Why Geospatial Intelligence Resists General-Purpose AI

The Universality Problem

Modern machine learning is built on an assumption of transferability. Train a large enough model on enough data, and it will generalize. This works remarkably well for language, where a sentence in English follows roughly the same grammar whether it was written in London or Lagos. It works well for certain classes of image recognition, where a cat is a cat regardless of the camera that photographed it.

It does not work for the Earth.

The Earth's surface is non-stationary. What is true at one location is not true at another, not because of noise or insufficient data, but because the underlying processes are genuinely different. The relationship between precipitation and vegetation depends on soil type, elevation, latitude, land use history, and a dozen other factors that vary continuously across space. A model that learns this relationship in Iowa will make confident and wrong predictions in Senegal.

This is not a data problem. It is a structural one. The technical term in geography is spatial heterogeneity, the recognition that statistical relationships vary across space. Geographically Weighted Regression was developed in the 1990s precisely because ordinary regression, which assumes a single global relationship, systematically failed when applied to spatial data. The same insight now applies to neural networks, but the machine learning community has been slower to absorb it.

Chapter 8 of the GeoAI Handbook addresses this directly. Xie et al. examine how ignoring spatial heterogeneity not only degrades prediction accuracy but introduces systematic unfairness: models that perform well in data-rich regions and fail in data-poor ones. This is not a theoretical concern. It is the lived reality of anyone who has tried to apply a model trained on Sentinel-2 imagery of European farmland to smallholder agriculture in sub-Saharan Africa.

What the Sensors Actually Produce

Before the AI even begins, there is a prior question that most machine learning pipelines ignore: what, exactly, is the data?

A Sentinel-2 image is not a photograph. It is a set of measurements of electromagnetic reflectance in thirteen spectral bands, captured by a pushbroom scanner moving at 7.5 kilometers per second, corrected for atmospheric interference using radiative transfer models, geometrically corrected to a map projection, and delivered as calibrated top-of-atmosphere or bottom-of-atmosphere reflectance values. Each of those processing steps encodes assumptions about the physics of light, the composition of the atmosphere, and the geometry of the Earth's surface.

A Sentinel-1 SAR image is something else entirely. It is a measurement of microwave backscatter, the return signal from radar pulses transmitted by the satellite. The physics is different. The information content is different. The noise characteristics are different. A SAR image of a forest tells you about structure and moisture content. An optical image of the same forest tells you about chlorophyll concentration and canopy reflectance. They are not two views of the same thing. They are two measurements of fundamentally different physical properties.

This matters because general-purpose AI architectures (convolutional neural networks, vision transformers, foundation models) treat their inputs as arrays of numbers. They do not know that band 4 of a Sentinel-2 image measures red reflectance while band 8 measures near-infrared. They do not know that the relationship between those two bands is governed by the physics of chlorophyll absorption. They do not know that a SAR backscatter value of −12 dB over water means something completely different from −12 dB over a parking lot.

Domain-specific systems encode this knowledge. They build the physics into the architecture — not as a constraint that limits what the model can learn, but as a prior that helps it learn the right things faster and with less data.

Heinz von Foerster, writing about cybernetics and perception in the 1970s, made a point that applies here with uncomfortable precision. He argued that perception is not passive reception; the sensor does not simply record what is there. Perception is an active process of construction, shaped by the structure of the observer. What a system can see is determined by how it is built to look.

A general-purpose neural network looking at satellite imagery is like von Foerster's hypothetical observer with no framework for interpreting what it sees. It can find statistical patterns. It cannot understand what those patterns mean in physical terms. And when the patterns shift (because the geography changed, or the season changed, or the sensor changed), it has no basis for adaptation.

Federated Learning and the Locality of Knowledge

There is a deeper version of the heterogeneity problem that emerges when you try to train models across distributed sensor networks.

Consider a network of 144 weather stations spread across Indiana, each recording temperature, humidity, wind speed, soil moisture, and twenty other variables at fifteen-minute intervals. A standard approach would be to pool all the data, train a single model, and deploy it back to each station. This is what centralized machine learning does. It assumes that more data produces better models.

Recent work on federated learning for soil moisture prediction tells a different story. When lightweight convolutional neural networks (approximately 800 parameters) are trained locally at each station and only their model weights are shared, never the raw data, the federated system achieves a mean absolute error within one centibar of the centralized model. The local model with ten times fewer parameters nearly matches the global model that saw all the data.

This is not a failure of centralization. It is evidence that environmental data has a fundamentally local character. The relationships between predictors and outcomes are not identical across a sensor network; they are shaped by local soil composition, microclimate, topography, and land use. A model that tries to learn one universal function across all stations is fighting the data's own structure.

The federated approach respects that structure. Each node learns its local relationships. The aggregation step combines those local models into a global understanding without erasing what makes each locality distinct. This is a principled response to the non-IID (non-independently and identically distributed) nature of spatial data, a property that the machine learning community often treats as a nuisance to be corrected, but which is in fact the most important signal in the dataset.

The security implications are worth noting as well. When raw data never leaves the sensor node, the attack surface for data poisoning is fundamentally different. Work on Byzantine-resilient federated learning has demonstrated that even with 50% of nodes compromised, submitting deliberately poisoned model updates, systems that use cosine similarity filtering, multiplicity checking, and committee-based verification can maintain model accuracy above 85%, compared to near-complete failure for naive averaging approaches. The resilience comes not from a central authority but from the distributed structure itself.

Graph Architectures and Spatial Structure

A second line of evidence comes from air quality forecasting, where the spatial relationships between monitoring stations are not just background context but the primary mechanism of the phenomenon being modeled.

PM2.5, particulate matter smaller than 2.5 microns, does not stay where it is generated. It moves. It is carried by wind, blocked by mountains, concentrated in valleys, and dispersed by turbulence. Predicting PM2.5 concentration at a given monitoring station requires knowing not just the local conditions at that station but the conditions at upwind stations, the fires burning within hundreds of kilometers, and the terrain between them.

Graph neural networks are designed for exactly this kind of problem. They represent monitoring stations as nodes in a graph and learn how PM2.5 propagates between them, accounting for wind direction, distance, and elevation differences. A recent GNN-based forecasting system for California achieves a one-hour-ahead MAE of 5.23 μg/m³ across 112 stations, outperforming random forests, LSTMs, and multilayer perceptrons. The advantage is modest in aggregate metrics but dramatic for the cases that matter most: elevated PM2.5 events, where the GNN's spatial inductive bias allows it to capture propagation dynamics that purely temporal models miss.

What makes this architecture interesting is not just its accuracy. It is the way domain knowledge is embedded in the graph construction itself. The GNN only models PM2.5 transport between stations that are within 300 kilometers of each other and differ by less than 1,200 meters in elevation, because the physics of particulate transport does not support transmission across those boundaries. Fire radiative power from satellite observations is aggregated at each station using inverse distance and wind-direction weighting, encoding the physical reality that a fire directly upwind matters more than a fire of equal intensity at the same distance but crosswind.

These are not hyperparameters to be tuned. They are physical constraints that define the problem. A general-purpose model would have to discover them from data alone — and might never discover them if the training set does not contain enough variation in wind direction, elevation, and fire location to reveal the relationships.

The GNN framework also enables something that general models cannot do easily: counterfactual simulation. By injecting synthetic fire radiative power values into the graph, representing prescribed burns that have not yet happened, the system can forecast the air quality impact of hypothetical controlled burns at different times of year. This determined that March is the optimal month for prescribed fires in California, and that the short-term PM2.5 increase from controlled burns is dramatically outweighed by the avoided pollution from the wildfires those burns prevent.

This kind of simulation requires a model that understands spatial relationships, not just temporal patterns. It requires architecture that knows what wind direction means, what distance means, what a fire is. General-purpose models can approximate these relationships given enough data. Domain-specific models can reason about them.

The Observer Constructs the Observation

There is a philosophical thread running beneath all of this that connects to how we think about intelligence in Earth observation systems.

Von Foerster's central insight in second-order cybernetics was that the observer is not separate from the observation. The act of measurement is an act of construction. The sensor does not passively record reality; it creates a specific representation of reality, shaped by its spectral range, its spatial resolution, its revisit frequency, its noise characteristics, and the processing chain that transforms raw signals into data products.

This means that the choice of sensor predetermines what phenomena are visible. A multispectral imager can see chlorophyll but not soil structure. A SAR instrument can see soil moisture but not plant health. A thermal sensor can see surface temperature but not subsurface conditions. No single sensor captures "the truth." Each captures a projection of reality through a specific observational lens.

The same principle applies to AI models. A convolutional neural network sees spatial patterns at the scale of its receptive field. A recurrent network sees temporal sequences up to the length of its memory. A graph neural network sees relationships between connected nodes. The architecture determines what the model can perceive, just as the sensor determines what the instrument can measure.

This is why domain-specific AI matters. Not because general-purpose models are bad, but because the choice of architecture is itself an observational decision. When you build a model to analyze satellite imagery, you are choosing what aspects of the data the model can attend to and what it will be structurally blind to. Making that choice with domain knowledge — encoding the physics of the sensor, the geography of the phenomenon, the structure of the observation network — is not constraining the model. It is giving it eyes that can actually see.

Von Foerster distinguished between "trivial machines" — which produce the same output for the same input, every time — and "non-trivial machines," whose output depends on their internal state, which in turn depends on their history. The Earth is the ultimate non-trivial machine. Its response to a stimulus depends on everything that has happened before, at that location and at every connected location. General-purpose AI tries to trivialize this complexity by treating each observation as independent. Domain-specific AI respects the non-triviality.

What This Means for Intelligence Layers

The convergence of these lines of evidence (federated learning that respects data locality, graph architectures that encode spatial structure, domain-specific feature engineering that reflects sensor physics) points toward a particular design philosophy for geospatial intelligence systems.

That philosophy has three pillars.

First, models must be spatially aware. Not as an afterthought, not by adding coordinates as input features to a generic architecture, but structurally. The relationships between locations, the heterogeneity of processes across space, the physics of how signals propagate through the environment, should be encoded in the model's bones.

Second, training must respect data sovereignty. Environmental data is collected by diverse institutions, across jurisdictions, under different regulatory frameworks. A system that requires all data to be centralized before intelligence can be extracted is not just technically suboptimal — it is practically impossible in many of the contexts where geospatial intelligence matters most. Federated and distributed approaches are not compromises. They are the architecture that matches the data's natural structure.

Third, intelligence must be personalized to context. A model trained on temperate forests should not be blindly applied to boreal forests. A change detection algorithm calibrated for urban expansion should not be repurposed for glacial retreat without adaptation. The most useful intelligence layer is one that can be tuned to a specific geography, a specific sensor, a specific application, while still drawing on the broader patterns learned across all contexts.

These are not novel observations. The geospatial research community has been articulating them for years, and the GeoAI Handbook captures the state of the art as of 2023. What remains underbuilt is the infrastructure that would make them operational at scale: systems that federate model training across sensor networks, that encode domain knowledge into production architectures, that personalize predictions to local context without starting from scratch.

That infrastructure is the next frontier. Not bigger models. Better-situated ones.