How to Choose Between NASA Harvest Crop Yield Models
NASA Harvest produces crop yield estimates for a range of uses, from regional food security monitoring to field-level decision support. There is no single model that works best across all of these cases. The choice depends on what data are available, the scale of the analysis, and how early an estimate is needed.
Some models rely on long records of historical yields and perform well where those data are reliable. Others focus on how crops develop during the season and can provide earlier signals of change or simulate crop growth directly, making it possible to estimate yields even where historical data is limited. These differeces reflect tradeoffs, not incremental improvements.
NASA Harvest uses three primary yield models - GEOCIF, ARYA, and VeRCYe - that represent these distinct approaches. They vary in how they use satellite data, what inputs they require, and the conditions under which they perform best. Choosing among them depends on the specific application, including crop type, climate, and data availability.
This post explains how each model works and where it is most useful, with the goal of making those tradeoffs explicit.
Key Features and Applications of NASA Harvest’s Three Main Yield Models
Global Earth Observations for Crop Inventory Forecasting (GEOCIF)
Ritvik Sahajpal, NASA Harvest scientist (University of Maryland) and creator of the GEOCIF yield model.
GEOCIF estimates crop yields by learning from the past. It takes historical yield records and pairs them with satellite and weather data to identify patterns that repeat over time. Instead of relying on fixed agronomic rules, it uses a machine learning model to learn which combinations of temperature, rainfall, vegetation signals, and extreme events tend to produce higher or lower yields in a given region (Sahajpal et al., 2020).
This approach works well when the data is strong. GEOCIF performs best in places with long, consistent yield records and a clear relationship between what satellites observe and what farmers harvest. As lead developer Dr. Ritvik Sahajpal puts it, “GEOCIF works best with accurate administrative level statistics with a long history that span the full range of regional climate conditions (e.g., severe drought) and a clear relation of satellite signal to yield production - pests are challenging, for example.”
When those conditions are met, GEOCIF can produce reliable forecasts at the scale of districts, states, or countries, often one to three months before harvest.
Depending on historical data is also a constraint. If yield records are short, inconsistent, or missing, the model has little to learn from. In those cases, performance drops, especially where yield is driven by factors that are not well captured in satellite data.
GEOCIF has been applied across a wide range of crops and regions, from U.S. maize to wheat in Argentina and cereals in parts of Africa, and it is used in systems like the GEOGLAM Crop Monitor. The common requirement across these applications is not the crop or geography, but the availability of reliable historical data.
Agriculture Remotely-sensed Yield Algorithm (ARYA)
ARYA also estimates yield from past data, but it focuses on how crops grow over the season rather than learning from a large set of climate indicators. It tracks the timing and shape of crop development using satellite vegetation signals and temperature, then links that growth pattern to final yield (Franch et al., 2021). In practice, this means looking at when crops reach their peak growth and how long that growth lasts, and using those features to estimate outcomes.
Belen Franch, NASA Harvest scientist (University of Valencia) and creator of the ARYA yield model.
This design makes ARYA useful earlier in the season. Because it models the full growth trajectory, it can project yields up to about two and a half months before harvest. ARYA is typically used at national or sub-national scales, though it can also run at finer resolution (around 1 km) depending on the satellite data available.
ARYA works best when crop growth follows familiar patterns. As model developer Dr. Natacha Kalecinski notes, it performs strongest when a season’s weather is close to typical conditions. When conditions shift outside that range, the model still detects that yields are changing, but estimating the size of those changes becomes more difficult.
Like GEOCIF, ARYA depends on reliable historical yield data for training. But compared to GEOCIF, it uses a more constrained set of inputs, centered on crop growth timing rather than a broad suite of climate indicators. That makes it easier to run in some settings, but also limits how much variation it can capture when growing conditions diverge from the norm.
ARYA has been widely applied in large-scale agricultural systems, including the U.S., Europe, and Australia, and is used operationally in systems like the European Space Agency’s WorldCereal. The model is particularly effective at identifying relative gains and losses across regions, even when exact yield levels are harder to pin down.
Yuval Sadeh, NASA Harvest scientist (Monash University) and creator of the VeRCYe yield model.
Versatile Crop Yield Estimator (VeRCYe)
VeRCYe takes a different approach. Instead of learning yield directly from past data, it simulates how crops grow and then checks those simulations against what satellites observe. It runs thousands of possible crop growth scenarios using a process-based model (e.g., APSIM), while varying inputs like weather, soil, and management. It then compares those simulated growth patterns to satellite measurements of Leaf Area Index (LAI), which track how crop canopies develop over time. The simulations that best match the observed growth are kept, and their yields are averaged to produce an estimate (Sadeh et al., 2024).
This structure changes what the model needs to operate. Unlike GEOCIF and ARYA, VeRCYe does not require historical yield data for calibration. As model developer Dr. Yuval Sadeh explains, it can operate “without direct calibration to ground-based yield data,” which makes it useful in regions where those data are sparse or unavailable.
Because it is grounded in crop growth processes, VeRCYe can also handle conditions that fall outside the historical record, such as extreme droughts, where empirical models tend to struggle. At the same time, this approach introduces its own constraints. The model depends on accurate inputs for weather, soil, and management, and its performance varies across scales. While it has shown strong accuracy at field scale in some settings, such as wheat in Australia, large-scale performance is still improving and has not yet reached the same level of accuracy.
VeRCYe was first developed and validated for rainfed wheat in Australia. Since then, it has been extended to other crops, including maize and soybean, and applied across regions such as Ukraine, Kazakhstan, Ethiopia, Malawi, France, and the United States. The model can produce yield estimates at multiple spatial scales, from individual fields to country-level assessments, depending on the resolution of the satellite data used. This flexibility, combined with VeRCYe’s ability to operate without yield training data, makes it particularly useful in data-sparse or rapidly changing agricultural systems.
Choosing Between Models
The three yield models are designed for different conditions. The choice of model is thus not based on which is “best”, but rather which model fits the data and agricultural context.
If reliable historical yield data are available, GEOCIF is often the most effective option. It uses those records to learn long-term relationships between climate, vegetation signals, and yield, and performs best where those relationships are stable and well-documented.
If the goal is to generate an early signal during the growing season, ARYA provides a more direct approach. By modeling the timing and shape of crop development, it can project yields up to a few months before harvest. This makes it useful for tracking how a season is unfolding, even if exact yield levels are harder to estimate under unusual conditions.
If historical yield data are limited or unavailable, VeRCYe becomes more useful. Because it does not require calibration to ground-based yield data, it can be applied in data-sparse regions or in settings where conditions fall outside the historical record. Its process-based structure also allows it to capture crop responses under extreme conditions that are difficult for empirical models to represent.
In Practice
Each of these models reflects a different way of estimating yield, with different data requirements and assumptions. In practice, the choice depends on the context: the availability of historical data, the scale of analysis, and how early an estimate is needed.
In high-stakes settings, NASA Harvest often runs multiple models in parallel. Agreement across models can strengthen confidence in emerging signals, while differences can highlight where assumptions or data limitations matter most.
To support this, NASA Harvest’s Christina Justice is leading the development of a NASA Harvest Yield Intercomparison Dashboard that enables side-by-side evaluation of GEOCIF, ARYA, VeRCYe, and external models (e.g., CAPE from the University of California Santa Barbara Climate Hazards Center and yield models from the European Commission Joint Research Center). As Justice notes, the goal is to make it easier to assess model performance across regions and applications, and to identify areas of agreement that can inform near real-time agricultural monitoring.
For a more detailed description of yield models and forecasts, please see the recently published chapter authored by NASA Harvest researchers in the Comprehensive Remote Sensing textbook (Becker-Reshef et al., 2026).
This article was written and edited by a team of NASA Harvest researchers including Michael Cecil, Ella Kirchner, Elinor Benami, Ritvik Sahajpal, Belen Franch, Natacha Kalecinski, Italo Moletto-Lobos, Yuval Sadeh, and Christina Justice.