1. Introduction
Extreme weather events and long-term climate change effects will have a large impact on the yield of agriculture crops in the future [
1,
2,
3,
4]. A negative impact on major food and feed crops is expected in temperate and tropical regions under the climate change scenario of global warming equal to 2 °C or more compared to the pre-industrial era [
4]. In some regions, the negative effect of increased temperature and occurrence of extreme events (e.g., drought and pests) on crop production might be offset by the carbon dioxide (CO
2) fertilization effect [
4,
5]. However, the magnitude of the CO
2 fertilization effect and the importance of other interacting factors is still unclear [
4,
5]. Considering these scenarios, it will be a challenge to ensure global food security in the future. In Europe, particularly in southern Europe, extreme weather events have already resulted in crop yield losses [
2,
6,
7]. The extreme drought of 2018 resulted in a drop of 8% in European cereal harvest compared to the five-year average [
6]. Years with extreme heat waves between 1964 and 2007 resulted in national cereal production deficits of 9.1% worldwide [
8].
In order to guarantee food supply, it is crucial to monitor crop yield and understand how weather variability and weather extremes influence crop yield. Accurate and timely crop yield data are indispensable for decision making, as reliable estimations of yield changes are required for strategic planning [
9]. Currently, crop yield statistics are predominantly used to monitor and evaluate the causes of crop yield changes. However, these crop yield statistics have their limitations as the spatial resolution and temporal frequency of these data are limited. Data are, for example, often only available at county or regional levels and assessed yearly [
10].
Remotely sensed vegetation indices (VIs) have a large scope to fill the crop yield data gap [
11]. VIs capture various properties of the Earth’s vegetation by combining the reflectance of specific spectral bands and can be used to monitor crop performance. An advantage of VIs, compared to crop yield statistics, is that information of the crop can be delivered at a high spatio-temporal resolution. The spatial resolution of VIs can be as high as a few meters and some VIs are available at a daily resolution. There is, however, often a tradeoff between the spatial and temporal resolution of VIs. Time series of VIs enable us to evaluate crop development throughout the growing season. In addition, remotely sensed VIs are globally available, allowing us to monitor crop yield at the field level as well as at the global level. Several studies have already demonstrated the possibility to use VIs to monitor yield at different geographical locations and at different scales i.e., from the field to global level [
12,
13,
14,
15,
16,
17]. One of the VIs that is often and successfully used to estimate crop yield is the Normalized Difference Vegetation Index (NDVI) [
14,
18,
19].
Deriving crop yield from VIs can be performed using a mechanistic approach or an empirical approach [
20]. When a mechanistic approach is applied, growth parameters such as: VI, soil characteristics, and management information are used to run and calibrate plant growth models from which crop yield is derived [
21]. When an empirical approach is used, a VI is related to crop yield by using statistical methods without describing the causality between the VI and crop yield as performed in mechanistic-based models [
14,
22]. A simple linear regression model where a VI is modeled as a function of crop yield is an example of an empirical based model. Whereas empirical-based approaches depend on the availability and quality of the data i.e., VI and yield data, mechanistic approaches rely on the assumptions made in the models [
20]. The main limiting factor of empirical crop yield models is that the models cannot be easily extrapolated to other areas [
12,
20].
Wheat is one of the most important cereal crops in Europe, in 2017 wheat constituted 52% of the total harvested cereal production in Europe [
23]. European wheat production is also important on a global scale, as Europe accounted for 32.6% of the global wheat production for the period from 1994 to 2017, and wheat yield in Europe is high (i.e., in 2017 wheat yield in Europe was 44 kg/ha compared to 35 kg/ha globally) [
23]. However, the occurrence of adverse weather conditions in European wheat production areas is expected to increase due to climate change which will lead to a larger wheat yield variability in Europe [
1,
2,
24,
25,
26]. This advocates wheat yield monitoring in Europe. The aim of this paper is to (i) evaluate the possibility to use NDVI time series to estimate yields and (ii) determine which weather variables impact yield changes. We tested the methodology for spring and winter wheat yield in Latvia. NDVI series at the field level were combined with crop yield statistics at the regional level from 2014 until 2018 using an empirical modelling strategy. In addition, we explored which weather variables impact wheat yield changes in Latvia, using output from the Regional Climate Models ALARO-0 and REMO2015 (henceforth REMO), and subsequent correlation analysis between weather variables, wheat yield, and NDVI.
3. Results
3.1. NDVI Series of Winter and Spring Wheat
The average NDVI series from 2014 to 2018 for the studied winter and spring wheat fields in Latvia are visualized in
Figure 4. For winter wheat, the NDVI peak was around mid-May, while for spring wheat the NDVI peak was later in the growing season—around mid-June. For both winter and spring wheat, the NDVI dropped around the end of August which coincided with the harvest time of winter and spring wheat in Latvia.
3.2. aNDVI and Crop Yield Statistics
The NDVI series used to calculate the aNDVI for spring and winter wheat for each of the five NUTS 3 regions are visualized in
Figure 5. The Zemgale region had the highest number of wheat fields, whereas the Latgale region had the lowest number of wheat fields. The number of fields used for the calculation of aNDVI for each year for spring and winter wheat is presented in
Figure 6. Note that the number of fields in the years 2014 and 2018 was lower compared to the other years. The number of winter wheat fields was higher than the number of spring wheat fields, except in 2014. This is because a large proportion of winter wheat fields were damaged by frost due to a cold spell in mid-January in 2014. These fields were subsequently re-sown with spring wheat [
38].
In
Figure 7, the yield at the regional level, as reported by the Central Statistical Bureau of Latvia for spring and winter wheat for the years from 2014 to 2018, is visualized. For spring wheat, a low yield was reported in 2018. For winter wheat there was a low yield reported in the years 2014 and 2018 (
Figure 7).
Similar patterns were visible when the aNDVI calculated for the studied fields was plotted for each year for spring and winter wheat (
Figure 8). Nonetheless, the variation in aNDVI between the years (
Figure 8) was not as pronounced as compared to the variation in yield between the years (
Figure 7).
3.3. Wheat Yield Empirical Model
The statistics of the coefficients (estimate, standard error, t-value, and
p-value) of the wheat yield empirical linear regression model (Equation (1)) are presented in
Table 1. All the coefficients of the linear regression model were significant (
p < 0.001). The explained variance of the linear regression model was 71% and the residual standard error was 0.55 Mg/ha. When a random forest regression model was used instead of a linear regression model, the residual standard error of the random forest model was equal to 0.58 Mg/ha and the explained variance by the model decreased to 62%. Therefore, the linear regression model performed better. The explained variance of the linear regression model decreased by 23% when the region factor was not included in the model. The AIC increased from 3400 to 4574 when region was not included in the model. When neither crop nor region factors were included in the model, the explained variance decreased by 27%—from 71% to 44%. In addition, the AIC increased from 3400 to 4724 when neither crop nor region were included in the model. Allowing the slope of the linear regression model to change depending on the region, i.e., including an interaction term between aNDVI and region, did not result in the improvement of the model accuracy and resulted in non-significant parameter estimates. The proposed linear regression model with the factor crop, indicating if the crop is spring or winter wheat, and the factor region, indicating in which region the field is located, explains the wheat yield variability well. In
Figure 9, the yield, as predicted by the empirical linear regression model, is plotted versus the observed yield for spring and winter wheat. The adjusted R
2 (R
2 = 0.71) and the residual standard error of the linear regression model fit (0.55 Mg/ha) resulted in good estimates of the model parameters (
Table 1). However, the linear regression model overestimated the yield of winter wheat slightly in the years 2014 and 2018 (
Figure 9) and slightly underestimated the 2015 yield of spring wheat.
Figure 9 indicates that the overestimation of winter wheat yield by the linear regression model is similar in all regions in 2014. However, in 2018 the overestimation of winter wheat yield was mostly for the fields located in the Kurzeme region. The underestimation of spring wheat yield in 2015 was similar in all regions.
3.4. Effect of Tmin and Tmax on Wheat Yield
In
Figure 10, the Pearson correlation between the monthly means of temperature (Tmax and Tmin) and aNDVI, and the monthly means of temperature (Tmax and Tmin) and yield for spring wheat is visualized for the months from January to August. The most negative correlation between monthly means of temperature and yield, and temperature and aNDVI, was found in June, indicating that high temperatures in June resulted in lower aNDVI and yield values for spring wheat. In the month of March, there was a high positive correlation between Tmin and yield, and Tmax and yield. The correlations between aNDVI and Tmin, and aNDVI and Tmax of March, showed similar high positive correlations, indicating that high temperatures in March resulted in high aNDVI and yield values for spring wheat.
In
Figure 11, the Pearson correlation between the monthly means of temperature (Tmax and Tmin) and aNDVI, and the monthly means of temperature (Tmax and Tmin) and yield for winter wheat, is visualized for the months from January to August. The most negative correlation between temperature and aNDVI, and temperature and yield, was observed in July, indicating that high temperatures in July resulted in lower aNDVI and yield values for winter wheat. The most positive correlations between temperature and yield, and between temperature and aNDVI, were observed in January.
In
Figure 12, wheat yield is plotted as a function of maximum temperature in June and March, and July and January, for spring and winter wheat yield, respectively. These are the two months which showed the strongest correlation with spring and winter wheat. The highest spring wheat yields were found in the years with low temperatures in June and high temperatures in March. The highest winter wheat yields were found in the years with low temperatures in July and high temperatures in January.
3.5. Regional Wheat Yield Modelling
For winter wheat, the modeled yield was overestimated in 2014 in all regions and in 2018 for fields located in the Kurzeme region (
Figure 9 and
Figure 13). The years with the lowest yields were also 2014 and 2018 (
Figure 7). This suggests that aNDVI did not capture one or more factors affecting low winter wheat yields observed in 2014 and 2018. The correlation of the monthly means of temperature and yield for winter wheat suggests that high temperatures in July resulted in lower winter wheat yields (
Figure 11). The highest July temperatures were observed in 2014 and 2018 (
Figure 13), but their negative impact on winter wheat yield did not show in aNDVI. This might explain the overestimation of modeled winter wheat in the years 2014 and 2018. Considering this observation, the wheat yield model could improve by adding explanatory variables which are known to affect wheat yield such as Tmin or Tmax of July. Indeed, when Tmax of July from the ALARO-0 model or REMO model were added to the linear regression model the explained variance increased by 10%—from 71% to 79% and 11%—from 71% to 80%, respectively. The AIC of the wheat linear regression model decreased from 3400 to 2615 for the ALARO-0 model and to 2702 for the REMO model, when including the July maximum temperatures. The residual standard error decreased from 0.55 to 0.47 Mg/ha and to 0.46 Mg/ha, respectively, when Tmax of July from ALARO-0 or REMO were added to the linear regression model. However, when July maximum temperatures were included in the model, the explained variance by the random forest model was higher compared to the linear regression model. When a random forest modeling approach was used, the explained variance was equal to 97% for the ALARO-0 model and 94% for the REMO model, when including July maximum temperatures. The residual standard error was equal to 0.17 Mg/ha for the ALARO-0 model and 0.23 Mg/ha for the REMO model, when including July maximum temperatures in the random forest models.
Adding explanatory variables which are known to affect wheat yield also improved the explained variance of the individual spring and winter wheat linear regression models. The explained variance of the winter wheat linear regression model increased from 59% to 77% for the ALARO-0 model and to 84% for the REMO model, when including the July maximum temperatures. The AIC of the winter wheat linear regression model decreased from 2690 to 1794 for the ALARO-0 model and to 1258 for the REMO model, when including July maximum temperatures. Both linear regression models indicated that July maximum temperatures had a negative effect on winter wheat yield. The residual standard error decreased from 0.57 to 0.43 Mg/ha and to 0.36 Mg/ha, respectively, when July maximum temperatures from ALARO-0 or REMO models were added to the winter wheat linear regression model. Likewise, the explained variance of the spring wheat linear regression model increased from 59% to 83% for the ALARO-0 model and to 77% for the REMO model, when including the June maximum temperatures. The AIC of the spring wheat linear regression model decreased from 661 to 246 for the ALARO-0 model and to 387 for the REMO model, when including the June maximum temperatures. Both linear regression models indicated that June maximum temperatures had a negative effect on spring wheat yield. The residual standard error decreased from 0.48 to 0.31 Mg/ha and to 0.36 Mg/ha, respectively, when June maximum temperatures from ALARO-0 or REMO models were added to the spring wheat linear regression model.
The explained variance was higher when a random forest regression modelling approach was used for both the winter and spring wheat model when including July and June maximum temperatures, respectively. For the winter wheat random forest model, the explained variance was equal to 98% for the ALARO-0 model and to 97% for the REMO model, when including July maximum temperatures. The residual standard error was equal to 0.12 Mg/ha for the ALARO-0 model and to 0.16 Mg/ha for the REMO model, when including July maximum temperatures in the random forest winter wheat model. For the spring wheat random forest model, the explained variance was equal to 92% for the ALARO-0 model and to 93% for the REMO model, when including June maximum temperatures. The residual standard error was equal to 0.21 Mg/ha for the ALARO-0 model and to 0.20 Mg/ha for the REMO model, when including June maximum temperatures in the random forest spring wheat model.
4. Discussion
The statistics of the empirical models showed that there is a scope to use 10-daily NDVI series with a 300 m resolution provided by satellites with a PROBA-V configuration to estimate spring and winter wheat yield in Latvia (
Table 1 and
Figure 9). The variance explained by the wheat linear regression model decreased by 22% when the factor region was not included as a predictor factor in the wheat yield model. This is likely related to the fact that yield data (the response variable) reflect the regional environment. Wheat yield variability may not be fully captured by aNDVI and may need additional information, such as, for example, differences in soil characteristics, used wheat varieties, and management practices between the regions, although the analysis by [
14] suggests that NDVI time series are sufficient to estimate yields on a parcel to regional scale.
In general, the model performance is in the same range as that of other studies using PROBA-V NDVI data to predict wheat yield [
14,
21,
22]. Zheng, Y. et al. used NDVI to estimate the aboveground biomass with a light use efficiency model [
21], and subsequently predicted wheat yield. Durgun, Y.Ö. et al. fitted an asymmetric double sigmoid model on the NDVI series and used the integration of the fitted function as a predictor for wheat yield [
14]. Compared to these studies, our methodology of computing the integral of the NDVI series (i.e., aNDVI) as a predictor of wheat yield is straightforward. However, a prerequisite is the availability of continuous NDVI series. Methodologies which combine different remotely sensed data—such as vegetation indicators and microwave vegetation optical depth data—can be used to estimate crop yield in regions where continuous NDVI series are scarce due to the frequent presence of clouds [
17]. The developed wheat modelling method can be used in other wheat producing areas. In addition, NDVI, aNDVI, and meteorological variables may capture only part of the important wheat yield-influencing variables. In other regions’ local soil fertility conditions, the effects of variety and weather extremes may explain a much larger proportion of the yield variation. The explained variance by the random forest regression models that included RCM data as a predictor variable was higher compared to the linear regression models. Therefore, in regions where RCM data are available, a random forest regression modeling approach is recommended for estimating regional wheat yields.
Durgun, Y.Ö. et al studied whether better wheat yield estimations could be obtained with thermal instead of calendar time for the integration of PROBA-V NDVI series at spatial resolutions of 100 m, 300 m, and 1 km [
14]. Thermal time outperformed calendar time for estimating winter wheat, but a notable exception occurred for fields with a pixel purity above 65% at 300 m spatial resolution. However, due to low sample size at 300 m resolution no general conclusion could be made by [
14]. The present study seems to confirm the observation of [
14], since we obtained good wheat yield estimations with pure pixel NDVI data at 300 m resolution projected on calendar time (i.e., aNDVI). The use of thermal time did not improve wheat yield estimations. The explained variance decreased by 4% when thermal time instead of calendar time was used in the linear regression wheat yield model.
In general, the variation in correlation from January to August between monthly means of temperature and wheat yield on the one hand, and monthly means of temperature and aNDVI on the other hand, were quite similar.
The variation in correlation of monthly mean temperature and winter and spring wheat yield and aNDVI between the different months from January to August indicated that temperature during some months influenced the yield and aNDVI more than others. For spring wheat, high temperatures in June resulted in lower yield and lower aNDVI, while for winter wheat the same pattern was found in July. The negative effect of temperature in June (for spring wheat) and July (for winter wheat) on aNDVI can be linked to important phenological stages of winter and spring wheat. This observation indicates that the variability in wheat yield in Latvia is likely to increase in the future. Indeed, the average temperature as well as the occurrence of extremely hot days will further increase due to climate change. The negative effect of high temperatures on wheat yield in June (for spring wheat) and July (for winter wheat) confirms similar observations in Europe [
39,
40] and should, therefore, be considered in climate change adaptation strategies [
2,
26,
41]. From the observed positive correlation between temperature in January and winter wheat yield, we could conclude that projected increases in winter temperatures could have a positive effect on winter wheat yield. However, dehardening (i.e., loss of freezing tolerance) of wheat is also triggered by higher temperatures [
42] and may negatively impact crop performance. Winter wheat growing at high temperatures in January generally loses its freezing tolerance and will be more susceptible to low temperatures in spring [
43]. In addition, in the case of, due to global warming, both January and July temperatures increase, the positive effect of high January temperatures on winter wheat yield could be offset by the negative effect of high July temperatures on winter wheat yield. A similar remark can be made for spring wheat, as the positive effect of high March temperatures on spring wheat yield is likely to be offset by the negative effect of high June temperatures on spring wheat yield. These phenomena are already visible at present. In years with high temperatures for both March and June the spring wheat yield is moderate, and in years with high temperatures for both January and July, winter wheat yield is moderate (
Figure 12).
5. Conclusions
Wheat yield was estimated from NDVI time series for spring and winter wheat in Latvia. The wheat yield was derived from the integral of the NDVI series (i.e., aNDVI) using a simple methodology which can be applied to other regions and crops where continuous NDVI series are available. We can conclude that adding (i) region as a factor, (ii) both the region and crop type, and (iii) explanatory weather variables from regional climate models (RCM) to the model improved wheat yield prediction. The importance of adding region as a factor to the model might be explained by regional differences in the dominant Reference Soil Group. Allowing the slope of the model to change depending on the region did not improve wheat yield prediction. Our results suggest that there is a large scope to use NDVI-derived crop yield estimates, such as aNDVI, and that RCM output plays a valuable role in understanding regional weather impacts on yield. In the case that RCM data are available, a random forest model approach is recommended to model regional wheat yields. The high spatial extent of the VI-derived crop yield estimates is an important benefit compared to the commonly used crop yield statistics that are often only available at the regional level. Likewise, the spatio-temporal availability of the regional climate model output is superior to sparsely available climate stations. Using aNDVI and the RCMs ALARO-0 and REMO, we found that high temperatures during June (for spring wheat) and July (for winter wheat) influenced the wheat yield negatively in Latvia. High temperatures in March and January had a positive effect on spring and winter wheat yield, respectively. However, this positive effect of high temperatures was found to be offset when temperatures were high in both March and June, and January and July for spring and winter wheat, respectively. This information should be considered in weather impact analysis and climate change adaptation strategies such as, for example, wheat breeding programs.