1. Introduction
The term “nutrition” mainly refers to monocotyledon crops, such as maize, wheat, and rice. Peas (
Pisum sativum L.) are an important dicotyledonous crop with moderate protein and energy [
1]. Statistical data from the Food and Agriculture Organization in 2021 indicate a dry pea production of 12.40 million tons in 99 countries and a fresh pea production of 20.52 million tons in 86 countries. Worldwide, China has contributed 11.83% and 55.84% of dry and green peas, respectively [
2]. The reduction in arable land in China has exacerbated the conflict between the demand for food and land use policies [
3]. A rapid and effective estimation of pea yield at the field scale is therefore critical for making field decisions and trade policy [
4].
The traditional methods of measuring crop yield can be categorized into two types. One is ground-based field surveys or expert farmer knowledge to obtain detailed yield data. However, it is labor-consuming and too late for field management. The other is using non-destructive techniques (such as measuring leaf area index and spad values) to observe crop morphological characteristics and estimate the yield. However, this method is subjective and challenging to apply over large areas [
5,
6].
Satellite data have become increasingly used for crop yield estimation in recent decades. Battude et al. [
7] combined the simple regression for yield estimates by using several satellite data to obtain accurate maize yield estimates over large areas. **e and Huang [
8] used MODIS time series data to estimate yields in growing areas by using deep learning methods. However, the application of satellite data in precision agriculture remains limited, owing to the relatively high associated costs, limited flexibility regarding the spatial and temporal resolutions of the data, and the effects of meteorological conditions [
9].
With the development of the low-altitude platform and integration of sensors [
10], a large number of researchers and breeders have turned their attention to acquire high temporal and spatial resolution images by employing unmanned aerial vehicles (UAVs) [
11,
12,
13]. A growing number of studies in the literature are now using UAV remote sensing images for crop yield estimations. For instance, Peng et al. [
14] used leaf area index data collected with a UAV to estimate maize yields, and they obtained high accuracy. Song et al. [
15] applied UAV data to generate high-spatial-resolution maps, which were then used to estimate yields. Vega et al. [
16] developed a method for estimating sunflower yields using multitemporal images from a UAV system carrying a multispectral sensor (MS) during the growing season. Soybean [
17], sorghum [
18], barley [
19], and cotton [
20] yields have also been successfully estimated using UAV remote sensing data. However, despite these advances, scant research attention has been paid to pea yield estimations based on UAV remote sensing images.
Several sensors have been adopted to collect data for estimating crop yield, including red green blue (RGB) cameras [
21], MS cameras [
22], hyperspectral cameras [
23], and lidar [
24]. Among them, RGB and MS cameras are the most widely favored, owing to their affordable price, simple operation, and easy transport aboard UAVs. Zhang et al. [
25] used RGB images obtained using consumer-grade UAVs to extract the excess green (ExG) color feature, thereby establishing a corn yield estimation model and demonstrating the potential of RGB cameras for yield estimates. Huang et al. [
26] estimated the yield of cotton based on the vegetation index ratio (RVI) extracted from an MS camera, achieving good results. Héctor et al. [
27] analyzed different vegetation indices, canopy cover, and plant densities using RGB and MS cameras, and applied a neural network model to estimated corn grain yields. Their results showed that RGB and MS cameras can provide a high coefficient of determination (
R2) between the estimated and observed yields, thus allowing corn grain yields to be accurately characterized and estimated.
Machine learning methods have become a major trend in agricultural research for supporting yield estimations, and have been successfully applied to many crops. Li et al. [
28] applied artificial neural network (ANN), support vector machine (SVM), stepwise multiple regression, and random forest (RF) to estimate wheat yields, which indicated that the ANN model obtained higher accuracy than other models. Ashapure et al. [
29] estimated cotton yields through ANN, SVM, and RF models, revealing that the ANN model outperformed both the SVM and RF models. Guo et al. [
30] adopted several models to estimation maize yields, and presented the finding that the SVM provided better estimates than other machine learning methods, although machine learning techniques have demonstrated their ability to estimate crop traits. However, relying on a single estimation model may lead to the occurrence of overfitting, especially when dealing with limited training data. In contrast, ensemble learning (EL), which combines multiple learners, typically achieved significantly superior generalization performance compared to a single learner. This has been empirically demonstrated in previous studies on various crops. For example, Ji et al. demonstrated that the use of EL significantly outperforms the use of a single model in estimating faba bean yield performance [
31]. The applications of pea yield estimation have not been reported yet.
Considering the above, the aims of this study are to: (1) evaluate the estimation performance of pea yields obtained using UAV-based RGB, MS, and feature-level fusion data (RGB + MS); (2) compare the pea yield estimation accuracy in five growth stages; (3) explore the performance of EL and base learners in estimating pea yield; and (4) explore and compare the applicability of machine learning methods for two different pea types (cold-tolerant and common peas).
2. Materials and Methods
2.1. Test Design and Pea Yield Measurement
In this study, peas were sowed on 15 October 2019 and data were collected in 2020. The research was conducted at the experimental base of the Chinese Academy of Agricultural Sciences in **nxiang, Henan province (113°45′38″ E, 35°8′10″ N). The annual average temperature and humidity of **nxiang are 14 °C and 68%, respectively. The annual average rainfall is 656.3 mm. The period from June to September is the wettest, with an average precipitation of 409.7 mm. The test site had 90 plots with dimensions of 8 m
2 (4 × 2 m; length × width), and 30 pea varieties (16 cold-tolerant varieties and 14 common varieties) were planted with three replicates (
Figure 1).
Prior to sowing, it was important to thoroughly plow and harrow the land to loosen the soil and encourage root development, while also applying compound fertilizer at a rate of 600 kg.ha
−1. When planting the seeds, the row planting method with a row spacing of 40 cm and a plant spacing of 10 cm was used, and the seeds were sown at a depth of 5–8 cm. During the growth of peas, insecticides were regularly sprayed every half month after overwintering, and additional fertilizer was administered during the flowering period. It was crucial to manually remove weeds to ensure healthy growth. At maturity (27 May 2020), we harvested plants from each plot separately, and conducted threshing and weighing to obtain yield data. The average yield of each plot was 826 kg.ha
−1, with a minimum of 100 kg.ha
−1 and a maximum of 1420 kg.ha
−1. The phenological stages of cold-tolerant peas and regular peas are consistent, as detailed in
Table 1.
2.2. UAV-Based Images Acquisition and Processing
We adopted a quadcopter DJI Matrice 210 (SZ DJI Technology Co., Shenzhen, China) electric UAV as a low-altitude observation platform. Two sensors, a Zenmuse X7 (SZ DJI Technology Co., Shenzhen, China) camera and a Red-Edge MX camera (MicaSense Inc., Seattle, DC, USA), were mounted on the DJI Matrice 210 for simultaneously collecting high-resolution RGB and MS images.
Table 2 lists detailed information regarding the sensors.
Figure 2 shows the UAV observation platform and two sensors used in this study.
In this study, UAV-based images were collected in five key pea growth periods: branching (7 March 2020), flowering (3 April 2020), podding (14 April 2020), early filling (23 April 2020), and mid filling (30 April 2020). The five flight missions were conducted under cloudless conditions between 11:00 AM and 1:00 PM to minimize any disturbances to the images acquired by the UAV by cloud cover and wind. To obtain high-quality images, all flights were set to a height of 25 m, with 85% forward and 85% side overlap for the RGB and MS cameras. The calibrated reflectance panel collected images both before and after the flights, which were later used for MS image calibration. Ground control points (GCPs) were also selected in the field that remained fixed during the study. A differential global navigation satellite system was used to record the GCP coordinates with millimeter precision.
In this study, the RGB and MS images were stitched using Pix4Dmapper 4.4.12 (Pix4D SA, Lausanne, Switzerland) software following the procedure described here. The RGB/MS images and GCP coordinates were first imported into software, and then the “Ag RGB” and “Ag Multispectral” modules in Pix4Dmapper were selected to stitch the RGB and MS images, respectively. For the MS images, the digital number (DN) values were converted to reflectance values using the previously obtained calibrated reflectance panel images. A digital surface model (DSM), digital terrain model (DTM), and orthomosaic of the test site were then obtained in .tif format.
2.3. RGB and MS Feature Extraction
The RGB images were used to extract the plant height (PH), canopy coverage (CC), texture information, and DN values of each plot. And the MS images were used to extract the reflectance of the five bands and texture information in each plot. All features, except the texture information, were extracted using ArcMap 10.5 (Environmental Systems Research Institute, Inc., Redlands, CA, USA), and the texture information was extracted in ENVI 5.3 (ITT Visual Information Solutions, Boulder, CO, USA). During estimation, input was based on the source of the feature. The input features for the RGB sensor include plant height, coverage, and vegetation indices extracted from the RGB sensor. The input features for the MS sensor consist of vegetation indices extracted from the MS sensor. When performing data fusion, all variables extracted from the RGB and MS sensors are utilized as input features.
2.3.1. RGB Data Extraction
The PH is a critical parameter for evaluating crop growth status, which has been proven to correlate highly with yield [
32,
33]. The PH derived from the RGB images was therefore used as an important feature for estimating pea yield. The DSM image and DTM image obtained by stitching the Pix4Dmapper 4.4.12 images were imported into ArcMap 10.5 software. Just like Equation (1), using the raster calculator, The crop surface model (CSM) image was obtained by subtracting the DSM image from the DTM image. Finally, the maximum value of each small area in the CSM image represents the PH for later data processing.
where
is the crop surface model,
is digital surface model, and
is digital terrain model.
The CC refers to the proportion ratio of crop canopy vertical projection area to the ground surface area of each plot, and can reflect the crop growth status and physiological parameters [
34]. Before extracting CC, binary mask maps of each growth period were established to effectively exclude the background. The pixels of peas in each plot were then divided by the total number of pixels in the sampled plot to calculate the CC [
35] (Equation (2)).
where
is the number of canopy pixels of peas and
is the total number of pixels.
Texture can reflect the structural and geometric features of a canopy [
36]. The single bands of the RGB and MS images containing pea plants were used to extract texture information. The extracting method was using the gray-level co-occurrence matrix (GLCM) in ENVI 5.3 [
37]. The sliding window and sliding step were set as 7 × 7 and 2 pixels, respectively. Eight parameters from each plot were obtained for further data processing, including the contrast, correlation, dissimilarity, entropy, homogeneity, mean, second moment, and variance.
Vegetation indices (VIs), as an indicator of the spectral features of a canopy, are commonly adopted to estimate crop traits in agriculture research. In this study, the DN values derived from RGB images were used to construct 12 VIs (
Table S1) for estimating pea yield.
2.3.2. MS Data Extraction
The radiometrically calibrated MS images were used to extract the reflectance values of each band. The reflectance derived from MS images were adopted to calculate 18 VIs (
Table S2) for estimating pea yield.
2.4. Regression Technology
The EL and four base learners (Cubist, EN, KNN, and RF) were selected for estimating pea yield, which were implemented with the “caret” package in R software (V.4.2.2), and the model hyperparameters were tuned using a grid search and five-fold inner cross validation methods.
The Cubist model was developed using Rule Quest, which is based on Quinlan’s M5 model tree algorithm [
38]. This algorithm expresses a piecewise multivariate linear function that estimates the value of a variable from a series of independent variables. The training rules of the Cubist model are simple, effective, and fast, and the input space segmentation is automatically carried out by the algorithm, which can handle problems that contain high-dimensional attributes. The Cubist model has been widely applied for leaf area index [
39] and yield estimations [
40].
The EN algorithm is a linear regression model based on Lasso and Ridge, and is an improvement of several linear regression methods designed for solving high-dimensional feature selection problems. This algorithm can balance the sparsity and accuracy of a model by adjusting the value of parameter α in the objective function and selecting a subset of highly correlated features [
41].
The KNN algorithm is used to assign an average value of other sample attributes to an estimation sample to obtain an estimated result. When making an estimation, the distance between the training and test sets should first be calculated, sorted by distance from small to large, and the attributes of the K known samples closest to the estimation sample should be determined. An estimation is then made according to the established decision rules. The size and distance of K are the main factors that affect the KNN model estimation [
42]. The characteristics of this model are easy to understand, require a small amount of calculation, and offer a wide range of use.
The RF algorithm is based on the Bagging algorithm with a decision tree as the basic unit algorithm [
43], replacement sampling for generating new training sets, and training to obtain a decision tree. Then, the final result was obtained by integrating the estimations of each decision tree. The RF algorithm is characterized by randomness, does not easily fall into overfitting problems, and has a good anti-noise ability [
44].
The stacking ensemble learning was first proposed by Wolpert [
45]. It is typically a heterogeneous ensemble method that involves training multiple individual learners of different types in parallel. These learners are then combined using a meta-model to generate a final estimation result. As shown in
Figure 3, EL consists of two levels. The first layer consists of Cubist, EN, KNN, and RF, where the initial training set is inputted to each base learner using 5-fold cross-validation. In the second level, it consists of combining the estimation results of the base learners into a new matrix, using MLR as the secondary learner, leveraging the predictive abilities of multiple base learners for training, and obtaining the final result.
2.5. Model Performance Evaluation
In order to fairly and completely compare different algorithms in estimating pea yield, we conducted a five-fold cross-validation to test the accuracy of pea yield estimation. The original data were randomly divided into five subsets, with four subsets used as training data and the remaining subset as the test data. This process was repeated five times to ensure that all samples were independently validated. Finally, assessing the model performance using the test set results from five-fold cross-validation.
The
R2, the root-mean-square error (RMSE), and the normalized root-mean-square error (NRMSE) were used for evaluating the performance of each algorithms [
33].
where n is the number of all samples,
is the measured pea yield of the samples,
is the estimated pea yield of the samples, and
denotes the mean of the measured pea yield.
3. Results
3.1. Performance of Sensor Data on Pea Yield
The features extracted from two types of sensors (RGB and MS) and their combination (RGB + MS) were regarded as different datasets for estimating pea yield using Cubist, EN, KNN, and RF models (
Table S3). The estimation performances of the single and dual sensors were compared to determine the optimal sensor condition, as indicated by the highest obtained
R2 value.
Figure 4 showed the robustness assessment of the pea yield. The MS sensor performed better than the RGB sensor during the flowering, podding, early filling, and mid filling stages, with average
R2 values raising by up to 0.25, 0.36, 0.41, and 0.48, respectively. In contrast, the RGB sensor outperformed the MS sensor in the branching stage, with a higher
R2 value of up to 0.25.
The accuracy based on the dual sensor (RGB + MS) conditions was found to be superior than that of the single sensors in the five growth stages, with an average accuracy estimation up to 0.34, 0.43, 0.51, 0.58, and 0.74, respectively. The fusion data were therefore more helpful for estimating pea yield.
3.2. Effects of Different Growth Stages on Yield Estimation
To explore the effect of five growth stages on estimating pea yield, the remote sensing data of five growth stages (branching, flowering, podding, early filling, and mid filling stages) were compared using four machine learning algorithms. The average
R2 values of the four machine learning algorithms in each growth stage were selected as the result (
Figure 5). The mid filling stage yielded the best estimation accuracy based on all sensors, RGB, MS, and their combination with
R2 values of 0.43, 0.48, and 0.74, respectively. The next best values were sequentially obtained in the early filling stage (
R2 = 0.34, 0.42, and 0.58, respectively), podding stage (
R2 = 0.29, 0.37, and 0.51, respectively), and flowering stage (
R2 = 0.24, 0.25, and 0.43, respectively), while the branching stage presented the worst estimation accuracy (
R2 = 0.25, 0.24, and 0.35, respectively).
Figure 6 also showed the
R2 values of each algorithm in the five growth stages based on RGB, MS, and RGB + MS. Although the estimation accuracy fluctuated during the flowering and podding stages with some of the algorithms, the overall estimated accuracy showed an increasing trend with growth development.
3.3. Model Performance for Pea Yield Estimation
As indicated in
Section 3.1, the fusion of RGB and MS sensors performed better than single sensor in estimating pea yield. Therefore, this section adopted an EL model for estimating pea yield by using the fusion data of RGB and MS, and the estimation performance was compared with four base learners (
Table 3).
For base learners, the EN algorithm obtained the best estimation accuracy in the first four growth stages with R2 values of 0.49, 0.61, 0.56, and 0.67, respectively, and NRMSE values of 20.8%, 18.0%, 19.3%, and 16.5%. The Cubist algorithm obtained the best estimation accuracy in the mid filling stage (R2 = 0.805, NRMSE = 12.5%). For the EL algorithm, it achieved the highest R2 values across all five growth stages. Compared to the average performance of the base learners, EL improved by 0.18, 0.19, 0.14, 0.11, and 0.11 for each of the five fertility stages.
In summary, for base learners, the Cubist and EN algorithms both demonstrated better estimation effects for pea yield, while the KNN model was generally poor in this study. Compared with the single model, the EL presented the ability to significantly enhance the estimation accuracy and generalization capability of pea yield.
3.4. Yield Estimation for Different Pea Types
This study tested the applicability of the models by comparing the estimated and measured yields of two different pea types (cold-tolerant and common peas). This section only presents the results obtained under the optimal estimation performance conditions, namely, in the mid filling stage using dual sensors, which has been proven above. As shown in
Figure 7, the estimated yield of cold-tolerant peas was slightly higher than that of common peas, which is consistent with the measured values. The estimated yield, therefore, did not significantly change with pea type, thereby reflecting a satisfactory adaptability of the yield estimation models to the two types of peas investigated in this study.
3.5. Estimation Effect Analysis
The fusion of RGB and MS data show the best pea yield estimates in the mid filling stage. Hence, the absolute differential values between the estimated and measured values at the mid filling stage were then used to generate a heat map with EL and four base learners (
Figure 8). From a visual perspective, smaller color differences represent better estimation accuracy, in which the EL provided superior estimates, followed by the EN and Cubist, the KNN presented the worst estimation accuracy, which is consistent with the results in
Section 3.3. In conclusion, the most reliable yield estimation can be obtained using the appropriate algorithm and the fusion of dual-sensor data in the mid filling stage, and the scatter plot between the estimated yield and the ground yield is shown in
Figure 9.