2.1. Study Area and Data Description
The Heihe Basin (97°24′–102°10′E, 37°41′–42°42′N), with an area of 130,000 km
2, is the second largest inland river basin in Northwest China. Our study area is situated in an oasis–desert ecotone of Zhangye City (31°14′–32°37′N, 118°22–119°14′E) within the middle reaches of the Heihe Basin. The area experiences an arid continental climate and long dry season (from October to May) and short rainy season (from June to September). The annual mean temperature in the area is 6.5 °C and the average annual precipitation (evaporation) is 115.6 (2107.1) mm [
46]. June to September are the hottest and most humid months, in which the average maximum air temperature reaches 39.3 °C The study area contains four main land cover types, namely, wetland, impervious surfaces, vegetation, and desert, which are located in the northernmost, north, middle, and northwest (southeast and southwest) parts, respectively. The oasis locates in the middle of this region surrounded by the desert. Six ground sites (wetland, maize, orchard, Gobi, wilderness, and desert sites) were selected from large flat areas of the four land cover types (
Figure 1).
All selected sites are parts of the Heihe Watershed Allied Telemetry Experimental Research (HiWATER), which is an ongoing watershed-scale eco-hydrological experiment designed from an interdisciplinary perspective to address problems including heterogeneity, scaling, uncertainty, and closing of the water cycle at the watershed scale [
47].
All ground observation data were provided by the Cold and Arid Regions Science Data Center at Lanzhou [
48]. The actual LST was estimated from upwelling and downwelling longwave radiation observed by pyranometers using the following equation:
where
Rlu (
Rld) is the surface upwelling (downwelling) longwave radiation,
ε is land surface emissivity (LSE),
Ts is LST, and σ is the Stefan–Boltzmann constant. The temporal resolution of all ground observation is 10 min. Furthermore, the ground observation data during satellite overpassing were chosen to validate the retrievals (
Table 1).
The MODIS products were acquired at 5:55 (UTC) on 3 September 2012 (autumn) and used in this study. The products were available in Level 1 and Atmosphere Archive and Distribution System. The MOD11 datasets provide the LSE (bands 31 and 32) and LST with 1-km spatial resolution, and the MOD09 datasets provide the reflectance of bands 1–7 with 500-m spatial resolution [
49]; these datasets are used to acquire remote sensing indices to downscale the resolution of MOD11 LST from 1 km to 500 m. The images under a clear sky were acquired on 17 April 2013, 15 June 2012, and 22 February 2013 to reveal the availability of our approach in other seasons (spring, summer, and winter), except for the image in the autumn.
The Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) LST and LSE datasets on 3 September 2012 in the middle reaches of the Heihe Basin were selected. The ASTER LST in the arid region exhibits higher spatial resolution (90 m) and is more accurate than that of MODIS because of the satisfactory estimation of ASTER LSE [
48,
50]. The ASTER LST was provided by the Cold and Arid Regions Science Data Center at Lanzhou. A validation reference is not available for LST simulation; as such, the ASTER images were upscaled to 500-m resolution to ensure that simulation could be validated by ASTER LST.
In addition, the land use/land cover (LULC) dataset was provided by the Cold and Arid Regions Science Data Center at Lanzhou with an overall accuracy of 92.19% [
51,
52]. The spatial and temporal resolutions are 30 m and 1 month, respectively (
Figure 2).
The Landsat 8 Operational Land Imager (OLI) and TIRS image were acquired on 21 July 2013 and then used in this study to evaluate the applicability of our approach for the satellite images in middle-high resolution. The Landsat 8 datasets, which were provided by the United States Geological Survey, included OLI and TIRS images with 30- and 100-m spatial resolutions, respectively. LST and remote sensing indices were calculated using these images.
2.2. Downscaling Methods
MODIS LST products are characterized with coarse spatial resolutions. Regression models between ancillary environmental predictors and LST have been established to enhance LST resolution. If the relationships between LST and predictors do not change with spatial resolution, then a detailed high-resolution LST can be estimated by predictors using such relationships.
RF is a nonlinear statistical ensemble bagging method. RF employs recursive partitioning to divide data into many homogeneous subsets, called regression trees, and averages the results of all trees. Each tree is independently grown to its maximum size based on a bootstrap sample from the training dataset without any pruning. In each tree, the ensemble predicts data that are not in the tree (the out-of-bag: OOB data). By calculating the difference in the mean square errors between the OOB data and data used to grow the regression trees, the RF algorithm provides an error of prediction called the OOB error of estimate for each variable. The binary splits are selected by minimizing the sum-of-squares error between the response variable and the predicted response caused by a specific split.
The choice of appropriate predictor variables in RF downscaling approach should refer to existing correlations between LST and many biophysical variables. In previous research on LST downscaling with RF, the reflectance of NIR and red wavebands was selected as predictors. However, these wavebands are not sensitive to recognizing the characteristics of some types of land cover, especially for desert that dominates a large part of the arid region. Therefore, in this paper, some remote sensing indices related to land status (such as vegetation cover, soil moisture, water cover, impervious surface cover, and desert) were selected; these factors include SAVI [
53], normalized multi-band drought index (NMDI) [
53], modified normalized difference water index (MNDWI) [
26], NDBI [
27], and NDDI [
31]. NMDI was selected to evaluate vegetation stress by soil water.
RF regression trees model the relationship between multiple remote sensing indices and LST simulation by a set of decision rules. The LULC was not regarded as the predictor to facilitate the recognition of the influence of LULC on the LST downscaling in the future. Accordingly, a model was established for each land cover. Therefore, for each land-cover type, model training on coarse LSTc and input variables is obtained as follows:
where the subscript
C indicates the variable in the coarse resolution and the subscript
F refers to the variable fitted by those variables.
The residual temperature (
e) was the difference between the original LST (LST
O) and the LST
F, as shown in Equation (2). This difference is the model estimation error:
Therefore, from the coarse-resolution LST, the simulated LST with coarse resolution (LST
C) could be estimated as follows:
Given the scale invariance, the trained model was applied to the five remote sensing indices with high resolution. Subsequently, a simulated, high-resolution LST (LST
H) is obtained, which is given as follows:
where
H indicates the high-resolution variable. For convenience, LST
H (LST
C) is regarded as the downscaled (simulated) LST, and LST
O is regarded as the original LST.
In the region with every kind of land cover, Equation (5) holds. Accordingly, the 1-km LST is downscaled by these regression models in each land cover. For convenience, the proposed approach was called multiple remote sensing indices approach of random forest (MIRF). In our study, a 1-km coarse resolution is the spatial resolution of MOD11 LST, while a 500-m resolution is the spatial resolution of remote sensing indices. A detailed procedure is presented in
Figure 3.
Two typical LST downscaling approaches were selected, namely, DisTrad and basic RF, to evaluate the effectiveness of our approach. The DisTrad approach downscaled LST using a least-squares fit of LST and vegetation index [
20]. Vegetation index in a high spatial resolution is selected as a predictor to downscale the LST in low spatial resolution. The basic RF approach was based on RF and two predictors (red band and NIR reflectances) [
44]. Unlike MIRF, the land cover data were also another predictor to simulate LST. The relationship of LST and all three predictors in high spatial resolution are regressed by RF to downscale the LST in low spatial resolution.
In addition, the applicability of the proposed method for satellite images in middle-high spatial resolution has been evaluated by Landsat and MIRF approach. The Landsat OLI images were initially adjusted with the Fast Line-of-sight Atmospheric Analysis of Hypercubes atmospheric correction algorithm [
54]. Then, the LST was retrieved using single-channel method, OLI, and TIRS datasets [
55]. For convenience, the TIRS images with 100-m resolution were resampled into 90-m images by the nearest neighbor method, whereas the OLI images with 30-m resolution were resampled into 90-m images by aggregation. The 30-m OLI images were high-resolution images, whereas the 90-m OLI and TIRS images were coarse-resolution images in the MIRF approach.
2.3. Evaluation Measures
Three measures, namely, coefficient of determination (
R2), bias, and root-mean-square error (RMSE) [
32,
56], were used to evaluate the downscaling effect of the MIRF algorithm and compare the proposed algorithm with three other downscaling methods.
In the equation below,
R2 is the coefficient of determination between the original and downscaled images. A high
R2 indicates a satisfactory downscaling. This coefficient is given by the following:
where LST
S is the simulated LST (Equations (4) and (5)), LST
R is the reference LST, and
is the average of LST
R in the entire image. In detail, the LST
R is the LST observed by the ground instrument in the direct validation, whereas the LST
R is the LST obtained by ASTER in the cross validation.
Bias and RMSE were used to test the errors between the original LST image and the downscaled image. The calculation formulas for bias and RMSE are as follows:
where
n represents the number of pixels of the image.