1. Introduction
Surface longwave radiation (SLR) stands as a foundational factor governing the energy equilibrium of the Earth’s surface. As the surface warms, it triggers fluctuations in surface downward longwave radiation (SDLR) at a pace of 7 Wm
−2K
−1, representing one of the most responsive fluxes within the Earth’s climatic system [
1]. The role of SLR extends to affecting the interchange of energy and substances within the land–atmosphere structure, with consequential effects on attributes of vegetation, soil climate, hydrology, and zoology [
2,
3,
4]. In this context, the comprehensive monitoring of SLR on a global scale emerges as indispensable for comprehending the dynamism of our evolving planet.
As applied research progresses, investigations such as those focusing on small-scale regions or watersheds, including studies related to understanding snow processes in glacier melting, regional ecology, and precision agriculture involving phenomena like evapotranspiration, necessitate remote sensing products of SLR with higher spatial resolutions (such as less than 100 m) to serve as input parameters for modeling efforts [
5,
6]. Furthermore, high spatial resolution remote sensing data of SLR prove invaluable in constraining the evolving resolution of global climate models, thereby reducing the uncertainties of cloud and radiation processes simulations [
7]. Major earthquakes tend to occur near active fault zones within mountainous regions due to strong tectonic activities. Previous studies suggest that variations in surface temperature and water vapor prior to seismic events may be associated with the seismogenic process [
8,
9]. SLR serves as an integrated geophysical parameter reflecting changes in surface and atmospheric thermal radiation. High spatial resolution SLR data can contribute to the study of spatiotemporal evolution characteristics of pre-seismic thermal anomalies within fault zones in a detailed pattern. Moreover, coarse-resolution products fall short in meeting the demands of applications including urban, extreme environmental, and agricultural domains, and the current availability of medium- to high-resolution remote sensing thematic data products remains inadequate. Hence, it is important to explore remote sensing retrieval methods for SLR, facilitating the acquisition of reliable surface thermal radiation product at high spatial resolutions.
After many years of development, SLR estimations based on satellite observations have evolved into three major types of retrieval methods: parameterization, physically based, and hybrid methods. Parameterization methods establish linear or non-linear statistical relationships between SLR and readily available surface or atmospheric variables, such as surface temperature, humidity, and cloud cover. For instance, the study [
10] used Moderate Resolution Imaging Spectroradiometer (MODIS) land surface temperature/emissivity products to estimate SLR. A widely used parameterization is the formula from the study [
11], which relates SDLR to screen-level air temperature and water vapor pressure. While parameterization methods are simple and computationally efficient, they may lack accuracy, particularly for diverse landscapes. Physically based methods directly employ radiative transfer calculations to simulate the atmospheric emission and absorption of longwave radiation based on atmospheric profiles of temperature, water vapor, and other relevant parameters. The Moderate Resolution Atmospheric Radiance and Transmittance Model (MODTRAN) [
12] is commonly used for this purpose and is employed in this study. This approach can be coupled with atmospheric profiles derived from satellite observations or reanalysis data. Although this approach offers a physically interpretable basics, it requires accurate atmospheric data and complex model calculations, resulting in a deep dependence on input data and significant computational costs. Hybrid methods combine parameterization and physically based models to leverage the strengths of both, resulting in more robust and accurate SLR estimates. In these methods, radiative transfer models generate training datasets, and parameterization methods build the statistical relationships. Beyond using linear models, machine learning approaches, particularly neural networks, are increasingly employed for SLR estimation due to their ability to handle large datasets and complex relationships. For example, SLR under cloudy sky conditions was estimated by combining parameterization and artificial neural networks from remotely sensed data [
13], highlighting the effectiveness of machine learning in capturing non-linear relationships. Such approach holds promise for improved accuracy but requires careful construction of training data and thorough model validation, and remains an active area of research. Note that each retrieval method has its respective strengths and limitations, and the choice of method often depends on factors such as the availability of input data, computational resources, and the desired accuracy and resolution of the SLR estimates.
Ongoing studies aim to further improve the accuracy and spatiotemporal resolution of SLR estimates from remote sensing data, as well as to account for complex atmospheric conditions and surface properties. However, in contrast to the algorithms for retrieving SLR at medium to low resolutions [
14,
15,
16,
17], only a few studies have been dedicated to SLR retrieval from high-resolution remote sensing data. Such studies necessitate the integration of satellite imagery, typically Landsat and Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) data, with meteorological observations including air temperature, humidity, atmospheric pressure, and atmospheric sounding data. Employing parameterization approaches, these investigations achieve remote sensing estimations of SLR [
18,
19,
20]. An early method for SLR estimation was introduced by the study [
21], using Landsat imagery and ground-based meteorological observations to estimate surface radiation balance. Relative to results from airborne observations, satellite-derived net radiation exhibited discrepancies of less than 12%. SDLR is estimated from air temperature and humidity, while surface upward longwave radiation (SULR) is computed via Landsat-derived surface temperature. Subsequent investigations predominantly followed analogous retrieval strategies. Application of this approach to radiation balance studies in wetland areas has demonstrated its efficacy, though universality remains an area for enhancement [
22].
Influenced by local circulation patterns, the retrieval of SLR using Landsat data necessitates the integration of land cover types and ground-based observational data [
23]. Sparse ground-based parameters, such as air temperature and humidity, require spatial interpolation to extend over larger regions, unavoidably incurring significant errors. Consequently, the study [
24] incorporated the MODIS atmospheric water vapor products into their inversion model. Reference [
25], on the other hand, employed the atmospheric radiative transfer model MODTRAN, inputting ASTER and site-based observational data for direct simulation and computation of SDLR [
26]. This approach leads to a substantial increase in computational burden and requires plenty of atmospheric and surface parameters from various field observations, thereby challenging its scalability to broader regions. Reference [
27] directly utilized atmospheric sounding data of field measurements and a spatial interpolation method to retrieve SDLR. Landsat and meteorological data have also been used for estimating instantaneous, daily, and daytime surface net radiation under clear-sky conditions [
28].
Meanwhile, when comparing data products with moderate spatial resolution (e.g., MODIS) to high spatial resolution multispectral satellite data from instruments such as ASTER or Landsat, the latter often reveals limitations. ASTER possesses five thermal infrared (TIR) bands, a 16-day revisit interval, a higher spatial resolution of 90 m, and a restricted array of data products (e.g., land surface temperature (LST) and band emissivity) pertinent to the estimation of SLR. In contrast, MODIS features 16 TIR bands and a more comprehensive suite of data products (including LST, emissivity, column water vapor, air temperature and humidity profiles, and cloud parameters) available at least four times per day. These limitations in the ASTER data products complicate the direct application of common SLR retrieval models developed for sensors with more comprehensive atmospheric and surface information. For instance, parameterization methods necessitate additional atmospheric or ground-based data, somewhat constraining the global applicability of this approach. Reanalysis data are frequently employed as model input. However, they consistently exhibit a coarse spatial resolution (e.g., 0.25 degrees), and the process of downscaling the data to 90 m introduces significant uncertainties. Therefore, develo** an SLR retrieval model specifically tailored for high spatial resolution satellite remote sensing data is crucial to ensure its relevance and effectiveness in scientific research.
Notably, advancements in high-resolution remote sensing and machine learning algorithms hold the potential for further refinement of SLR estimations. The Light Gradient Boosting Machine (LightGBM) machine learning model has been widely explored in remote sensing parameter retrievals and object recognition. It stands out as a prominent open-source gradient boosting framework, developed by Microsoft, that enjoys widespread employment in machine learning tasks. An exceptional attribute of LightGBM is its application of the “Gradient-based One-Side Sampling” technique during training. This method prioritizes and selects the most informative data instances, contributing to reduced memory consumption and accelerated training. While conventional gradient boosting frameworks opt for depth-wise tree growth, LightGBM innovatively adopts leaf-wise tree growth. This approach expands the tree’s leaves based on the maximum loss reduction, leading to a more precise and efficacious model. LightGBM has been employed in various applications, such as estimating water depth, salinity and lithium concentration from Landsat data [
29], wind power forecasting [
30], forest canopy height retrieval based on ICESat-2, Landsat-8 and Sentinel-2 data [
31], and dynamic water extent map** in high spatiotemporal resolution using Sentinel-2 data [
32], and ground-level PM
2.5 estimation from Himawari-8 and auxiliary environmental variables data [
33]. As a result, the LightGBM model often outperforms other machine learning frameworks like XGBoost and CatBoost in terms of both speed and accuracy in certain cases [
30,
34]. Its fast performance and ability to handle large-scale tasks make it a top choice for both academic research and practical industrial applications. Notably, limited input features were utilized in constructing the retrieval models in this study. Unlike the LightGBM model, which excels with simpler features, deep learning models typically require more complex features and are trained on larger datasets [
35]. This distinction underscores the different needs and considerations when applying various machine learning techniques in different contexts.
ASTER remote sensing data possess five thermal infrared bands (8–12 μm), higher spatial resolution (90 m), and an extended time series observation (from 2000 to present), rendering them conducive for validation and analysis after SLR retrievals compared with Landsat data. These attributes establish ASTER observations as a better dataset for constructing high spatial resolution SLR retrieval models. A summary of common surface longwave radiation products is shown in
Table 1. While the temporal resolution of the MDSLF (MSG Downward Surface Longwave Flux) product can be as fine as 30 min, the highest spatial resolution is 1 km as provided by the GLASS (Global Land Surface Satellite) product. In comparison, ASTER can generate SLR data with a much higher spatial resolution of 90 m, albeit with a limited revisit interval, highlighting the necessity of develo** high spatial resolution SLR products.
In this study, we showcase the utilization of ASTER remote sensing data to develop clear-sky SLR models. This is achieved through a combination of atmospheric radiative transfer model and a machine learning technique, specifically the LightGBM algorithm. The upcoming Landsat Next satellite, projected for launch around 2030, is anticipated to feature a thermal infrared band configuration similar to the ASTER sensor [
43]. As such, this study provides a methodological precursor and a foundational reserve of shared technologies for the prospective retrieval of SLR and product generation, utilizing Landsat Next data. Moreover, leveraging the SLR dataset derived from ASTER retrieval serves as a cornerstone for validating medium to low-resolution products, facilitating judicious selection of spatial scales for validation, informing future observational site placements, and furnishing scientific reference for enhancing authenticity assessment methodologies.
3. Methods
The SLR estimations from ASTER TIR measurements involve several steps, as illustrated in
Figure 1. Initially, a spatial-temporal match was performed based on an ERA5-based atmospheric profile dataset and a spectral emissivity dataset known as CAMEL. An SDLR-based screening method was proposed in above dataset to achieve a relatively even distribution of SDLR, representing the hydrothermal condition of the near-surface atmosphere. Subsequently, a representative atmospheric profile and corresponding spectral emissivity dataset were constructed. Thirdly, the simulation dataset, including ASTER TIR band radiances and SLR components, was generated using MODTRAN v5.2 from this dataset. To emulate real ASTER observations, random white noise was added to the simulated TIR bands based on ASTER band NE∆T. Fourthly, an initial full band model was created using LightGBM v4.2.0, which was employed in global sensitivity analysis to identify optimal bands for both SDLR and SULR retrievals. The hyperparameters of the LightGBM model were determined using the Optuna v 3.5.0 framework based on the optimal bands from the representative dataset. The final SLR models were then constructed based on the optimal bands and hyperparameters and were validated using ground measurements from SURFRAD.
3.1. Rationale of SLR Retrieval Method
The physical basis for this hybrid method in SLR retrieval is introduced in the work [
52]. Based on the radiative transfer theory, the spectral radiance received
at the top of the atmosphere (TOA) in cloud-free conditions can be approximated as the sum of the radiance contributions from the Earth’s surface and all atmospheric levels, as expressed in (1). This formula underpins the retrieval algorithms that derive SLR from measured TOA radiances in the TIR region.
where
is the surface emissivity at wavenumber
v;
B indicates the Planck function;
is the surface temperature;
is the total atmospheric transmittance from the surface at pressure
to the TOA at
v;
is the air temperature at pressure
P;
and
denote the zenith angle and azimuth angle, respectively. The first term on the right-hand side of (1) is the spectral radiance emitted directly from the surface and transmitted to the TOA. The second term denotes the hemispheric atmospheric downwelling longwave radiation at
reflected by the surface and subsequently attenuated by the atmosphere along the path from the surface to the TOA. Lastly, the third term represents the atmospheric upwelling path radiation at
. The terms account for the radiative contributions from the surface, atmospheric reflection and emission, respectively, enabling the retrieval of information related to the surface and atmospheric state.
As shown in (1), the coupling of upwelling and downwelling radiative components in the atmospheric radiative transfer equation poses a challenge for the direct estimation of SDLR from TOA measurements. While the connection between SDLR and TOA radiances is inherently non-linear and complex, statistical methods have been proposed to approximate this relationship through linear or non-linear models [
52,
54,
55]. These approaches aim to establish mathematical functions that relate the measured TOA radiances to the desired SDLR quantity, enabling its direct retrieval from satellite observations.
The TOA radiances in TIR bands are strongly dependent on band weighting functions [
52]. Therefore, different bands of the ASTER sensor exhibit specific sensitivities to atmospheric and surface information. The band correlation between ASTER and MODIS is presented in
Figure 2, demonstrating their spectral interrelated characteristics in this critical range. ASTER bands 10 and 11 partly overlap with MODIS band 29, and ASTER bands 13 and 14 partly overlap with MODIS band 31. The MODIS window bands 31 and 32 have been utilized in LST retrieval [
56], and LST has a strong correlation with near-surface air temperature and SULR. Radiance measurements obtained from MODIS bands 29, 31, and 32 are known to provide valuable insights into atmospheric moisture content owing to the weak water vapor absorption observed within these spectral regions [
57], and affirmed by the study [
52]. Therefore, the spectral characteristics of the TIR bands play a crucial role in determining the information content of the measured radiances. By exploiting the spectral overlap and correlations between ASTER and MODIS bands, it is possible to leverage the well-established retrieval techniques for LST and SULR from MODIS to extract relevant information from ASTER observations. Despite the relatively weak signals related to near-surface air temperature and water vapor in the ASTER bands, their presence suggests the feasibility of retrieving SDLR, analogous to the approach using MODIS data [
52,
58].
3.2. Generation of Atmosphere Profiles and Emissivity Spectra Matchups
Atmospheric radiative transfer simulations by the MODTRAN model necessitate specific inputs: atmospheric profiles containing temperature and moisture data, as well as surface emissivity spectra. For every atmospheric profile, surface elevation information based on ERA5 land orography data was added, matched to its latitude and longitude coordinates. Initially, the dataset provided 25 potential emissivity pairs at around 11 and 12 µm [
47]. However, this proved insufficient for MODTRAN simulations. Consequently, the CAMEL database was utilized, offering high spectral resolution within the 3.6–14.3 µm range, to complement each atmospheric profile.
For each profile, emissivity spectra were extracted from a window around it monthly, forming an initial dataset from April 2000 to December 2016. These emissivity spectra exhibit relative uniformity attributable to analogous land cover characteristics, with their fluctuations primarily attributed to phenomena associated with vegetation, snow, or alterations in land cover, all of which are responsive to atmospheric profiles.
Through an iterative process, this dataset was refined using a method based on maximum covariance, see (2). The covariance was calculated between emissivity spectra in the subset and a potential candidate. If the covariance exceeded a predetermined threshold
, the candidate spectrum was added to the subset. When the spectra count within the subset surpassed 20, the threshold increased by 20%, and the process was repeated.
where
m is the number of emissivity spectra in the subset;
n is the number of points in the emissivity spectra;
is the i-th emissivity value of a spectra in the subset, and
is its average emissivity value;
is the i-th emissivity value of a potential candidate, and
is its average emissivity value.
Ultimately, a new emissivity spectra dataset for each atmospheric profile was generated. In
Figure 3, it becomes evident that out of 1746 emissivity spectra, only 7 spectra were selected to stand as representatives. These chosen spectra were deemed to sufficiently capture the diverse variations seen in the MODTRAN simulation, meanwhile significantly reducing the computational resources required for this analysis.
It is evident that an increased number of atmospheric profiles can lead to a more accurate representation of atmospheric conditions. However, this expansion may potentially impede radiative transfer simulations, which are computationally intensive. Additionally, unbalanced samples can distort the fitness of machine learning models. To tackle this challenge and optimize computational efficiency, two approaches were applied.
First, a constraint by considering the SDLR values associated with each profile was introduced. Given that temperature and humidity profiles significantly influence SDLR, this metric serves as a valuable indicator of the collective hydrothermal state of the atmosphere. To calculate the SDLR value, MODTRAN simulations were employed that utilized the temperature and humidity profiles from the newly generated dataset. The SDLR range, spanning from 50 to 500 W/m2, was systematically divided into 50 bins with intervals of 10 W/m2. Subsequently, all atmospheric profiles were categorized based on their respective SDLR bins.
Then, in order to ensure that each bin contained no more than 400 profiles—an empirical value established through iterative experimentations—a screening procedure was carried out. Initially, one atmospheric profile was randomly selected to form the initial dataset. This dataset was then updated iteratively by evaluating the similarity between newly extracted atmospheric profiles and those already present in the dataset. A similarity indicator (Λ) was employed to ascertain that each newly selected profile sufficiently differed from the existing ones. The value of Λ for each new atmospheric profile was determined according to the following formula:
where
and
represent either the air temperature or specific humidity of a new atmospheric profile at the i-th layer in the initial dataset and as the candidate profile, respectively; n denotes the total number of effective layers in an atmospheric profile; m is the number of atmospheric profiles in the new dataset; and
denotes the geopotential height at the i-th layer. The initial threshold values for air temperature and specific humidity are set at 0.5 K and 0.05 g/kg, respectively. If
Λ exceeded 0.5 K for air temperature or 0.05 g/kg for specific humidity, the new atmospheric profile was incorporated into the dataset. Following each epoch, if the new dataset exceeded 400 profiles in a bin, this procedure was reiterated, gradually increasing the air temperature and specific humidity thresholds by 20%.
A uniform sample distribution in terms of SDLR was achieved, aiming to enhance the performance of regression models and mitigate the likelihood of both overestimation and underestimation at extreme SDLR values [
58].
Figure 4a illustrates the distribution of sample numbers in each bin, highlighting a satisfactory overall evenness. However, it is worth noting that the count is relatively lower for SDLR values <180 W/m
2 or >460 W/m
2. Finally, a comprehensive and representative atmospheric profile dataset was generated spanning global land areas. The spatial and statistical attributes of this dataset are showcased in
Figure 4b–e. While these profiles cover a substantial portion of land, their representation in tropical zones is relatively sparse. This could be attributed to the comparable extremely hot and humid conditions prevalent in these regions, and limited samples is sufficient to effectively capture such variations. The air temperature typically ranges from 200–350 K, with a notable peak around 300 K. Total Column Water Vapor (TCWV) observations cluster around values close to 0, while the second peak appears around 5.0 cm, with a maximum of 8.0 cm. In summary, this database signifies a refined subset of atmospheric profiles across different regions worldwide.
3.3. MODTRAN Simulations
Both SDLR and SULR, as well as TOA band radiances from the ASTER sensor, were simulated using MODTRAN version 5 [
59]. The ASTER sensor’s viewing zenith angle falls within a range of ±8.55°, making the nadir viewing angle the suitable choice for the MODTRAN simulation. To emulate SDLR, essential input data included the profiles of air temperature, humidity, ozone, and surface elevation. The atmosphere model for additional gases, such as CH
4, N
2O, and CO, was adjusted based on the specific time and latitude of the given atmospheric profile. All other settings were kept at their default values. Following this, the calculation of TOA spectral radiance (
) is performed according to the following formula:
where
is the wavenumber;
is the LST, derived from the ERA5-based atmospheric profile dataset, containing six distinct values;
is the emissivity spectrum extracted from CAMEL database at
;
is the upwelling atmospheric transmittance at
;
is the surface downward spectral radiances simulated by MODTRAN at
;
is the spectrum of path radiance towards the TOA and is also simulated by MODTRAN at
.
The ASTER band radiances (
) are calculated as:
where
and
are the lower and upper boundaries of the spectral response function (SRF) of the ASTER TIR band;
is the SRF, which can be obtained from
https://asterweb.jpl.nasa.gov/characteristics.asp, accessed on 24 June 2024.
The SULR (
, W/m
2) can be calculated as:
where
is the broadband emissivity that is integrated from emissivity spectrum of CAMEL data and corresponds to each atmospheric profile as illustrated in
Figure 3;
is the Stefan–Boltzmann constant (
);
is SDLR (W/m
2) simulated by MODTRAN. Subsequently, a comprehensive dataset was assembled. This dataset incorporates the inputs of band radiance values (
) across five ASTER TIR bands, as well as surface elevation. On the output side, it consists of the SULR and SDLR, both simulated by MODTRAN. This dataset was then employed to effectively train the machine learning model.
3.4. The LightGBM Model and Determination of Its Hyperparameters
The LightGBM model was used for establishing the statistical linkage between SLR and the band radiances of ASTER TIR bands. Nevertheless, such a model has a range of hyperparameters that require careful tuning to achieve optimal peak model performance. Hyperparameters are parameters that govern the learning process of the algorithm, influencing the way the model learns from the training data and makes predictions on unseen data. However, they cannot be trained and must be predetermined before model training commences. Consequently, techniques for hyperparameter tuning emerge as a critical endeavor. By tuning hyperparameters, we essentially undertake a search for the optimal combination of settings that enable the model to generalize effectively to new, unseen data. This process entails exploring various values for hyperparameters and evaluating the model’s performance using methodologies such as cross-validation.
To facilitate this essential process, we employed Optuna, an open-source hyperparameter optimization framework designed to automate the search for optimal hyperparameter configurations [
60]. Utilizing a 10-fold cross-validation approach, Optuna navigated through the hyperparameter space to identify the optimal parameter combination, guided by the Root Mean Square Error (RMSE) metric as criterion. The resultant parameter settings, along with their corresponding optimal values for both the SULR and SDLR retrieval models, have been compiled in
Table 3. Notably, the optimal configuration for the SDLR retrieval model exhibits a greater degree of complexity when compared to the configuration for the SULR model.
3.5. Global Sensitivity Analysis
The optional inputs for the SLR retrieval models include ASTER’s bands 10–14 and the surface elevation from the AW3D30 (ALOS World 3D—30 m) DEM product [
61]. To identify the optimal inputs for the SULR and SDLR models, a comprehensive global sensitivity analysis was conducted, drawing from methodologies outlined in the studies [
62,
63]. Sensitivity analysis refers to the study of uncertainty in the output of a model or system and further determining the sources of that uncertainty, specifically investigating the extent to which changes in input parameters result in variations in the output. Therefore, sensitivity analysis is an essential and routine step in the process of system modeling. The Sobol method is a variance-based global sensitivity analysis approach that can handle non-linear responses and measure the effects of interactions in non-additive systems [
64]. While the global approach captures interactions among inputs, particularly in non-linear and non-additive models, it often necessitates probabilistic data and thus incurs computational demands. It falls within the realm of probability theory, characterizing uncertainties in both input and output as probability distributions, dissecting output variance into segments assignable to individual input variables and their interplays. Consequently, this method facilitates quantification of the effects of model inputs or external factors on desired outputs. By conducting a global sensitivity analysis, the most influential input parameters can be pinpointed, thus enhancing the overall model’s reliability and effectiveness.
3.6. Training of SLR Models
The TIR bands of the ASTER sensor exhibit a NE∆T of ≤0.3 K [
44]. Therefore, instrument noise was introduced to the calculated spectral band radiance. Employing a lookup table (LUT) methodology, the transformation of ASTER band radiance from the MODTRAN simulated dataset to band brightness temperature (BT) was executed. Subsequently, random additive white noise, with an average of 0 K and a standard deviation of 0.3 K, is superimposed onto this BT dataset to emulate ASTER observations. This process culminates in the calculation of five band radiances from the noise-affected band BTs through the same LUT.
The inputs undergo normalization using the min-max approach as expressed in (7), which linearly transforms the raw input features to a common range, typically spanning from 0 to 1.
where
is the value of an input feature;
and
are the minimum and maximum values of that feature, respectively. Normalization can accelerate the training convergence for identifying optimal solutions and potentially elevate the model’s predictive accuracy.
The MODTRAN simulated dataset is then randomly partitioned into three distinct segments: training (60%), validation (20%), and test (20%) datasets. The validation dataset assumes a pivotal role in the model training phase, while the test dataset serves to assess the model’s performance as a separate dataset. Employing an early stop** strategy, measures are taken to forestall model overfitting. This strategy ensures continued training until the validation score ceases to improve by a user-defined minimum threshold.
3.7. Validation
The ASTER data collocated with SURFRAD sites spanning the period from 2000 to 2021 were acquired for validation and analysis. The derived ASTER SLR data underwent validation against in situ measurements within a
neighboring window (i.e., a radius of 135 m), along with a temporal window of 12 min. Notably, all pixels within the
window needed to exhibit clear sky conditions [
5], and subsequently, their average value was considered when their standard deviation was less than 10 W/m
2 to ensure a relatively homogeneous scene. Temporally, all ground measurements (comprising 4 samples prior to 2009 and 12 samples post-2009) had to be valid, with their standard deviation being less than 5 W/m
2. This consideration is rooted in the relatively stable nature of SDLR variations under clear sky conditions over a short timeframe. To assess accuracy, commonly utilized evaluation metrics including the bias, RMSE, and the Coefficient of Determination (
R2) were employed. Their definitions are defined as follows:
where
N is the total count of validation samples;
is the retrieved SULR/SDLR value for the i-th sample;
is the true SULR/SDLR value for the i-th sample, which is the ground measurement;
is the average of all
; and
is the average of all
.
6. Conclusions
SLR is a fundamental climate variable that governs the radiation budget and energy balance at the Earth’s surface. As an integrated parameter reflecting land surface emission and atmospheric thermal radiation, the estimation of high spatial resolution SLR is essential for climate studies and environmental modeling from local to global scales. This investigation constructed and validated a hybrid statistical-physical approach to retrieve SLR from ASTER TIR satellite data at 90 m resolution. A global database of land surface emissivity spectra and reanalysis-based atmospheric profiles was generated. The MODTRAN was used with these datasets to simulate TOA radiances and SLR components under a range of surface and atmospheric conditions. A LightGBM machine learning algorithm was trained on the simulation results to establish quantitative relationships between ASTER band radiances, surface elevation, and SLR fluxes. Global sensitivity analysis determined the optimal input variables to be ASTER bands 13, 14, 12, and 11 for SULR, and bands 10, 11, 12 and surface elevation for SDLR.
Validated against ground-based measurements from the SURFRAD network, the SULR model achieved a bias of 3.42 W/m2 and a RMSE of 17.76 W/m2, while the SDLR model had a bias of 3.92 W/m2, and a RMSE of 25.36 W/m2. Retrievals exhibited systematic biases related to extreme temperature and moisture conditions, with SULR overestimating in hot humid atmospheres while underestimating in cold dry conditions. SDLR showed large negative biases in very arid atmospheric conditions. These errors likely stem from deficiencies in current radiative transfer models under such non-standard temperature and humidity regimes.
Overall, this study demonstrates the potential for a hybrid statistical-physical modeling approach to generate global, long-term SLR datasets at 90 m resolution from ASTER and comparable TIR sensors. The method established provides a pathfinder for SLR retrieval from future missions like Landsat Next. With further refinements to the radiative transfer modeling for extreme atmospheric conditions, this approach could produce valuable high-resolution SLR data for studies of urban climate, ecosystems, evapotranspiration, and land–atmosphere interactions from local to global scales.