1. Introduction
Atmospheric pollution is primarily composed of PM2.5 particles, which can persist in the atmosphere and exert widespread and profound impacts on human health and the environment [
1]. Environmental remote sensing provides a means for globally, continuously, and in real-time retrieving PM2.5 concentrations [
2]. The PM2.5 data from the United States (U.S.) Environmental Protection Agency (EPA), as a product of the U.S. ground-based air quality monitoring network, have undergone rigorous quality validation [
3], and are widely utilized in environmental science [
4], atmospheric science [
5], public health [
6], disaster management [
7], and other fields. This dataset furnishes researchers and policymakers with crucial data support, aiding in our better understanding and addressing of atmospheric pollution issues [
8]. Nevertheless, existing remote sensing-derived PM2.5 products commonly suffer from low spatial resolution, failing to delineate local details [
9]. To overcome this limitation, acquiring high-resolution PM2.5 spatial distribution data is of paramount importance for the dynamic monitoring and control of atmospheric PM2.5 pollution [
10,
11,
12].
In the task of retrieving PM2.5 concentrations, the selection of predictive factors is a critical step. The choice of predictors should be based on atmospheric physical and chemical processes, ecological environment quality, meteorological factors, and geographic information. Common parameters include Aerosol Optical Depth (AOD), Normalized Difference Vegetation Index (NDVI), meteorological conditions, and topographic features [
13]. AOD provides global atmospheric optical information and is a significant indicator of the atmospheric physical and chemical evolution of air pollutants [
14]. Numerous studies have demonstrated a significant correlation between AOD and surface PM2.5 concentration, making AOD one of the most reliable explanatory factors in PM2.5 prediction [
15,
16,
17]. NDVI reflects vegetation health, land use changes, and ecosystem productivity, serving as an essential measure of ecological environment quality. Studies have shown that NDVI significantly impacts PM2.5 concentrations by reducing dust, adsorbing particles, improving microclimate conditions, reducing pollution sources, and enhancing ecosystem purification functions [
18]. Meteorological conditions actively influence the dispersion, dilution, and deposition of pollutants, significantly affecting the spatiotemporal distribution of PM2.5 concentrations [
19]. Topographic features alter atmospheric morphology by adjusting air flow, forming the temperature inversion layers, influencing meteorological conditions, and creating urban heat island effects, indirectly affecting the spatial distribution of PM2.5 concentrations [
20].
The estimation of PM2.5 through remote sensing involves methods such as AOD inversion [
21], atmospheric chemical transport models [
22], spatiotemporal interpolation [
23], data assimilation [
24], among others. The most widely used parameter is the satellite-monitored AOD. To estimate ground-level PM2.5 from AOD, a typical strategy is to establish the statistical relationship between AOD and PM2.5 [
25]. The accuracy of these methods is constrained by the number of monitoring stations, remote sensing data resolution, and model quality [
26], and a comprehensive approach incorporating multiple data sources and methods is often necessary to enhance inversion accuracy. However, the reality is that traditional spatial statistical tools tend to focus on detecting spatial relationships in sample data [
27], and when the spatial density and uniformity of sampling points are insufficient, the estimation accuracy and confidence significantly decrease [
28]. The air quality products derived from ground-based stations can solve these problems.
In recent years, new research has emerged in which geostatistical tools and machine learning methods are used for PM2.5 inversion. Scholars have designed geographically weighted regression (GWR) models [
29,
30] and mixed-effects models [
31] for detecting geographical relationships between PM2.5 and data such as AOD, meteorological parameters, and land use information. The novel convolutional neural network (CNN) model can utilize the spatial correlation between predictor variables to increase the ground-level PM2.5 estimation accuracy to some extent [
32]. Combining AOD and big data, the PM2.5 regression model using the random forest algorithm can assess the risk of air pollution exposure in the Yangtze River Delta urban agglomeration region during COVID-19 [
33]. These inverse models are not perfectly compatible with nonlinear fitting and spatial relationship detection. Traditional geostatistical models cannot fit complex nonlinear relationships, while machine learning methods cannot express spatial non-stationarity.
As studies on PM2.5 spatial patterns increase, various machine learning-related methods (e.g., CNN, Artificial Neural Network (ANN) and Generalized Regression Neural Network (GRNN)) have gradually been introduced. To more accurately calculate geographically weighted kernels, the Geographically Neural Network Weighted Regression (GNNWR) innovatively combines Ordinary Least Squares (OLS) and neural networks to successfully estimate complex geographical processes [
34]. In addition to spatial relationships, temporal series are also important research objects in the field of GWR. The Geographically and Temporally Weighted Neural Network (GTWNN) accounts for both spatial and temporal non-stationarity and has been applied in high-precision crop yield prediction modeling [
35]. To address nonlinearity and spatiotemporal heterogeneity, researchers have proposed another GTWNN using GRNN, which shows a superior performance in exploring the spatiotemporal relationship between AOD and PM2.5 [
36]. However, these GWR-ANN methods mainly focus on improving the accuracy of regression relationships without considering the impact of training samples on the accuracy of spatial dependence, resulting in a certain degree of discount in predictive performance.
Many studies have used GWR or neural networks for PM2.5 inversion, with their data processing methods being essentially similar. When faced with imperfect training samples, the common approach is to first train the optimal regression model [
37]. If the prediction samples are also not ideal, one can choose to enhance the density of prediction samples using interpolation techniques, effectively filling the entire target resolution space, or proceed without any further adjustments [
38]. Finally, the prediction samples are input into the regression model to obtain the prediction results. If non-ideal prediction samples are left unprocessed, interpolation methods are used to complete the prediction data [
39]. This posteriori method results in predictions with high specificity, greatly limiting the model’s generalization capability [
40]. In contrast, constructing a uniform and dense spatial network would lead to a more comprehensive and accurate understanding of spatial non-stationarity.
This study endeavors to incorporate spatial non-stationary into a machine learning model for the high-precision estimation of PM2.5 via remote sensing data. The proposed Flexible Geographically Weighted Neural Network (FGWNN) model is designed with the Flexible Geographical Neuron (FGN) and Geographically Weighted Activation Function (GWAF) to mitigate the negative impacts of uniform and sparse samples on regression accuracy. It enables the simultaneous learning of spatial non-stationarity and global non-linear relationships within the neural network. The 2.5 km spatial resolution PM2.5 data over the contiguous U.S. (CONUS) can be predicted by FGWNN with conventional satellite remote sensing product data. The organization of this paper is as follows.
Section 2 elaborates on the study region and data materials associated with this study.
Section 3 provides a detailed description of the FGWNN model design and evaluation.
Section 4 demonstrates the FGWNN’s performance and spatiotemporal patterns of PM2.5. Recommendations and further discussions based on this research will be presented in
Section 5. Finally, we conclude in
Section 6.
5. Discussion
The spatial distance between geographical objects determines the strength of their spatial relationships, referred to as spatial dependency [
57]. Spatial weighting in GWR [
58] describes the varying spatial dependency between individual objects and all objects. In this paper, we define the strong or weak variation in the region as the spatial dependency field (SDF). Although GWR can effectively detect spatial non-stationarity, the SDF used in the model has two limitations. First, in the sparse state (
Figure 11a), training samples are homogeneously distributed in space but with low sample density. The SDF can only roughly reflect the spatial dependency pattern of the original data, overlooking the finer details. Second, when spatial density is insufficient, training samples are heterogeneously distributed in space (eccentric and uneven), which falls into the biased state (
Figure 11b). This situation can cause the significant deformation of the SDF, affecting the accurate representation of the original spatial dependency pattern, and significantly diminishing the quality of the final model. In conclusion, if the ideal SDF (
Figure 11c) adapted to the target spatial scale is constructed, its learned spatial non-stationarity will be more comprehensive and accurate.
Large-scale, low spatial resolution remote sensing images inevitably suffer from issues related to insufficient spatial details, challenging target identification, limited image quality, and application constraints [
59]. The high-definition images inferred and predicted using FGWNN can facilitate the precise identification of areas with air pollution anomalies, providing strong evidence for the analysis of air pollution driving factors [
60]. The quality of the SDF constructed by traditional geographical detectors depends on the density and uniformity of the spatial distribution of samples, which often suffer from sparsity and non-homogeneity in real-world data [
61]. To overcome this limitation, FGWNN automatically allocates homogeneous and moderate FGNs to the hidden layer, achieving an ideal SDF state. The FGWNN method proposed in this paper realizes the effective detection of spatial relationships through the establishment of a flexible SDF, which can accurately reconstruct the real features of geographical data. Additionally, it significantly enhances the regression accuracy and spatial resolution of PM2.5 inversion, making it a reliable-efficient remote sensing map** technique.
In exploring the spatial and temporal patterns of PM2.5 using high-precision remote sensing products from FGWNN inversion, different levels of the study area require the use of products with a compatible spatial resolution. This aims to strike a balance between inversion accuracy and computational efficiency.
Figure 12 shows the study area at four administrative levels, i.e., national level (CONUS), division-level (Pacific Division), state level (California State), and county level (Los Angeles County). When the spatial resolution of the remote sensing image is not less than 20 km, PM2.5 data within the CONUS can be obtained with clear image details and no obvious jaggedness. Increasing the spatial resolution of remote sensing products to 10 km can fully demonstrate the spatial distribution characteristics of PM2.5 in the Pacific Division. In order to clearly detect the air quality distribution pattern in California State, the spatial resolution of remote sensing images is required to be higher than 5 km. Remote sensing products inverted from existing data with a maximum spatial resolution of 2.5 km can roughly reflect the general situation in Los Angeles County. Theoretically, the FGWNN model is able to complete the inversion of remote sensing products with arbitrary target resolution when the spatial resolutions of the independent variables meet the requirements.
Although the FGWNN has made some progress, there are still some limitations. These potential issues should be further considered in subsequent research or when applying the method more widely. The choice of spatial bandwidth relies on GWR calculations [
62], which may limit its practicality in spatiotemporal analysis scenarios. In high-resolution remote sensing data contexts, model training and geographical activation processes incur substantial computational costs [
63]. In the future, we plan to optimize the acquisition of spatial bandwidth by embedding this process within FGWNN. Simultaneously, we aim to refine the FGWNN network structure to reduce its dependence on computer resources. It is our hope that the FGWNN model can be extended to the spatiotemporal analysis field, further advancing the development of remote sensing spatiotemporal map** technology.