1. Introduction
The Tianshan forests are essential for preserving the ecological balance among oasis, desert, and forest ecosystems. They regulate the climate cycle, conserve soil and water, protect biodiversity, sequester carbon, and support the timber economy [
1]. However, the forests are facing challenges due to fluctuating climate change, including prolonged extreme droughts and rising temperatures [
2,
3]. Conducting a survey to assess the current state of the forests is imperative to address these impacts and implement conservation measures.
Forest canopy height, average height, density, and aboveground biomass (AGB) are key indicators of forest ecosystem structure and biodiversity [
4,
5,
6]. However, due to the complex topography and steep slopes of the region, manual data collection and instrumental measurements face challenges and accuracy issues. Presently, most available results are based on large-scale simulations or small forest surveys, lacking a complete representation of the natural forests in Tianshan. To address this, it is necessary to redesign sampling sites to better reflect the characteristics of forests, thereby improving the precision of forest structure parameter simulations and gaining a more comprehensive understanding of the forest ecosystem in this region.
The accuracy of map** products in remote sensing applications heavily relies on the quantity and quality of training data [
7,
8]. However, gathering ground reference data has limitations, such as the quantity being constrained by time-consuming and expensive fieldwork. Long-established forest inventories provide valuable data but lack the spatial information at the local and regional scales required for effective forest management in the context of climate change challenges. Moreover, ground-based observations face challenges such as dense canopy cover, accessibility, and representativeness, making it difficult to establish statistical links with satellite observations taken from above the canopy. The discrete nature of field data also poses difficulties in matching them with continuous earth observation data at various spatial scales.
To overcome these limitations, this study proposes the use of unmanned aerial vehicles (UAVs) as a source of reference data collection. UAV-based remote sensing can obtain spatially continuous information on species coverage at very high spatial resolution [
9,
10,
11]. The application of UAV data instead of in-situ data has several advantages: (1) increased data quantity can be obtained in each timeframe, (2) data collection is not impeded by accessibility (e.g., topography) and is therefore more representative, (3) UAV data have the same viewing angle as satellite data, and (4) descriptions of target species from UAV-collected data can be leveraged with automated algorithms to enhance the efficiency of reference data collection [
12]. However, UAV data still have limitations when it comes to characterizing large-scale forests, and the significant time and economic costs of acquiring them cannot be ignored.
Satellite systems are essential for forest monitoring, especially when UAV-LiDAR resources are limited or analysis requires regional/global scale [
13,
14]. Satellites offer global remote sensing data, ensuring consistency across different geographical areas and enabling long-term monitoring of forest structural parameters. High-resolution satellites, like Sentinel missions, provide detailed forest characteristics for small-scale studies, local forest management, and environmental sensitivity analyses [
15]. Sentinel multiple spectral bands, including visible, infrared, and microwave, allow estimation of forest biomass, chlorophyll content, and vegetation cover [
16,
17]. Sentinel Synthetic Aperture Radar (SAR) complements optical sensors, providing all-weather, day-and-night visibility and retrieving surface roughness like vegetation density and forest species [
18,
19].
Machine learning combined with remote sensing image data has been widely used to simulate forest structural parameters, encompassing both classification and regression, which have certain advantages [
16,
20]. Machine learning can effectively deal with large-scale, complex data through automated feature selection, model tuning, and prediction. The most common and widely used machine learning methods are decision trees (e.g., random forest) and support vector machines. Random Forest (RF), a prominent machine learning approach based on integrated learning, conducts classification and regression analyses by constructing multiple decision tree models and aggregating their predictions. Notably, RF demonstrates robustness against noisy data and overfitting while being adaptable to diverse data distributions and feature relationships. It provides feature importance analysis, quantifying the contributions of features used in decision tree construction. Furthermore, RF excels at handling high-dimensional and large-scale datasets, effectively managing data with numerous features and samples, and mitigating dimensionality issues [
21,
22,
23].
This study employed the Bayesian-Random Forest model to assess the capabilities of Sentinel-1 and -2 series data, along with DEM data, for modeling forest structural parameters. To train and validate predictive models, UAV-LiDAR data were used as real-world data. The objectives of this study were to (1) demonstrate the adaptability of the Bayesian-Random Forest model in accurately predicting forest structural parameters; (2) investigate the potential of Sentinel-1, Sentinel-2, and DEM data in predicting forest structural parameters in the Tianshan Mountains; and (3) explore the spatial distribution characteristics of forest canopy height (m), average height (m), density (plant/ha), and AGB (t/ha) in the western Tianshan Mountains.
5. Conclusions
In this study, forest canopy height, average height, density, and AGB for natural Schrenk spruce forests in the western Tianshan Mountains were successfully estimated. We used a machine learning model called Bayesian-Random Forest, using data from UVA-LiDAR to measure the forest structural parameters and data from STRM DEM, Sentinel 1, and Sentinel 2 to predict these parameters. This is the first time such extensive UAV-LiDAR data have been used to estimate forest characteristics in this region. The Bayesian optimizer efficiently found the best parameters for the Random Forest model and created accurate predictions in a short time. Based on the prediction results, the largest forest area, the highest forest height, the highest density, and the larger AGB were distributed in the 2000–2500 altitude range of the western Tianshan Mountains. Additionally, all forest structural parameters exhibit a gradual decrease with increasing mountain altitude. This is crucial knowledge for researching how climate change affects forests in Central Asia and is useful for comprehending the growth and ecological function of Schrenk spruce forests.