1. Introduction
Digital Elevation Models (DEMs) are one of the most crucial and key inputs for several topographical applications. DEM is a three-dimensional digital representation of the earth terrain showing the elevation profile or height variations. A DEM depicts the continuous earth surface by a large number of points having x, y and z information in a given coordinate system [
1]; it takes the form of a raster grid or a vector triangulated irregular network (TIN) form. DEM is a primary input to various modelling and quantifying processes in multiple disciplines such as hydrology [
2,
3], glaciology, soil sciences, agricultural, urban, climate studies, forestry, disaster risk monitoring [
4], geomorphology, and environmental monitoring [
5]. The remote sensing technologies have transformed in past times and so has evolved the process of DEM generation. Different sensors provide a variety of input for producing DEMs such as stereo-pairs from optical sensors, spaceborne SAR satellites image pairs for Interferometry or radargrammetry, LiDAR point clouds and digitization of contour maps. Every technique developed for generating an elevation model has its own advantages and limitations. Due to the advantages of microwave active sensors over other conventional methods, SAR Interferometry has emerged as an advanced technique for generating high-precision DEMs [
6].
The Synthetic Aperture Radar (SAR) is an active microwave sensor that provides high-spatial and good temporal resolution SAR image pairs for the Interferometry process for generating DEMs. The phase variation from the earth surface targets as recorded in the backscatter of the radar signals is transformed to elevation values in the interferometry technique using two SAR acquisitions taken for the same area with some time difference or different viewing angles. The SAR images are compared after registration and interference between them produces a fringe map termed an interferogram [
7]. The elevation values derived from the interferometry by the transformation of phase variations are highly correlated with the terrain topography and deformation patterns can also be mapped from this [
8,
9]. The availability of a large number of spaceborne SAR sensors such as the Sentinel-1A/ 1B, RADARSAT-1/2, ALOS PALSAR- 1/2, TerraSAR-X and TanDEM-X, provides high spatial and temporal resolution images to carry out interferometry process for the generation of DEMs. A DEM being a crucial input to varied applications, can be improved by employing fusion techniques. The task of improving different forms of DEMs has been extensively researched and presented by several authors. Fusion is a technique of combining multi-source data to improve upon individual values and produce a high-quality representation of the data. The first case of data fusion in earth sciences was employed in numerical weather forecasting, which belongs to wider estimations and control theories [
10]. The fusion of information from different sources is useful in obtaining a high-resolution elevation model that can be analyzed for its quality using other different types of datasets such as LiDAR data or multispectral images [
11,
12,
13]. Various techniques have been followed for fusing the DEMs such as statistical measures of central tendencies, Sparse representation [
14], Kalman filtering [
15,
16], and ANN framework [
17]. A novel empirical model is developed for the improvement of InSAR-based DEMs using the DEM fusion approach in the plain terrain of Ghaziabad and its surrounding regions, known as Successive Best Pixel Selection Approach (SBPSA); this is based on deriving better elevation values from multiple InSAR-based DEMs based on firstly, coherence values and then select the nearest to truth elevation values in generating improved fused DEMs. The accuracy assessment for fused DEMs shows significant improvement with an RMSE of 0.98 m for fused output DEM in comparison to 1.58 m and 1.20 m RMSE of individual input DEMs [
18].
TanDEM-X and Cartosat-1 elevation data are fused with support of ANN which is used as a predictive weight map** model in a weighted averaging fusion for different land types in urban areas of Munich, Germany [
17]. An attempt was made to improve SRTM DEM using a multilayer perceptron type ANN in coastal areas to obtain a global coastal DEM reducing the vertical error regression by combining information of vegetation indices and LiDAR reference data [
19]. An alternative method of fusion in place of interpolation techniques is an ANN model designed in MATLAB to estimate unknown heights of a DTM [
20]. The quality of SRTM DEM is improved in the dense urban city of Nice, Singapore with the usage of multispectral Sentinel-2 and Google Earth imagery as inputs and high-precision reference DEM as a target in a multi-channel CNN model using a U-Net structure. The model is implemented in the MATLAB Deep Learning toolbox and the results showed an RMSE of 4.8 m in contrast to the 9.2 m RMSE of the original DEM [
21]. Recent research work has shown the improvement of Satellite DEM that is TanDEM-X (12 m DEM) using multispectral imagery and predictive learning capability of the ANN models. The ANN model is trained in the Nice area using various indices from Sentinel-2 multispectral data and ground truth information for the area and it is validated over the Singapore site. The trained model is then tested over a new test study site of Vietnam where the ground truth is not available [
22].
Although, from the literature survey it can be inferred that very less research work has been done for DEM improvement using an ANN-based approach specifically for the widely diverse topography of India. Thus, the adaptive learning of Neural Networks provides scope for develo** methods and models which can be used as a tool for performing DEM fusion to improve the existing or generated DEMs in the complex and diverse topography of the Indian region. A large number of applications employ DEM as a primary and key input; thus, an improvement of input DEM will add up to the potential of the generated outputs from these applications. Moreover, the use of precise ICESat-2 (Ice, Cloud and Land Elevation Satellite) spaceborne altimetry data as tested and validated for assessment of DEMs shall be explored in applications of ANN-based fusion models [
23,
24]. The precise elevation data from the ICESat-2 ATL08 (Land and Vegetation) height product provided with height uncertainty suffice for the reference elevation data in hilly and complex terrains where the collection of ground truth can be a time-consuming and difficult task. Wang evaluated the accuracy of ICESat-2 data in providing ground elevation data in the Alaska region, validating it with airborne LiDAR data and other factors like slope, vegetation covers and height of vegetation [
25]. The accuracy of ICESat-2 data products is tested successfully by Zhang in the mountainous region using around 208 footprints with CORS (Continuously Operating Reference System) and UAV (Unmanned Aerial Vehicle) datasets [
26]. The abundance of laser photon datasets covering the regions around the globe, provided by ICESat-2 mission has been employed in the accuracy assessment of open access InSAR-based DEMs in the Himalayan region [
27]. SRTM 90 m DEM is assessed with the ICESat-2 data in Australia region for bare ground and in areas with tree cover and vegetation heights, concluding that in plain areas its accuracy is similar to SRTM DEM and showing the positive differences in vegetation areas for ATL08 product [
28]. ICESat-2 is also used in the assessment of open access DEMs like TanDEM-X and CartoDEM and retrieval of building heights in urban areas [
29,
30].
This paper is organized in the following manner: The core concepts and background of the Artificial Neural Networks (ANN) are explained in
Section 2.
Section 3 describes the two study sites considered in this work and explains their various aspects along with the specifications of datasets processed in this study.
Section 4 discusses in detail the processing steps and methodology followed for the development of the neural network DEM fusion framework for different topographies of the study sites. The important inferences and assessment results obtained from the neural network models are presented in
Section 5. Further,
Section 6 gives the important discussions made during the work and finally,
Section 7 concludes the study.
2. Neural Network Fusion Framework
An Artificial Neural Network (ANN) is a part of a bigger Machine Learning class which acts or mimics the human brain and works similarly its design is based on biological neurons. The structure of a neural network is interconnected where the fundamental unit is called a neuron. The special characteristic of a neural network is that of universal approximation making it highly effective; this property makes a neural network an important tool for solving problems of varied domains including remote sensing and signal processing problems [
31]. The concept of learning by a single neuron was originally presented in the work of McCulloch and Pitt’s neuron model in the 1940s. According to this model, a neuron has two main parts, a net function
u and an activation function
a. The net function is the weighted average sum of all the inputs and biases (Equation (1)) [
31]. The activation function is a linear or non-linear transformation of inputs to desired outputs.
where the
term is weighted inputs and θ is the bias or threshold of a neuron and a is the activation function for net function u.
A neural network is a non-parametric computational model that can learn the non-linear and highly complex relationship between variables. A simple model consists of an input layer, a hidden layer (or layers) and an output layer. The neurons of each layer work in a parallel combination transforming the input to desirable output as required in the application of models. Each layer is connected to the next layer forming a network. The basic operation of ANN follows the reception of information from the outside world through the input layer, which travels to the next connected layer (hidden layer) after a neuron gets activated. The activation of the neuron takes place once the threshold is reached by the weighted input and bias. The hidden layer transforms the input into desirable or meaningful outputs using transfer or activation functions. The input data comprises attributes or features of different samples that belong to different classes. The aim is to make a neural network to determine the correct output of new samples by learning the behaviour of the already classified samples in the training dataset. Weights are assigned randomly to each of the neurons; these are arbitrarily initialized values to the inputs based on the importance or amount of influence it has on the output. The activation functions or the transfer function are the mathematical functions which are differentiable in a definite range; it computes the sum of the product of weights and inputs added with biases to check whether a neuron should be activated or not. Some commonly used transfer functions are Linear, Sigmoid, TanH, ReLU (Rectified Linear Unit), Softmax functions and so on. The hidden and output layer neurons are equipped with these transfer functions [
31].
The information propagates through the network in a forward direction that is from input to output layers through the hidden layers; this traversing of data in the network is termed Forward propagation and the network is called Feed Forward network. In a multi-layer perceptron (MLP) model that consists of a layered structure (based on McCulloch and Pitt’s neuron model), non-linear activation functions are used and the neurons of each layer are interconnected. The weight matrices of MLP neurons are determined by using the error-backpropagation training method, originally proposed by the Widrow-Hoff gradient descent procedure in the 1960s. A backpropagation algorithm estimates the error between the predicted and the target outputs and reduces the error gradient by propagating the training samples back and forth in the network. The weights are updated with the error gradient calculation. The rate of change in error to the change in error is called the error gradient; it is the direction of the steepest descent for the learning algorithm to obtain the global minima from the several available local minima [
32]. According to Equation (2) [
31], the updating of the weights takes place as below:
The updated weights
is given as the sum of the previous weight
with the sum of three terms: the second term describes the gradient of the mean square error with respect to
that is the summation of the product of delta error
and the output z corresponds to the
kth training sample of
jth neuron of the (
L − 1)th layer; η represents the learning rate or step size; L denotes the layer. The third term is the momentum term for the adaptive adjustment of step size or learning rate with the gradient vector represented by
. Momentum constant μ will be gained when the gradient vector is indicating in the same direction in each successive epoch, while a zigzag search pattern shows that the momentum term helps in minimizing the mean-square error regulating the effective gradient direction. The learning rate and the momentum are selected in the range of 0 to 1. Generally, the learning rate is kept smaller between 0 to 0.3 and momentum assumes larger values between 0.6 to 0.9. The last term
represents small random noise that helps the backpropagation algorithm to leap out of the local minima during the search for global optimum minima when the magnitude of the corresponding gradient vector or momentum has diminished [
31].
Generally, a Feed-Forward Multilayer Neural Network is useful for classification, regression, pattern recognition and prediction problems. A neuron of a layer is connected to the next successive layer neuron and each connection possesses a weight. The number of units in an input layer is exactly the number of input data applied to the network. A heuristic approach is applicable for obtaining an optimal architecture of the model by determining the required number of hidden layers, number of units in hidden layers, model parameters like the activation function, optimizer, number of layers, batch size, epochs and so on [
33,
34]; these are known as the hyperparameters, which are either selected heuristically or by hyperparameter tuning running several iterations to check the model performance. The output layer neurons are the number of classes or desired number of outputs. An ANN model can be implemented on several platforms like on TensorFlow using Keras library or with built-in applications of commercial software MATLAB NN-Toolbox (Neural Network). This study has employed both methods for designing ANN models in the study areas. MATLAB NN-Toolbox provides faster converging backpropagation algorithms other than the standard gradient descent backpropagation (traingd, traingdm). The other classes include algorithms with variable learning rates (traingda, traingdx), resilient backpropagation (trainrp), algorithms that use numerical optimization techniques like conjugate gradient (traincgf, traincgb, traincgp, trainscg), Quasi-Newton (trainbgf) and Levenberg Marquardt (trainlm) algorithms. Among the mentioned algorithms, the Levenberg Marquardt is most widely used as it converges faster by minimising the error gradient [
35].
The models are designed by selecting appropriate parameters and are introduced with the training and testing datasets. The training samples include input features from different thematic layers, and the target set includes accurate reference values. The testing dataset is the one with new samples from the study areas to test the performance of the trained model. The focus of this study is to develop a method for producing a high-quality DEM which in turn caters to wide remote sensing applications like topographic map**, hydrological modelling, glacial studies, disaster risk monitoring, climate studies, surface deformations, and so on. Thus, there is a requirement for a method or process that produces high-quality DEMs and their analysis for accuracy in diverse terrains and geographical regions.
3. Study Areas and Dataset Used
The study areas for implementing the process of fusion using a neural network-based approach are selected from diverse terrain in Indian states. The first study area is from Ghaziabad and the surrounding regions which is mainly a plain terrain region. The second study area is the hilly and undulating terrain of Dehradun and surrounding regions (
Figure 1).
3.1. Study Area 1: Ghaziabad and Surrounding Region
This study site lies on the western edge of Uttar Pradesh state of India belonging to the Delhi-NCR (National Capital Region); it’s one of the oldest and largest cities in the state, in the vicinity of the national capital Delhi. The geographical extent of the study site is from 77° to 78° E Latitude and 28° to 29° N Longitude covering from Loni to Pilakhuwa with around 777.9 Sq Km area. The major water body is the river Hindon which segments the Ghaziabad region into Cis-Hindon on the east and Trans-Hindon on the west; it falls in the Upper-Gangetic plains having an average elevation of 214 Km. The terrain is majorly plain with elevation values varying from 60 m in eastern parts to 300 m in Northwest parts. The terrain relief is featureless having fertile land varying from alluvium soil to sandy and clayey loamy soil across the city. The climate is tropical monsoon type having warm weather round the year. The different landform types in this region include highly-dense built-ups including rural and urban settlements. The urban class includes flats, two-storey and multi-storey apartments as this region caters to large industrial sites. Other land use land cover classes are agricultural fields, croplands, barren land, roads and highways, and river; this is selected to implement and analyse the improvement of InSAR-based DEMs in a plain and largest urban area of the Indian region.
3.2. Study Area 2: Dehradun and Surrounding Region
It is the largest and most populated, capital city of Uttarakhand state; it’s located in the foothills of the Himalayas and Shivalik range in Doon Valley. The geographical area of this study site is around 3088 Sq Km. The Latitudinal and Longitudinal extent of this area is from 77°34′ to 78°18′ E and 29°58′ to 31°2′ N. The two main rivers flowing through this region are Ganga in the east and Yamuna in the west. Two major parts of this area are first, Dehradun city bounded by the Himalayas and the Shivalik ranges from north to south respectively and the second part is the Jaunsar Bawar located at the base of mountains. The geography of this area comprises highlands and hills with cooler temperatures and vast-dense forest cover. Topographically, there are two tracts, first is the Montane tract covering the entire Chakrata tehsil having high mountains, continuous steep slopes and gorges in the Jaunsar Bawar region. The sub-mountain tract is the second tract including Doon valley bounded by the Shivalik in the south and the Himalayas in the north. The hilly terrain of this region has elevation values in the ranges of 410 m in Clement town to 700 m in the Malsi area, and up to 1870–2017 m at Mussoorie. The land use land cover classes of this area comprise largely dense forest covers in Terai and Bhabar as well as the Shivalik hills with large tree canopies in Mussoorie. The Doon Valley, on the contrary, has huge settlements, including the city areas of Dehradun, Doiwala, Harrawala, Herbetpur, Rishikesh, Raiwala and Clement town area. The geomorphological and meteorological conditions of Dehradun and its surrounding regions make it highly vulnerable to natural hazards which are prone to floods, landslides, earthquakes and so on; this is an important study site for which good quality DEMs should be developed to be applied in major disaster risk management and climate study applications.
3.3. Dataset Used
Multiple DEMs using the SAR Interferometry technique are generated from Sentinel-1A and 1B image pairs for the two study regions. High-resolution multispectral data to prepare the LULC maps are obtained from the Sentinel-2 MSI product. The precise spaceborne altimetry ICESat-2 photon data are used as a reference elevation for the training of the different neural network models. Finally, the results from the ANN models are assessed for accuracy and quality in comparison to TanDEM-X 90 m DEM for the two different topographies under study. The Survey of India (SOI) Toposheets are also referred for each region to check for the range of elevation values while preparation of training and testing datasets for each of the study areas. The details and specifications of all the datasets are given in
Table 1.
4. Methodology
The steps followed to carry out the study are depicted below in
Figure 2. The first step is to generate multiple InSAR-based DEMs for each of the study sites. The SAR image pairs to perform interferometry is selected mainly based on perpendicular and temporal baselines using the ASF (Alaska Satellite Facility) Vertex Data Search and Baseline tools. Other factors that affect the quality of InSAR DEMs are operating wavelength, viewing angle, Image Coherence and suitable atmospheric conditions [
6,
36]. Using the multi-pass interferometry from the spaceborne SAR sensors, multiple image pairs are selected in both regions. Selecting the suitable sub-swath and polarization, the two images in each pair (referred to as reference/master and secondary/slave image) are co-registered to create a stack that aligns both the products at sub-pixel accuracy to exploit the phase difference of the acquisitions. The orbit file information which is available with the image pairs containing the sensor’s positional information at the time of imaging is applied to the images. An Interferogram is generated from the stack containing the intensity image, phase image and coherence image. The phase information of DEMs is extracted by subtracting the flat-earth phase and removing the interference from the atmospheric conditions and noises from the total phase. Phase filtering is performed to further unwrap the interferogram phase properly. The most important step of phase unwrap** over the filtered subset is performed using SNAPHU (Statistical Cost, Network-Flow Algorithm for Phase Unwrap**) algorithm in the open-source SNAP software. The phase unwrap** is used for the removal of any ambiguity in the phase information of the SAR images; this unwrapped phase so obtained is transformed to elevation and the coherence band is added to the final product to obtain the DEM output. Similarly, other SAR pairs are used to produce multiple DEMs through the interferometric process for the two study areas, having varying quality depending on the particular baselines and coherence that an image pair possess.
The elevation values from these multiple InSAR-based DEMs are input to the Neural Network models. Other geometrical features or topographic attributes are generated from the DEMs to provide a deep insight into terrain information to the model. Several studies show the relationship of these attributes with the quality of DEMs [
12,
37]. The topographical attributes computed in this study include the Slope (the first-order derivative of the DEM or the rate of change of elevation in the up or down steepest direction of DEM), Aspect (the first-order derivative of DEM that is a measure of the steepest slope in the downhill direction), Topographic Ruggedness Index (TRI) (that is the standard deviation of slope or elevation; the difference between the elevation of a cell and the mean elevations of eight neighboring pixels [
38], Topographic Position Index (TPI) (is the difference between the pixel height with the average height of the neighboring pixel [
39,
40], Vector Ruggedness Measurement (VRM) (is the three-dimensional perspective of the raster grid in relation to its neighbours [
41] and the Land Use Land Cover classes map for both the study areas. The input data thus includes elevation values from the InSAR DEMs, DEM derivative values and the LULC class information that appears on respective nodes of the input layer. The height residuals are calculated for each of the DEMs with the precise ICESat-2 ATL08 elevation data considering the available uncertainties with the product. Further, the extracted information from the raster layers is filtered using the height residual values that fall within the range of the second standard deviation (
2σ) of the mean to remove the outliers. The prepared filtered datasets are used as the training samples for the neural network models.
The suitable neural network models are designed using a Keras-based ANN model in Google Colaboratory and the MATLAB NN- Toolbox. The Keras-based models are sequential dense layer Feed-Forward Multilayer perceptron models using the backpropagation algorithm; these models are designed by selecting the appropriate number of layers, the number of neurons in each layer, activation functions used in each layer, optimizer, batch size and epochs after performing hyperparameter optimization. The training dataset includes the input DEM elevation values along with the related DEM derivative values and the reference is provided from the ICESat-2 photon points. The random state in the train-test split function is to ensure the reproducibility of the results. The model training is validated to check and prevent overtraining or undertraining by visualizing the chosen loss parameter as Mean Absolute Error (MAE). The training and validation loss curves are used to visualize the performance of the model while training with the use of different activation functions in several iterations. The best fit model architecture is selected for performing DEM fusion; this best-trained model is then tested for making predictions over new data samples of the study area which were not included in the training. The network models the relationship by combining the information from the given input and the reference (ICESat-2 elevation values). The error gradient between the predicted and the output values are minimized by the backpropagation algorithms. The model predictions are then assessed with TanDEM-X 90 m DEM to estimate the RMSE (Root Mean Square Error) as a measure of accuracy and quality of DEM. The mathematical expression of RMSE is given in Equation (3); it is a measure of the square root of the mean squared height errors between the predicted and the observed values [
42].
Here, the Hi(input) refers to the ith elevation of input DEM and the Hi(Ref) is the corresponding reference elevation value which is from the ICESat-2 photon data and n denotes the total number of observations.
Another estimate of accuracy can be determined from the percentage improvement factor (%IF) which is calculated by simply taking the percentage of the difference between the RMSE of input and predicted or fused DEM RMSE over the input DEMs [
43,
44]. Mathematically it is expressed as given in Equation (4).
where
Pi(input) is the
ith input DEM elevation value,
Pi(fused) is the corresponding elevation of fused DEM,
Oi is the corresponding reference elevation value and
N is the total number of observations.
The Linear error at 90th percentile (LE90) is an extensively used parameter for accuracy assessments of DEMs; it corresponds to the 90% probability for the absolute vertical error to reside along the length of a vertical segment; it is a scalar accuracy expressed in terms of a probability for the Linear error [
45]; it is expressed as given in Equation (5).
Along with this, models are designed in the MATLAB Neural Network Toolbox for performing a fusion of DEMs. The implementation of the fusion framework in MATLAB is similar to the program-based models. Faster converging and robust backpropagation algorithms are available in this toolbox. The training algorithm used here is TRAINLM (Levenberg Marquardt) which is a faster converging algorithm as compared to others. The training and target datasets are imported in the form of matrix variables. Different model architectures are tested and checked by changing the number of layers, units in each layer, transfer functions and other parameters to obtain a best fit predictive model. The models were trained with suitable architecture and used loss parameter as MSE (Mean Square Error) to evaluate the training performance of the models. The MSE is the default performance parameter for the feed-forward networks. MSE penalizes the large errors hence useful than other loss measures such as MAE. The trained model is then simulated on the new test data of the study area and the predictions from the model are obtained. The results from the predictive models are assessed with TanDEM-X 90 m DEM to obtain the RMSE.