Next Article in Journal
Correcting Image Refraction: Towards Accurate Aerial Image-Based Bathymetry Map** in Shallow Waters
Previous Article in Journal
Exploring the Impact of Various Spectral Indices on Land Cover Change Detection Using Change Vector Analysis: A Case Study of Crete Island, Greece
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Feature Dimension Reduction Using Stacked Sparse Auto-Encoders for Crop Classification with Multi-Temporal, Quad-Pol SAR Data

1
College of Mechanical and Electronic Engineering, Northwest A & F University, Yangling 712100, China
2
Shaanxi Key Laboratory of Agricultural Information Perception and Intelligent Service, Yangling 712100, China
3
College of Information Engineering, Northwest A&F University, Yangling 712100, China
4
Institute of Soil and Water Conservation, Northwest A&F University, Yangling 712100, China
5
School of Electronic Engineering, **dian University, **’an 710071, China
6
Data61, Commonwealth Scientific and Industrial Research Organization, Kensington WA 6151, Australia
*
Author to whom correspondence should be addressed.
Remote Sens. 2020, 12(2), 321; https://doi.org/10.3390/rs12020321
Submission received: 26 November 2019 / Revised: 14 January 2020 / Accepted: 14 January 2020 / Published: 18 January 2020
(This article belongs to the Section Remote Sensing in Agriculture and Vegetation)

Abstract

:
Crop classification in agriculture is one of important applications for polarimetric synthetic aperture radar (PolSAR) data. For agricultural crop discrimination, compared with single-temporal data, multi-temporal data can dramatically increase crop classification accuracies since the same crop shows different external phenomena as it grows up. In practice, the utilization of multi-temporal data encounters a serious problem known as a “dimension disaster”. Aiming to solve this problem and raise the classification accuracy, this study developed a feature dimension reduction method using stacked sparse auto-encoders (S-SAEs) for crop classification. First, various incoherent scattering decomposition algorithms were employed to extract a variety of detailed and quantitative parameters from multi-temporal PolSAR data. Second, based on analyzing the configuration and main parameters for constructing an S-SAE, a three-hidden-layer S-SAE network was built to reduce the dimensionality and extract effective features to manage the “dimension disaster” caused by excessive scattering parameters, especially for multi-temporal, quad-pol SAR images. Third, a convolutional neural network (CNN) was constructed and employed to further enhance the crop classification performance. Finally, the performances of the proposed strategy were assessed with the simulated multi-temporal Sentinel-1 data for two experimental sites established by the European Space Agency (ESA). The experimental results showed that the overall accuracy with the proposed method was raised by at least 17% compared with the long short-term memory (LSTM) method in the case of a 1% training ratio. Meanwhile, for a CNN classifier, the overall accuracy was almost 4% higher than those of the principle component analysis (PCA) and locally linear embedded (LLE) methods. The comparison studies clearly demonstrated the advantage of the proposed multi-temporal crop classification methodology in terms of classification accuracy, even with small training ratios.

1. Introduction

It is very important to acquire crop type information in crop growing monitoring, biomass estimation, crop yield prediction, etc. [1,2,3]. Different from the conventional optical remote sensing, polarimetric synthetic aperture radar (PolSAR) is an active remote sensing technology that can work in all weather and all time conditions. During the past few decades, crop classification with PolSAR data has attracted strongly increasing research interest [4,5,6].
Currently, a variety of classification algorithms have been proposed and developed for PolSAR data. Generally, the development mainly underwent the following three phases [7]: 1) Lee investigated the regularities of distribution and applied statistical models to discriminate different objects [8]. 2) With the deeper research on the mechanisms of PolSAR data, the scattering mechanisms of electromagnetic waves were introduced into the analysis and application of PolSAR data [9,10]. Many studies have proved that retrieved polarimetric features by both coherent and incoherent decomposition algorithms can improve the recognition and classification accuracy [11,12], especially for quad-pol SAR images [13,14]. 3) Currently, several deep learning architectures have been developed and notable results were attained [15,16]. These knowledge-based algorithms [17,18] have opened a new horizon in this area. However, most of existing methods only concentrate on single-date PolSAR data. For agricultural-crop-type discrimination, single-temporal PolSAR images cannot provide sufficient information since the same crop shows different external phenomena as it grows up [19]. Additionally, the date of data collection is quite crucial. For instance, it is very challenging to discriminate different crops within the sowing periods. Accordingly, it is necessary to take the advantage of multi-temporal PolSAR images to produce classification results with a high accuracy [14]. Fortunately, with the rapid developments of SAR techniques, increasing numbers of spaceborne SAR systems have been launched and operate in orbits, which can collect a large amount of multi-temporal SAR data for Earth observations. Nowadays, several representative systems are available for civilian applications, including C-band Sentinel-1 systems [20,21], RADARSAT-2 and Radarsat Constellation Mission (RCM) [22,23], L-band Advanced Land Observing Satellite (ALOS) ALOS-PALSAR/PALSAR-2 [24,25], X-band Tandem-X [26], and X-band Constellation of Small Satellites for Mediterranean basin Observation (COMSMO) COMSMO -SkyMed constellation [27]. Based on these operational systems, large amounts of multi-temporal PolSAR data can be collected and adopted for use in crop classification and other applications [28,29,30,31,32,33,34,35,36].
For multi-temporal remote sensing data, a key point is how to take full advantage of the wealth of multi-temporal characteristics. Currently, a variety of classification algorithms with time-series information have been proposed. These algorithms can be mainly split into two methods: on one hand, long short-term memory (LSTM) networks have been adopted for the recognition and classification of multi-temporal data. For instance, Teimouri et al. combined an Fully Convolutional Networks (FCN) and a ConvLSTM network to classify and map land covers with multi-temporal Sentinel-1 data [29]. In order to further raise the performances of LSTM, the input features of LSTM are artificially modified with the fusion of high-resolution optical and SAR images. Zhou et al. used extracted SAR features in multiple deep convolutional networks (DCNs) as the input features of LSTM to improve the classification accuracy [30]. On the other hand, time-series or multi-temporal information is extracted manually from multi-temporal and multi-source data [31,32,33,34,35,36]. For example, Zhong et al. designed a one-dimensional convolution network to extract time-series features and discriminate different ground objects [31]. Yang et al. combined Normalized Difference Vegetation Index (NDVI) of optical images with SAR data to raise the accuracy of classifying paddy-rice in mountainous areas [32]. Guo et al. defined a new parameter based on differential characteristics of Cloude scattering parameters to improve the crop classification accuracy [33]. It can be seen that there is a rapid development in the analysis and application of multi-temporal optical and SAR data. These existing methods have the following limitations: First, it is difficult to discriminate multiple crop types by only using the time-series information due to the similarity between various crops. Second, the performances of the LSTM networks depend heavily on the input features and excessive features or sparse time-series data seriously deteriorate its performances. Consequently, for both the LSTM networks and time-series-information-based methods, most of the algorithms still concentrate on extracting more effective features. However, since SAR data represent the compositive interaction between radar signals and vegetation and soil [21], a variety of famous decomposition algorithms have been proposed to efficiently extract polarimetric scattering features [37,38,39,40]. However, direct utilization of many polarimetric features will cause a “dimension disaster” for most of the classification methods. Therefore, effective feature dimension reduction from these redundant ones is a very important task.
In the area of data dimension reduction, early representative algorithms mainly include principle component analysis (PCA) [41] and locally linear embedded (LLE) methods [42]. However, since most engineering problems are nonlinear, PCA performs poorly due to its assumption that the processed data are linear [43]. In addition, LLE can automatically extract low-dimensional nonlinear feature representations from high-dimensional data and is easy to implement, but they are not robust to outliers [44]. More recently, artificial neural network (ANN) and deep learning theory have realized considerable development in terms of image classification [45], target recognition [46], data dimensionality reduction [47], and other machine vision fields. With more than three hidden layers, deep learning models contain sufficient complexity to learn effective features from the data itself [4]. Currently, deep learning has also been proven to be an extremely powerful tool in remote sensing data analysis [48]. For instance, convolutional neural networks (CNN) and deep stack auto-encoder (S-AE) are the most successful network architectures, showing excellent performances in image classification and feature learning [16,49,50].
To make full use of multi-temporal PolSAR images and manage the serious problem of a “dimension disaster”, this study first employed several common scattering decomposition algorithms to extract detailed and quantitative parameters. Second, a three-hidden-layer stacked sparse auto-encoder (S-SAE) network was built to effectively extract polarimetric features and reduce the feature dimensionality. Third, the architecture of a deep CNN classifier was proposed to enhance the crop classification performance with limited training ratios. The main contribution of this paper was to apply S-SAEs to effectively reduce the feature dimension for crop classification improvements with multi-temporal, quad-pol SAR images.
The remaining sections are arranged as follows. Section 2 reviews the PolSAR data structure and polarimetric feature extraction. Furthermore, the main concept of an S-SAE and the architecture of CNN classifier is also proposed and discussed in this section. In Section 3, the main configuration and optimization of an S-SAE are analyzed and the performances of the proposed processing framework are assessed using simulated Sentinel-1 data. Section 4 is the conclusion of this paper.

2. Methodology

For the purpose of crop classification with multi-temporal remote sensing data, this study employed stacked sparse auto-encoders to learn low-dimensional features from a number of decomposed scattering signatures, which were fed to CNN classifiers to achieve classification results with high accuracies. The flowchart of the proposed method is shown in Figure 1, which was mainly composed of three steps including scattering feature extraction, feature dimensionality reduction with an S-SAE, and crop classification with CNN. It should be emphasized that the second step was the key of this study. Its aim was to acquire the optimal low-dimension features to represent sufficient information contained in the original multi-temporal data. In this section, the PolSAR data structure and features, along with the stacked sparse auto-encoders, are reviewed, and the architecture of a proposed deep CNN classifier is proposed and discussed.

2.1. PolSAR Data Structure and Features

In quad-pol SAR systems, the measured vector data can be expressed using a 2 × 2 complex scattering matrix as the following format:
S = [ S H H S H V S V H S V V ] ,
where S H H , S V H , S H V , and S V V are the four scattering elements from four independent polarization channels, with “H” and “V” standing for the horizontal and vertical linear polarizations. With the assumption of reciprocal backscattering, S H V is approximately equal to S V H and the polarimetric scattering matrix can be rewritten as the Lexicographic basis vector:
h = [ S H H 2 S H V S V V ] T ,
where the superscript “T” is the matrix transpose. Then, a covariance matrix can be constructed as:
C = h h * T .
Additionally, the Pauli-based scattering matrix can also be obtained as:
k m = 1 2 [ S H H + S V V ,   S H H S V V ,   2 S H V ] T ,
T m = k m k m * T
where T m is the one-look coherency matrix for the mth pixel, where ( ) * denotes conjugate complex of ( ) . Actually, PolSAR images are multi-look processed to suppress speckles such that the coherency matrix is spatially averaged to become:
T = 1 N m = 1 N k m k m * T ,
where N indicates the number of equivalent looks. It has also been proven that the covariance and coherency matrices are linearly related and can be easily converted to each other [51].
According to the covariance and coherency matrices, several obvious features, including the polarization intensities of | S H H | , | S H V | , and | S V V | , can be directly obtained, where | | denotes the absolute value of . With the deeper research on the mechanisms of PolSAR data, various coherent and incoherent scattering decomposition algorithms based on the covariance and coherency matrices have been employed to extract a variety of detailed and quantitative parameters from multi-temporal PolSAR data. Consequently, several features can be extracted for single and multiple temporal PolSAR images. Bai summarized a 123-dimensional feature vector extracted from single temporal quad-pol SAR images [52]. Furthermore, some new features have also been proposed recently [15]. These tremendous features encounter the great difficulty of the curse of dimensionality at high dimensionalities. To take full advantage of the wealth of multiple temporal PolSAR images for crop classification, feature dimension reduction is essential and important.

2.2. Auto-Encoder

In the past few years, feature learning with neural network architectures has attracted increasing attention, which can be used to extract optimal features from high-dimension data. An auto-encoder (AE) is an unsupervised learning algorithm and its goal is to set the target values to be approximately equal to the inputs. A single-layer AE network comprises three main steps (i.e., encoder, activation, and decoder), having one visible input layer ( x ) of w units, one hidden layer ( y ) of s units, and one reconstruction layer ( z ) of w units, which is shown in Figure 2, where f ( ) and g ( ) denote the activation functions.
Considering the input data x i R w , where i is the index of i th data point, the AE network first maps it to the latent representation y i R s . This process is named the encoding step and can be mathematically represented as:
y i = f ( W y x i + b y ) ,
where W y R s × w is the encoding matrix and b y R s is the bias. Here, the logistic sigmoid function of f ( I ) = ( 1 + e I ) 1 is adopted.
The decoder has a similar structure to the encoder and maps the compressed data to a reconstruction z i R w with the weight matrix W z R w × s , bias b z R w , and an activation function g ( I ) = f ( I ) = ( 1 + e I ) 1 , which can be represented as:
z i = g ( W z y i + b z ) .
For simplification, the assumption of the weights strategy W y = W z T = W is used. Consequently, the three parameters { W , b y , b z } need to determined. An auto-encoder can be trained by minimizing the cost function (i.e., the error between the inputs and reconstructed ones):
ψ ( x , z ) = arg min W , b y , b z 1 2 n i = 1 n x i z i 2 ,
where n is the number of training samples. In addition, a weight attenuation term can be added in the cost function to control the degree of the weight reduction such that the term inhibits the influence of noise on the irrelevant components in the target and weight vector, significantly improves the generalization ability of the network, and effectively avoids over fitting. The cost function is defined as:
τ ( x , z , λ ) = ψ ( x , z ) + λ Ω w e i g h t s ,
Ω w e i g h t s = 1 2 l = 1 u i = 1 s j = 1 w ( W i j ( l ) ) 2 ,
where λ is used to control the regularization strength, u is the number of hidden layers, and Ω w e i g h t s is the weight attenuation term named L2 regularization. By using the Stochastic gradient descent (SGD) algorithm, the weight matrix and bias are trained and optimized [53].

2.3. Stacked Sparse Auto-Encoder

A sparse auto-encoder is the foundation of an S-SAE, which is developed from an AE. By adding sparsity constraints to an AE, the sparsity constraints act on hidden layer units. This method can preferably express high-dimensional features [54]. In order to realize the inhibitory effects, an SAE uses a Kullback-Leibler (KL) divergence to force it to be close to a given sparse value ρ (sparsity parameter) by constraining the average activation value ρ ^ of the hidden layer neuron output. KL divergence is added to the cost function as a penalty term. Therefore, the cost function of an SAE can be updated in accordance with Equation (11). Furthermore, the penalty term is called a sparse regularization term, which is expressed in Equation (12).
E ( x , λ , β , ρ ) = 1 2 n n = 1 n x i - z i 2 + λ Ω w e i g h t s + β Ω s p a r s i t y
Ω s p a r s i t y = i = 1 D K L ( ρ ρ ^ i ) = i = 1 D ρ log ( ρ ρ ^ i ) + ( 1 ρ ) log ( 1 ρ 1 ρ ^ i )
In Equation (12), the average activation values ρ ^ i of all training samples on neurons i of the hidden layer are defined as:
ρ ^ i = 1 n j = 1 n f ( w i ( 1 ) x j + b i ( 1 ) ) ,
where x j is the j th training samples, w i ( 1 ) is i th row of the weight matrix W in the first layer, and b i ( 1 ) is the i th entry of the bias vector.
KL divergence is adopted because it can measure the difference between two different distributions wel. If ρ ^ i = ρ i , K L ( ρ ^ i ρ i ) = 0 . If the difference between ρ and ρ ^ is large, KL divergence will force them to be close. In Equation (11), β ( 0 < β < 1 ) is used to control the weight of the sparse regularization term. In order to inhibit most neurons, the value is generally close to 0. If the value is 0.03, the average activation value of each neuron on the SAE will be close to 0.03 through this constraint.
S-SAE is a deep learning architecture constructed by linking several sparse auto-encoders in series, in which the outputs of each layer are fed to the following layer. The training of an S-SAE is through greedy training layer by layer. An S-SAE with the Softmax regression neural network [49] can train and adjust network parameters again, which is called fine-tuning. The structure and training process of an S-SAE are given in Figure 3 and Figure 4, respectively.
In the network, the number of labels is equal to the dimension of the output vector y ( u ) . Here, W ( l ) and b ( l ) , l { 1 , 2 , 3 u } , are the parameters of the l th layer, where W ( l ) is the weight matrix of the l th layer and b ( l ) is the bias vector. According to the known labels, the whole S-SAE network is finely tuned to determine the parameters of W and b using backpropagation.

2.4. Architecture of the Proposed Deep CNN Classifier

A convolutional neural network (CNN) is one of the most successful network architectures in deep learning methods. The learning process of CNNs is computationally efficient and insensitive to shifts in data like image translation, making CNN a leading model to recognize 2D patterns in images [31]. In remote sensing studies, a deep learning network based on CNN has been widely used in remote sensing, such as in large-scale image recognition, semantic segmentation, and target classification [48]. In the field of remote sensing target classification, since the ground truth data collection is always a hard task, the aim of this paper was to construct a lightweight CNN network to achieve classification results with a high accuracy and the classical LetNet was selected as the basic structure. Additionally, it is difficult to obtain better classification performance by deepening the network due to the small size of the input data cubic. Consequently, this paper promoted the LetNet configuration using two parallel branches inspired by the characteristics of GoogLeNet [55]. The proposed structure of a CNN classifier is given in Figure 5. It mainly comprises four convolutional layers, one average-pooling layer, one addlayer layer, one fully connected layer, and a Softmax classifier. Basically, it retains the same elements as GoogLeNet and utilizes the average pooling layer to reduce the number of fully connected layers. Furthermore, the network uses two convolution branches with different depths and the addlayer structure to fuse the features from different depths, enhancing the relevant features to achieve a fast convergence. The dimension of the input data is 15 × 15 × M where M indicates the number of input features. First, the input is through one convolution layer with 32 filters whose size is 5 × 5 and the stride is 1. The dimension of the generated feature cubic is 15 × 15 × 32 using zero padding. Second, the generated feature maps are fed into two pathways, where one pathway contains two convolution layers and another is one convolution layer. These three convolution layers have the same 64 filters with the same size of 3 × 3 and the strides are 2, 1, and 2. Third, the feature maps from the upper pathway are activated using a Rectified Linear Unit (ReLU) function and combined with the feature from the lower pathway. Thus, the feature maps with dimension of 8 × 8 × 64 are obtained. Fourth, the combined maps are downsampled by one average-pooling layer with the size of 2 × 2 and the stride is 2. Finally, in order to use Softmax to calculate the probability for each class, one fully connected layer is applied to map the data to a vector.

3. Experiments and Results

3.1. Experimental Sites and PolSAR Data

The performances of the proposed method were assessed with two data sets from two experimental sites. The first experimental site was an approximate 14 km × 19 km rectangular region located in the town of Indian Head (103°66′87.3″W, 50°53′18.1″N) in southeastern Saskatchewan, Canada [56,57]. The second one was a 13 km × 9 km rectangular region located in Flevoland (5°33′53.67″W, 52°26′45.73″N), Netherlands. The location maps of the study areas are shown in Figure 6. These two sites were established by the European Space Agency (ESA) to evaluate the performances of crop classification with Sentinel-1 data. Both study sites contained various crop types. There are mainly 14 classes of crops and the corresponding planting areas are summarized in Table 1.
The two experimental PolSAR data sets were both simulated with Sentinel-1 system parameters from real RADARSAT-2 data by ESA before launching real Sentinel-1 systems. The real RADARSAT-2 data were collected from April to September (i.e., Indian Head data set: 21 April, 15 May, 8 June, 2 July, 26 July, 19 August, and 12 September; Flevoland data set: 7 April, 1 May, 25 May, 12 June, 5 July, 29 August, and 22 September) in 2009, almost covering the whole growing cycles of the main crops. The multi-temporal images have already been coregistered and filtered for speckle noise suppression using an averaged structure Lee filter [58]. For the performance evaluation, optical images of high resolutions and ground surveys were combined to establish the ground truth maps, which are shown in Figure 7.

3.2. Results and Analysis for the Indian Head Site

3.2.1. Polarimetric Feature Extraction

For the subsequent experiments, polarimetric scattering features were first derived from seven time-series PolSAR images separately using various methods. Some of these features were based on the measured data and directly obtained, and others were calculated with Freeman decomposition [37], Huynen decomposition [38], Yamaguchi decomposition [39], and Cloude decomposition [40]. It should be noted that the parameter “A” from Cloude decomposition was substituted by the theta parameter proposed in Ji et al. [34] and the null angle parameters were from Chen and Tao [15]. In total, 252 features from 7 time series were prepared, which are summarized in Table 2 and some typical features (i.e., amplitude of HH-VV correlation where HH and VV stands for the complex SAR images from horizontal and vertical channels respectively, phase difference of HH-VV, co-polarized ratio, and null angle parameters) are shown in Figure 8.

3.2.2. S-SAE Configuration and Optimization

1. Selection of unsupervised training samples for pretraining the parameters of an S-SAE item
Unsupervised pretraining is the first key step to constructing an S-SAE. The ratio of selected training samples significantly affects the performance and computation time. For simplification, a single-layer sparse auto-encoder with the regularization term λ of 0.001, sparsity regularization term β of 4.5, and the sparsity parameter ρ of 0.25 was used to compress 252 input features down to 9 features. To investigate the performances of an S-SAE with different training samples, this section compares the performances of different training sample ratios of 0.3%, 1%, 5%, 10%, 20%, and 50%. The pretraining epochs were 400. After the dimensionality reduction, the nine reduced features were fed into a support vector machine (SVM) classifier. The ratio of the supervised training samples for the SVM classifier was 1% and randomly selected, and the remaining 99% were for testing. In the experiments, crop classifications were carried out ten times and the averaged classification accuracy was utilized for the evaluation. The overall accuracy (OA) and kappa coefficient of the classification results are plotted in Figure 9a. Figure 9b gives the training calculation time with different ratios of unsupervised training samples. From Figure 9, it can be seen that the classification accuracy increased with the increase of unsupervised pretraining samples. For the case of a 50% training ratio, the OA and kappa were raised by about 6% and 5%, respectively, relative to those for the 0.3% training ratio. However, the training time also became longer. Therefore, taking the computational efficiency into account, the ratio of the unsupervised pretraining samples was fixed to 0.3%, 1%, and 5% in the following experiments.
2. Depth and supervised fine-tuning of the parameters of an S-SAE
In order to investigate the performance of an S-SAE with different depths, this section sets five AEs with one to five layers (i.e., AE, S-AE2, S-AE3, S-AE4, and S-AE5), where AE is a single layer auto-encoder, S-AE is stack auto-encoder. These five S-SAEs were first pretrained, as discussed in the last section, and then finely tuned with supervised training samples. The ratios of the pretraining samples were 0.3%, 1% and 5%. The ratio of the supervised training samples for fine-tuning was set to be 1%. In order to reduce the disturbance of other parameters, λ , β , and ρ of the five S-AEs were all set to 0.001, 0, and 0. The number of epochs of pretraining and fine-tuning training were 400 and 800, respectively. Again, the nine reduced features were fed into the SVM classifier after the dimensionality reduction. The OA and kappa of the classification results with five different S-AEs are shown in Figure 10 (1% pretraining ratio), and the evaluation parameter values were calculated and are listed in Table 3. From Figure 10 and Table 3, it is clear to see that: 1) The accuracy of the classification results increased with the increase of pretraining samples, which is consistent with the conclusion of previous step. 2) Comparing Table 3 and Figure 9a, the parameters of λ , β , and ρ improved the accuracy, especially in the case of the 0.3% pretraining samples. 3) The fine-tuning of S-SAE raised the classification accuracy dramatically. However, the power of improvement became weaker with the increase of pretraining samples, which was mainly due to the insufficient training caused by the decreased ratio of fine-tuning samples to pretraining ones. 4) As the number of layers increased, the classification accuracies became gradually higher [55]. However, when the network contained more than three layers, the performance did not increase. Therefore, the network architecture of 252-100-50-9 was configured as the optimal one and used in the following experiments.
3. Size of the hidden layers
For an AE network, the number of learned features is decided by the hidden layer. To investigate the effects of the hidden layers on crop classification, a group of experiments were conducted. Here, the goal was still to reduce the 252 input features to 9, where the optimal network with the hidden neurons of 252-100-50-9 in the previous step was chosen as the original network configuration. Let L 1 and L 2 denote the number of units contained in the two hidden layers and the searching intervals were [35, 260] and [20, 200], respectively. In the experiment, the grid search strategy with a step of 15 was employed to determine the optimal number of hidden neurons. The values of λ , β , and ρ were still set to 0.001, 0, and 0. The nine compressed features were fed into the SVM classifiers trained with 1% supervised samples. The OA and kappa of the classification results are shown in Figure 11. From Figure 11, it is clear that the effects of the size of the hidden layer were strong. The highest OA was approximately 10% higher than the lowest. The zone with the higher classification accuracy was mainly distributed in the top-left area, appearing as an inverted triangle, marked by a yellow line in Figure 11. In this area, the optimal interval for the size of the second hidden layer shrunk gradually with the decrease of the size of the first hidden layer. Consequently, the size of the first hidden layer was much bigger than that of the second one. In addition, when the size of the first hidden layer was close to the size of the input data, a better performance of the dimension reduction was obtained, which was mainly because having more neurons could raise the ability of feature learning and improve the performance of the whole S-SAE. Finally, the best classification accuracy with an OA of 80.45% and kappa of 75.42% was achieved when L 1 and L 2 were 215 and 95, and thus the S-SAE with the structure of 252-215-95-9 was the optimal configuration.
4. Parameters of λ , β , and ρ
The determination of λ , β , and ρ is the final step to constructing an optimal S-SAE. In order to study the effects of these three parameters, this study carried out some comparison experiments for a single-layer AE network.
For λ , the grid search of Δ log 10 λ was carried out with a stride of 1 and an interval of [−7, 1]. The other remaining implementation was similar with that in the previous step. The OA and kappa of the classification results with an SVM classifier are plotted in Figure 12, where the highest accuracy was obtained with a λ of 0.1 (i.e., log 10 λ = 1 ), and a λ of 10 generated the lowest accuracy.
For β and ρ , two groups of experiments with a log 10 λ of −2 and −1 were compared. For β , the searching interval and step were [0.5, 4.5] and 1. For ρ , the searching interval and step were set to [0.05, 0.45] and 0.1. The AEs were configured with these parameters and the reduced features were fed into the SVM classifiers. The OA and kappa were charted in Figure 13, where it was evident that: 1) the values of λ , β , and ρ had a great impact on the recognition performances; and 2) the best results were generated when λ , β , and ρ were set to 0.1, 2.5, and 0.45. The highest OA and kappa were 80.62% and 75.84%, which were both 5% higher than that of an SAE with the original parameters (i.e., λ = 0.001, β = 0, ρ = 0). Additionally, clear regulations during the process of determining optimal parameters were not found in these experiments.
Finally, after a variety of experiments, a group of optimal parameters were determined and are listed in Table 4 for the S-SAE with the structure of 252-215-95-9. Consequently, the OA and kappa of the classification results with the SVM were 81.77% and 77.20%, which were slightly higher than those in the previous section.

3.2.3. Comparison of Classification Results with the Different Methods

In this section, nine reduced features were extracted from the original 252 multi-temporal features using different methods, including PCA, LLE, a single-layer AE(1) of 252-9, a finely tuned AE(2) of 252-9, a finely tuned S-AE(1) of 252-100-50-9, a finely tuned S-AE(2) of 252-215-95-9, and a finely tuned S-SAE of 252-215-95-9 with the optimal parameters listed in Table 4. The λ , β , and ρ of AE(1), AE(2), S-AE(1), and S-AE(2) were all set to 0.001, 0, and 0. The ratio of the supervised fine-tuning samples for AE was 1%. For LLE, the number of neighbors was 300 and the max embedding dimensionality was 9. Then, the extracted nine-dimensional features were fed into the SVM and the proposed CNN classifiers. For the CNN classifier, the initial parameters were randomly selected and renewed with a learning rate of 0.01, momentum parameter of 0.9, and weight decay of 0.0004.
First, the ratio of the training samples for the SVM and CNN classifiers were 1% and the other 99% were for testing. The 14 types of crops were classified, including lentil, durum wheat, spring wheat, field pea, oat, canola, grass, mixed pasture, mixed hay, barley, summer fallow, flax, canary seed, and chemical fallow. Compared with the ground truth, some of the classification results along with the error map are shown in Figure 14, and the OA and kappa for the SVM and CNN classifiers, along with the correct classification ratio for each class, were calculated and are summarized in Table 5 and Table 6, respectively. The crop classification results showed that the optimal S-SAE had the best performances for both the SVM and CNN classifiers. For the SVM classifier, OA from the S-SAE was 16.77% and 14.56% higher than those from PCA and AE(1). For the CNN classifier, OA from the S-SAE was raised by 7.55% and 7.58% relative to those from PCA and AE(1). The combination of S-SAE and CNN achieved the best classification results, of which the OA and kappa were up to 95.44% and 94.51%
Second, the classification methods with different training ratios (i.e., 1%, 5%, and 10%) for training the SVM and CNN classifiers were compared. The quantitative evaluation parameters were calculated and are listed in Table 7. Meanwhile, some results with the conventional Complex Wishart classifier, LSTM in Zhong et al. [31] and Chen method [15] are also included. The 36 × 7 decomposed features were the input of the LSTM and 14 decomposed roll-invariant features from 7 time series were taken as the input of the Chen method.
From Figure 14, we can conclude that: 1) for feature dimension reduction, the SVM and CNN with nine features from the S-SAE achieved the best classification results; 2) with the increase of the training ratio, the recognition accuracy for all methods became better, as expected; 3) compared with the SVM classifier, the CNN classifier improved the classification performance of the dimensionality reduction features, especially for the crops with small planting areas; 4) among all the compared methods, the proposed method achieved the highest classification accuracy. The LSTM wass not effective in discriminating crop species with similar external morphologies, such as barley and durum wheat. Moreover, the proposed method of S-SAE + CNN achieved excellent crop classification accuracies, even with finite training ratios.

3.2.4. Comparison of the Dimensionality Reduction Features with Different Methods

In order to further demonstrate the advantages of this method regarding feature extraction, this section compares the dimensionality reduction features with different dimensionality reduction methods for the following two aspects. First, according to the contribution of PCA and visual quality of LLE and S-SAE, the first four extracted features from these three methods were visualized and are shown in Figure 15, from which it can be seen that the features with S-SAE exhibited the best visual effects. Second, the values of the standard deviation between different crops and within the same class were calculated for quantitative comparison. The six main crops with the highest planting area in the Indian Head site were selected for comparison: lentil, spring wheat, field pea, canola, barley, and flax. Then, six local windows with the size of 100 × 100 were used to choose samples for the six main crops, and the standard deviations of the optimal feature (i.e., the first column in Figure 15) from each method were calculated and plotted in Figure 16 (in Figure 16a, the number of the horizontal ordinate stands for the six selected crops; in Figure 16b, the number of the horizontal ordinate stands for three crops, including the current and neighboring crops), which shows the maximum standard deviation between different crops and the minimum within the same crop. Consequently, the features with the S-SAE had a better classification ability and are expected to improve the accuracy of crop recognition and classification.

3.3. Results and Analysis for the Flevoland Site

In order to further verify the reliability of the proposed method, another experiment with multi-temporal data from Flevoland was carried out and is reported on in this section. The methods of dimensionality reduction and classification maintained the same setup as the previous one. For a fair comparison, according to the contribution rate (i.e., more than 90%) of reduced parameters using PCA method, the original 252 features were all reduced to 12 features.
First, various dimensionality reduction methods, including PCA, LLE, a single-layer AE(1) of 252-12, a finely tuned AE(2) of 252-12, a finely tuned S-AE(1) of 252-100-50-12, a finely tuned S-AE(2) of 252-215-95-12, and an S-SAE of 252-215-95-12 were used to extract 12 low-dimension features from the original 252 multi-temporal features. In addition, the comparison methods also included a finely tuned SAE of 252-12 with λ , β , and ρ being 0.1, 2.5, and 0.45. The determined optimal parameters of an S-SAE of 252-215-95-12 are listed in Table 8. Compared with the ground truth, some of the classification results and the error maps are shown in Figure 17, and the OA and kappa, along with the correct classification ratio for each class, were calculated and are summarized in Table 9, from which it can be seen that the optimal S-SAE still had the best performances. The combination of the S-SAE and CNN achieved the best classification results, of which OA and kappa were up to 91.63% and 90.11%, respectively. Furthermore, the AE with a deeper structure and fine-tuning parameters improved the data dimensionality reduction ability.
Secondly, the classification methods with different training ratios (i.e., 1%, 5% and 10%) for training SVM and CNN classifiers are compared. The quantitative evaluation parameters are calculated and listed in Table 10. The experimental results show that the classification accuracy with Chen method is a little lower than that in Indian Head data. Other main conclusions are the same as those in the first study site.

4. Discussions

The accuracy of the crop recognition and classification can be raised dramatically with multi-temporal quad-pol SAR data. Recently, an increasing number of spaceborne SAR systems launched into orbit around the Earth can acquire a large amount of real data, providing a great opportunity for multi-temporal data analysis [28]. Meanwhile, deep learning has shown powerful abilities in the fields of remote sensing [48]. Based on these two considerations, this study attempted to combine the effective feature dimension methods of AE and CNN classifiers to achieve crop classification results with a high accuracy. Based on the theoretical analysis and experimental results above, we provide the following discussions.

4.1. Contribution of Multi-Temporal SAR Data and Decomposed Features

Different from other ground objects, agricultural crops usually hold a stable and regular growing cycle. Generally, the crop categories would not change during various growing stages. In this case, single-temporal PolSAR images cannot provide sufficient information since the same crop shows different external phenomena as it grows up. Compared with single temporal data, multi-temporal remote sensing data can provide much richer information to improve the classification accuracy [29,30].
In order to better understand the polarimetric scattering mechanisms of various objects, a variety of polarimetric feature decomposition techniques have been developed [37,38,39,40,59]. With these polarimetric decomposition algorithms, a huge number of features can be extracted from PolSAR data that are used to raise the classification accuracy. For multi-temporal data from M time series, the number of features becomes M times greater. If a deep learning algorithm is directly applied, these tremendous features encounter the serious difficulty of the curse of dimensionality at high dimensionality [50], especially for quad-pol SAR images. Therefore, data dimension reduction and effective feature extraction are very essential and important issues.

4.2. Network Construction and Parameter Optimization of an S-SAE

In this study, the S-SAE was employed to extract effective features from multi-temporal polarimetric features. Furthermore, the effects of the training ratios, network configurations, size of the hidden layers, and main parameters were studied in Section 3.2.2.
For the training ratios, the increasing training ratio of unsupervised training samples increased the performances of S-SAE. However, the power of improvement became weaker with the increase in pretraining samples, which was mainly due to the insufficient training caused by the decreased ratio of fine-tuning samples to pretraining ones. In addition, the increasing training samples brought about a heavy computation burden. Therefore, an appropriate training ratio is of paramount importance.
For network depths and the size of hidden layers, as the network deepened, the classification accuracies became higher. However, the network depth was not as deep as possible [50,53]. When the network contained more than three layers, the performance did not increase any more. Additionally, by setting the step size of the searching grid to be smaller, this study provided new discoveries. If the number of neurons in the first hidden layer approximated the number of input features and the number of neurons in the second hidden layer kept a certain gap with the former and following layers, a better performance of the data dimension reduction could be obtained.
For the parameters of λ, β and ρ, the regularization of λ and the sparsity terms of β and ρ had uncertain impacts on the performances of the S-SAE. Appropriate values of these parameters improved the OA of classification results by 5%, which was even higher than that of the optimal structure of 252-215-95-9. Unfortunately, clear regulations were not found in these experiments, which needs further study in the future.

4.3. Differences from Existing Works

Regarding the purpose of classification with multi-temporal PolSAR data, Lee investigated the regularities of distribution and applied statistical models to discriminate between different objects [8]. The proposed complex Wishart classifier only uses the statistical distribution of the covariance matrix and can be easily extended to the case of multi-band or multi-temporal PolSAR images.
With the development of polarimetric decomposition theory, various decomposed features can be calculated, and it has been proven that the performances of target recognition and classification can be improved greatly with these meaningful parameters [15,33]. Meanwhile, the CNN classifier was also applied to classification with PolSAR images since it has shown great success in computer vision and other fields [15,48]. In order to take full advantage of the polarimetric decomposition and the CNN classifier, a feature dimension reduction was necessary. In this case, the conventional PCA and LLE were used to extract effective information expressed by the reduced parameters.
In Section 3.2.3, for fair comparison, according to the contribution rate of the reduced parameters using the PCA method, the original 252 features were all reduced to 9 features. The experimental results show that the S-SAE performed best and the accuracy of the classification result obtained using the S-SAE+CNN was the highest. For further comparison, PCA and S-SAE were used to reduce the original 252 features to 3, 6, 9, and 12 low-dimensional features for the Indian Head site, and then the reduced features were fed into the SVM classifier. The OA and kappa with different reduced features are shown in Figure 18. It can be seen clearly that the classification accuracy increased as the number of reduced features increased, as we expected. Regardless of the dimension of the reduced features, S-SAE had a better performance than the conventional PCA method.

4.4. Limitations and Future Work

First, only a limited number of polarimetric decomposition algorithms for quad-pol SAR images were used in this paper. Polarimetric decomposition is still a rapidly develo** area and many other algorithms can also be used together to form a rather high-dimensional feature vector [52]. Second, the proposed S-SAE feature dimension reduction method had a heavy computation burden, although it performed much better than PCA. Third, the training ratio, network depth, size of the hidden layer, and other main parameters can impact the performances of data dimension reduction, especially for a multi-layer deep S-SAE. Finally, the proposed method was only assessed with Sentinel-1 data. The performances with multiple SAR sensors and combination of SAR and optical data will be investigated in future work.

5. Conclusions

To deal with the problem of the dimension disaster, this paper proposed an S-SAE to reduce the data dimension of scattering features extracted from multi-temporal PolSAR images. To validate the performances of the proposed S-SAE + CNN strategy, the Sentinel-1 data, along with the established ground truth maps for two experimental sites, were used. Combined with the traditional SVM and constructed CNN classifiers, the classification results were compared and evaluated. The experimental results showed that the S-SAE could extract low-dimension features from original ones. For the CNN classifier, this method could significantly improve the classification accuracy of small sample crops compared with the traditional methods. With the increasing availability of PolSAR data, the proposed method also provides an efficient way for crop classification with multi-temporal PolSAR data.
In order to construct an optimal S-SAE, this paper also investigated the effects of the training ratio, network depth, size of the hidden layer, network configuration, and other main parameters. The experimental results showed that these factors could greatly impact the performance of the feature dimension reduction. For this specific purpose, the configuration and main parameters of an S-SAE should be optimized. It should be noted that the single-layer SAE could achieve a dimension reduction performance that was close to the optimized S-SAE when the training parameters were set properly. Due to the complexity of optimizing the parameters of a multi-layer S-SAE, the potential of multi-layer S-SAE is not fully utilized. Methods to effectively determine the configuration and parameters for constructing an optimal S-SAE will be discussed in a separate paper.

Author Contributions

J.G. and H.L. designed and conducted the experiments and performed the programming work. J.N. and Z.Z. contributed extensively to the manuscript writing and revision. W.Z. and Z.Z. provided suggestions for the experimental design and result analysis. J.G. and Z.Z. contributed to the editing of the manuscript. W.H. supervised the study. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under contracts 51979233 and 41301450, the 13th Five-Year Plan for Chinese National Key R&D Project (2017YFC0403203), Major Project of Industry-Education-Research Cooperative Innovation in Yangling Demonstration Zone in China (2018CXY-23), the Fundamental Research Funds for the Central Universities (2452019180), and the 111 Project (no. B12007).

Acknowledgments

The authors would like to thank the European Space Agency (ESA) for providing the simulated multi-temporal PolSAR data sets and ground truth information under the Proposal of C1F.21329: Preparation of Exploiting the S1A data for land, agriculture and forest applications.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Boryan, C.; Yang, Z.W.; Mueller, R.; Craig, M. Monitoring US agriculture: The US department of agriculture, national agricultural statistics service, cropland data layer program. Geocarto Int. 2011, 26, 341–358. [Google Scholar] [CrossRef]
  2. Becker, I.; Vermote, E.; Lindeman, M.; Justice, C. A generalized regression-based model for forecasting winter wheat yields in Kansas and Ukraine using MODIS data. Remote Sens. Environ. 2010, 114, 1312–1323. [Google Scholar] [CrossRef]
  3. Dong, T.F.; Liu, J.; Qian, B.; **g, Q.; Croft, H.; Chen, J.; Wang, J.; Huffman, T.; Shang, J.; Chen, P. Deriving maximum light use efficiency from crop growth model and satellite data to improve crop biomass estimation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 104–117. [Google Scholar] [CrossRef]
  4. Sabry, R. Terrain and surface modeling using polarimetric SAR data features. IEEE Trans. Geosci. Remote Sens. 2015, 54, 1170–1184. [Google Scholar] [CrossRef]
  5. Wu, W.; Guo, H.; Li, X. Urban area SAR image man-made target extraction based on the product model and the time–frequency analysis. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 8, 943–952. [Google Scholar] [CrossRef]
  6. Ren, Y.; Li, X.M.; Gao, G.P.; Busche, T.E. Derivation of sea surface tidal current from spaceborne SAR constellation data. IEEE Trans. Geosci. Remote Sens. 2017, 55, 1–12. [Google Scholar] [CrossRef] [Green Version]
  7. Jafari, M.; Maghsoudi, Y.; Zoej, M.J.V. A new method for land cover characterization and classification of polarimetric sar data using polarimetric signatures. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 3595–3607. [Google Scholar] [CrossRef]
  8. Lee, J.S.; Grunes, M.R.; Kwok, R. Classification of multi-look polarimetric SAR imagery based on complex Wishart distribution. Int. J. Remote Sens. 1992, 15, 2299–2311. [Google Scholar] [CrossRef]
  9. Lardeux, C.; Frison, P.L.; Tison, C.; Souyris, J.C.; Stoll, B.; Fruneau, B.; Rudant, J.P. Support vector machine for multifrequency SAR polarimetric data classification. IEEE Trans. Geosci. Remote Sens. 2009, 47, 4143–4152. [Google Scholar] [CrossRef]
  10. Maghsoudi, Y.; Collins, M.; Leckie, D.G. Polarimetric classification of boreal forest using nonparametric feature selection and multiple classifiers. Int. J. Appl. Earth Obs. Geoinform. 2012, 19, 139–150. [Google Scholar] [CrossRef]
  11. Hellmann, M. A new approach for interpretation of SAR-data using polarimetric techniques. Sensing and Managing the Environment. In Proceedings of the IEEE International Geoscience and Remote Sensing, Symposium (IGARSS), Seattle, WA, USA, 6–10 July 1998; pp. 2195–2197. [Google Scholar]
  12. Cloude, S.R.; Pottier, E. A review of target decomposition theorems in radar polarimetry. IEEE Trans. Geosci. Remote Sens. 1996, 34, 498–518. [Google Scholar] [CrossRef]
  13. Lee, J.S.; Grunes, M.R.; Ainsworth, T.L.; Pottier, E.; Krogager, E.; Boerner, W.M. Quantitative comparison of classification capability: Fully-polarimetric versus partially polarimetric SAR. In Proceedings of the IEEE International Geoscience & Remote Sensing Symposium(IGARSS), Honolulu, HI, USA, 24–28 July 2000; pp. 1101–1103. [Google Scholar]
  14. Mohan, S.; Das, A.; Haldar, D.; Maity, S. Monitoring and retrieval of vegetation parameter using multi-frequency polarimetric SAR data. In Proceedings of the International Asia-pacific Conference on Synthetic Aperture Radar (APSAR), Seoul, Korea, 26–30 September 2011; pp. 1–4. [Google Scholar]
  15. Chen, S.W.; Tao, C.S. PolSAR image classification using polarimetric-feature-driven deep convolutional neural network. IEEE Geosci. Remote Sens. Lett. 2018, 15, 627–631. [Google Scholar] [CrossRef]
  16. Zhou, Y.; Wang, H.; Xu, F.; **, Y.Q. Polarimetric sar image classification using deep convolutional neural networks. IEEE Geosci. Remote Sens. Lett. 2017, 13, 1935–1939. [Google Scholar] [CrossRef]
  17. Pierce, L.E.; Ulaby, F.T.; Sarabandi, K.; Dobson, M.C. Knowledge-based classification of polarimetric SAR images. IEEE Trans. Geosci. Remote Sens. 1994, 32, 1081–1086. [Google Scholar] [CrossRef]
  18. Ferrazzoli, P.; Guerriero, L.; Schiavon, G. Experimental and model investigation on radar classification capability. IEEE Trans. Geosci. Remote Sens. 1999, 37, 960–968. [Google Scholar] [CrossRef] [Green Version]
  19. Skriver, H.; Mattia, F.; Satalino, G.; Balenzano, A.; Pauwels, V.R.N.; Verhoest, N.E.C.; Davidson, M. Crop classification using short-revisit multitemporal SAR data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2011, 4, 423–431. [Google Scholar] [CrossRef]
  20. Kussul, N.; Lemoine, G.; Gallego, F.J.; Skakun, S.V.; Lavreniuk, M.; Shelestov, A.Y. Parcel-based crop classification in Ukraine using Landsat-8 data and Sentinel-1A data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 2500–2508. [Google Scholar] [CrossRef]
  21. Veloso, A.; Mermoz, S.; Bouvet, A.; Le Toan, T.; Planells, M.; Dejoux, J.F.; Ceschia, E. Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications. Remote Sens. Environ. 2017, 199, 415–426. [Google Scholar] [CrossRef]
  22. Hoang, H.K.; Bernier, M.; Duchesne, S.; Tran, Y.M. Rice map** using RADARSAT-2 dual-and quad-pol data in a complex land-use watershed: Cau River Basin (Vietnam). IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 3082–3096. [Google Scholar] [CrossRef] [Green Version]
  23. White, L.; Millard, K.; Banks, S.; Richardson, M.; Pasher, J.; Duffe, J. Moving to the RADARSAT constellation mission: Comparing synthesized compact polarimetry and dual polarimetry data with fully polarimetric RADARSAT-2 data for image classification of peatlands. Remote Sens. 2017, 9, 573. [Google Scholar] [CrossRef] [Green Version]
  24. McNairn, H.; Shang, J.; Jiao, X.; Champagne, C. The contribution of ALOS PALSAR multipolarization and polarimetric data to crop classification. IEEE Trans. Geosci. Remote Sens. 2009, 47, 3981–3992. [Google Scholar] [CrossRef] [Green Version]
  25. Lucas, R.; Rebelo, L.M.; Fatoyinbo, L.; Rosenqvist, A.; Itoh, T.; Shimada, M.; Simard, M.; Souzafilho, P.W.; Thomas, N.; Trettin, C. Contribution of L-band SAR to systematic global mangrove monitoring. Mar. Freshw. Res. 2014, 65, 589–603. [Google Scholar] [CrossRef]
  26. Li, X.M.; Lehner, S. Algorithm for sea surface wind retrieval from TerraSAR-X and TanDEM-X data. IEEE Trans. Geosci. Remote Sens. 2013, 52, 2928–2939. [Google Scholar] [CrossRef] [Green Version]
  27. Mattia, F.; Satalino, G.; Balenzano, A.; D’Urso, G.; Capodici, F.; Iacobellis, V.; Milella, P.; Gioia, A.; Rinaldi, M.; Ruggieri, S. Time series of COSMO-SkyMed data for landcover classification and surface parameter retrieval over agricultural sites. In Proceedings of the 2012 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Munich, Germany, 22–27 July 2012; pp. 6511–6514. [Google Scholar]
  28. Wei, S.; Zhang, H.; Wang, C.; Wang, Y.; Xu, L. Multi-temporal SAR data large-scale crop map** based on U-Net model. Remote Sens. 2019, 11, 68. [Google Scholar] [CrossRef] [Green Version]
  29. Teimouri, N.; Dyrmann, M.; Jørgensen, R.N. A Novel Spatio-Temporal FCN-LSTM Network for Recognizing Various Crop Types Using Multi-Temporal Radar Images. Remote Sens. 2019, 11, 990. [Google Scholar] [CrossRef] [Green Version]
  30. Zhou, Y.N.; Luo, J.; Feng, L.; Zhou, X. DCN-Based Spatial Features for Improving Parcel-Based Crop Classification Using High-Resolution Optical Images and Multi-Temporal SAR Data. Remote Sens. 2019, 11, 1619. [Google Scholar]
  31. Zhong, L.; Hu, L.; Zhou, H. Deep learning based multi-temporal crop classification. Remote Sen Environ. 2019, 221, 430–443. [Google Scholar] [CrossRef]
  32. Yang, H.; Pan, B.; Wu, W.; Tai, J. Field-based rice classification in Wuhua county through integration of multi-temporal Sentinel-1A and Landsat-8 OLI data. Int. J. Appl. Earth Obs. Geoinf. 2018, 69, 226–236. [Google Scholar] [CrossRef]
  33. Guo, J.; Wei, P.L.; Liu, J.; **, B.; Su, B.F.; Zhou, Z.S. Crop classification based on differential characteristics of H/Alpha scattering parameters for multitemporal quad-and dual-polarization SAR images. IEEE Trans. Geosci. Remote Sens. 2018, 56, 6111–6123. [Google Scholar] [CrossRef]
  34. Ji, S.; Zhang, C.; Xu, A.; Shi, Y.; Duan, Y. 3D convolutional neural networks for crop classification with multi-temporal remote sensing images. Remote Sens. 2018, 10, 75. [Google Scholar] [CrossRef] [Green Version]
  35. Sonobe, R. Parcel-Based Crop Classification Using Multi-Temporal TerraSAR-X Dual Polarimetric Data. Remote Sens. 2019, 11, 1148. [Google Scholar] [CrossRef] [Green Version]
  36. Xu, L.; Zhang, H.; Wang, C.; Zhang, B.; Liu, M. Crop Classification Based on Temporal Information Using Sentinel-1 SAR Time-Series Data. Remote Sens. 2019, 11, 53. [Google Scholar] [CrossRef] [Green Version]
  37. Freeman, A.; Durden, S.L. A three-component scattering model for polarimetric SAR data. IEEE Trans. Geosci. Remote Sens. 1998, 36, 963–973. [Google Scholar] [CrossRef] [Green Version]
  38. Huynen, J.R. Phenomenological Theory of Radar Targets; Technical University: Delft, The Netherlands, 1978; pp. 653–712. [Google Scholar]
  39. Yamaguchi, Y.; Moriyama, T.; Ishido, M.; Yamada, H. Four-component scattering model for polarimetric SAR image decomposition. IEEE Trans. Geosci. Remote Sens. 2005, 104, 1699–1706. [Google Scholar] [CrossRef]
  40. Cloude, S.R.; Pottoer, E. An entropy based classification scheme for land applications of polarimetric SAR. IEEE Trans. Geosci. Remote Sens. 1997, 35, 68–78. [Google Scholar] [CrossRef]
  41. Jolliffe, I.T. Principal Component Analysis; Springer: Heidelberg/Berlin, Germany, 2002; pp. 1094–1096. [Google Scholar]
  42. Roweis, S.T.; Saul, L.K. Nonlinear dimensionality reduction by locally linear embedding. Science 2000, 290, 2323–2326. [Google Scholar] [CrossRef] [Green Version]
  43. Lee, J.-M.; Yoo, C.; Choi, S.W.; Vanrolleghem, P.A.; Lee, I.-B. Nonlinear process monitoring using kernel principal component analysis. Chem. Eng. Sci. 2004, 59, 223–234. [Google Scholar] [CrossRef]
  44. Hong, C.; Yeung, D.Y. Robust locally linear embedding. Pattern Recognit. 2006, 39, 1053–1065. [Google Scholar]
  45. Mou, L.; Ghamisi, P.; Zhu, X.X. Deep recurrent neural networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3639–3655. [Google Scholar] [CrossRef] [Green Version]
  46. Shrivastava, A.; Gupta, A.; Girshick, R. Training region-based object detectors with online hard example mining. In Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition(CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 761–769. [Google Scholar]
  47. Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [Green Version]
  48. Zhu, X.X.; Tuia, D.; Mou, L.; **a, G.S.; Zhang, L.; Xu, F.; Fraundorfer, F. Deep learning in remote sensing: A comprehensive review and list of resources. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–36. [Google Scholar] [CrossRef] [Green Version]
  49. Chen, Y.; Lin, Z.; **ng, Z.; Gang, W.; Gu, Y. Deep learning-based classification of hyperspectral data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 7, 2094–2107. [Google Scholar] [CrossRef]
  50. Paul, S.; Kumar, D.N. Spectral-spatial classification of hyperspectral data with mutual information based segmented stacked autoencoder approach. ISPRS J. Photogramm. Remote Sens. 2018, 138, 265–280. [Google Scholar] [CrossRef]
  51. Lee, J.S.; Grunes, M.R.; Ainsworth, T.L.; Du, L.J.; Schuler, D.L.; Cloude, S.R. Unsupervised classification using polarimetric decomposition and the complex Wishart classifier. IEEE Trans. Geosci. Remote Sens. 2002, 37, 2249–2258. [Google Scholar]
  52. Bai, Y.; Peng, D.; Yang, X.; Chen, L.; Yang, W. Supervised feature selection for polarimetric SAR classification. In Proceedings of the 2014 12th International Conference on Signal Processing (ICSP), Hangzhou, China, 19–23 October 2014; pp. 1006–1010. [Google Scholar]
  53. Dong, G.; Liao, G.; Liu, H.; Kuang, G. A Review of the Autoencoder and Its Variants: A Comparative Perspective from Target Recognition in Synthetic-Aperture Radar Images. IEEE Geosci. Remote Sens. Mag. 2018, 6, 44–68. [Google Scholar] [CrossRef]
  54. Ng, A. Sparse Autoencoder; CS294A Lecture notes; Stanford: San Francisco, CA, USA, 2011; Volume 72, pp. 1–19. [Google Scholar]
  55. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition(CVPR), Boston, MA, USA, 8–12 June 2015; pp. 1–9. [Google Scholar]
  56. Caves, R. Final Report: Technical Assistance for the Implementation of the AgriSAR 2009 Campaign; Tech. Rep. 22689/09; ESA: Paris, France, 2009. [Google Scholar]
  57. Caves, R.; Davidson, G.; Padda, J.; Ma, A. Data Analysis-Crop Classification; Tech. Rep. 22689/09/NL/FF/ef; ESA: Paris, France, 2011. [Google Scholar]
  58. Caves, R.; Davidson, G.; Padda, J.; Ma, A. Data Analysis-Multi-Temporal Filtering; Tech. Rep. 22689/09/NL/FF/ef; ESA: Paris, France, 2011. [Google Scholar]
  59. Tao, C.; Chen, S.; Li, Y.; **ao, S. PolSAR land cover classification based on roll-invariant and selected hidden polarimetric features in the rotation domain. Remote Sens. 2017, 9, 660. [Google Scholar]
Figure 1. Flow chart of the proposed method. CNN: convolutional neural network, PolSAR: polarimetric synthetic aperture radar, S-SAE: stacked sparse auto-encoder.
Figure 1. Flow chart of the proposed method. CNN: convolutional neural network, PolSAR: polarimetric synthetic aperture radar, S-SAE: stacked sparse auto-encoder.
Remotesensing 12 00321 g001
Figure 2. Single-layer auto-encoder (AE) network.
Figure 2. Single-layer auto-encoder (AE) network.
Remotesensing 12 00321 g002
Figure 3. An S-SAE network structure.
Figure 3. An S-SAE network structure.
Remotesensing 12 00321 g003
Figure 4. S-SAE network fine-tuning process.
Figure 4. S-SAE network fine-tuning process.
Remotesensing 12 00321 g004
Figure 5. CNN classifier architecture.
Figure 5. CNN classifier architecture.
Remotesensing 12 00321 g005
Figure 6. Location maps of the experimental sites from Google Earth: (a) Indian Head and (b) Flevoland.
Figure 6. Location maps of the experimental sites from Google Earth: (a) Indian Head and (b) Flevoland.
Remotesensing 12 00321 g006
Figure 7. Ground truth maps of the experimental sites: (a) Indian Head and (b) Flevoland.
Figure 7. Ground truth maps of the experimental sites: (a) Indian Head and (b) Flevoland.
Remotesensing 12 00321 g007
Figure 8. Some typical features: (a) amplitude of HH-VV correlation, (b) phase difference of HH-VV, (c) co-polarized ratio, (d) null angle parameters— θ _ Re [ T 12 ] , and (e) null angle parameters— θ _ Im [ T 12 ] . H: horizontal polarization, V: vertical polarization.
Figure 8. Some typical features: (a) amplitude of HH-VV correlation, (b) phase difference of HH-VV, (c) co-polarized ratio, (d) null angle parameters— θ _ Re [ T 12 ] , and (e) null angle parameters— θ _ Im [ T 12 ] . H: horizontal polarization, V: vertical polarization.
Remotesensing 12 00321 g008
Figure 9. Classification accuracies and training time with different training ratios: (a) classification accuracy and (b) training time.
Figure 9. Classification accuracies and training time with different training ratios: (a) classification accuracy and (b) training time.
Remotesensing 12 00321 g009
Figure 10. Classification accuracy with 1% pretraining samples.
Figure 10. Classification accuracy with 1% pretraining samples.
Remotesensing 12 00321 g010
Figure 11. Classification accuracies using S-SAEs with different hidden units: (a) OA map and (b) kappa map.
Figure 11. Classification accuracies using S-SAEs with different hidden units: (a) OA map and (b) kappa map.
Remotesensing 12 00321 g011
Figure 12. Classification accuracies using SAEs with different λ .
Figure 12. Classification accuracies using SAEs with different λ .
Remotesensing 12 00321 g012
Figure 13. Classification accuracies using S-SAEs with different β and ρ : (a) log 10 λ = 1 and (b) log 10 λ = 2 .
Figure 13. Classification accuracies using S-SAEs with different β and ρ : (a) log 10 λ = 1 and (b) log 10 λ = 2 .
Remotesensing 12 00321 g013
Figure 14. Classification results and ground truth map for the Indian Head site: (a) ground truth map, (A) crop area, (b) complex Wishart classifier, (c) Chen + CNN, (d) long short-term memory (LSTM), (e) locally linear embodied (LLE) + CNN, and (f) S-SAE + CNN. (BF) are the error maps of (bf).
Figure 14. Classification results and ground truth map for the Indian Head site: (a) ground truth map, (A) crop area, (b) complex Wishart classifier, (c) Chen + CNN, (d) long short-term memory (LSTM), (e) locally linear embodied (LLE) + CNN, and (f) S-SAE + CNN. (BF) are the error maps of (bf).
Remotesensing 12 00321 g014
Figure 15. Visualization of the dimensionality reduction features: the first row gives the features of PCA, the second row gives the features of LLE, and the third row gives the features of the S-SAE.
Figure 15. Visualization of the dimensionality reduction features: the first row gives the features of PCA, the second row gives the features of LLE, and the third row gives the features of the S-SAE.
Remotesensing 12 00321 g015
Figure 16. Values of standard deviation: (a) within the same class and (b) between different classes.
Figure 16. Values of standard deviation: (a) within the same class and (b) between different classes.
Remotesensing 12 00321 g016
Figure 17. Classification results and ground truth map for the Flevoland site: (a) ground truth map, (A) crop area, (b) complex Wishart classifier, (c) Chen + CNN, (d) LSTM, (e) LLE + CNN, and (f) S-SAE + CNN. (BF) are the error maps of (bf).
Figure 17. Classification results and ground truth map for the Flevoland site: (a) ground truth map, (A) crop area, (b) complex Wishart classifier, (c) Chen + CNN, (d) LSTM, (e) LLE + CNN, and (f) S-SAE + CNN. (BF) are the error maps of (bf).
Remotesensing 12 00321 g017
Figure 18. OA and Kappa of classification results by PCA and S-SAE with different reduced features.
Figure 18. OA and Kappa of classification results by PCA and S-SAE with different reduced features.
Remotesensing 12 00321 g018
Table 1. Data acquisition dates and basic information of the main crops in the experimental sites.
Table 1. Data acquisition dates and basic information of the main crops in the experimental sites.
Experimental SitesAcquisition DatesCategory CodeCrop TypeNumber of PixelsProportion
Indian Head(Canada)21 April
15 May
8 June
2 July
26 July
19 August
12 September
H1Lentil215,65910.62%
H2Durum Wheat98,9274.87%
H3Spring Wheat571,20528.13%
H4Field Pea252,27712.43%
H5Oat70,5413.47%
H6Canola452,06822.27%
H7Grass23,4521.16%
H8Mixed Pasture14,6080.72%
H9Mixed Hay27,1351.34%
H10Barley106,0225.22%
H11Summer fallow22,0671.09%
H12Flax127,7576.29%
H13Canary seed45,9152.26%
H14Chemical fallow26820.13%
Flevoland(Netherlands)21 April
15 May
8 June
2 July
26 July
19 August
12 September
F1Carrots4400.1%
F2Flower bulbs11,4992.58%
F3Fruit10,1982.29%
F4Grass33,7877.58%
F5Lucerne22550.51%
F6Maize18,2534.09%
F7Misc31,5737.08%
F8Onions41,0019.19%
F9Peas71051.59%
F10Potato100,04022.43%
F11Spring barley63401.42%
F12Spring wheat17,9914.03%
F13Sugarbeet58,40313.09%
F14Winter wheat107,14224.02%
Table 2. The 36 dimensional features from a single temporal PolSAR image.
Table 2. The 36 dimensional features from a single temporal PolSAR image.
Feature Extraction MethodsFeaturesDimension
Features based on measured dataPolarization intensities( | S H H | , | S H V | , | S V V | ) 3
Amplitude of HH-VV correlation( | S V V S H V / | S H H | 2 | S V V | 2 | )1
Phase difference of HH-VV( atan [ Im ( S H H S V V ) / Re ( S H H S V V ) ] )1
Co-polarized ratio( 10 log 10 ( | S H V | 2 / | S V V | 2 ) )1
Cross-polarized ratio( 10 log 10 ( | S V V | 2 / | S H H | 2 ) )1
Co-polarization ratio( 10 log 10 ( | S H V | 2 / | S H H | 2 ) )1
Degrees of polarization( | S H V | 2 / ( | S H H | 2 + | S V V | 2 ) , | S V V | 2 / ( | S H H | 2 + | S H V | 2 ) )2
Incoherent decompositionFreeman decomposition5
Yamaguchi decomposition7
Cloude decomposition3
Huynen decomposition9
Other DecompositionNull angle parameters2
Sum36
* Re [ · ] and Im [ · ] are the real and imaginary parts, respectively.
Table 3. Classification accuracy using S-SAEs with different depths.
Table 3. Classification accuracy using S-SAEs with different depths.
AlgorithmArchitectureMethodClassification Accuracy (%)
0.3% PRA Ratio1% PRA Ratio5% PRA Ratio
OAKappaOAKappaOAKappa
AE252 - 9PRA63.6152.3067.0157.2670.3562.01
PRA+FT73.1065.7576.7470.5178.9173.55
S-AE2252 - 100 - 9 PRA73.0965.8374.3467.5776.2370.12
PRA+FT76.4270.3177.2371.3577.7472.05
S-AE3252 - 100 - 50 - 9PRA73.0865.7974.7768.0176.5670.48
PRA+FT76.4570.2779.7574.6779.3174.07
S-AE4252 - 100 - 50 -110 - 9PRA72.9165.5674.3067.3877.0071.03
PRA+FT74.1667.2377.8272.1377.5171.73
S-AE5252 - 100 - 50 -110 - 30 - 9PRA72.3764.8374.1267.1476.4770.73
PRA+FT75.9169.5878.4673.0078.1472.56
PRA stands for pretraining, PRA+FT stands for pretraining and fine-tuning, OA stands for overall accuracy.
Table 4. Optimal parameter settings of each layer in an S-SAE for the Indian Head site.
Table 4. Optimal parameter settings of each layer in an S-SAE for the Indian Head site.
Parameters L 1 L 2 L 3
λ 0.00240.020.001
β 0.60.20.4
ρ 0.50.550.25
Table 5. Comparison of the support vector machine (SVM) classification accuracy with different dimension reduction methods for the Indian Head site. PCA: Principle component analysis.
Table 5. Comparison of the support vector machine (SVM) classification accuracy with different dimension reduction methods for the Indian Head site. PCA: Principle component analysis.
MethodsOA (%)Kappa (%)Classification Accuracy of the 14 Crops (%)
H1H2H3H4H5H6H7H8H9H10H11H12H13H14
PCA65.0054.88881964819044233181810
LLE65.1955.239019648189483361192110
AE(1)67.2157.55831978339046373252310
AE(2)76.7470.518829892229435737293556292
S-AE(3)79.7574.679059893269562257404365340
S-AE(4)80.4775.629259894249468359434965440
S-SAE81.7777.2092798943196727484158695325
Table 6. Comparison of the CNN classification accuracy with different dimension reduction methods in Indian Head site.
Table 6. Comparison of the CNN classification accuracy with different dimension reduction methods in Indian Head site.
MethodsOA (%)Kappa (%)Classification Accuracy of the 14 Crops (%)
H1H2H3H4H5H6H7H8H9H10H11H12H13H14
PCA87.5984.879655969256987850796176804923
LLE88.3085.809751949462997662796386846061
AE(1)87.8685.299144929763998859717089797531
AE(2)93.0391.589670969979998855838089878692
S-AE(3)94.7493.6598699799811008558888495949361
S-AE(4)94.9593.9298719799811008359829291949879
S-SAE95.4494.5199789799801008957868897949475
Table 7. Classification accuracy with different methods and training ratios for the Indian Head site.
Table 7. Classification accuracy with different methods and training ratios for the Indian Head site.
Methods1%5%10%
OA (%)Kappa (%)OA (%)Kappa (%)OA (%)Kappa (%)
Complex Wishart60.8655.8961.0556.0861.2656.28
LLE + SVM65.1955.2369.9361.6072.7565.35
S-SAE + SVM81.7777.2082..3578.0184.0680.17
Chen + SVM66.9957.4571.6163.7772.1364.47
Chen + CNN91.7490.0496.3595.5597.9397.51
LSTM69.6761.3180.7476.0782.8378.76
LLE + CNN88.3085.8096..6696.0099.2398.95
S-SAE + CNN95.4494.5199.0898.8999.6199.53
Table 8. Optimal parameter settings of each layer in the S-SAE for the Flevoland site.
Table 8. Optimal parameter settings of each layer in the S-SAE for the Flevoland site.
Parameters L 1 L 2 L 3
λ 0.0010.020.001
β 0.60.20.4
ρ 0.250.550.25
Table 9. Comparison of CNN classification accuracy with different dimension reduction methods in Flevoland site.
Table 9. Comparison of CNN classification accuracy with different dimension reduction methods in Flevoland site.
MethodsOA (%)Kappa (%)Classification Accuracy of the 14 Crops (%)
F1F2F3F4F5F6F7F8F9F10F11F12F13F14
PCA87.3084.94094918964837087599557329096
LLE87.9085.69595799262866789809456329497
AE(1)86.6284.161293928245847284799646189594
AE(2)88.0985.891394889050877088689549349397
S-AE(3)88.9886.95098868682747486819662399798
S-AE(4)89.9387.98397918736878088639774409498
SAE91.0989.473496929155917990809864449597
S-SAE91.6390.113597929367907787859886419798
Table 10. Classification accuracy with the different methods and training ratios for the Flevoland site.
Table 10. Classification accuracy with the different methods and training ratios for the Flevoland site.
Methods1%5%10%
OA (%)Kappa (%)OA (%)Kappa (%)OA (%)Kappa (%)
Complex Wishart77.5073.9276.7473.0876.1172.34
LLE + SVM80.2276.0782.6377.5984.1579.24
S-SAE + SVM82.6679.2284.8181.8386.0683.36
Chen + SVM75.4272.3378.3775.0180.2478.55
Chen + CNN81.1977.6293.5792.4096.4195.76
LSTM73.9368.5176.5271.7279.7775.71
LLE + CNN87.9085.6994.9994.0997.2896.30
S-SAE + CNN91.6390.1195.9095.1797.5797.14

Share and Cite

MDPI and ACS Style

Guo, J.; Li, H.; Ning, J.; Han, W.; Zhang, W.; Zhou, Z.-S. Feature Dimension Reduction Using Stacked Sparse Auto-Encoders for Crop Classification with Multi-Temporal, Quad-Pol SAR Data. Remote Sens. 2020, 12, 321. https://doi.org/10.3390/rs12020321

AMA Style

Guo J, Li H, Ning J, Han W, Zhang W, Zhou Z-S. Feature Dimension Reduction Using Stacked Sparse Auto-Encoders for Crop Classification with Multi-Temporal, Quad-Pol SAR Data. Remote Sensing. 2020; 12(2):321. https://doi.org/10.3390/rs12020321

Chicago/Turabian Style

Guo, Jiao, Henghui Li, Jifeng Ning, Wenting Han, Weitao Zhang, and Zheng-Shu Zhou. 2020. "Feature Dimension Reduction Using Stacked Sparse Auto-Encoders for Crop Classification with Multi-Temporal, Quad-Pol SAR Data" Remote Sensing 12, no. 2: 321. https://doi.org/10.3390/rs12020321

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop