1. Introduction
Gypsum karsts have been observed in many countries. These landforms cover an area of approximately 5% of Turkey [
1]. The largest part of the distribution of gypsum is located in the Sivas basin in Turkey. This region is one of the rare areas in the world where karstic shapes with unique features are found most prominently.
For the formation and development of karstification, rocks must exist that can dissolve in water. The more easily the rocks are dissolvable in water, the faster the development of the karst topography. The term karst topography is mostly associated with limestones. However, relatively large karstic depressions are not observed in those types of areas. Gypsum is a type of low-durable rock with a hardness of around 2.0, and it melts much more easily than limestone. Therefore, its melting rate is higher than that of limestone. Dolines are the most distinctive and common landforms of karstic lands. They have a mostly circular shape, but sometimes, they can also be in different shapes. Their long axes or diameters can vary from a few meters to a kilometer. In general terms, dolines are formed by the dissolution of gypsum at the bottom and the gradual settling that occurs on the surface. Karstic landforms are formed more quickly and easily in gypsum fields. Determining the characteristics of karst shapes is of great importance in terms of both understanding the formation processes and determining new land-use models.
The collapse of gypsum in most countries results in damage to many living areas and various engineering structures. This event creates natural dangers in terms of human safety, agricultural activities, and the economy [
2,
3,
4,
5,
6,
7]. In addition, it has been observed that most surface water and groundwater are enriched in sulfate by dissolving gypsum in gypsiferous areas. Therefore, the determination of gypsiferous areas and their karstification characteristics should be taken into account in national- and local-scale planning in terms of addressing natural hazards and water resources.
The rapid detection of karstic shapes and their time-dependent changes are very important parameters in terms of predicting damage in karstic lands. Map** the dolines in an area is the basis for predicting other dolines that may occur in the future. Creating a doline inventory map is quite difficult, especially in polycarst areas [
8], and this is because there are often thousands of very similar dolines on the field.
Image segmentation is commonly used to find objects and borders in images [
9,
10,
11,
12,
13]. Semantic and instance segmentation methods have been used in image segmentation projects. In this study, we used U-Net, which is one of the semantic segmentation methods that is used to predict doline areas. With this method, the image is automatically classified by assigning all of the pixels in an image to any defined class. This method uses convolutional neural network (CNN) architecture. A U-Net convolutional network [
14] is an ideal network for semantic segmentation studies. U-Net consists of two parts: an encoder and decoder. It does not need a large number of labeled data, unlike other methods, and it can learn with a small number of effective data. Additionally, with U-Net, skip connections are used to transfer information from previous convolution layers to deconvolution layers. This increases segmentation performance. For this reason, a U-Net model was preferred in detecting dolines in this paper. The U-Net model requires very few training images and yields more precise results in terms of pixel locations. Therefore, it is often preferred in earth science and other studies [
15,
16,
17,
18,
19,
20,
21].
The novelty of this paper is that the dolines were detected automatically by U-Net’s semantic segmentation via transfer learning. There are segmentation studies in which features such as buildings, roofs, and roads were extracted from orthophoto images. However, there have been no studies on doline segmentation. The study area has thousands of dolines, and it is almost impossible to determine all of them with classical field studies. Our model successfully predicted dolines in a randomly selected area.
Literature Review
In the literature, there have been many studies conducted with various methods, especially in similar areas using machine learning and deep learning methods on doline distribution such as the following: object detection [
22,
23,
24,
25,
26], object classification [
27], image segmentation [
28,
29,
30], change detection [
31], building objects map** [
32], vegetation map** [
33], and structural controls [
34].
In the study conducted by Mochales et al. [
22], dolines were located in the Ebro basin in northern Spain. The dolines identified were filled with alluvial sediments, agricultural soils, and urban debris. For this purpose, magnetic susceptibility measurements were used, which revealed a remarkable contrast between the host rocks and cavity fillings. Pueyo-Anchuela et al. [
23] proposed a geophysical routine for use in alluvial karst regions to create an integration-based model of the many methodologies in their paper. For the detection of doline areas, the geophysical characteristics involved in each approach, as well as their solutions and uncertainties, were investigated. The suggested sequence of implemented procedures indicates a gradual rise in survey time consumption, ambiguity reduction, and improved resolution. In the study by Nahhas et al. [
24], a deep learning (DL)-based building detection approach using a combination of Light Detection and Range (LiDAR) data and orthophotos was investigated. The proposed model was employed on two datasets taken from an urban region with different building types, and it was tested with 21 features and 10 features. The experimental results showed that the accuracy of the tests with size reduction was higher both in the study region and in the test region. Hussain et al. [
25] investigated the Vodose and Fluvial caves in Tarimba (Goias, Brazil) with several geophysical approaches. They determined that the findings obtained from their studies were compatible with real field conditions. They found circular and concave landforms formed by karstic processes, which are known as sinkholes or dolines. In their study, Čarni et al. [
26] propose using indicator species to describe sections of several dolines and their common species to determine dolines with significant conservation value for cold-adapted species. The primary goal of their work was to classify the dolines into landform-vegetation units (LVUs) by considering essential geomorphic features and indicative plant species.
Zero-shot learning (ZSL) is a method for identifying unseen objects during the training phase that has been known to be beneficial in real-world situations. In the study performed by Pradhan et al. [
27], the integration of CNN and ZSL was used as a classification and feature extraction technique to map the land cover using high-resolution orthophotos. High accuracy values were obtained in the experiments, and the effectiveness of ZSL in land cover map** based on high-resolution photographs has been proven.
Abdollahi et al. [
28] presented two novel deep convolution models for the segmentation of multiple objects from aerial photos such as buildings and roads based on the U-Net family. Their attention was directed toward buildings and road networks due to their large presence within urban regions. The presented models are called U-Net with multi-level context gate (MCG-UNet) and bidirectional ConvLSTM UNet model (BCL-UNet). Compared to U-Net and BCL-UNet, MCG-UNet increased the average F1 success by 1.85% and 1.19% for road extraction and by 6.67% and 5.11% for building extraction, respectively. Road parts extraction is of great importance in most Geographic Information System (GIS) applications. To derive the road class from orthophoto images, Abdollahi and Pradhan [
29] employed an integrated technique that included segmentation and classification methods with connected component analysis. There are three steps to the approach that has been proposed. The fractured pictures were first segmented using the multi-resolution segmentation approach. Using three different classification techniques (support vector machines, decision trees, and k-nearest neighbor), the results were then split into two groups: road and non-road. Finally, connected component labeling was utilized to extract road components, and morphological processing was performed to increase performance by removing off-road parts and noises. The current methods described in the literature for road extraction result in fragmented output due to obstructions like shadows, structures, and vehicles. To address the fragmented results, Abdollahi et al. [
30] presented SC-roadDeepNet, which is a deep learning-based architecture that preserves structure shape and connection. In comparison to alternative models like LinkNet, ResUNet, U-Net, and VNet, the proposed approach enhances the F1 score value by 5.49%, 4.03%, 3.42%, and 2.27%, respectively.
In the study conducted by Ghasemkhani et al. [
31], the proposed model describes the conversion of bare lands into settled or developed areas. The model consists of a fuzzy system and Ordered Weight Averaging (OWA) methods together. The applied model consists of four parts: physical fitness, accessibility, neighborhood effect, and calculation of general fitness. Experiments have shown that the proposed model predicts changes with high accuracy.
Abdollahi and Pradhan [
32] made studies related to the map** of building objects from aerial images. They proposed a novel deep learning structure called MultiRes-UNet. In their network, they used MultiRes-UNet blocks and convolutional operations with skip connections. They combined semantic edge knowledge with semantic polygons. They tested their model on the roof segmentation dataset and achieved a 0.78% increase in mIoU value.
In another study, Abdollahi and Pradhan [
33] studied urban vegetation map** and described how the output of the DNN model used to categorize vegetation can be interpreted using an annotation technique known as Shapley additive explanations (SHAPs). They evaluated the accuracy of their method by map** vegetation from aerial imagery using spectral and textural parameters, and their results showed an overall accuracy of 94.44%.
The solution dolines form characteristic landforms that are observed on the high plateaus of the Taurus Mountains. The study by Öztürk et al. [
34] concentrated on how the distribution and morphometric characteristics of the dolines in the western portion of the Central Taurus Mountains were influenced by tectonic structures, drainage patterns, and slope conditions. As a result of the research, it has been revealed that the fault and joint systems formed on the thick-bedded limestone between the thrust faults are effective on the change in the intensity of doline.
3. Results and Discussion
In this study, the Segmentation Models library was used for the segmentation process with transfer learning techniques. The transfer learning allows you to take pre-trained model weights for any task and reuse them for another task. In this technique, the layers and weights of a pre-trained model for classification purposes are used in the first part (encoder) of the U-Net architecture. Then, the layers in the second part of the U-Net (decoder) are trained with the augmented dataset. After preparing the data, the model was trained in U-Net architecture with three different pre-trained models. These are ResNet34, EfficientNetB3 and DenseNet121. Each model was trained and evaluated using the same parameters. Model results are close to each other. However, the results of DenseNet121 have higher values than the others. Therefore, the DenseNet121 model was used for the prediction of dolines.
A new area (not including the train and test data) was selected from the study area, and the model was run. This area is about 7.289 km
(
Figure 7). A Python script was written for the prediction task. This script loads the model, patches the image, makes a prediction, and creates a polygon file including dolines. Our script works successfully and is completed in a short time. This process was completed in approximately 30 s. A total of 808 dolines were predicted in the selected area. The total area of the dolines is 2.689 km
. This area corresponds to 37% of the total area. The smallest doline area is 1.5 m
, and the largest doline area is 115,775.2 m
. All the predicted doline areas were almost the same as the real shapes of dolines. The boundaries of dolines are correctly predicted as seen in
Figure 7b.
Figure 7a shows the predicted dolines, and
Figure 7b shows a zoomed view of an area of the image.
In this study, the Adam optimizer with a learning rate value of 0.0001 was used as an optimization algorithm. In addition, ReduceLROnPlateau is used, which reduces the learning rate when the metric stops develo**. The combination of Binary Focal Loss and Dice loss was used as the loss function. The batch size value was selected as 8. Higher batch sizes caused memory errors. The epoch value was selected as 50. Too many epochs can cause the model to overfit, and very few epochs can cause the model to underfit. The optimal epoch value can be determined by assessing training and test results.
The image size is a factor that affects the performance of the model. All classes must be visible in the images. If the image size is small, the number of data increases, but images could be produced in which all classes are not visible. If the image size is large, the number of data decreases. Datasets of 128 × 128, 256 × 256, and 320 × 320 dimensions were created to determine the optimum size. The best dimension was determined as 320 × 320 for our data.
Image augmentation is a method to overcome overfitting trouble in deep CNNs and is widely utilized to enhance performance on a variety of applications [
41]. The DenseNet121 model was tested with non-augmented data. The results show an overfitting problem and an unstable model (
Figure 8). The mean IoU value of the training set is 98%, while that of the test set is 70%. In addition, the loss value is not as low as expected. Therefore, data augmentation was applied in this study and the results were improved.
To create a robust model, it is necessary to determine the appropriate parameters. For this purpose, some parameters should be tuned. Tuning the parameters of the model requires a significant amount of time and computational resources, so researchers often decide on settings from previous experience. The settings used in similar studies can also be preferred. Also, some methods such as random search, and grid search automatically select the parameters [
43,
44]. In this study, loss functions, optimizers, and learning rates were tuned. Loss functions are one of the factors affecting model performance. They do not show the same performance in every model. For complex objectives such as segmentation, it is not possible to decide on a universal loss function [
45]. In our study, Jaccard loss, Dice loss, and Focal Loss were tested with the DenseNet121 model. In addition, Jaccard loss–Dice loss, Binary Focal Loss–Dice loss, and Binary Cross-Entropy Loss were combined and tested. The model was tested three times for every loss function. The mean IoU values of Dice loss, Binary Cross-Entropy Loss, Jackard loss, Binary Focal Loss, and Focal Loss were calculated as 0.7728%, 0.7755%, 0.7750%, 0.7258%, and 0.7206%, respectively. Also, some loss metrics were combined. The mean IoU values of Jaccard loss + Dice loss and Binary Focal Loss + Dice loss were calculated as 0.7494% and 0.7762%, respectively. The mean IoU value of Binary Focal Loss+Dice loss is higher than the others.
Optimizers are methods used to minimize loss function and to maximize performance. They are mathematical functions dependent on the model’s weights and biases. In this study, the three most used optimizers were performed. These are Adaptive Moment Estimation (Adam), Root Mean Square Propagation (RMS-Prop), and Stochastic Gradient Descent (SGD). The learning rate is a hyperparameter that controls how much the model weights are changed with each update. This is one of the most important parameters when configuring the model. The learning rate values of 0.1, 0.01, 0.001, and 0.0001 were tested for each optimizer. The tests were performed on the DenseNet121 model, and the batch size was chosen as 8. Test IoU values were used to evaluate the results. Results show us the best optimizer is Adam with a learning rate value of 0.0001 (
Figure 9). The result of the evaluation of RMS-Prop with a learning rate value of 0.0001 is very close to Adam’s. The best learning rate value for the SGD optimizer was determined as 0.1.
We trained our model three times with the same dataset and the same parameters for each transfer learning method. Mean IoU and mean F-score were calculated for training and test data, and the results are shown in
Table 1.
Experiments show that the results of each method are very close to each other. The best results were obtained by using DenseNet121. These are 0.8482, 0.9180, 0.7762, and 0.8661 for train IoU, train F-score, test IoU, and test F-score, respectively. These results are satisfactory for a segmentation task. IoU scores and loss graphs of the models are shown in
Figure 10.
After the models were created, the performance of each model was examined by selecting a single image from the test dataset. For this process, images in which dolines can be observed well were selected. The IoU values of the models and the predicted masks were compared. Performances are different in each image. ResNet34 was successful on some images, DenseNet121 was successful on others, and EfficientNet B3 was successful on others. In brief, no single model is successful on all images. However, borders of predicted masks of DenseNet121 are observed more appropriately in visual control (
Figure 11). These results can be combined in future study topics to benefit from all model results.
In the literature, various studies [
22,
23,
25] have been conducted on the detection of dolines. Their common feature is that they identify areas containing dolines using geotechnical methods. Geotechnical methods are also more costly and time consuming than deep learning methods. To the best of our knowledge, there is no study in the literature that detects dolines using deep learning and creates a doline inventory map.
The main objective of our article is to successfully detect and inventory uncovered collapse and dissolution dolines. To achieve this, our study utilizes the U-Net model and deep learning techniques. The proposed model gives successful results in large areas containing many dolines and where the depressions are not filled. However, distinct approaches may be necessary for areas where the dolin count is low, depressions are filled, and dolines are covered by vegetation. We recognize the need to develope specialized image processing techniques and employ appropriate suitable satellite imagery and filtering for the detection of covered dolines.
At this point, in future studies, a more specific method will be necessary for the detection and inventorying of covered dolines. This should particularly focus on which image processing techniques can be employed for the concealment and accurate detection of such dolines and which satellite images and filters should be preferred. This can be organized as part of a more comprehensive study on karst areas.
4. Conclusions
Karstic field studies are carried out to generally include measurements and evaluations made in the field. However, karstic areas can not be fully evaluated by field studies due to their characteristics and complex features. One of the reasons is their abundance in the field. Researching in these areas takes a lot of time and effort.
This study proposes a model for the fast and easy detection of dolines in the karstic areas. Deep CNN techniques were used for the segmentation of dolines. U-Net architecture, which requires less data and gives good results, is preferred for this task. All processes are completed automatically by Python scripts.
The results obtained from our model clearly reflect almost the entire karst structure in the field with all its morphological features. All predicted data were georeferenced and can be used in any GIS software. Thus, morphological measurements regarding dolines can be made easily. These results can be used in inventory studies, risk assessment studies, and geomorphological studies. Moreover, our model can be used to detect different types of landforms in future works.
The applied method introduces a novel approach to doline detection and map**, which is not commonly found in existing methods. The results indicate that the applied method achieves a higher level of accuracy in identifying and map** dolines, making it superior to previous techniques. Also, the method was designed to be more efficient and had allowed for faster doline detection and map**, which can be particularly valuable for large-scale applications.
In conclusion, while this article focuses on its achievements in detecting uncovered dolines, it lays the groundwork for guiding future research. Specialized studies will be required for the detection and inventorying of covered dolines, which will be an essential step for a more comprehensive analysis of karst areas.
Based on the findings and discussions presented in the paper, the main recommendation for further research would be to expand the application of the proposed method to different geographic regions and diverse geological conditions. This would help assess the method’s robustness and adaptability in various contexts and improve its generalizability for doline detection and map**. Additionally, exploring the potential integration of other data sources, such as LiDAR or hyperspectral imagery with the existing method could enhance its accuracy and applicability. Further investigations into the scalability of the approach for larger areas and its potential for automation would also be valuable for practical applications in environmental and geological studies.