Real-Time Infrared Sea–Sky Line Region Detection in Complex Environment Based on Deep Learning

Wang, Yongfei; Li, Fan; Zhao, Jianhui; Fu, Jian

doi:10.3390/jmse12071092

Open AccessArticle

Real-Time Infrared Sea–Sky Line Region Detection in Complex Environment Based on Deep Learning

School of Instrumentation and Optoelectronic Engineering, Beihang University, Bei**g 100191, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2024, 12(7), 1092; https://doi.org/10.3390/jmse12071092

Submission received: 12 May 2024 / Revised: 13 June 2024 / Accepted: 25 June 2024 / Published: 28 June 2024

(This article belongs to the Special Issue Machine Learning Methodologies and Ocean Science)

Download

Browse Figures

Versions Notes

Abstract

:

Fast and accurate infrared (IR) sea–sky line region (SSLR) detection can improve the early warning capability of the small targets that appear in the remote sea–sky junction. However, the traditional algorithms struggle to achieve high precision, while the learning-based ones have low detection speed. To overcome these problems, a novel learning-based algorithm is proposed; rather than detecting the sea–sky line first, the proposed algorithm directly provides SSLR, which mainly consists of three parts: Firstly, an IR sea–sky line region detection module (ISRDM) is proposed, which combines strip pooling and the connection mode of a cross-stage partial network to extract the features of the SSLR target, with an unbalanced aspect ratio, more specifically, thus improving the detection accuracy. Secondly, a lightweight backbone is presented to reduce the parameters of the model and, therefore, improve the inference speed. Finally, a Detection Head Based on the spatial-aware attention module (SAMHead) is designed to enhance the perception ability of the SSLR and further reduce the inference time. Extensive experiments conducted on three datasets with more than 26,000 frames show that the proposed algorithm achieved approximately 80% average precision (AP), outperforms the state-of-the-art algorithms in accuracy, and can realize real-time detection.

Keywords:

infrared (IR) sea–sky line region (SSLR) detection; infrared sea–sky line region detection module (ISRDM); spatial-aware attention module; deep learning

1. Introduction

Infrared (IR) sea–sky line (SSL) is the segmentation line of the sea and sky region under maritime research, serving as a crucial geographical marker and positional reference for detecting and tracking marine targets [1]. The strong penetrating capability of infrared imaging enhances its resistance to interference, enabling it to operate around the clock. As infrared targets move closer into view, they are first observed within the sea–sky line region (SSLR). Thus, pinpointing this zone’s location extends the warning distance for such targets, thereby improving early detection capabilities for small targets that appear in the remote sea–sky junction. Additionally, it can narrow the target detection range and enhance resistance to background noise interference. Consequently, the detection of the SSLR based on infrared imagery holds significant research value and has broad applications in sectors, such as infrared early warning systems, UAV navigation [2,3] and pose estimation, video surveillance, and target detection and tracking. However, traditional algorithms are limited in detection accuracy due to manual thresholding [4], and learning-based algorithms perform poorly in real time, so how to balance accuracy and speed is still an urgent problem.

Currently, infrared SSLR detection encounters three primary challenges. Firstly, the low resolution and long wavelength of infrared imaging often result in image blur, making the sea–sky line characteristics less distinct. Secondly, natural conditions such as waves, distant mountains, and thick clouds in the complex sea–sky environment pose interference. The wake of large ships or strong waves can mimic the sea–sky line, while obstacles like ships, buildings, and reefs near the sea–sky line further complicate detection. Thirdly, unmanned equipment at sea, like unmanned aerial vehicles, often lacks sufficient computing power for real-time detection due to various constraints. Many existing algorithms assume a straight sea–sky line in images, detecting its position first before extracting SSLR. However, this approach faces two main issues: Firstly, due to camera distortion and earth curvature, the sea–sky line is often not a straight line but a curved one with slight slopes. In this case, traditional line detection algorithms will produce significant detection errors, making it difficult to obtain accurate SSLR. Secondly, due to the need to detect the sea–sky line first and then determine the SSLR, the algorithm design of this approach is relatively more complex, resulting in greater computational complexity and poor real-time performance. Hence, achieving fast and accurate infrared SSLR detection in complex sea–sky environments remains a challenge.

According to different technical routes, the existing infrared sea–sky line detection algorithms can be roughly divided into traditional algorithms and algorithms based on deep learning. Traditional algorithms include algorithms based on edge features and algorithms based on image region features. According to the different edge information used, the detection algorithm based on edge features can be divided into linear fitting, gradient saliency, multiscale transformation, and transformation domain. The linear fitting algorithm first selects the candidate points and then linearly fits the candidate points to generate the sea–sky line. Random sample consensus (RANSAC) [5,6], least square algorithm [7], line segment detector (LSD) [8,9], and polynomial iterative fitting [10] are classical linear fitting algorithms. Gradient saliency is based on the fact that the SSLR has a higher gradient amplitude than other regions, and the characteristics of pixels in the gradient direction are basically the same for the detection of the sea–sky line. Prasad et al. [11] proposed a weighted multiscale transform detection algorithm called MuSCoWERT. In addition, Tomasz [12] also proposed a fast multiple-iteration algorithm combining multiscale transform. The transform domain algorithms transform the original image to the corresponding transform domain for processing and then inverse transform to obtain the processed result. The classical transform domain algorithms include the Hough transform [13], probability Hough transform (PHT) [14], Radon transform [15,16], and wavelet transform [17]. The detection algorithm based on image region features mainly uses image region features, such as color, brightness, texture, and so on, and semantic extraction is carried out by the threshold segmentation algorithm, where the sea–sky boundary can be used as the detection result of the sea–sky line. Otsu and the improved two-dimensional Otsu are classic algorithms of threshold segmentation. In addition, Song et al. [6] successfully segmented the sea–sky image and extracted the sea–sky line by using the k-means algorithm to divide the image into non-uniform segments and calculate the gradient value. Traditional algorithms are mainly based on hand-selected thresholds for image feature extraction and segmentation, which results in algorithms relying on specific datasets and lacking good generalization ability and high robustness. Meanwhile, traditional algorithms are limited by manually designed detectors and have limited room for improvement in detail extraction and image analysis. Therefore, they often have low accuracy in the face of strong interference and are difficult to adapt to complex infrared sea–sky environments.

In recent years, the algorithm based on machine learning has developed vigorously and has been widely used in the field of target detection, while sea–sky line detection is a special case of target detection, so there is gradual research on the application of machine learning algorithms in sea–sky line detection. Target detection algorithms consist of two main categories: one is based on the regional suggestion box, which mainly includes regional Convolutional Neural Network Fast R-CNN [18], and the other directly performs regression operations, which is represented by YOLO [19] series algorithms. Kumeechai et al. [13] moved the 8 × 8 pixel block in the whole image and used the BP neural network to determine whether the block contained the elements of the sea–sky line. Jeong et al. [20] used Convolutional Neural Networks (CNNs) to predict the probability that the pixels belong to the sea–sky line, to obtain the confidence map, and then carried out post-processing to obtain the sea–sky line. Mo et al. [21] proposed an SSL detection algorithm based on vertical grayscale distribution features (VGDFs). The VGDF image was used as input to train an SSL recognition model, which was used to distinguish edges with SSL features, and the Hough algorithm was used to fit the results. These algorithms provide a new solution for sea–sky line detection, but due to their complex network structure, the number of model parameters is very large. Meanwhile, they also have some post-processing operations after CNN, which further increases the running time of the algorithms, leading to poor real-time performance. In contrast, the proposed algorithm can reduce the number of network parameters to a large extent and improve the detection speed by designing a lightweight network structure, thus effectively meeting the requirements of real-time performance.

The above algorithms assume that the sea–sky line is straight, and the detected result is the straight line equation representing the sea–sky line; the SSLR is further extracted according to the straight line position. At present, there are few special algorithms for detecting the infrared SSLR. Yang et al. [22] tried to introduce the YOLO algorithm into SSLR detection and used the lightweight improved YOLOv5 network to detect the SSLR, which has high accuracy, but the datasets are small and the improvement in the robustness and generalization of the algorithm is limited. Jeong et al. [23] used the color attributes of sky and ocean scenes to detect the SSLR, which could not be carried out under the condition of infrared imaging. Praczyk [12] detected the SSLR in multiple iterations, but due to its ultimate goal of detecting straight lines, the detection results for the SSLR were relatively rough. In addition, the Transformer-based model has been gradually introduced into the field of image processing and target detection due to its powerful feature extraction and processing capabilities. Carion et al. [24] proposed an end-to-end target detection framework, DETR, which uses the Transformer’s self-attention mechanism to achieve target detection and classification, without the need for anchor frames or region suggestions commonly used in traditional networks. This type of method is also expected to be able to be applied in SSLR detection, providing more possibilities for realizing efficient detection.

In view of the challenges faced by the above algorithms in the field of infrared SSLR detection, a novel infrared SSLR detection algorithm based on deep learning is proposed, which directly provides SSLR, eliminating the need for initially detecting the sea–sky line. The algorithm aims to provide a high-accuracy and real-time solution for infrared SSLR detection in complex sea–sky environments. Firstly, an infrared sea–sky line region detection module (ISRDM) is proposed to improve the detection accuracy of SSLR by extracting the characteristic information of the SSLR in a targeted manner, which combines with strip pooling and cross-stage partial network (CSPNet). Secondly, the backbone network based on the MOB-SP module is designed, and the Detection Head Based on the Spatial-aware Attention Module (SAMHead) is proposed to improve the detection speed and further enhance the ability to perceive the SSLR.

2. Methods

In view of the challenges encountered in infrared SSLR detection, the various parts proposed in our algorithm and their principles for solving the above difficulties will be explained in detail in this section. The flow of the proposed algorithm is shown in Figure 1, mainly including the feature extraction network based on the ISRDM, the lightweight backbone network based on the MOB-SP module, and the SAMHead based on the spatial-aware attention mechanism.

2.1. Feature Extraction Network Based on ISRDM

The sea–sky line usually presents as a curve band with a small slope, whose width is several pixels and whose length is close to the width of the image. Therefore, the length–width ratio of the infrared SSLR is very unbalanced, the brightness of the detection region changes significantly, and there is a large distance between the associated pixels. In response to the characteristics of the SSLR, an infrared sea–sky line region detection module is proposed as the core foundational module of the feature extraction network, which combines the strip pooling [25] operation and the connection method of CSPNet [26]. It is named ISRDM and aims to solve the problem of detection performance loss caused by ignoring the imbalance in the target aspect ratio. Through this design, not only can the network improve its feature extraction ability for pixels associated with long-distance targets but it can also reduce unnecessary computational parameters to a certain extent, making the network structure more efficient. Therefore, ISRDM can make the network have a stronger detection ability for targets with an unbalanced length–width ratio.

The operation of the standard average pooling layer is shown in Figure 2, which uses a square convolution kernel (such as 3 × 3 or 5 × 5). After passing through the standard pool layer, the number of channels of the output characteristic map does not change, but the height and width will change. When facing the target with an unbalanced aspect ratio, the standard pooling layer will introduce irrelevant pixel information, resulting in a decline in the detection ability. Strip pooling uses a strip pool core, which is different from standard pooling. Strip pooling performs an average pool operation on one dimension of the characteristic graph, and the shape of the sampling area is H × 1 or 1 × W, rather than pooling on a square area like the standard pool layer. Therefore, the output of horizontal strip pooling and vertical strip pooling can be expressed by formulas:

y_{c}^{h} = \frac{1}{h} \sum_{0 \leq i \leq w} x_{c, i, j},

(1)

y_{c}^{w} = \frac{1}{w} \sum_{0 \leq j \leq h} x_{c, i, j},

(2)

where c, i, and j, respectively, represent the current channel, spatial vertical position, and spatial horizontal position of the feature graph; h is the long side of the horizontal strip pooling core; and w is the long side of the vertical strip pooling core. Its advantage is that it only pools a certain area in the horizontal or vertical direction, and irrelevant information in other directions will not be introduced. As mentioned earlier, the SSLR has the problem of an unbalanced aspect ratio. Compared with standard pooling, strip pooling can re-establish the connection between the associated pixels in the horizontal or vertical direction and better extract the spatial pixel information associated in the horizontal or vertical direction when facing the unbalanced aspect ratio target.

CSPNet is an improved network, and its main design purpose is to improve computational efficiency and realize multi-level gradient combination. It is structurally divided into two branches, and a sub-structure connected across stages is used to merge the two branches to achieve this requirement. Therefore, in order to save computation, ISRDM is built by combining the CSPNet fusion connection strategy and strip pooling. The specific structure is shown in Figure 3. The channels of the input feature map are divided into two groups with the same number. The first group of features is directly convoluted to obtain a part of the feature map. The other group of features first passes through the convolution block and then passes through strip pooling. The output can be obtained by splicing the two feature maps and passing through a 1 × 1 convolution block.

ISRDM uses strip pooling to extract features from SSLR, which uses a strip pool core with narrow sampling areas and only pools a certain area in the horizontal or vertical direction, effectively avoiding the introduction of irrelevant information. To address the issue of imbalanced aspect ratios in SSLR, strip pooling can establish long-range dependencies on pixels that are far apart in SSLR, better extracting spatial pixel information associated horizontally or vertically, thereby improving the feature extraction ability for targets with imbalanced aspect ratios. CSPNet establishes direct connections between different stages of a network, allowing for the direct flow of information between feature maps at different levels, facilitating feature reuse and information integration. Therefore, ISRDM is built by combining the CSPNet fusion connection strategy and strip pooling, which directly integrates low-level shallow features (usually with more spatial details) with high-level deep features (usually with stronger semantic information), thereby improving the expression ability of features. Meanwhile, this approach can also reduce redundant feature computation through cross-stage connections, as some features can be shared by multiple layers, avoiding the need for independent computation for each layer.

In the feature extraction network part, the underlying features are input into the feature extraction network. First, it will pass through a spatial pyramid pooling (SPP) layer. In the SPP layer, the number of channels will be adjusted through a standard convolution block CBL, and then the maximum pooling layer with different sizes of pooling cores will be used. Then, the output results and the feature map** of the previous layer will be spliced together to form a multiscale feature fusion layer, and the features will be integrated through a CBL and input into the subsequent feature extraction network. A Path Aggregation Network [27] (PANet) is selected as the connection of the feature extraction network, and each layer has a similar structure: it is composed of shallow ISRDM, CBL, and deep ISRDM. The characteristic of PANet is that it has top-down connection branches and performs the splicing operation on the feature map of the same size, which is similar to reusing the shallow features and performing the fusion matching processing with the deep features.

2.2. Backbone Network Based on MOB-SP Module

For the detection task of infrared SSLR, the detection region is relatively single. We redesigned the lightweight backbone network based on MobileNet-v2 [28] and combined with a strip pooling operation, which simplifies some repeated operations, such as convolution, pooling, and downsampling, avoids the high similarity and redundancy of information between feature maps, effectively reduces the parameter calculation of the model, speeds up the detection speed, and saves computational resources.

The backbone network structure designed in this paper is shown in Figure 1, which is stacked by a series of MOB-SP modules with a stride of 1 or 2. The core structure of MOB-SP includes the inverted residual connection and linear bottleneck layer, and its structure is shown in Figure 4. The inverse residual module uses the idea of ResNet [29] for reference. First, 1 × 1 convolution is used to achieve dimension elevation, then 3 × 3 depthwise separable convolution (DW convolution) is used to extract features, and, finally, 1 × 1 convolution is used to reduce dimensions. DW convolution is the core feature extraction module of MobileNet-v2. It is a form of deconvolution, which decomposes the standard convolution into a deep convolution and a point-by-point convolution. Splitting the correlation between the spatial dimension and the channel dimension reduces the number of parameters required for convolution calculation and reduces the time complexity and spatial complexity of the convolution layer. MOB-SP adds a strip pooling operation after DW convolution to improve the feature extraction ability of the inverse residual module in the SSLR. The linear bottleneck layer aims to keep as much useful information as possible in the low dimension of the network. Using the ReLU activation transform in the low-dimension space will lose a lot of information, so the linear bottleneck layer uses linear transform instead of the original nonlinear activation transform, to prevent the nonlinear destruction of too much feature information.

2.3. SAMHead

The detection head directly outputs the relevant prediction information in the inference stage by using the previously extracted feature information. In order to improve the recognition ability of the detector head to the infrared SSLR, the detector head based on the spatial-aware attention module (SAM) is proposed to focus on the discrimination ability of different spatial positions, which is named SAMHead. The core operation of SAMHead is deformable convolution (DConv). The difference between deformable convolution and standard convolution is that an offset is added at the sampling position. The offset is obtained through a convolution layer, which can be expressed by the following formula:

y (p_{n}) = \sum_{p_{0} \in k} x (p_{n} + p_{i} + ∆ p_{i}) \cdot w (p_{i}),

(3)

where x is the data, y is the output of the deformable convolution calculation, w is the convolution kernel, k is the size of w, p_n is the position of the output, p_i is the position of w, and

∆ p_{i}

is the offset of each point in the convolution kernel. Assuming that the convolution kernel size is 3 × 3, deformable convolution adds an offset to it to calculate each point in the 3 × 3 region separately Then, change the selection of each point in the 3 × 3 region, and extract some points that may have richer features, so as to further improve the detection effect.

The structure of SAM is shown in Figure 5, which mainly includes three steps: firstly, the information coming from the feature extraction network is decoupled, then the deformable convolution is used for sparse sampling, and, finally, the feature information of each level is aggregated at the same spatial location. By adding bias to the convolution operation, SAM can enhance the deformation representation ability and significantly improve the expression ability of SAMHead, without increasing the amount of calculation.

SAMHead processes the feature maps from the feature extraction network through two SAMs and then outputs prediction information. SAMHead uses deformable convolution to adjust the sampling position of the convolutional kernel, enabling the network to obtain richer contextual information and improve the feature expression ability. Meanwhile, SAMHead allows the convolutional kernel to adaptively adjust based on the geometric shape of the input feature map, which enables the network to better capture the contour of the target and improve the localization accuracy of the bounding box.

Since the SSLR is a rectangle with a length of image width and different widths, and its shape is similar, in order to improve the operation speed of the model, we used a single SAMHead to detect the SSLR. SAMHead presets an anchor box first and uses the K-means adaptive algorithm to redefine the anchor box according to the size of targets in different datasets, to better adapt to the size of targets to be detected in specific datasets. Each channel can be defined by the following formula:

c h a n n e l s = [(x, y), (x, h), o b j],

(4)

where (x, y), (w, h), and obj, respectively, represent the prediction coordinates of the target center, the width and height of the detection box, and the target confidence. For the convenience of calculation, x, y, w, and h are normalized to 0-1; the value of obj is between 0 and 1. The greater the value of obj, the higher the probability that the object belongs to a specific class.

2.4. Loss Function

The loss function of the proposed algorithm consists of two parts: location loss function and confidence loss function. The location loss adopts the CIoU loss function. The Intersection of Union (IoU) is a general evaluation metric in visual tasks, which is schematically shown in Figure 6. IoU can be calculated by the following formula:

I o U = \frac{(h_{g t} \times w_{g t}) \cap (h \times w)}{(h_{g t} \times w_{g t}) \cup (h \times w)},

(5)

where h_gt, w_gt are the height and width of the true annotation box, and h, w are the height and width of the prediction box. CIoU and the positioning loss function L_p can be calculated by the following formulas:

C I o U = I o U - \frac{d^{2} (O_{g t}, O)}{c^{2}} - α f,

(6)

L_{p} = 1 - C I o U,

(7)

where O_gt and O are the centers of the true annotation box and the prediction box, d is the geometric distance between O_gt and O, c is the diagonal length of the minimum bounding rectangle, α is a weight, and f is the similarity of the aspect ratio, which is calculated by the following formula:

f = \frac{4}{π^{2}} {(a r c t a n \frac{w_{g t}}{h_{g t}} - a r c t a n \frac{w}{h})}^{2},

(8)

Confidence is used to define the probability of predicting the target. The loss function of this part can be divided into two parts: positive sample loss and negative sample loss. The confidence loss function L_true of the positive sample can be expressed by the following formula:

L_{t r u e} = \sum_{i}^{s \times s} \sum_{j}^{a n c h o r} I_{i j}^{o b j} [y_{i} \times l o g (p_{i}) + (1 - y_{i}) \times l o g ({1 - p}_{i})],

(9)

where p_i represents the prediction confidence, y_i represents the positive and negative sample categories of sample i (if it is a positive sample, the CIoU value calculated from the sample will be used as the tag value; otherwise, it is 0), s represents the space size of the feature graph, and anchor is the number of anchor boxes. Similarly, the confidence loss function L_false of negative samples can be expressed by the following formula:

L_{f a l s e} = \sum_{i}^{s \times s} \sum_{j}^{a n c h o r} I_{i j}^{n o o b j} [y_{i} \times l o g (p_{i}) + (1 - y_{i}) \times l o g ({1 - p}_{i})],

(10)

The parameters are basically the same as Formula (8), and y_i is the positive and negative sample category of sample i, but if it is a negative sample, it is 1, and if it is a positive sample, it is 0. If the sample is negative, the pre-confidence label is 1. After the above process, the total confidence loss function can be obtained as shown in the formula:

L_{o b j} = L_{t r u e} + L_{f a l s e},

(11)

where L_true and L_false are the positive sample loss and negative sample loss of confidence, respectively. To sum up, the loss function of the whole network can be obtained as shown in the formula:

L = α L_{p} + {β L}_{o b j},

(12)

where L_p and L_obj are the location loss function and confidence loss function, respectively; α and β are weight parameters.

3. Experiments and Discussion

3.1. Datasets

Three datasets are used to train and verify the proposed algorithm. They are the InfML-HDD [30] dataset built by our laboratory and two public datasets: the Singapore Maritime dataset [31] (Singapore-NR) and the Mar-DCT [32] dataset. At present, there are few publicly available datasets for infrared SSLR detection. Therefore, our laboratory developed the InfML-HDD dataset, which is shot near the coast and contains over 6000 infrared sea–sky images, covering daytime and nighttime environments; various meteorological conditions, such as heavy fog, dense clouds, and rain; as well as typical scenes, such as coastal mountains, sea vessels, and target occlusion near the sea–sky line. Therefore, the performance of the infrared SSLR detection algorithm can be effectively verified through the InfML-HDD dataset. Singapore-NR contains 30 infrared videos, from which 13,162 frames are extracted. A subset of the Mar-DCT dataset can be used for SSLR detection research [33], from which 7374 infrared images are extracted. To facilitate the evaluation, the images of the three datasets were uniformly resized to 384 × 288 pixels and relabeled using Labelme3.16.2 software. The specific details of the three datasets are shown in Table 1.

3.2. Training Details

The experiment used stochastic gradient descent to train 300 epochs in the training phase, and the minimum batch size is 16. The initial learning rate is 0.01, using hardwise as the activation function, which can make the loss coverage faster. In addition, the cosine learning rate attenuation is used to improve the training progress. The experimental environment is Pytorch1.6.0 and CUDA9.2 running on Linux, and the hardware is GeForce RTX3080 and Intel i9-12900k. Synchronous multiple training or normalization technology is not used in the training process. We divided the three datasets into a training set and testing set in a ratio of 7:3. The training set is used to train the proposed algorithm, and the testing set is used for performance testing and metric evaluation on the trained model.

3.3. Evaluation Metric

The detection results of the proposed algorithm are evaluated from the aspects of accuracy, speed, and model calculation. The accuracy Acc can be calculated using the following formula:

A c c = \frac{I_{s u c c e s s}}{I_{t o t a l}},

(13)

where I_success is the number of images of the SSLR that are successfully detected, and I_total is the total number of images. The definition of the SSLR successfully detected in this paper is shown in Figure 7. Specifically, the successfully detected detection box should completely include the sea–sky line, and the detection box should not exceed the rectangular range with a width of ±10 pixels at the two end points of the sea–sky line and the length of the image width.

The speed is evaluated using detection frames per second (FPS) and inference time of single image (ITSI). At the same time, because there is only one kind of detection target in this paper, that is, SSLR, the average precision (AP) and model parameters are introduced to evaluate the performance of the proposed algorithm. The AP is the area approximation under the precision–recall curve (i.e., PR curve). It is a measure of the recall rate and accuracy rate under different IoU thresholds. Its calculation method can be expressed by the following formula:

A P = \int_{0}^{1} P (R) d R,

(14)

where P is the accuracy rate, indicating the proportion of the number of positive cases correctly classified to the number of positive cases classified, and R is the recall rate, indicating the proportion of the number of positive cases correctly classified to the actual number of positive cases. The model parameters reflect the complexity of the model, and a larger number of parameters means that the model requires more computational resources for training and inference, resulting in a higher computational load.

3.4. Ablation Experiments on InfML-HDD

In order to explore the impact of the proposed ISRDM, MOB-SP-based backbone network, and SAMHead on the overall performance of the algorithm, we conducted ablation experiments on the InfML-HDD dataset, and the results are shown in Table 2. In the table, √ indicates that the module was added, and the baseline is the YOLOv5s network. All parameters in the experiment are the same except for different module selections.

As can be seen from Table 2, adding SAMHead brought significant performance improvements, so subsequent models added this setting. Compared with SAMHead, ISRDM + SAMHead increased by 7.6% under the AP metric and 1.2% under the Acc metric. The introduction of ISRDM significantly improved the detection accuracy and AP value, which shows that it has great detection ability for targets with unbalanced length–width ratios like the SSLR, but the cost is that the detection speed is reduced to a certain extent. The MOB-SP-based backbone network significantly improves the detection speed to 241.1 frames/s, without sacrificing the detection performance, which can form an effective complementary combination with ISRDM. After the introduction of all three modules, the AP value and Acc value reached the highest values of 82.4% and 99.3%, while the values of FPS and params were also basically equivalent to the highest values. In general, under the comprehensive influence of each module, the proposed algorithm achieved a great balance among the evaluation metrics, and each metric reached the best result.

3.5. Comparison and Analysis

Four classical sea–sky line detection traditional algorithms and the most advanced one-stage target detection algorithms based on CNN are selected for comparison. The four traditional algorithms are the algorithm based on the line segment detector, LSD [8], the algorithm using multiscale transform and Radon transform, MuSCoWERT [11], the algorithm based on edge detection and Hough transform, Edge–Hough [34], and the RANSAC algorithm [35]. CNN-based algorithms include the YOLOv5 series [36], YOLOv7 series [37], and the lightweight improved YOLOv5 algorithm, YOLOv5-MOBv2 [22].

What the traditional algorithm detects is a straight line representing the sea–sky line. In order to uniformly evaluate the performance of the traditional algorithms and the proposed algorithm, we convert the results detected by the traditional algorithm. A schematic diagram of the conversion process is shown in Figure 8. Specifically, for the detected straight line, add five pixels to its upper endpoint and subtract five pixels from its lower endpoint; use this length as the width, and use the width of the image as the length to obtain the detection box representing the SSLR, and then use this rectangular detection box as the detection result of the traditional algorithms for subsequent metric calculation.

Various algorithms were evaluated on the same test sets of three datasets. A comparison of their detection results is shown in Table 3 and Figure 9. Some typical detection results of the proposed algorithm on three datasets are shown in Figure 10 and Figure 11. In Figure 10, in column order, each column from left to right represents an environmental category. In the first column of images, the proposed algorithm can effectively distinguish between SSL and strong sea waves while eliminating the interference of thick clouds in the sky, which traditional algorithms often find difficult to overcome. In the second and third columns, the wake of container ships, coastal mountains, and ships near SSL all increase the probability of traditional algorithms extracting incorrect feature points. The proposed algorithm can effectively deal with these disturbances through its powerful feature extraction capability. In the fourth and fifth columns, there are waveless ocean environments with spectral reflection and dense fog strong wave environments, respectively. In these scenarios, the characteristics of the SSL are greatly blurred, so traditional algorithms have poor performance, and CNN-based algorithms cannot easily and effectively capture the characteristics of the SSL. In contrast, the proposed algorithm can effectively identify the characteristics of the SSL and achieve correct detection.

Compared with the traditional algorithms, the proposed algorithm shows excellent advantages in detection accuracy and AP value. As can be seen from Figure 9, the proposed algorithm achieves the highest AP values on all three datasets tested, which are 28.1%, 28.1%, and 26.9% higher than the traditional algorithm with the highest AP values, respectively. The traditional detection algorithms are vulnerable to the interference of natural conditions such as waves, distant mountains, and thick clouds in the sea–sky environment. For example, in the Mar-DCT and Singapore-NR datasets, there is a large number of large ships and coastal mountains near the sea–sky line, which will increase the probability of false identification. In contrast, the proposed algorithm can effectively overcome these problems. Even in the heavy fog environment of the InfML-HDD dataset, it still has good detection results for SSLR, while the traditional algorithm can hardly detect the SSLR correctly in this environment. At the same time, the real-time performance of the proposed algorithm is also satisfactory, reaching 237.8 frames/s, which is not as good as Edge–Hough, but its detection effect is far better than Edge–Hough.

Compared with the CNN-based detection algorithms, the proposed algorithm also achieves the best detection accuracy and stability. Compared with other algorithms, the AP value of the proposed algorithm is the highest on the InfML-HDD dataset, reaching 82.4. Thanks to the design of ISRDM, its accuracy is also higher than that of YOLOv5m and YOLOv7, reaching 99.3%. The incorporation of the lightweight backbone network based on the MOB-SP module and the introduction of the SAMHead result in a minimal number of parameters and the fastest detection speed. Its number of parameters is only slightly higher than that of YOLOv5-MOBv2, but its detection speed reaches 237.8 frames/s, which is 33.9 frames/s higher than that of YOLOv5-MOBv2. Therefore, the proposed algorithm not only achieves higher detection accuracy but also has faster detection speed, and has the best performance under different stringency evaluations as further shown in Figure 12. In the two public datasets, there are many large ships near the sea–sky line in the Singapore-NR dataset, and the coastal mountains in the Mar-DCT dataset are continuous. The proposed algorithm can effectively overcome these interferences and achieve excellent detection results, which verifies the robustness and generalization of the proposed algorithm.

4. Conclusions

Based on deep learning principles, a novel algorithm is proposed for infrared SSLR detection in complex sea–sky environments. The algorithm is specifically tailored to capture the shape features of SSLRs while prioritizing detection speed. Initially, a feature extraction network based on ISRDM is proposed to effectively identify SSLR targets with unbalanced aspect ratios. Subsequently, a lightweight backbone network utilizing the MOB-SP module is designed to significantly enhance the detection speed. Lastly, a SAMHead based on the spatial-aware attention mechanism is proposed to boost the spatial perception capabilities of SSLR while reducing model inference time. Extensive experiments conducted on three datasets, comprising over 26,000 images, demonstrate that the proposed algorithm achieves superior detection accuracy, faster speed, and enhanced robustness compared to state-of-the-art algorithms. The algorithm represents a significant advancement in infrared SSLR detection, which can be applied to the infrared early warning system for ocean monitoring, target detection and tracking in complex marine environments, and has broad application prospects.

Author Contributions

Conceptualization, Y.W. and J.F.; methodology, Y.W.; software, Y.W.; formal analysis, Y.W.; investigation, Y.W.; data curation, J.F.; writing—original draft preparation, Y.W.; writing—review and editing, Y.W., J.Z. and F.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Aeronautical Science Foundation of China (grant number 20230023051001) and the Academic Excellence Foundation of BUAA for Ph.D. Students.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data supporting this study’s findings are available from the authors upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, L.; Fan, S.; Liu, Y.; Li, Y.; Fei, C.; Liu, J.; Liu, B.; Dong, Y.; Liu, Z.; Zhao, X. A Review of Methods for Ship Detection with Electro-Optical Images in Marine Environments. J. Mar. Sci. Eng. 2021, 9, 1408. [Google Scholar] [CrossRef]
Yu, Q.; Su, Y. Local Defogging Algorithm for the First Frame Image of Unmanned Surface Vehicles Based on a Radar-Photoelectric System. J. Mar. Sci. Eng. 2022, 10, 969. [Google Scholar] [CrossRef]
Zheng, J.; Chen, J.; Wu, X.; Liang, H.; Zheng, Z.; Zhu, C.; Liu, Y.; Sun, C.; Wang, C.; He, D. Analysis and Compensation of Installation Perpendicularity Error in Unmanned Surface Vehicle Electro-Optical Devices by Using Sea–Sky Line Images. J. Mar. Sci. Eng. 2023, 11, 863. [Google Scholar] [CrossRef]
Liu, J.; Li, H.; Liu, J.; **e, S.; Luo, J. Real-Time Monocular Obstacle Detection Based on Horizon Line and Saliency Estimation for Unmanned Surface Vehicles. Mob. Netw. Appl. 2021, 26, 1372–1385. [Google Scholar] [CrossRef]
Kim, S.; Lee, J. Small Infrared Target Detection by Region-Adaptive Clutter Rejection for Sea-Based Infrared Search and Track. Sensors 2014, 14, 13210–13242. [Google Scholar] [CrossRef]
Song, H.; Ren, H.; Song, Y.; Chang, S.; Zhao, Z. A Sea–Sky Line Detection Method Based on the RANSAC Algorithm in the Background of Infrared Sea–Land–Sky Images. J. Russ. Laser Res. 2021, 42, 318–327. [Google Scholar] [CrossRef]
Longstaff, F.A.; Schwartz, E.S. Valuing American Options by Simulation: A Simple Least-Squares Approach. Rev. Financ. Stud. 2001, 14, 113–147. [Google Scholar] [CrossRef]
Dong, L.; Ma, D.; Ma, D.; Xu, W. Fast Infrared Horizon Detection Algorithm Based on Gradient Directional Filtration. J. Opt. Soc. Am. A 2020, 37, 1795–1805. [Google Scholar] [CrossRef]
Grompone von Gioi, R.; Jakubowicz, J.; Morel, J.-M.; Randall, G. LSD: A Fast Line Segment Detector with a False Detection Control. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 722–732. [Google Scholar] [CrossRef]
Lin, C.; Chen, W.; Zhou, H. Multi-Visual Feature Saliency Detection for Sea-Surface Targets through Improved Sea-Sky-Line Detection. J. Mar. Sci. Eng. 2020, 8, 799. [Google Scholar] [CrossRef]
Prasad, D.K.; Rajan, D.; Rachmawati, L.; Rajabally, E.; Quek, C. MuSCoWERT: Multi-Scale Consistence of Weighted Edge Radon Transform for Horizon Detection in Maritime Images. J. Opt. Soc. Am. A 2016, 33, 2491–2500. [Google Scholar] [CrossRef]
Praczyk, T. A Quick Algorithm for Horizon Line Detection in Marine Images. J. Mar. Sci. Technol. 2018, 23, 164–177. [Google Scholar] [CrossRef]
Kumeechai, P.; Jiriwibhakorn, S. Effective Horizon Detection on Complex Seas Using Back Propagation Neural Network. In Proceedings of the 2019 16th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), Pattaya, Thailand, 10–13 July 2019; pp. 790–793. [Google Scholar]
Fu, J.; Zhao, J.; Li, F. Infrared Sea-Sky Line Detection Utilizing Self-Adaptive Laplacian of Gaussian Filter and Visual-Saliency-Based Probabilistic Hough Transform. IEEE Geosci. Remote Sens. Lett. 2022, 19, 7002605. [Google Scholar] [CrossRef]
Özertem, K.A. A Fast Automatic Target Detection Method for Detecting Ships in Infrared Scenes. In Proceedings of the Automatic Target Recognition XXVI, Baltimore, MD, USA, 18–19 April 2016; Volume 9844, pp. 16–29. [Google Scholar]
Li, F.; Zhang, J.; Sun, W.; **, J.; Li, L.; Dai, Y. Sea–Sky Line Detection Using Gray Variation Differences in the Time Domain for Unmanned Surface Vehicles. Signal Image Video Process. 2021, 15, 139–146. [Google Scholar] [CrossRef]
Kong, X.; Liu, L.; Qian, Y.; Cui, M. Automatic Detection of Sea-Sky Horizon Line and Small Targets in Maritime Infrared Imagery. Infrared Phys. Technol. 2016, 76, 185–199. [Google Scholar] [CrossRef]
Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Jeong, C.; Yang, H.S.; Moon, K. A Novel Approach for Detecting the Horizon Using a Convolutional Neural Network and Multi-Scale Edge Detection. Multidimens. Syst. Signal Process. 2019, 30, 1187–1204. [Google Scholar] [CrossRef]
Mo, W.; Pei, J. Sea-Sky Line Detection in the Infrared Image Based on the Vertical Grayscale Distribution Feature. Vis. Comput. 2023, 39, 1915–1927. [Google Scholar] [CrossRef]
Yang, L.; Zhang, P.; Huang, L.; Wu, L. Sea-Sky-Line Detection Based on Improved YOLOv5 Algorithm. In Proceedings of the 2021 IEEE 2nd International Conference on Information Technology, Big Data and Artificial Intelligence (ICIBA), Chongqing, China, 17–19 December 2021; Volume 2, pp. 688–694. [Google Scholar]
Jeong, C.Y.; Yang, H.S.; Moon, K. Fast Horizon Detection in Maritime Images Using Region-of-Interest. Int. J. Distrib. Sens. Netw. 2018, 14, 1550147718790753. [Google Scholar] [CrossRef]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020. [Google Scholar]
Hou, Q.; Zhang, L.; Cheng, M.-M.; Feng, J. Strip Pooling: Rethinking Spatial Pooling for Scene Parsing. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 4002–4011. [Google Scholar]
Wang, C.-Y.; Mark Liao, H.-Y.; Wu, Y.-H.; Chen, P.-Y.; Hsieh, J.-W.; Yeh, I.-H. CSPNet: A New Backbone That Can Enhance Learning Capability of CNN. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; pp. 1571–1580. [Google Scholar]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the 2018, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Fu, J.; Li, F.; Zhao, J. Real-Time Infrared Horizon Detection in Maritime and Land Environments Based on Hyper-Laplace Filter and Convolutional Neural Network. IEEE Trans. Instrum. Meas. 2023, 72, 5016513. [Google Scholar] [CrossRef]
Prasad, D.K.; Rajan, D.; Rachmawati, L.; Rajabally, E.; Quek, C. Video Processing From Electro-Optical Sensors for Object Detection and Tracking in a Maritime Environment: A Survey. IEEE Trans. Intell. Transp. Syst. 2017, 18, 1993–2016. [Google Scholar] [CrossRef]
Bloisi, D.D.; Iocchi, L.; Pennisi, A.; Tombolini, L. ARGOS-Venice Boat Classification. In Proceedings of the 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Karlsruhe, Germany, 25–28 August 2015; pp. 1–6. [Google Scholar]
Hashmani, M.A.; Umair, M. A Novel Visual-Range Sea Image Dataset for Sea Horizon Line Detection in Changing Maritime Scenes. J. Mar. Sci. Eng. 2022, 10, 193. [Google Scholar] [CrossRef]
**n, Z.; Kong, S.; Wu, Y.; Zhan, G.; Yu, J. A Hierarchical Stabilization Control Method for a Three-Axis Gimbal Based on Sea–Sky-Line Detection. Sensors 2022, 22, 2587. [Google Scholar] [CrossRef] [PubMed]
Zhu, L.; Liu, J.; Chen, J. Detection of Sea Surface Obstacle Based on Super-Pixel Probabilistic Graphical Model and Sea-Sky-Line. In Proceedings of the Eleventh International Conference on Machine Vision (ICMV 2018), Munich, Germany, 1–3 November 2018; Volume 11041, pp. 516–525. [Google Scholar]
Yolov5. Available online: https://github.com/ultralytics/yolov5 (accessed on 15 January 2024).
Yolov7. Available online: https://github.com/WongKinYiu/yolov7 (accessed on 21 December 2023).

Figure 1. A flowchart of the proposed algorithm. The model can be mainly divided into three parts: the backbone network based on the MOB-SP module is in the red dashed box, the feature extraction network based on ISRDM is in the blue dashed box, and the SAMHead is in the green dashed box, which is used to output prediction information. In addition, the composition details of some basic modules are shown in the yellow box on the left.

Figure 2. Comparison between strip pooling and standard pooling. (a) Vertical strip pooling, (b) horizontal strip pooling, (c) standard square pooling.

Figure 3. The structure of ISRDM is on the left, consisting of two branches: the feature map of one branch is directly convolved, while the feature map of another branch is sequentially convolved through convolutional blocks, stripe pooling, and convolution. The implementation method of strip pooling is located in the dashed box on the right, which includes two branches to perform horizontal and vertical strip pooling on feature vectors. Specifically, the two pooling kernel sizes are set to 1 × 4 and 4 × 1, respectively.

Figure 4. Structure of MOB-SP module. (a) MOB-SP module when stride = 2. (b) MOB-SP module when stride = 1 which contains a shortcut connection.

Figure 5. Structure of SAM. The feature information of each level is first sparsely sampled using DConv and then aggregated at the same spatial position.

Figure 6. Schematic diagram of CIoU.

Figure 7. Example of successfully detecting the sea–sky line region.

Figure 8. Example of converting sea–sky line detected by traditional algorithms into detection region.

Figure 9. AP-Speed scatter plots of various algorithms on three datasets, with the horizontal axis pointing to the left indicating faster detection speed and the vertical axis pointing upwards indicating higher AP values. From the plots, it can be observed that the proposed algorithm achieved an excellent balance between accuracy and speed on three datasets. (a) InfML-HDD. (b) Singapore-NR. (c) Mar-DCT.

Figure 10. Typical detection results of the proposed algorithm on the InfML-HDD datasets. In column order, each column from left to right represents a category. (a) Thick clouds and strong waves have similar properties to SSL, which pose significant interference to traditional algorithms. (b) The wake of container ships is a significant disturbance. (c) Coastal mountains and ships located near the SSLR can cause interference in detection. (d) A waveless marine environment with spectral reflection and breakwaters. (e) In dense fog environments, waves have more prominent characteristics than SSL, and traditional algorithms are almost ineffective in this environment, some CNN-based algorithms also have poor performance, while the proposed algorithms achieve good detection results.

Figure 11. Typical detection results of the proposed algorithm on two public datasets. The first row is the Singapore-NR dataset, and the second row is the Mar-DCT dataset. The frames in the Singapore-NR dataset have interference from large ships near the sea–sky line, while in Mar-DCT, they are mainly from coastal mountain ranges. The proposed algorithm has achieved satisfactory detection results on both datasets.

Figure 12. AP50 and AP75 for different algorithms under three datasets. (a) InfML-HDD. (b) Singapore-NR. (c) Mar-DCT. AP50 and AP75 are the average precision obtained with IoU thresholds of 0.5 and 0.75 respectively. As can be seen from the figure, the proposed algorithm achieves the highest values on both AP50 and AP75, further indicating that the proposed algorithm has the best performance under different stringency evaluations.

Table 1. Comparison of three datasets.

Dataset	InfML-HDD	Singapore-NR	Mar-DCT
Image Source	Image	Video	Video
Frames	6055	13,162	7374
Image size	384 × 288	1920 × 1080	704 × 576
Image type	Infrared range	Near-infrared range	Infrared range
Shooting	Moving	Static	Static
Time	Day/Night	Day	Night
Image characteristics	Cloud edges and waves; Coastal mountains; Heavy fog and rain; Specular reflection	Large container ships near the sea–sky line; Sea waves and reflections	Coastal mountains; Low resolution and low contrast

Table 2. Results of ablation experiments on the InfML-HDD dataset, where bold represents the best results.

Modules			AP (%)	Params (M)	Acc (%)	FPS
MOB-SP	ISRDM	SAMHead	AP (%)	Params (M)	Acc (%)	FPS
			65.6	7.5	96.2	166.3
		√	73.6	7.2	97.4	203.6
√		√	73.2	6.0	97.2	241.1
	√	√	81.2	7.6	98.6	186.7
√	√	√	82.4	6.2	99.3	237.8

Table 3. Comparison of the performance of the various algorithms on the three datasets, where bold represents the best results.

Algorithm	InfML-HDD		Singapore-NR		Mar-DCT		Params (M)	FPS	ITSI (ms)
Algorithm	AP (%)	Acc (%)	AP (%)	Acc (%)	AP (%)	Acc (%)	Params (M)	FPS	ITSI (ms)
LSD [8]	50.8	80.1	51.9	81.2	49.6	78.6	-	179.7	5.56
Edge-Hough [34]	47.3	76.5	45.4	74.3	46.9	75.2	-	320.9	3.12
RANSAC [35]	40.1	71.1	36.3	65.8	37.6	67.3	-	30.3	33.0
MuSCoWERT [11]	54.3	84.9	52.1	82.3	52.4	82.6	-	103.1	9.70
YOLOv5s [36]	65.6	96.2	66.4	96.4	68.1	95.6	7.5	166.3	6.01
YOLOv5m [36]	71.9	97.3	70.7	96.6	71.3	96.1	21.2	148.2	6.75
YOLOv7-tiny [37]	62.9	95.1	64.5	95.0	60.9	94.3	6.5	191.7	5.22
YOLOv7 [37]	76.8	97.8	77.6	97.1	78.3	97.2	36.9	115.5	8.66
YOLOv5-MOBv2 [22]	60.7	94.7	64.7	95.3	59.7	94.1	5.8	203.9	4.90
Proposed	82.4	99.3	80.2	99.1	79.3	98.8	6.2	237.8	4.21

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Y.; Li, F.; Zhao, J.; Fu, J. Real-Time Infrared Sea–Sky Line Region Detection in Complex Environment Based on Deep Learning. J. Mar. Sci. Eng. 2024, 12, 1092. https://doi.org/10.3390/jmse12071092

AMA Style

Wang Y, Li F, Zhao J, Fu J. Real-Time Infrared Sea–Sky Line Region Detection in Complex Environment Based on Deep Learning. Journal of Marine Science and Engineering. 2024; 12(7):1092. https://doi.org/10.3390/jmse12071092

Chicago/Turabian Style

Wang, Yongfei, Fan Li, Jianhui Zhao, and Jian Fu. 2024. "Real-Time Infrared Sea–Sky Line Region Detection in Complex Environment Based on Deep Learning" Journal of Marine Science and Engineering 12, no. 7: 1092. https://doi.org/10.3390/jmse12071092

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Real-Time Infrared Sea–Sky Line Region Detection in Complex Environment Based on Deep Learning

Abstract

1. Introduction

2. Methods

2.1. Feature Extraction Network Based on ISRDM

2.2. Backbone Network Based on MOB-SP Module

2.3. SAMHead

2.4. Loss Function

3. Experiments and Discussion

3.1. Datasets

3.2. Training Details

3.3. Evaluation Metric

3.4. Ablation Experiments on InfML-HDD

3.5. Comparison and Analysis

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI