The purpose of ship classification is to identify various types of ships as accurately as possible, which is of great significance for monitoring the rights and interests of maritime traffic and improving coastal defense early warnings. With the improvement of all kinds of imaging technology, the ship classification method of imaging technology has become the mainstream method of ship target classification and recognition. From the data, the ship image can be roughly divided into the radar image, satellite remote-sensing image, infrared image and visible light image. The most widely used radar imaging technology is synthetic aperture radar (SAR). The advantages of SAR imaging are a wide monitoring range, short observation period and all-weather monitoring. On the other hand, the price of using radar is being vulnerable to other electromagnetic interference. Moreover, the captured ship targets only account for a few parts of the whole image. The classification method for radar images is only suitable for larger targets. The classification effect for a boat with a long distance is better than that for optical remote-sensing satellite imaging, which is easily affected by changes in ocean weather and light, making it hard to do real-time monitoring for a long time. Infrared imaging can provide rich target information and target backgrounds obtained at night or in the case of insufficient light, and it has a strong anti-jamming ability. However, infrared imaging is affected by the weather, temperature and other factors. On the sea surface, the influence of waves, clouds and other interference will greatly affect the accuracy of the image. Thus, infrared imaging cannot provide rich color information if the image quality is low. The visible light image contains gray information for multiple bands, and the image quality improves steadily, which makes the target features easier to be found and extracted. For the problem of ship classification, the actual system can get a variety of images. This can be solved using fusion methods that can produce high-resolution multispectral images from a high-resolution panchromatic image and low-resolution multispectral images [
1,
2].
Several traditional algorithms were suggested by Rainey et al. [
3] for extraction and identification of the ship image. These include LBP, hog and sift and also classifiers such as the nearest neighbor algorithm and SVM. Arguedas [
4] used LBP features to remove texture features from ship images to classify ships. Parameswarans et al. [
5] used the bag of words model in classifying texts and used the bag of words model in ship classification. A two-stage ship recognition technique based on structural features was proposed. The method can effectively distinguish ships and cargo ships according to the ship image. Leclerc et al. [
6] proposed a commercial ship classification algorithm based on structural feature analysis which can distinguish the features of density estimation, the position of the ship’s integral principal axis and the proportion of integral quantity of the left, middle and right parts. Through a synchronous experiment in the East China Sea experimental area, it was proven that the average classification accuracy of COSMO-SkyMed image quotient method was 89.94%. Liang **xiong et al. [
7] suggested the use of a BP neural network to classify six infrared images. After pre-processing the images, the Hu invariant moment, edge image and perimeter area ratio were selected, and the accuracy of the four-layer BP neural network was about 84%. The traditional ship image classification method is based on the expert system, which can recognize the ship according to the ship type and lacks good generalization performance. Therefore, ship classification accuracy needs to be enhanced. With the rapid development of edge metering and word learning, convolutional neural networks have become a research hotspot in the field of image classification. Rainey et al. [
8] created and acquired a convolutional neural network to recognize ships from satellite images and achieved good results. Liu et al. [
9] proposed an improved residual network to detect and classify remote-sensing ship images which is prone to overfitting due to a small dataset. Khellal et al. [
10] proposed using an extreme learning network to recognize a ship’s infrared image. This method is suitable for infrared recognition systems. After using extreme learning features, it also needs to use extreme learning machines based on integration for classification. Therefore, this method proposes a CNN model with multi-resolution input. The performance of the proposed method was evaluated with TerraSAR-X images which were composed of five maritime categories. The classification effect was different, but how the change in the image resolution affected the internal activation of the CNN was still unclear from the test. Chen ** [
18] was introduced, the structure of which is shown in
Figure 1.
In the residual structure shown in
Figure 1, if the input is x, the weight layer is a 3 × 3 convolutional layer and the map** learned through multiple multilayer networks containing parameters in the structure is
f(
x), then the output of the residual structure is
f(
x) +
x. In the network, assuming that the
mth through
Mth layers are composed of such multiple continuous residual structures, the forward propagation process of this part of the network is shown in Equation (1):
where
is the output of these continuous residual structures,
is the input of the first layer,
is the parameter of the
ith layer from the
mth layer to the
Mth layer and
is the input of the
ith layer.
When performing backpropagation, according to the chain rule, the calculation process of the gradient of the first layer in the network is shown in Equation (2):
It can be found from Equation (2) that the gradient of the first layer contained a partial derivative term directly derived from the error of the layer. Even if the gradient of the latter layer was extremely small, the gradient would not disappear in this layer.
Google’s Inception series models from V1 to V3 start from the width of the model instead of the depth. It is believed that the size of the convolution kernel required for objects of different sizes is also different, so the parallel convolution kernel is adopted. At the same time, the Inception network also performs well in terms of model size and computational efficiency. For example, when using two 3 × 3 convolution kernels instead of a 5 × 5 convolution kernel, the expression ability is not weakened while reducing the number of parameters.
GoogleNet is a 22-layer deep convolutional neural network that is a variant of the Inception network, a deep convolutional neural network developed by researchers at Google. It was introduced to provide more efficiency in classification and detection. It is currently being used in classification techniques.