Author Contributions
Conceptualization, D.W. and L.W.; methodology, D.W. and L.W.; software, D.W.; validation, L.W. and H.L.; formal analysis, M.X.; investigation, D.W.; resources, M.X. and L.W.; data curation, D.W.; writing—original draft preparation, D.W.; writing—review and editing, M.X.; visualization, D.W.; supervision, L.W.; project administration, L.W.; funding acquisition, L.W. All authors have read and agreed to the published version of the manuscript.
Figure 1.
The network framework of MHCNet. Four auxiliary modules are proposed in MHCNet, which are the cross-scale feature-attention module (CSAM), global semantic filtering module (GSFM), double-branch information-fusion module (DBIFM), and similarity-enhancement module (SEM).
Figure 1.
The network framework of MHCNet. Four auxiliary modules are proposed in MHCNet, which are the cross-scale feature-attention module (CSAM), global semantic filtering module (GSFM), double-branch information-fusion module (DBIFM), and similarity-enhancement module (SEM).
Figure 2.
Structure of cross-scale feature-attention module.
Figure 2.
Structure of cross-scale feature-attention module.
Figure 3.
Structure of global semantic filtering module.
Figure 3.
Structure of global semantic filtering module.
Figure 4.
Structure of double-branch information-fusion module.
Figure 4.
Structure of double-branch information-fusion module.
Figure 5.
Structure of similarity-enhancement module.
Figure 5.
Structure of similarity-enhancement module.
Figure 6.
Improved pyramid pooling module structure.
Figure 6.
Improved pyramid pooling module structure.
Figure 7.
Heatmap where (a) represents the heat map of the backbone network + SEM, (b) represents the heat map after adding CSAM to the basis of (a), (c) represents the heat map after adding GSFM to the basis of (b), (d) represents the heat map after adding DBIFM to (c), and (e) is a real label diagram. As illustrated in the figure, the model’s attention to the changing areas gradually increases as the modules are continuously stacked. This observation indicates the effectiveness of each of our modules in improving the model’s ability to represent features.
Figure 7.
Heatmap where (a) represents the heat map of the backbone network + SEM, (b) represents the heat map after adding CSAM to the basis of (a), (c) represents the heat map after adding GSFM to the basis of (b), (d) represents the heat map after adding DBIFM to (c), and (e) is a real label diagram. As illustrated in the figure, the model’s attention to the changing areas gradually increases as the modules are continuously stacked. This observation indicates the effectiveness of each of our modules in improving the model’s ability to represent features.
Figure 8.
BTRS-CD dataset. Each column represents a pair of two-time remote -ensing images, image1 and image2 are real remote-sensing images, and label is a label (white is the change region, black is the invariant region).Where, (a) represents the no-change area, (b) represents the construction of factories, (c) represents the transformation of roads into fields, (d) represents the transformation of factories into vacant land, and (e) represents the changes within the building complex.
Figure 8.
BTRS-CD dataset. Each column represents a pair of two-time remote -ensing images, image1 and image2 are real remote-sensing images, and label is a label (white is the change region, black is the invariant region).Where, (a) represents the no-change area, (b) represents the construction of factories, (c) represents the transformation of roads into fields, (d) represents the transformation of factories into vacant land, and (e) represents the changes within the building complex.
Figure 9.
The proportion distribution map of the change area of the BTRS-CD dataset. The percent of the changing area is plotted along the horizontal axis, and the sample size is plotted along the vertical axis.
Figure 9.
The proportion distribution map of the change area of the BTRS-CD dataset. The percent of the changing area is plotted along the horizontal axis, and the sample size is plotted along the vertical axis.
Figure 10.
LEVIR dataset diagram. Each column represents a pair of two-time remote-sensing images, image1 and image2 are real remote-sensing images, and label is a label (white is the change region, black is the invariant region). Where, (a) represents the conversion of good land to buildings, (b) represents the conversion of forests to buildings (concentrated change areas), (c) represents conversion of forests to buildings (scattered change areas), (d) represents conversion of forests to buildings (uneven distribution of change areas), and (e) represents building houses on open land.
Figure 10.
LEVIR dataset diagram. Each column represents a pair of two-time remote-sensing images, image1 and image2 are real remote-sensing images, and label is a label (white is the change region, black is the invariant region). Where, (a) represents the conversion of good land to buildings, (b) represents the conversion of forests to buildings (concentrated change areas), (c) represents conversion of forests to buildings (scattered change areas), (d) represents conversion of forests to buildings (uneven distribution of change areas), and (e) represents building houses on open land.
Figure 11.
Prediction graphs of different algorithms on TRS-CD dataset. A comparison of three pairs of dual-time remote-sensing images is presented in (I–III). Image1 and Image2 represent bi-temporal Google Earth images; label means label; (a) represents our MHCNet prediction graph; (b–j) represent the prediction maps for MFGANnet, BiSiNet, ChangNet, FC_CONC, FC_DIFF, FC_EF, FCN8s, TCDnet, and UNet, respectively.
Figure 11.
Prediction graphs of different algorithms on TRS-CD dataset. A comparison of three pairs of dual-time remote-sensing images is presented in (I–III). Image1 and Image2 represent bi-temporal Google Earth images; label means label; (a) represents our MHCNet prediction graph; (b–j) represent the prediction maps for MFGANnet, BiSiNet, ChangNet, FC_CONC, FC_DIFF, FC_EF, FCN8s, TCDnet, and UNet, respectively.
Figure 12.
Prediction of different algorithms on the LEVIR-CD dataset. A comparison of three pairs of dual-time remote-sensing images is presented in (I–III). Image1 and Image2 represent bi-temporal Google Earth images; label means label; (a) represents our MHCNet prediction graph; (b–j) represent the prediction maps for MFGANnet, BiSiNet, ChangNet, FC_CONC, FC_DIFF, FC_EF, FCN8s, TCDnet, and UNet, respectively.
Figure 12.
Prediction of different algorithms on the LEVIR-CD dataset. A comparison of three pairs of dual-time remote-sensing images is presented in (I–III). Image1 and Image2 represent bi-temporal Google Earth images; label means label; (a) represents our MHCNet prediction graph; (b–j) represent the prediction maps for MFGANnet, BiSiNet, ChangNet, FC_CONC, FC_DIFF, FC_EF, FCN8s, TCDnet, and UNet, respectively.
Table 1.
Comparative experiment of MHCNet under different backbone networks (bold numbers represent optimal results).
Table 1.
Comparative experiment of MHCNet under different backbone networks (bold numbers represent optimal results).
Backbone | ACC (%) | RC (%) | PR (%) | MIoU (%) |
---|
VGG16 | 93.96 | 63.83 | 74.63 | 76.70 |
VGG19 | 93.46 | 65.90 | 69.61 | 75.59 |
ResNet18 | 95.79 | 73.92 | 82.30 | 83.56 |
ResNet34 | 96.01 | 75.33 | 82.90 | 84.36 |
Table 2.
Ablation experiments of MHCNet (bold numbers represent optimal results).
Table 2.
Ablation experiments of MHCNet (bold numbers represent optimal results).
Method | ACC (%) | RC (%) | PR (%) | MIoU (%) | Param (M) |
---|
Backbone | 95.53 | 74.09 | 79.85 | 82.77 | 30.65 |
Backbone + CSAM | 95.71 | 74.55 | 80.63 | 83.14 | 33.45 |
Backbone + CSAM + GSFM | 95.80 | 74.59 | 81.71 | 83.58 | 36.25 |
Backbone + CSAM + GSFM + DBIFM | 95.94 | 74.83 | 82.50 | 84.19 | 40.08 |
Backbone + CSAM + GSFM + DBIFM + SEM | 96.01 | 75.33 | 82.90 | 84.36 | 40.43 |
Table 3.
Comparative experiments on TRS-CD dataset (bold numbers represent optimal and suboptimal results).
Table 3.
Comparative experiments on TRS-CD dataset (bold numbers represent optimal and suboptimal results).
Method | ACC (%) | RC (%) | PR (%) | MIoU (%) | Param (M) | Flops (GMac) |
---|
BiSeNet [44] | 95.21 | 71.92 | 78.56 | 81.33 | 22.02 | 22.48 |
FCN8s [45] | 92.85 | 66.06 | 66.84 | 74.49 | 18.65 | 80.68 |
UNet [46] | 92.68 | 59.67 | 70.00 | 73.18 | 13.42 | 124.21 |
FC_DIFF [47] | 91.12 | 39.27 | 74.83 | 65.87 | 11.35 | 19.29 |
FC_EF [47] | 90.11 | 48.06 | 67.38 | 66.51 | 11.35 | 14.79 |
FC_CONC [47] | 91.58 | 51.25 | 70.87 | 69.61 | 11.55 | 19.30 |
ChangNet [48] | 94.18 | 62.78 | 75.62 | 76.88 | 23.52 | 42.73 |
TCDNet [49] | 95.07 | 69.98 | 79.31 | 80.97 | 23.28 | 32.65 |
MFGANnet [50] | 95.54 | 72.40 | 80.09 | 82.32 | 33.53 | 52.82 |
MHCNet (Ours) | 96.01 | 75.33 | 82.90 | 84.36 | 40.43 | 59.07 |
Table 4.
Comparison experiments on the LEVIR-CD dataset (bold numbers represent optimal and suboptimal results).
Table 4.
Comparison experiments on the LEVIR-CD dataset (bold numbers represent optimal and suboptimal results).
Method | ACC (%) | RC (%) | PR (%) | MIoU (%) | Param (M) | Flops (GMac) |
---|
BiSeNet | 98.04 | 80.49 | 78.74 | 83.36 | 22.02 | 22.48 |
FCN8s | 98.39 | 79.08 | 83.33 | 84.68 | 18.65 | 80.68 |
UNet | 98.62 | 81.32 | 84.69 | 86.25 | 13.42 | 124.21 |
FC_DIFF | 98.46 | 78.84 | 85.72 | 85.26 | 11.35 | 19.29 |
FC_EF | 97.94 | 80.26 | 78.07 | 82.86 | 11.35 | 14.79 |
FC_CONC | 98.54 | 79.72 | 86.53 | 86.09 | 11.55 | 19.30 |
TCDNet | 98.20 | 77.02 | 83.05 | 83.63 | 23.28 | 32.65 |
ChangNet | 98.12 | 79.57 | 81.21 | 83.74 | 23.52 | 42.73 |
MFGANnet | 98.30 | 78.49 | 84.70 | 84.73 | 33.53 | 52.82 |
MHCNet (Ours) | 98.65 | 81.79 | 86.59 | 86.92 | 40.43 | 59.07 |
Table 5.
Comparative experiments on TRS-LEVIR dataset (bold numbers represent optimal and suboptimal results).
Table 5.
Comparative experiments on TRS-LEVIR dataset (bold numbers represent optimal and suboptimal results).
Method | ACC (%) | RC (%) | PR (%) | MIoU (%) | Param (M) | Flops (GMac) |
---|
BiSeNet | 91.97 | 60.65 | 61.69 | 66.08 | 22.02 | 22.48 |
FCN8s | 92.08 | 59.97 | 57.53 | 64.25 | 18.65 | 80.68 |
UNet | 92.28 | 61.44 | 62.48 | 65.55 | 13.42 | 124.21 |
FC_DIFF | 90.18 | 44.55 | 56.74 | 63.93 | 11.35 | 19.29 |
FC_EF | 91.74 | 45.19 | 60.72 | 64.95 | 11.35 | 14.79 |
FC_CONC | 90.46 | 44.85 | 59.11 | 63.86 | 11.55 | 19.30 |
TCDNet | 90.83 | 63.62 | 63.32 | 66.31 | 23.28 | 32.65 |
ChangNet | 91.07 | 54.07 | 56.68 | 62.80 | 23.52 | 42.73 |
MFGANnet | 92.16 | 61.47 | 64.27 | 67.94 | 33.53 | 52.82 |
MHCNet (Ours) | 92.44 | 63.30 | 65.12 | 68.71 | 40.43 | 59.07 |
Table 6.
Comparative experiments on LEVIR-TRS dataset (bold numbers represent optimal and suboptimal results).
Table 6.
Comparative experiments on LEVIR-TRS dataset (bold numbers represent optimal and suboptimal results).
Method | ACC (%) | RC (%) | PR (%) | MIoU (%) | Param (M) | Flops (GMac) |
---|
BiSeNet | 88.51 | 55.74 | 58.26 | 65.31 | 22.02 | 22.48 |
FCN8s | 88.10 | 55.15 | 57.73 | 62.29 | 18.65 | 80.68 |
UNet | 88.28 | 56.83 | 58.14 | 65.74 | 13.42 | 124.21 |
FC_DIFF | 88.86 | 53.26 | 52.35 | 65.67 | 11.35 | 19.29 |
FC_EF | 87.54 | 52.15 | 57.74 | 64.57 | 11.35 | 14.79 |
FC_CONC | 88.61 | 54.36 | 51.49 | 65.55 | 11.55 | 19.30 |
TCDNet | 88.48 | 55.57 | 55.84 | 66.24 | 23.28 | 32.65 |
ChangNet | 88.23 | 56.09 | 54.70 | 64.10 | 23.52 | 42.73 |
MFGANnet | 8.39 | 56.17 | 56.31 | 66.34 | 33.53 | 52.82 |
MHCNet (Ours) | 88.90 | 57.12 | 58.02 | 66.78 | 40.43 | 59.07 |