Training Convolutional Neural Networks with Multi-Size Images and Triplet Loss for Remote Sensing Scene Classification

Jianming Zhang; Chaoquan Lu; Jin Wang; Xiao-Guang Yue; Se-Jung Lim; Zafer Al-Makhadmeh; Amr Tolba

doi:10.3390/s20041188

Training Convolutional Neural Networks with Multi-Size Images and Triplet Loss for Remote Sensing Scene Classification

Sensors ◽

10.3390/s20041188 ◽

2020 ◽

Vol 20 (4) ◽

pp. 1188 ◽

Cited By ~ 10

Author(s):

Jianming Zhang ◽

Chaoquan Lu ◽

Jin Wang ◽

Xiao-Guang Yue ◽

Se-Jung Lim ◽

...

Keyword(s):

Remote Sensing ◽

Classification Accuracy ◽

Model Parameters ◽

Classification Algorithms ◽

Scene Classification ◽

Training Strategy ◽

Network Training ◽

Training Stage ◽

Triplet Loss ◽

And Training

Many remote sensing scene classification algorithms improve their classification accuracy by additional modules, which increases the parameters and computing overhead of the model at the inference stage. In this paper, we explore how to improve the classification accuracy of the model without adding modules at the inference stage. First, we propose a network training strategy of training with multi-size images. Then, we introduce more supervision information by triplet loss and design a branch for the triplet loss. In addition, dropout is introduced between the feature extractor and the classifier to avoid over-fitting. These modules only work at the training stage and will not bring about the increase in model parameters at the inference stage. We use Resnet18 as the baseline and add the three modules to the baseline. We perform experiments on three datasets: AID, NWPU-RESISC45, and OPTIMAL. Experimental results show that our model combined with the three modules is more competitive than many existing classification algorithms. In addition, ablation experiments on OPTIMAL show that dropout, triplet loss, and training with multi-size images improve the overall accuracy of the model on the test set by 0.53%, 0.38%, and 0.7%, respectively. The combination of the three modules improves the overall accuracy of the model by 1.61%. It can be seen that the three modules can improve the classification accuracy of the model without increasing model parameters at the inference stage, and training with multi-size images brings a greater gain in accuracy than the other two modules, but the combination of the three modules will be better.

Download Full-text

A Multi-Branch Feature Fusion Strategy Based on an Attention Mechanism for Remote Sensing Image Scene Classification

Remote Sensing ◽

10.3390/rs13101950 ◽

2021 ◽

Vol 13 (10) ◽

pp. 1950

Author(s):

Cuiping Shi ◽

Xin Zhao ◽

Liguo Wang

Keyword(s):

Remote Sensing ◽

Feature Extraction ◽

Classification Accuracy ◽

Feature Fusion ◽

State Of The Art ◽

Rapid Development ◽

Remote Sensing Image ◽

Classification Performance ◽

Attention Mechanism ◽

Scene Classification

In recent years, with the rapid development of computer vision, increasing attention has been paid to remote sensing image scene classification. To improve the classification performance, many studies have increased the depth of convolutional neural networks (CNNs) and expanded the width of the network to extract more deep features, thereby increasing the complexity of the model. To solve this problem, in this paper, we propose a lightweight convolutional neural network based on attention-oriented multi-branch feature fusion (AMB-CNN) for remote sensing image scene classification. Firstly, we propose two convolution combination modules for feature extraction, through which the deep features of images can be fully extracted with multi convolution cooperation. Then, the weights of the feature are calculated, and the extracted deep features are sent to the attention mechanism for further feature extraction. Next, all of the extracted features are fused by multiple branches. Finally, depth separable convolution and asymmetric convolution are implemented to greatly reduce the number of parameters. The experimental results show that, compared with some state-of-the-art methods, the proposed method still has a great advantage in classification accuracy with very few parameters.

Download Full-text

A Lightweight Convolutional Neural Network Based on Group-Wise Hybrid Attention for Remote Sensing Scene Classification

Remote Sensing ◽

10.3390/rs14010161 ◽

2021 ◽

Vol 14 (1) ◽

pp. 161

Author(s):

Cuiping Shi ◽

Xinlei Zhang ◽

Jingwei Sun ◽

Liguo Wang

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Convolutional Neural Network ◽

Spatial Attention ◽

Classification Performance ◽

Model Parameters ◽

Scene Classification ◽

Spatial Dimensions ◽

Work First ◽

Channel Dimension

With the development of computer vision, attention mechanisms have been widely studied. Although the introduction of an attention module into a network model can help to improve e classification performance on remote sensing scene images, the direct introduction of an attention module can increase the number of model parameters and amount of calculation, resulting in slower model operations. To solve this problem, we carried out the following work. First, a channel attention module and spatial attention module were constructed. The input features were enhanced through channel attention and spatial attention separately, and the features recalibrated by the attention modules were fused to obtain the features with hybrid attention. Then, to reduce the increase in parameters caused by the attention module, a group-wise hybrid attention module was constructed. The group-wise hybrid attention module divided the input features into four groups along the channel dimension, then used the hybrid attention mechanism to enhance the features in the channel and spatial dimensions for each group, then fused the features of the four groups along the channel dimension. Through the use of the group-wise hybrid attention module, the number of parameters and computational burden of the network were greatly reduced, and the running time of the network was shortened. Finally, a lightweight convolutional neural network was constructed based on the group-wise hybrid attention (LCNN-GWHA) for remote sensing scene image classification. Experiments on four open and challenging remote sensing scene datasets demonstrated that the proposed method has great advantages, in terms of classification accuracy, even with a very low number of parameters.

Download Full-text

AMN: Attention Metric Network for One-Shot Remote Sensing Image Scene Classification

Remote Sensing ◽

10.3390/rs12244046 ◽

2020 ◽

Vol 12 (24) ◽

pp. 4046

Author(s):

Xirong Li ◽

Fangling Pu ◽

Rui Yang ◽

Rong Gui ◽

Xin Xu

Keyword(s):

Remote Sensing ◽

Classification Problem ◽

Remote Sensing Image ◽

Similarity Measurement ◽

Scene Classification ◽

Feature Maps ◽

Training Strategy ◽

Measurement Results ◽

Classification Tasks ◽

The Cost

In recent years, deep neural network (DNN) based scene classification methods have achieved promising performance. However, the data-driven training strategy requires a large number of labeled samples, making the DNN-based methods unable to solve the scene classification problem in the case of a small number of labeled images. As the number and variety of scene images continue to grow, the cost and difficulty of manual annotation also increase. Therefore, it is significant to deal with the scene classification problem with only a few labeled samples. In this paper, we propose an attention metric network (AMN) in the framework of the few-shot learning (FSL) to improve the performance of one-shot scene classification. AMN is composed of a self-attention embedding network (SAEN) and a cross-attention metric network (CAMN). In SAEN, we adopt the spatial attention and the channel attention of feature maps to obtain abundant features of scene images. In CAMN, we propose a novel cross-attention mechanism which can highlight the features that are more concerned about different categories, and improve the similarity measurement performance. A loss function combining mean square error (MSE) loss with multi-class N-pair loss is developed, which helps to promote the intra-class similarity and inter-class variance of embedding features, and also improve the similarity measurement results. Experiments on the NWPU-RESISC45 dataset and the RSD-WHU46 dataset demonstrate that our method achieves the state-of-the-art results on one-shot remote sensing image scene classification tasks.

Download Full-text

DropBand: A Simple and Effective Method for Promoting the Scene Classification Accuracy of Convolutional Neural Networks for VHR Remote Sensing Imagery

IEEE Geoscience and Remote Sensing Letters ◽

10.1109/lgrs.2017.2785261 ◽

2018 ◽

Vol 15 (2) ◽

pp. 257-261 ◽

Cited By ~ 10

Author(s):

Naisen Yang ◽

Hong Tang ◽

Hongquan Sun ◽

Xin Yang

Keyword(s):

Remote Sensing ◽

Neural Networks ◽

Convolutional Neural Networks ◽

Classification Accuracy ◽

Scene Classification ◽

Remote Sensing Imagery

Download Full-text

Research on Indoor Scene Classification Mechanism Based on Multiple Descriptors Fusion

Mobile Information Systems ◽

10.1155/2020/4835198 ◽

2020 ◽

Vol 2020 ◽

pp. 1-14

Author(s):

Ping Ji ◽

Danyang Qin ◽

Pan Feng ◽

Tingting Lan ◽

Guanyu Sun

Keyword(s):

Classification Accuracy ◽

Region Of Interest ◽

Combination Method ◽

Classification Algorithms ◽

Scene Classification ◽

Depth Images ◽

Large Size ◽

Indoor Scene ◽

Components Analysis ◽

Performance Analysis And Simulation

This study aims at the great limitations caused by the non-ROI (region of interest) information interference in traditional scene classification algorithms, including the changes of multiscale or various visual angles and the high similarity between classes and other factors. An effective indoor scene classification mechanism based on multiple descriptors fusion is proposed, which introduces the depth images to improve descriptor efficiency. The greedy descriptor filter algorithm (GDFA) is proposed to obtain valuable descriptors, and the multiple descriptor combination method is also given to further improve descriptor performance. Performance analysis and simulation results show that multiple descriptors fusion not only can achieve higher classification accuracy than principal components analysis (PCA) in the condition with medium and large size of descriptors but also can improve the classification accuracy than the other existing algorithms effectively.

Download Full-text

A NOVEL FRAMEWORK FOR REMOTE SENSING IMAGE SCENE CLASSIFICATION

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-3-657-2018 ◽

2018 ◽

Vol XLII-3 ◽

pp. 657-663 ◽

Cited By ~ 5

Author(s):

S. Jiang ◽

H. Zhao ◽

W. Wu ◽

Q. Tan

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Semantic Category ◽

Distribution Patterns ◽

Remote Sensing Image ◽

Scene Classification ◽

Remote Sensing Images ◽

Training Stage ◽

Feature Extractor ◽

High Level

High resolution remote sensing (HRRS) images scene classification aims to label an image with a specific semantic category. HRRS images contain more details of the ground objects and their spatial distribution patterns than low spatial resolution images. Scene classification can bridge the gap between low-level features and high-level semantics. It can be applied in urban planning, target detection and other fields. This paper proposes a novel framework for HRRS images scene classification. This framework combines the convolutional neural network (CNN) and XGBoost, which utilizes CNN as feature extractor and XGBoost as a classifier. Then, this framework is evaluated on two different HRRS images datasets: UC-Merced dataset and NWPU-RESISC45 dataset. Our framework achieved satisfying accuracies on two datasets, which is 95.57&thinsp;% and 83.35&thinsp;% respectively. From the experiments result, our framework has been proven to be effective for remote sensing images classification. Furthermore, we believe this framework will be more practical for further HRRS scene classification, since it costs less time on training stage.

Download Full-text

Scene classification of remote sensing image based on compound pruning

MATEC Web of Conferences ◽

10.1051/matecconf/202133606030 ◽

2021 ◽

Vol 336 ◽

pp. 06030

Author(s):

Fengbing Jiang ◽

Fang Li ◽

Guoliang Yang

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Image Data ◽

Remote Sensing Image ◽

Convolution Neural Network ◽

Model Parameters ◽

Scene Classification ◽

Remote Sensing Image Classification ◽

And Storage

Convolution neural network for remote sensing image scene classification consumes a lot of time and storage space to train, test and save the model. In this paper, firstly, elastic variables are defined for convolution layer filter, and combined with filter elasticity and batch normalization scaling factor, a compound pruning method of convolution neural network is proposed. Only the superparameter of pruning rate needs to be adjusted during training. in the process of training, the performance of the model can be improved by means of transfer learning. In this paper, algorithm tests are carried out on NWPU-RESISC45 remote sensing image data to verify the effectiveness of the proposed method. According to the experimental results, the proposed method can not only effectively reduce the number of model parameters and computation, but also ensure the accuracy of the algorithm in remote sensing image classification.

Download Full-text

RESEARCH ON REMOTE SENSING IMAGE CLASSIFICATION BASED ON FEATURE LEVEL FUSION

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-3-2185-2018 ◽

2018 ◽

Vol XLII-3 ◽

pp. 2185-2189 ◽

Cited By ~ 1

Author(s):

L. Yuan ◽

G. Zhu

Keyword(s):

Remote Sensing ◽

Image Classification ◽

Classification Accuracy ◽

Remote Sensing Image ◽

Support Vector ◽

Classification Algorithms ◽

Remote Sensing Image Classification ◽

Feature Level Fusion ◽

Fused Image ◽

Level Fusion

Remote sensing image classification, as an important direction of remote sensing image processing and application, has been widely studied. However, in the process of existing classification algorithms, there still exists the phenomenon of misclassification and missing points, which leads to the final classification accuracy is not high. In this paper, we selected Sentinel-1A and Landsat8 OLI images as data sources, and propose a classification method based on feature level fusion. Compare three kind of feature level fusion algorithms (i.e., Gram-Schmidt spectral sharpening, Principal Component Analysis transform and Brovey transform), and then select the best fused image for the classification experimental. In the classification process, we choose four kinds of image classification algorithms (i.e. Minimum distance, Mahalanobis distance, Support Vector Machine and ISODATA) to do contrast experiment. We use overall classification precision and Kappa coefficient as the classification accuracy evaluation criteria, and the four classification results of fused image are analysed. The experimental results show that the fusion effect of Gram-Schmidt spectral sharpening is better than other methods. In four kinds of classification algorithms, the fused image has the best applicability to Support Vector Machine classification, the overall classification precision is 94.01&thinsp;% and the Kappa coefficients is 0.91. The fused image with Sentinel-1A and Landsat8 OLI is not only have more spatial information and spectral texture characteristics, but also enhances the distinguishing features of the images. The proposed method is beneficial to improve the accuracy and stability of remote sensing image classification.

Download Full-text

Remote Sensing Image Scene Classification via Label Augmentation and Intra-Class Constraint

Remote Sensing ◽

10.3390/rs13132566 ◽

2021 ◽

Vol 13 (13) ◽

pp. 2566

Author(s):

Hao Xie ◽

Yushi Chen ◽

Pedram Ghamisi

Keyword(s):

Remote Sensing ◽

Classification Accuracy ◽

Data Augmentation ◽

Original Data ◽

Small Data ◽

Scene Classification ◽

Training Set ◽

Output Distribution ◽

Training Samples ◽

Leibler Divergence

In recent years, many convolutional neural network (CNN)-based methods have been proposed to address the scene classification tasks of remote sensing images. Since the number of training samples in RS datasets is generally small, data augmentation is often used to expand the training set. It is, however, not appropriate when original data augmentation methods keep the label and change the content of the image at the same time. In this study, label augmentation (LA) is presented to fully utilize the training set by assigning a joint label to each generated image, which considers the label and data augmentation at the same time. Moreover, the output of images obtained by different data augmentation is aggregated in the test process. However, the augmented samples increase the intra-class diversity of the training set, which is a challenge to complete the following classification process. To address the above issue and further improve classification accuracy, Kullback–Leibler divergence (KL) is used to constrain the output distribution of two training samples with the same scene category to generate a consistent output distribution. Extensive experiments were conducted on widely-used UCM, AID and NWPU datasets. The proposed method can surpass the other state-of-the-art methods in terms of classification accuracy. For example, on the challenging NWPU dataset, competitive overall accuracy (i.e., 91.05%) is obtained with a 10% training ratio.

Download Full-text

LPIN: A Lightweight Progressive Inpainting Network for Improving the Robustness of Remote Sensing Images Scene Classification

Remote Sensing ◽

10.3390/rs14010053 ◽

2021 ◽

Vol 14 (1) ◽

pp. 53

Author(s):

Weining An ◽

Xinqi Zhang ◽

Hang Wu ◽

Wenchang Zhang ◽

Yaohua Du ◽

...

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Classification Accuracy ◽

Image Inpainting ◽

Intrinsic Noise ◽

Transmission Efficiency ◽

Atmospheric Environment ◽

Combined Approach ◽

Scene Classification ◽

High Level

At present, the classification accuracy of high-resolution Remote Sensing Image Scene Classification (RSISC) has reached a quite high level on standard datasets. However, when coming to practical application, the intrinsic noise of satellite sensors and the disturbance of atmospheric environment often degrade real Remote Sensing (RS) images. It introduces defects to them, which affects the performance and reduces the robustness of RSISC methods. Moreover, due to the restriction of memory and power consumption, the methods also need a small number of parameters and fast computing speed to be implemented on small portable systems such as unmanned aerial vehicles. In this paper, a Lightweight Progressive Inpainting Network (LPIN) and a novel combined approach of LPIN and the existing RSISC methods are proposed to improve the robustness of RSISC tasks and satisfy the requirement of methods on portable systems. The defects in real RS images are inpainted by LPIN to provide a purified input for classification. With the combined approach, the classification accuracy on RS images with defects can be improved to the original level of those without defects. The LPIN is designed on the consideration of lightweight model. Measures are adopted to ensure a high gradient transmission efficiency while reducing the number of network parameters. Multiple loss functions are used to get reasonable and realistic inpainting results. Extensive tests of image inpainting of LPIN and classification tests with the combined approach on NWPU-RESISC45, UC Merced Land-Use and AID datasets are carried out which indicate that the LPIN achieves a state-of-the-art inpainting quality with less parameters and a faster inpainting speed. Furthermore, the combined approach keeps the comparable classification accuracy level on RS images with defects as that without defects, which will improve the robustness of high-resolution RSISC tasks.

Download Full-text