Dense Connectivity Based Two-Stream Deep Feature Fusion Framework for Aerial Scene Classification

Yunlong Yu; Fuxian Liu

doi:10.3390/rs10071158

Dense Connectivity Based Two-Stream Deep Feature Fusion Framework for Aerial Scene Classification

Remote Sensing ◽

10.3390/rs10071158 ◽

2018 ◽

Vol 10 (7) ◽

pp. 1158 ◽

Cited By ~ 27

Author(s):

Yunlong Yu ◽

Fuxian Liu

Keyword(s):

Remote Sensing ◽

Feature Fusion ◽

Local Binary Patterns ◽

Aerial Image ◽

Great Success ◽

Scene Classification ◽

Fusion Model ◽

Deep Architecture ◽

Deep Feature ◽

Training Samples

Aerial scene classification is an active and challenging problem in high-resolution remote sensing imagery understanding. Deep learning models, especially convolutional neural networks (CNNs), have achieved prominent performance in this field. The extraction of deep features from the layers of a CNN model is widely used in these CNN-based methods. Although the CNN-based approaches have obtained great success, there is still plenty of room to further increase the classification accuracy. As a matter of fact, the fusion with other features has great potential for leading to the better performance of aerial scene classification. Therefore, we propose two effective architectures based on the idea of feature-level fusion. The first architecture, i.e., texture coded two-stream deep architecture, uses the raw RGB network stream and the mapped local binary patterns (LBP) coded network stream to extract two different sets of features and fuses them using a novel deep feature fusion model. In the second architecture, i.e., saliency coded two-stream deep architecture, we employ the saliency coded network stream as the second stream and fuse it with the raw RGB network stream using the same feature fusion model. For sake of validation and comparison, our proposed architectures are evaluated via comprehensive experiments with three publicly available remote sensing scene datasets. The classification accuracies of saliency coded two-stream architecture with our feature fusion model achieve 97.79%, 98.90%, 94.09%, 95.99%, 85.02%, and 87.01% on the UC-Merced dataset (50% and 80% training samples), the Aerial Image Dataset (AID) (20% and 50% training samples), and the NWPU-RESISC45 dataset (10% and 20% training samples), respectively, overwhelming state-of-the-art methods.

Download Full-text

Deep Discriminative Representation Learning with Attention Map for Scene Classification

Remote Sensing ◽

10.3390/rs12091366 ◽

2020 ◽

Vol 12 (9) ◽

pp. 1366 ◽

Cited By ~ 5

Author(s):

Jun Li ◽

Daoyu Lin ◽

Yang Wang ◽

Guangluan Xu ◽

Yunyan Zhang ◽

...

Keyword(s):

Remote Sensing ◽

Feature Fusion ◽

Representation Learning ◽

Classification Performance ◽

Great Success ◽

Scene Classification ◽

Remote Sensing Images ◽

Discriminative Ability ◽

Feature Representations ◽

Benchmark Datasets

In recent years, convolutional neural networks (CNNs) have shown great success in the scene classification of computer vision images. Although these CNNs can achieve excellent classification accuracy, the discriminative ability of feature representations extracted from CNNs is still limited in distinguishing more complex remote sensing images. Therefore, we propose a unified feature fusion framework based on attention mechanism in this paper, which is called Deep Discriminative Representation Learning with Attention Map (DDRL-AM). Firstly, by applying Gradient-weighted Class Activation Mapping (Grad-CAM) algorithm, attention maps associated with the predicted results are generated in order to make CNNs focus on the most salient parts of the image. Secondly, a spatial feature transformer (SFT) is designed to extract discriminative features from attention maps. Then an innovative two-channel CNN architecture is proposed by the fusion of features extracted from attention maps and the RGB (red green blue) stream. A new objective function that considers both center and cross-entropy loss are optimized to decrease the influence of inter-class dispersion and within-class variance. In order to show its effectiveness in classifying remote sensing images, the proposed DDRL-AM method is evaluated on four public benchmark datasets. The experimental results demonstrate the competitive scene classification performance of the DDRL-AM approach. Moreover, the visualization of features extracted by the proposed DDRL-AM method can prove that the discriminative ability of features has been increased.

Download Full-text

AttentionBased Deep Feature Fusion for the Scene Classification of HighResolution Remote Sensing Images

Remote Sensing ◽

10.3390/rs11171996 ◽

2019 ◽

Vol 11 (17) ◽

pp. 1996 ◽

Cited By ~ 7

Author(s):

Zhu ◽

Yan ◽

Mo ◽

Liu

Keyword(s):

Remote Sensing ◽

Loss Function ◽

Feature Fusion ◽

Cross Entropy ◽

Scene Classification ◽

Remote Sensing Images ◽

Graphic Processing Units ◽

Entropy Loss ◽

Deep Feature

Scene classification of highresolution remote sensing images (HRRSI) is one of the most important means of landcover classification. Deep learning techniques, especially the convolutional neural network (CNN) have been widely applied to the scene classification of HRRSI due to the advancement of graphic processing units (GPU). However, they tend to extract features from the whole images rather than discriminative regions. The visual attention mechanism can force the CNN to focus on discriminative regions, but it may suffer from the influence of intraclass diversity and repeated texture. Motivated by these problems, we propose an attention-based deep feature fusion (ADFF) framework that constitutes three parts, namely attention maps generated by Gradientweighted Class Activation Mapping (GradCAM), a multiplicative fusion of deep features and the centerbased cross-entropy loss function. First of all, we propose to make attention maps generated by GradCAM as an explicit input in order to force the network to concentrate on discriminative regions. Then, deep features derived from original images and attention maps are proposed to be fused by multiplicative fusion in order to consider both improved abilities to distinguish scenes of repeated texture and the salient regions. Finally, the centerbased cross-entropy loss function that utilizes both the cross-entropy loss and center loss function is proposed to backpropagate fused features so as to reduce the effect of intraclass diversity on feature representations. The proposed ADFF architecture is tested on three benchmark datasets to show its performance in scene classification. The experiments confirm that the proposed method outperforms most competitive scene classification methods with an average overall accuracy of 94% under different training ratios.

Download Full-text

Self-Attention-Based Deep Feature Fusion for Remote Sensing Scene Classification

IEEE Geoscience and Remote Sensing Letters ◽

10.1109/lgrs.2020.2968550 ◽

2021 ◽

Vol 18 (1) ◽

pp. 43-47 ◽

Cited By ~ 4

Author(s):

Ran Cao ◽

Leyuan Fang ◽

Ting Lu ◽

Nanjun He

Keyword(s):

Remote Sensing ◽

Feature Fusion ◽

Scene Classification ◽

Deep Feature

Download Full-text

A Novel Deep Feature Fusion Network For Remote Sensing Scene Classification

IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium ◽

10.1109/igarss.2019.8898900 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yangyang Li ◽

Qi Wang ◽

Xiaoxu Liang ◽

Licheng Jiao

Keyword(s):

Remote Sensing ◽

Feature Fusion ◽

Scene Classification ◽

Deep Feature

Download Full-text

Retraction: Zhu R. et al. Attention-Based Deep Feature Fusion for the Scene Classification of High-Resolution Remote Sensing Images. Remote Sensing. 2019, 11(17), 1996

Remote Sensing ◽

10.3390/rs12040742 ◽

2020 ◽

Vol 12 (4) ◽

pp. 742 ◽

Cited By ~ 1

Author(s):

Ruixi Zhu ◽

Li Yan ◽

Nan Mo ◽

Yi Liu

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Research Method ◽

Feature Fusion ◽

Scene Classification ◽

Remote Sensing Images ◽

Deep Feature

We have been made aware that the innovative contributions, research method and the majority of the content of this article [...]

Download Full-text

Deep Feature Fusion for VHR Remote Sensing Scene Classification

IEEE Transactions on Geoscience and Remote Sensing ◽

10.1109/tgrs.2017.2700322 ◽

2017 ◽

Vol 55 (8) ◽

pp. 4775-4784 ◽

Cited By ~ 146

Author(s):

Souleyman Chaib ◽

Huan Liu ◽

Yanfeng Gu ◽

Hongxun Yao

Keyword(s):

Remote Sensing ◽

Feature Fusion ◽

Scene Classification ◽

Deep Feature

Download Full-text

Remote Sensing Scene Classification Using Sparse Representation-based Framework with Deep Feature Fusion

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing ◽

10.1109/jstars.2021.3084441 ◽

2021 ◽

pp. 1-1

Author(s):

Shaohui Mei ◽

Keli Yan ◽

Mingyang Ma ◽

Xiaoning Chen ◽

Shun Zhang ◽

...

Keyword(s):

Remote Sensing ◽

Sparse Representation ◽

Feature Fusion ◽

Scene Classification ◽

Deep Feature

Download Full-text

FEATURE FUSION FOR CROSS-MODAL SCENE CLASSIFICATION OF REMOTE SENSING IMAGE

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliv-m-3-2021-63-2021 ◽

2021 ◽

Vol XLIV-M-3-2021 ◽

pp. 63-66

Author(s):

W. Geng ◽

W. Zhou ◽

S. Jin

Keyword(s):

Remote Sensing ◽

Feature Fusion ◽

Remote Sensing Image ◽

Aerial Images ◽

Aerial Image ◽

Svm Classifier ◽

Scene Classification ◽

Street View ◽

Modal Model

Abstract. Scene classification plays an important role in remote sensing field. Traditional approaches use high-resolution remote sensing images as data source to extract powerful features. Although these kind of methods are common, the model performance is severely affected by the image quality of the dataset, and the single modal (source) of images tend to cause the mission of some scene semantic information, which eventually degrade the classification accuracy. Nowadays, multi-modal remote sensing data become easy to obtain since the development of remote sensing technology. How to carry out scene classification of cross-modal data has become an interesting topic in the field. To solve the above problems, this paper proposes using feature fusion for cross-modal scene classification of remote sensing image, i.e., aerial and ground street view images, expecting to use the advantages of aerial images and ground street view data to complement each other. Our cross- modal model is based on Siamese Network. Specifically, we first train the cross-modal model by pairing different sources of data with aerial image and ground data. Then, the trained model is used to extract the deep features of the aerial and ground image pair, and the features of the two perspectives are fused to train a SVM classifier for scene classification. Our approach has been demonstrated using two public benchmark datasets, AiRound and CV-BrCT. The preliminary results show that the proposed method achieves state-of-the-art performance compared with the traditional methods, indicating that the information from ground data can contribute to aerial image classification.

Download Full-text

A Multi-Branch Feature Fusion Strategy Based on an Attention Mechanism for Remote Sensing Image Scene Classification

Remote Sensing ◽

10.3390/rs13101950 ◽

2021 ◽

Vol 13 (10) ◽

pp. 1950

Author(s):

Cuiping Shi ◽

Xin Zhao ◽

Liguo Wang

Keyword(s):

Remote Sensing ◽

Feature Extraction ◽

Classification Accuracy ◽

Feature Fusion ◽

State Of The Art ◽

Rapid Development ◽

Remote Sensing Image ◽

Classification Performance ◽

Attention Mechanism ◽

Scene Classification

In recent years, with the rapid development of computer vision, increasing attention has been paid to remote sensing image scene classification. To improve the classification performance, many studies have increased the depth of convolutional neural networks (CNNs) and expanded the width of the network to extract more deep features, thereby increasing the complexity of the model. To solve this problem, in this paper, we propose a lightweight convolutional neural network based on attention-oriented multi-branch feature fusion (AMB-CNN) for remote sensing image scene classification. Firstly, we propose two convolution combination modules for feature extraction, through which the deep features of images can be fully extracted with multi convolution cooperation. Then, the weights of the feature are calculated, and the extracted deep features are sent to the attention mechanism for further feature extraction. Next, all of the extracted features are fused by multiple branches. Finally, depth separable convolution and asymmetric convolution are implemented to greatly reduce the number of parameters. The experimental results show that, compared with some state-of-the-art methods, the proposed method still has a great advantage in classification accuracy with very few parameters.

Download Full-text

A Lightweight and Fine-Grained Feature Fusion Network for Remote Sensing Scene Classification

10.1109/icspcc52875.2021.9564476 ◽

2021 ◽

Author(s):

Lin Bai ◽

Qingxin Liu ◽

Cuiling Li ◽

Zhen Ye ◽

Meng Hui

Keyword(s):

Remote Sensing ◽

Feature Fusion ◽

Scene Classification ◽

Fine Grained

Download Full-text