A Lightweight Object Detection Framework for Remote Sensing Images

Lang Huyan; Yunpeng Bai; Ying Li; Dongmei Jiang; Yanning Zhang; Quan Zhou; Jiayuan Wei; Juanni Liu; Yi Zhang; Tao Cui

doi:10.3390/rs13040683

A Lightweight Object Detection Framework for Remote Sensing Images

Remote Sensing ◽

10.3390/rs13040683 ◽

2021 ◽

Vol 13 (4) ◽

pp. 683

Author(s):

Lang Huyan ◽

Yunpeng Bai ◽

Ying Li ◽

Dongmei Jiang ◽

Yanning Zhang ◽

...

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Real Time ◽

Large Scale ◽

Feature Fusion ◽

Computational Cost ◽

Feature Representation ◽

Detection Accuracy ◽

Remote Sensing Images ◽

Low Level

Onboard real-time object detection in remote sensing images is a crucial but challenging task in this computation-constrained scenario. This task not only requires the algorithm to yield excellent performance but also requests limited time and space complexity of the algorithm. However, previous convolutional neural networks (CNN) based object detectors for remote sensing images suffer from heavy computational cost, which hinders them from being deployed on satellites. Moreover, an onboard detector is desired to detect objects at vastly different scales. To address these issues, we proposed a lightweight one-stage multi-scale feature fusion detector called MSF-SNET for onboard real-time object detection of remote sensing images. Using lightweight SNET as the backbone network reduces the number of parameters and computational complexity. To strengthen the detection performance of small objects, three low-level features are extracted from the three stages of SNET respectively. In the detection part, another three convolutional layers are designed to further extract deep features with rich semantic information for large-scale object detection. To improve detection accuracy, the deep features and low-level features are fused to enhance the feature representation. Extensive experiments and comprehensive evaluations on the openly available NWPU VHR-10 dataset and DIOR dataset are conducted to evaluate the proposed method. Compared with other state-of-art detectors, the proposed detection framework has fewer parameters and calculations, while maintaining consistent accuracy.

Download Full-text

Subtask Attention Based Object Detection in Remote Sensing Images

Remote Sensing ◽

10.3390/rs13101925 ◽

2021 ◽

Vol 13 (10) ◽

pp. 1925

Author(s):

Shengzhou Xiong ◽

Yihua Tan ◽

Yansheng Li ◽

Cai Wen ◽

Pei Yan

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Feature Fusion ◽

Detection Task ◽

Feature Representation ◽

Detection Accuracy ◽

Remote Sensing Images ◽

Attention Network ◽

Multi Scale ◽

Automatic Interpretation

Object detection in remote sensing images (RSIs) is one of the basic tasks in the field of remote sensing image automatic interpretation. In recent years, the deep object detection frameworks of natural scene images (NSIs) have been introduced into object detection on RSIs, and the detection performance has improved significantly because of the powerful feature representation. However, there are still many challenges concerning the particularities of remote sensing objects. One of the main challenges is the missed detection of small objects which have less than five percent of the pixels of the big objects. Generally, the existing algorithms choose to deal with this problem by multi-scale feature fusion based on a feature pyramid. However, the benefits of this strategy are limited, considering that the location of small objects in the feature map will disappear when the detection task is processed at the end of the network. In this study, we propose a subtask attention network (StAN), which handles the detection task directly on the shallow layer of the network. First, StAN contains one shared feature branch and two subtask attention branches of a semantic auxiliary subtask and a detection subtask based on the multi-task attention network (MTAN). Second, the detection branch uses only low-level features considering small objects. Third, the attention map guidance mechanism is put forward to optimize the network for keeping the identification ability. Fourth, the multi-dimensional sampling module (MdS), global multi-view channel weights (GMulW) and target-guided pixel attention (TPA) are designed for further improvement of the detection accuracy in complex scenes. The experimental results on the NWPU VHR-10 dataset and DOTA dataset demonstrated that the proposed algorithm achieved the SOTA performance, and the missed detection of small objects decreased. On the other hand, ablation experiments also proved the effects of MdS, GMulW and TPA.

Download Full-text

Detection of Schools in Remote Sensing Images Based on Attention-Guided Dense Network

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10110736 ◽

2021 ◽

Vol 10 (11) ◽

pp. 736

Author(s):

Han Fu ◽

Xiangtao Fan ◽

Zhenzhen Yan ◽

Xiaoping Du

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Feature Fusion ◽

State Of The Art ◽

Feature Representation ◽

Detection Accuracy ◽

Dense Network ◽

Remote Sensing Images ◽

Composite Object ◽

Detection Algorithms

The detection of primary and secondary schools (PSSs) is a meaningful task for composite object detection in remote sensing images (RSIs). As a typical composite object in RSIs, PSSs have diverse appearances with complex backgrounds, which makes it difficult to effectively extract their features using the existing deep-learning-based object detection algorithms. Aiming at the challenges of PSSs detection, we propose an end-to-end framework called the attention-guided dense network (ADNet), which can effectively improve the detection accuracy of PSSs. First, a dual attention module (DAM) is designed to enhance the ability in representing complex characteristics and alleviate distractions in the background. Second, a dense feature fusion module (DFFM) is built to promote attention cues flow into low layers, which guides the generation of hierarchical feature representation. Experimental results demonstrate that our proposed method outperforms the state-of-the-art methods and achieves 79.86% average precision. The study proves the effectiveness of our proposed method on PSSs detection.

Download Full-text

Elongated Small Object Detection from Remote Sensing Images Using Hierarchical Scale-Sensitive Networks

Remote Sensing ◽

10.3390/rs13163182 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3182

Author(s):

Zheng He ◽

Li Huang ◽

Weijiang Zeng ◽

Xining Zhang ◽

Yongxin Jiang ◽

...

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Large Scale ◽

Small Scale ◽

Detection Accuracy ◽

Small Object ◽

Direction Vector ◽

Remote Sensing Images ◽

Ship Detection ◽

Hierarchical Scale

The detection of elongated objects, such as ships, from satellite images has very important application prospects in marine transportation, shipping management, and many other scenarios. At present, the research of general object detection using neural networks has made significant progress. However, in the context of ship detection from remote sensing images, due to the elongated shape of ship structure and the wide variety of ship size, the detection accuracy is often unsatisfactory. In particular, the detection accuracy of small-scale ships is much lower than that of the large-scale ones. To this end, in this paper, we propose a hierarchical scale sensitive CenterNet (HSSCenterNet) for ship detection from remote sensing images. HSSCenterNet adopts a multi-task learning strategy. First, it presents a dual-direction vector to represent the posture or direction of the tilted bounding box, and employs a two-layer network to predict the dual direction vector, which improves the detection block of CenterNet, and cultivates the ability of detecting targets with tilted posture. Second, it divides the full-scale detection task into three parallel sub-tasks for large-scale, medium-scale, and small-scale ship detection, respectively, and obtains the final results with non-maximum suppression. Experimental results show that, HSSCenterNet achieves a significant improved performance in detecting small-scale ship targets while maintaining a high performance at medium and large scales.

Download Full-text

SSD7-FFAM: A Real-Time Object Detection Network Friendly to Embedded Devices from Scratch

Applied Sciences ◽

10.3390/app11031096 ◽

2021 ◽

Vol 11 (3) ◽

pp. 1096

Author(s):

Qing Li ◽

Yingcheng Lin ◽

Wei He

Keyword(s):

Object Detection ◽

Real Time ◽

Large Scale ◽

Feature Fusion ◽

Contextual Information ◽

Attention Mechanism ◽

Detection Accuracy ◽

Single Shot ◽

Feature Maps ◽

Embedded Devices

The high requirements for computing and memory are the biggest challenges in deploying existing object detection networks to embedded devices. Living lightweight object detectors directly use lightweight neural network architectures such as MobileNet or ShuffleNet pre-trained on large-scale classification datasets, which results in poor network structure flexibility and is not suitable for some specific scenarios. In this paper, we propose a lightweight object detection network Single-Shot MultiBox Detector (SSD)7-Feature Fusion and Attention Mechanism (FFAM), which saves storage space and reduces the amount of calculation by reducing the number of convolutional layers. We offer a novel Feature Fusion and Attention Mechanism (FFAM) method to improve detection accuracy. Firstly, the FFAM method fuses high-level semantic information-rich feature maps with low-level feature maps to improve small objects’ detection accuracy. The lightweight attention mechanism cascaded by channels and spatial attention modules is employed to enhance the target’s contextual information and guide the network to focus on its easy-to-recognize features. The SSD7-FFAM achieves 83.7% mean Average Precision (mAP), 1.66 MB parameters, and 0.033 s average running time on the NWPU VHR-10 dataset. The results indicate that the proposed SSD7-FFAM is more suitable for deployment to embedded devices for real-time object detection.

Download Full-text

Real-Time Garbage Object Detection With Data Augmentation and Feature Fusion Using SUAV Low-Altitude Remote Sensing Images

IEEE Geoscience and Remote Sensing Letters ◽

10.1109/lgrs.2021.3074415 ◽

2021 ◽

pp. 1-5

Author(s):

Weiyang Chen ◽

Haifeng Wang ◽

Hao Li ◽

Quanjing Li ◽

Yang Yang ◽

...

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Real Time ◽

Data Augmentation ◽

Feature Fusion ◽

Remote Sensing Images ◽

Low Altitude

Download Full-text

Real-Time Object Detection in Remote Sensing Images Based on Visual Perception and Memory Reasoning

Electronics ◽

10.3390/electronics8101151 ◽

2019 ◽

Vol 8 (10) ◽

pp. 1151 ◽

Cited By ~ 4

Author(s):

Xia Hua ◽

Xinqing Wang ◽

Ting Rui ◽

Dong Wang ◽

Faming Shao

Keyword(s):

Remote Sensing ◽

Visual Perception ◽

Object Detection ◽

Real Time ◽

Detection Accuracy ◽

Small Object ◽

Remote Sensing Images ◽

Feature Maps ◽

Convolutional Network ◽

Fully Convolutional Network

Aiming at the real-time detection of multiple objects and micro-objects in large-scene remote sensing images, a cascaded convolutional neural network real-time object-detection framework for remote sensing images is proposed, which integrates visual perception and convolutional memory network reasoning. The detection framework is composed of two fully convolutional networks, namely, the strengthened object self-attention pre-screening fully convolutional network (SOSA-FCN) and the object accurate detection fully convolutional network (AD-FCN). SOSA-FCN introduces a self-attention module to extract attention feature maps and constructs a depth feature pyramid to optimize the attention feature maps by combining convolutional long-term and short-term memory networks. It guides the acquisition of potential sub-regions of the object in the scene, reduces the computational complexity, and enhances the network’s ability to extract multi-scale object features. It adapts to the complex background and small object characteristics of a large-scene remote sensing image. In AD-FCN, the object mask and object orientation estimation layer are designed to achieve fine positioning of candidate frames. The performance of the proposed algorithm is compared with that of other advanced methods on NWPU_VHR-10, DOTA, UCAS-AOD, and other open datasets. The experimental results show that the proposed algorithm significantly improves the efficiency of object detection while ensuring detection accuracy and has high adaptability. It has extensive engineering application prospects.

Download Full-text

A Real-Time Tree Crown Detection Approach for Large-Scale Remote Sensing Images on FPGAs

Remote Sensing ◽

10.3390/rs11091025 ◽

2019 ◽

Vol 11 (9) ◽

pp. 1025 ◽

Cited By ~ 3

Author(s):

Weijia Li ◽

Conghui He ◽

Haohuan Fu ◽

Juepeng Zheng ◽

Runmin Dong ◽

...

Keyword(s):

Remote Sensing ◽

Real Time ◽

Large Scale ◽

Satellite Image ◽

Detection Accuracy ◽

Tree Crown ◽

Remote Sensing Images ◽

Time Data ◽

Original Algorithm ◽

Detection Approach

The on-board real-time tree crown detection from high-resolution remote sensing images is beneficial for avoiding the delay between data acquisition and processing, reducing the quantity of data transmission from the satellite to the ground, monitoring the growing condition of individual trees, and discovering the damage of trees as early as possible, etc. Existing high performance platform based tree crown detection studies either focus on processing images in a small size or suffer from high power consumption or slow processing speed. In this paper, we propose the first FPGA-based real-time tree crown detection approach for large-scale satellite images. A pipelined-friendly and resource-economic tree crown detection algorithm (PF-TCD) is designed through reconstructing and modifying the workflow of the original algorithm into three computational kernels on FPGAs. Compared with the well-optimized software implementation of the original algorithm on an Intel 12-core CPU, our proposed PF-TCD obtains the speedup of 18.75 times for a satellite image with a size of 12,188 × 12,576 pixels without reducing the detection accuracy. The image processing time for the large-scale remote sensing image is only 0.33 s, which satisfies the requirements of the on-board real-time data processing on satellites.

Download Full-text

An Approach on Image Processing of Deep Learning Based on Improved SSD

Symmetry ◽

10.3390/sym13030495 ◽

2021 ◽

Vol 13 (3) ◽

pp. 495

Author(s):

Liang Jin ◽

Guodong Liu

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

Object Detection ◽

Real Time ◽

Remote Sensing Image ◽

Detection Accuracy ◽

Remote Sensing Images ◽

Image Detection ◽

Ship Detection ◽

Real Time Detection

Compared with ordinary images, each of the remote sensing images contains many kinds of objects with large scale changes, providing more details. As a typical object of remote sensing image, ship detection has been playing an essential role in the field of remote sensing. With the rapid development of deep learning, remote sensing image detection method based on convolutional neural network (CNN) has occupied a key position. In remote sensing images, the objects of which small scale objects account for a large proportion are closely arranged. In addition, the convolution layer in CNN lacks ample context information, leading to low detection accuracy for remote sensing image detection. To improve detection accuracy and keep the speed of real-time detection, this paper proposed an efficient object detection algorithm for ship detection of remote sensing image based on improved SSD. Firstly, we add a feature fusion module to shallow feature layers to refine feature extraction ability of small object. Then, we add Squeeze-and-Excitation Network (SE) module to each feature layers, introducing attention mechanism to network. The experimental results based on Synthetic Aperture Radar ship detection dataset (SSDD) show that the mAP reaches 94.41%, and the average detection speed is 31FPS. Compared with SSD and other representative object detection algorithms, this improved algorithm has a better performance in detection accuracy and can realize real-time detection.

Download Full-text

Enhanced Feature Representation in Detection for Optical Remote Sensing Images

Remote Sensing ◽

10.3390/rs11182095 ◽

2019 ◽

Vol 11 (18) ◽

pp. 2095 ◽

Cited By ~ 4

Author(s):

Kun Fu ◽

Zhuo Chen ◽

Yue Zhang ◽

Xian Sun

Keyword(s):

Remote Sensing ◽

State Of The Art ◽

Computational Cost ◽

Feature Representation ◽

Detection Accuracy ◽

Optical Remote Sensing ◽

Remote Sensing Images ◽

Two Stage ◽

Multi Scale ◽

One Stage

In recent years, deep learning has led to a remarkable breakthrough in object detection in remote sensing images. In practice, two-stage detectors perform well regarding detection accuracy but are slow. On the other hand, one-stage detectors integrate the detection pipeline of two-stage detectors to simplify the detection process, and are faster, but with lower detection accuracy. Enhancing the capability of feature representation may be a way to improve the detection accuracy of one-stage detectors. For this goal, this paper proposes a novel one-stage detector with enhanced capability of feature representation. The enhanced capability benefits from two proposed structures: dual top-down module and dense-connected inception module. The former efficiently utilizes multi-scale features from multiple layers of the backbone network. The latter both widens and deepens the network to enhance the ability of feature representation with limited extra computational cost. To evaluate the effectiveness of proposed structures, we conducted experiments on horizontal bounding box detection tasks on the challenging DOTA dataset and gained 73.49% mean Average Precision (mAP), achieving state-of-the-art performance. Furthermore, our method ran significantly faster than the best public two-stage detector on the DOTA dataset.

Download Full-text

Multi-Modality and Multi-Scale Attention Fusion Network for Land Cover Classification from VHR Remote Sensing Images

Remote Sensing ◽

10.3390/rs13183771 ◽

2021 ◽

Vol 13 (18) ◽

pp. 3771

Author(s):

Tao Lei ◽

Linze Li ◽

Zhiyong Lv ◽

Mingzhe Zhu ◽

Xiaogang Du ◽

...

Keyword(s):

Remote Sensing ◽

Land Cover ◽

Large Scale ◽

Feature Fusion ◽

Network Models ◽

Land Cover Classification ◽

Extraction Methods ◽

Feature Representation ◽

Remote Sensing Images ◽

Multi Scale

Land cover classification from very high-resolution (VHR) remote sensing images is a challenging task due to the complexity of geography scenes and the varying shape and size of ground targets. It is difficult to utilize the spectral data directly, or to use traditional multi-scale feature extraction methods, to improve VHR remote sensing image classification results. To address the problem, we proposed a multi-modality and multi-scale attention fusion network for land cover classification from VHR remote sensing images. First, based on the encoding-decoding network, we designed a multi-modality fusion module that can simultaneously fuse more useful features and avoid redundant features. This addresses the problem of low classification accuracy for some objects caused by the weak ability of feature representation from single modality data. Second, a novel multi-scale spatial context enhancement module was introduced to improve feature fusion, which solves the problem of a large-scale variation of objects in remote sensing images, and captures long-range spatial relationships between objects. The proposed network and comparative networks were evaluated on two public datasets—the Vaihingen and the Potsdam datasets. It was observed that the proposed network achieves better classification results, with a mean F1-score of 88.6% for the Vaihingen dataset and 92.3% for the Potsdam dataset. Experimental results show that our model is superior to the state-of-the-art network models.

Download Full-text