Multi-Scale Feature Integrated Attention-Based Rotation Network for Object Detection in VHR Aerial Images

Feng Yang; Wentong Li; Haiwei Hu; Wanyi Li; Peng Wang

doi:10.3390/s20061686

Multi-Scale Feature Integrated Attention-Based Rotation Network for Object Detection in VHR Aerial Images

Sensors ◽

10.3390/s20061686 ◽

2020 ◽

Vol 20 (6) ◽

pp. 1686 ◽

Cited By ~ 3

Author(s):

Feng Yang ◽

Wentong Li ◽

Haiwei Hu ◽

Wanyi Li ◽

Peng Wang

Keyword(s):

Object Detection ◽

Large Scale ◽

Ground Truth ◽

Classification Performance ◽

Aerial Images ◽

Detection Methods ◽

Robust Detection ◽

Scale Feature ◽

Multi Scale ◽

Bounding Boxes

Accurate and robust detection of multi-class objects in very high resolution (VHR) aerial images has been playing a significant role in many real-world applications. The traditional detection methods have made remarkable progresses with horizontal bounding boxes (HBBs) due to CNNs. However, HBB detection methods still exhibit limitations including the missed detection and the redundant detection regions, especially for densely-distributed and strip-like objects. Besides, large scale variations and diverse background also bring in many challenges. Aiming to address these problems, an effective region-based object detection framework named Multi-scale Feature Integration Attention Rotation Network (MFIAR-Net) is proposed for aerial images with oriented bounding boxes (OBBs), which promotes the integration of the inherent multi-scale pyramid features to generate a discriminative feature map. Meanwhile, the double-path feature attention network supervised by the mask information of ground truth is introduced to guide the network to focus on object regions and suppress the irrelevant noise. To boost the rotation regression and classification performance, we present a robust Rotation Detection Network, which can generate efficient OBB representation. Extensive experiments and comprehensive evaluations on two publicly available datasets demonstrate the effectiveness of the proposed framework.

Download Full-text

A Single Shot Framework with Multi-Scale Feature Fusion for Geospatial Object Detection

Remote Sensing ◽

10.3390/rs11050594 ◽

2019 ◽

Vol 11 (5) ◽

pp. 594 ◽

Cited By ~ 11

Author(s):

Shuo Zhuang ◽

Ping Wang ◽

Boran Jiang ◽

Gang Wang ◽

Cong Wang

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Large Scale ◽

Feature Fusion ◽

Aerial Images ◽

Detection Methods ◽

Single Shot ◽

Feature Maps ◽

Scale Feature ◽

Multi Scale

With the rapid advances in remote-sensing technologies and the larger number of satellite images, fast and effective object detection plays an important role in understanding and analyzing image information, which could be further applied to civilian and military fields. Recently object detection methods with region-based convolutional neural network have shown excellent performance. However, these two-stage methods contain region proposal generation and object detection procedures, resulting in low computation speed. Because of the expensive manual costs, the quantity of well-annotated aerial images is scarce, which also limits the progress of geospatial object detection in remote sensing. In this paper, on the one hand, we construct and release a large-scale remote-sensing dataset for geospatial object detection (RSD-GOD) that consists of 5 different categories with 18,187 annotated images and 40,990 instances. On the other hand, we design a single shot detection framework with multi-scale feature fusion. The feature maps from different layers are fused together through the up-sampling and concatenation blocks to predict the detection results. High-level features with semantic information and low-level features with fine details are fully explored for detection tasks, especially for small objects. Meanwhile, a soft non-maximum suppression strategy is put into practice to select the final detection results. Extensive experiments have been conducted on two datasets to evaluate the designed network. Results show that the proposed approach achieves a good detection performance and obtains the mean average precision value of 89.0% on a newly constructed RSD-GOD dataset and 83.8% on the Northwestern Polytechnical University very high spatial resolution-10 (NWPU VHR-10) dataset at 18 frames per second (FPS) on a NVIDIA GTX-1080Ti GPU.

Download Full-text

Object Detection Algorithm Based on Improved YOLOv3

Electronics ◽

10.3390/electronics9030537 ◽

2020 ◽

Vol 9 (3) ◽

pp. 537 ◽

Cited By ~ 5

Author(s):

Liquan Zhao ◽

Shuaiyang Li

Keyword(s):

Markov Chains ◽

Object Detection ◽

Large Scale ◽

Ground Truth ◽

Detection Algorithm ◽

Detection Methods ◽

Cluster Method ◽

Cluster Center ◽

Initial Cluster ◽

Bounding Boxes

The ‘You Only Look Once’ v3 (YOLOv3) method is among the most widely used deep learning-based object detection methods. It uses the k-means cluster method to estimate the initial width and height of the predicted bounding boxes. With this method, the estimated width and height are sensitive to the initial cluster centers, and the processing of large-scale datasets is time-consuming. In order to address these problems, a new cluster method for estimating the initial width and height of the predicted bounding boxes has been developed. Firstly, it randomly selects a couple of width and height values as one initial cluster center separate from the width and height of the ground truth boxes. Secondly, it constructs Markov chains based on the selected initial cluster and uses the final points of every Markov chain as the other initial centers. In the construction of Markov chains, the intersection-over-union method is used to compute the distance between the selected initial clusters and each candidate point, instead of the square root method. Finally, this method can be used to continually update the cluster center with each new set of width and height values, which are only a part of the data selected from the datasets. Our simulation results show that the new method has faster convergence speed for initializing the width and height of the predicted bounding boxes and that it can select more representative initial widths and heights of the predicted bounding boxes. Our proposed method achieves better performance than the YOLOv3 method in terms of recall, mean average precision, and F1-score.

Download Full-text

Geospatial Object Detection on High Resolution Remote Sensing Imagery Based on Double Multi-Scale Feature Pyramid Network

Remote Sensing ◽

10.3390/rs11070755 ◽

2019 ◽

Vol 11 (7) ◽

pp. 755 ◽

Cited By ~ 20

Author(s):

Xiaodong Zhang ◽

Kun Zhu ◽

Guanzhou Chen ◽

Xiaoliang Tan ◽

Lifei Zhang ◽

...

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Object Detection ◽

Large Scale ◽

Training Data ◽

Validation Dataset ◽

Remote Sensing Imagery ◽

Scale Feature ◽

Multi Scale ◽

Feature Pyramid

Object detection on very-high-resolution (VHR) remote sensing imagery has attracted a lot of attention in the field of image automatic interpretation. Region-based convolutional neural networks (CNNs) have been vastly promoted in this domain, which first generate candidate regions and then accurately classify and locate the objects existing in these regions. However, the overlarge images, the complex image backgrounds and the uneven size and quantity distribution of training samples make the detection tasks more challenging, especially for small and dense objects. To solve these problems, an effective region-based VHR remote sensing imagery object detection framework named Double Multi-scale Feature Pyramid Network (DM-FPN) was proposed in this paper, which utilizes inherent multi-scale pyramidal features and combines the strong-semantic, low-resolution features and the weak-semantic, high-resolution features simultaneously. DM-FPN consists of a multi-scale region proposal network and a multi-scale object detection network, these two modules share convolutional layers and can be trained end-to-end. We proposed several multi-scale training strategies to increase the diversity of training data and overcome the size restrictions of the input images. We also proposed multi-scale inference and adaptive categorical non-maximum suppression (ACNMS) strategies to promote detection performance, especially for small and dense objects. Extensive experiments and comprehensive evaluations on large-scale DOTA dataset demonstrate the effectiveness of the proposed framework, which achieves mean average precision (mAP) value of 0.7927 on validation dataset and the best mAP value of 0.793 on testing dataset.

Download Full-text

Attentional single-shot network with multi-scale feature fusion for object detection in aerial images

2020 Chinese Automation Congress (CAC) ◽

10.1109/cac51589.2020.9326692 ◽

2020 ◽

Author(s):

Yusheng Wang ◽

Hongzhang Wang ◽

Eryong Tang ◽

Ye Liu

Keyword(s):

Object Detection ◽

Feature Fusion ◽

Aerial Images ◽

Single Shot ◽

Scale Feature ◽

Multi Scale

Download Full-text

A Lightweight Keypoint-Based Oriented Object Detection of Remote Sensing Images

Remote Sensing ◽

10.3390/rs13132459 ◽

2021 ◽

Vol 13 (13) ◽

pp. 2459

Author(s):

Yangyang Li ◽

Heting Mao ◽

Ruijiao Liu ◽

Xuan Pei ◽

Licheng Jiao ◽

...

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Large Scale ◽

Detection Methods ◽

Gaussian Kernel ◽

Remote Sensing Images ◽

Computational Overhead ◽

Comparable Performance ◽

Bounding Boxes ◽

Oriented Object

Object detection in remote sensing images has been widely used in military and civilian fields and is a challenging task due to the complex background, large-scale variation, and dense arrangement in arbitrary orientations of objects. In addition, existing object detection methods rely on the increasingly deeper network, which increases a lot of computational overhead and parameters, and is unfavorable to deployment on the edge devices. In this paper, we proposed a lightweight keypoint-based oriented object detector for remote sensing images. First, we propose a semantic transfer block (STB) when merging shallow and deep features, which reduces noise and restores the semantic information. Then, the proposed adaptive Gaussian kernel (AGK) is adapted to objects of different scales, and further improves detection performance. Finally, we propose the distillation loss associated with object detection to obtain a lightweight student network. Experiments on the HRSC2016 and UCAS-AOD datasets show that the proposed method adapts to different scale objects, obtains accurate bounding boxes, and reduces the influence of complex backgrounds. The comparison with mainstream methods proves that our method has comparable performance under lightweight.

Download Full-text

A Comparative Analysis of Object Detection Metrics with a Companion Open-Source Toolkit

Electronics ◽

10.3390/electronics10030279 ◽

2021 ◽

Vol 10 (3) ◽

pp. 279

Author(s):

Rafael Padilla ◽

Wesley L. Passos ◽

Thadeu L. B. Dias ◽

Sergio L. Netto ◽

Eduardo A. B. da Silva

Keyword(s):

Comparative Analysis ◽

Object Detection ◽

Open Source ◽

Performance Metrics ◽

Ground Truth ◽

Detection Methods ◽

Detection Algorithms ◽

Spatio Temporal ◽

Bounding Boxes ◽

Temporal Overlap

Recent outstanding results of supervised object detection in competitions and challenges are often associated with specific metrics and datasets. The evaluation of such methods applied in different contexts have increased the demand for annotated datasets. Annotation tools represent the location and size of objects in distinct formats, leading to a lack of consensus on the representation. Such a scenario often complicates the comparison of object detection methods. This work alleviates this problem along the following lines: (i) It provides an overview of the most relevant evaluation methods used in object detection competitions, highlighting their peculiarities, differences, and advantages; (ii) it examines the most used annotation formats, showing how different implementations may influence the assessment results; and (iii) it provides a novel open-source toolkit supporting different annotation formats and 15 performance metrics, making it easy for researchers to evaluate the performance of their detection algorithms in most known datasets. In addition, this work proposes a new metric, also included in the toolkit, for evaluating object detection in videos that is based on the spatio-temporal overlap between the ground-truth and detected bounding boxes.

Download Full-text

Mask OBB: A Semantic Attention-Based Mask Oriented Bounding Box Representation for Multi-Category Object Detection in Aerial Images

Remote Sensing ◽

10.3390/rs11242930 ◽

2019 ◽

Vol 11 (24) ◽

pp. 2930 ◽

Cited By ~ 6

Author(s):

Jinwang Wang ◽

Jian Ding ◽

Haowen Guo ◽

Wensheng Cheng ◽

Ting Pan ◽

...

Keyword(s):

Object Detection ◽

Empirical Studies ◽

Classification Problem ◽

Aerial Images ◽

Detection Methods ◽

Detection Accuracy ◽

Lateral Connection ◽

Feature Pyramid ◽

Bounding Boxes ◽

Definition Of

Object detection in aerial images is a fundamental yet challenging task in remote sensing field. As most objects in aerial images are in arbitrary orientations, oriented bounding boxes (OBBs) have a great superiority compared with traditional horizontal bounding boxes (HBBs). However, the regression-based OBB detection methods always suffer from ambiguity in the definition of learning targets, which will decrease the detection accuracy. In this paper, we provide a comprehensive analysis of OBB representations and cast the OBB regression as a pixel-level classification problem, which can largely eliminate the ambiguity. The predicted masks are subsequently used to generate OBBs. To handle huge scale changes of objects in aerial images, an Inception Lateral Connection Network (ILCN) is utilized to enhance the Feature Pyramid Network (FPN). Furthermore, a Semantic Attention Network (SAN) is adopted to provide the semantic feature, which can help distinguish the object of interest from the cluttered background effectively. Empirical studies show that the entire method is simple yet efficient. Experimental results on two widely used datasets, i.e., DOTA and HRSC2016, demonstrate that the proposed method outperforms state-of-the-art methods.

Download Full-text

A Slimmer Network with Polymorphic and Group Attention Modules for More Efficient Object Detection in Aerial Images

Remote Sensing ◽

10.3390/rs12223750 ◽

2020 ◽

Vol 12 (22) ◽

pp. 3750

Author(s):

Wei Guo ◽

Weihong Li ◽

Zhenghao Li ◽

Weiguo Gong ◽

Jinkai Cui ◽

...

Keyword(s):

Object Detection ◽

Detection Efficiency ◽

Aerial Images ◽

Aerial Image ◽

Detection Methods ◽

Detection Accuracy ◽

Practical Applications ◽

Multi Scale ◽

High Detection Efficiency ◽

Object Features

Object detection is one of the core technologies in aerial image processing and analysis. Although existing aerial image object detection methods based on deep learning have made some progress, there are still some problems remained: (1) Most existing methods fail to simultaneously consider multi-scale and multi-shape object characteristics in aerial images, which may lead to some missing or false detections; (2) high precision detection generally requires a large and complex network structure, which usually makes it difficult to achieve the high detection efficiency and deploy the network on resource-constrained devices for practical applications. To solve these problems, we propose a slimmer network for more efficient object detection in aerial images. Firstly, we design a polymorphic module (PM) for simultaneously learning the multi-scale and multi-shape object features, so as to better detect the hugely different objects in aerial images. Then, we design a group attention module (GAM) for better utilizing the diversiform concatenation features in the network. By designing multiple detection headers with adaptive anchors and the above-mentioned two modules, we propose a one-stage network called PG-YOLO for realizing the higher detection accuracy. Based on the proposed network, we further propose a more efficient channel pruning method, which can slim the network parameters from 63.7 million (M) to 3.3M that decreases the parameter size by 94.8%, so it can significantly improve the detection efficiency for real-time detection. Finally, we execute the comparative experiments on three public aerial datasets, and the experimental results show that the proposed method outperforms the state-of-the-art methods.

Download Full-text

MTI-YOLO: A Light-Weight and Real-Time Deep Neural Network for Insulator Detection in Complex Aerial Images

Energies ◽

10.3390/en14051426 ◽

2021 ◽

Vol 14 (5) ◽

pp. 1426

Author(s):

Chuanyang Liu ◽

Yiquan Wu ◽

Jingjing Liu ◽

Jiaming Han

Keyword(s):

Feature Detection ◽

Feature Fusion ◽

Memory Storage ◽

Aerial Images ◽

Detection Accuracy ◽

Composite Insulator ◽

Running Time ◽

Scale Feature ◽

Multi Scale ◽

Good Trade

Insulator detection is an essential task for the safety and reliable operation of intelligent grids. Owing to insulator images including various background interferences, most traditional image-processing methods cannot achieve good performance. Some You Only Look Once (YOLO) networks are employed to meet the requirements of actual applications for insulator detection. To achieve a good trade-off among accuracy, running time, and memory storage, this work proposes the modified YOLO-tiny for insulator (MTI-YOLO) network for insulator detection in complex aerial images. First of all, composite insulator images are collected in common scenes and the “CCIN_detection” (Chinese Composite INsulator) dataset is constructed. Secondly, to improve the detection accuracy of different sizes of insulator, multi-scale feature detection headers, a structure of multi-scale feature fusion, and the spatial pyramid pooling (SPP) model are adopted to the MTI-YOLO network. Finally, the proposed MTI-YOLO network and the compared networks are trained and tested on the “CCIN_detection” dataset. The average precision (AP) of our proposed network is 17% and 9% higher than YOLO-tiny and YOLO-v2. Compared with YOLO-tiny and YOLO-v2, the running time of the proposed network is slightly higher. Furthermore, the memory usage of the proposed network is 25.6% and 38.9% lower than YOLO-v2 and YOLO-v3, respectively. Experimental results and analysis validate that the proposed network achieves good performance in both complex backgrounds and bright illumination conditions.

Download Full-text

MM-FPN: Multi-path and Multi-scale Feature Pyramid Network for Object Detection

10.1109/isceic53685.2021.00072 ◽

2021 ◽

Author(s):

Sheng Dong ◽

Jiaxin Zhang ◽

Zehui Qu

Keyword(s):

Object Detection ◽

Scale Feature ◽

Multi Scale ◽

Feature Pyramid

Download Full-text