A Slimmer Network with Polymorphic and Group Attention Modules for More Efficient Object Detection in Aerial Images

Wei Guo; Weihong Li; Zhenghao Li; Weiguo Gong; Jinkai Cui; Xinran Wang

doi:10.3390/rs12223750

A Slimmer Network with Polymorphic and Group Attention Modules for More Efficient Object Detection in Aerial Images

Remote Sensing ◽

10.3390/rs12223750 ◽

2020 ◽

Vol 12 (22) ◽

pp. 3750

Author(s):

Wei Guo ◽

Weihong Li ◽

Zhenghao Li ◽

Weiguo Gong ◽

Jinkai Cui ◽

...

Keyword(s):

Object Detection ◽

Detection Efficiency ◽

Aerial Images ◽

Aerial Image ◽

Detection Methods ◽

Detection Accuracy ◽

Practical Applications ◽

Multi Scale ◽

High Detection Efficiency ◽

Object Features

Object detection is one of the core technologies in aerial image processing and analysis. Although existing aerial image object detection methods based on deep learning have made some progress, there are still some problems remained: (1) Most existing methods fail to simultaneously consider multi-scale and multi-shape object characteristics in aerial images, which may lead to some missing or false detections; (2) high precision detection generally requires a large and complex network structure, which usually makes it difficult to achieve the high detection efficiency and deploy the network on resource-constrained devices for practical applications. To solve these problems, we propose a slimmer network for more efficient object detection in aerial images. Firstly, we design a polymorphic module (PM) for simultaneously learning the multi-scale and multi-shape object features, so as to better detect the hugely different objects in aerial images. Then, we design a group attention module (GAM) for better utilizing the diversiform concatenation features in the network. By designing multiple detection headers with adaptive anchors and the above-mentioned two modules, we propose a one-stage network called PG-YOLO for realizing the higher detection accuracy. Based on the proposed network, we further propose a more efficient channel pruning method, which can slim the network parameters from 63.7 million (M) to 3.3M that decreases the parameter size by 94.8%, so it can significantly improve the detection efficiency for real-time detection. Finally, we execute the comparative experiments on three public aerial datasets, and the experimental results show that the proposed method outperforms the state-of-the-art methods.

Download Full-text

Multiple-Oriented and Small Object Detection with Convolutional Neural Networks for Aerial Image

Remote Sensing ◽

10.3390/rs11182176 ◽

2019 ◽

Vol 11 (18) ◽

pp. 2176 ◽

Cited By ~ 3

Author(s):

Chen ◽

Zhong ◽

Tan

Keyword(s):

Neural Networks ◽

Object Detection ◽

Convolutional Neural Networks ◽

Aerial Images ◽

Superior Performance ◽

Aerial Image ◽

Detection Accuracy ◽

Small Object ◽

Data Set ◽

Orientation Information

Detecting objects in aerial images is a challenging task due to multiple orientations and relatively small size of the objects. Although many traditional detection models have demonstrated an acceptable performance by using the imagery pyramid and multiple templates in a sliding-window manner, such techniques are inefficient and costly. Recently, convolutional neural networks (CNNs) have successfully been used for object detection, and they have demonstrated considerably superior performance than that of traditional detection methods; however, this success has not been expanded to aerial images. To overcome such problems, we propose a detection model based on two CNNs. One of the CNNs is designed to propose many object-like regions that are generated from the feature maps of multi scales and hierarchies with the orientation information. Based on such a design, the positioning of small size objects becomes more accurate, and the generated regions with orientation information are more suitable for the objects arranged with arbitrary orientations. Furthermore, another CNN is designed for object recognition; it first extracts the features of each generated region and subsequently makes the final decisions. The results of the extensive experiments performed on the vehicle detection in aerial imagery (VEDAI) and overhead imagery research data set (OIRDS) datasets indicate that the proposed model performs well in terms of not only the detection accuracy but also the detection speed.

Download Full-text

Multi-Scale Feature Integrated Attention-Based Rotation Network for Object Detection in VHR Aerial Images

Sensors ◽

10.3390/s20061686 ◽

2020 ◽

Vol 20 (6) ◽

pp. 1686 ◽

Cited By ~ 3

Author(s):

Feng Yang ◽

Wentong Li ◽

Haiwei Hu ◽

Wanyi Li ◽

Peng Wang

Keyword(s):

Object Detection ◽

Large Scale ◽

Ground Truth ◽

Classification Performance ◽

Aerial Images ◽

Detection Methods ◽

Robust Detection ◽

Scale Feature ◽

Multi Scale ◽

Bounding Boxes

Accurate and robust detection of multi-class objects in very high resolution (VHR) aerial images has been playing a significant role in many real-world applications. The traditional detection methods have made remarkable progresses with horizontal bounding boxes (HBBs) due to CNNs. However, HBB detection methods still exhibit limitations including the missed detection and the redundant detection regions, especially for densely-distributed and strip-like objects. Besides, large scale variations and diverse background also bring in many challenges. Aiming to address these problems, an effective region-based object detection framework named Multi-scale Feature Integration Attention Rotation Network (MFIAR-Net) is proposed for aerial images with oriented bounding boxes (OBBs), which promotes the integration of the inherent multi-scale pyramid features to generate a discriminative feature map. Meanwhile, the double-path feature attention network supervised by the mask information of ground truth is introduced to guide the network to focus on object regions and suppress the irrelevant noise. To boost the rotation regression and classification performance, we present a robust Rotation Detection Network, which can generate efficient OBB representation. Extensive experiments and comprehensive evaluations on two publicly available datasets demonstrate the effectiveness of the proposed framework.

Download Full-text

A Single Shot Framework with Multi-Scale Feature Fusion for Geospatial Object Detection

Remote Sensing ◽

10.3390/rs11050594 ◽

2019 ◽

Vol 11 (5) ◽

pp. 594 ◽

Cited By ~ 11

Author(s):

Shuo Zhuang ◽

Ping Wang ◽

Boran Jiang ◽

Gang Wang ◽

Cong Wang

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Large Scale ◽

Feature Fusion ◽

Aerial Images ◽

Detection Methods ◽

Single Shot ◽

Feature Maps ◽

Scale Feature ◽

Multi Scale

With the rapid advances in remote-sensing technologies and the larger number of satellite images, fast and effective object detection plays an important role in understanding and analyzing image information, which could be further applied to civilian and military fields. Recently object detection methods with region-based convolutional neural network have shown excellent performance. However, these two-stage methods contain region proposal generation and object detection procedures, resulting in low computation speed. Because of the expensive manual costs, the quantity of well-annotated aerial images is scarce, which also limits the progress of geospatial object detection in remote sensing. In this paper, on the one hand, we construct and release a large-scale remote-sensing dataset for geospatial object detection (RSD-GOD) that consists of 5 different categories with 18,187 annotated images and 40,990 instances. On the other hand, we design a single shot detection framework with multi-scale feature fusion. The feature maps from different layers are fused together through the up-sampling and concatenation blocks to predict the detection results. High-level features with semantic information and low-level features with fine details are fully explored for detection tasks, especially for small objects. Meanwhile, a soft non-maximum suppression strategy is put into practice to select the final detection results. Extensive experiments have been conducted on two datasets to evaluate the designed network. Results show that the proposed approach achieves a good detection performance and obtains the mean average precision value of 89.0% on a newly constructed RSD-GOD dataset and 83.8% on the Northwestern Polytechnical University very high spatial resolution-10 (NWPU VHR-10) dataset at 18 frames per second (FPS) on a NVIDIA GTX-1080Ti GPU.

Download Full-text

Multi-Scale Geospatial Object Detection Based on Shallow-Deep Feature Extraction

Remote Sensing ◽

10.3390/rs11212525 ◽

2019 ◽

Vol 11 (21) ◽

pp. 2525 ◽

Cited By ~ 5

Author(s):

Dalal AL-Alimi ◽

Yuxiang Shao ◽

Ruyi Feng ◽

Mohammed A. A. Al-qaness ◽

Mohamed Abd Elaziz ◽

...

Keyword(s):

Feature Extraction ◽

Object Detection ◽

Extraction Methods ◽

Aerial Images ◽

Detection Accuracy ◽

Feature Maps ◽

Deep Convolutional Neural Networks ◽

Multi Scale ◽

Deep Feature ◽

Deep Feature Extraction

Multi-class detection in remote sensing images (RSIs) has garnered wide attention and introduced several service applications in many fields, including civil and military fields. However, several reasons make detection from aerial images very challenging and more difficult than nature scene images: Objects do not have a fixed size, often appear at very various scales and sometimes appear in dense groups, like vehicles and storage tanks, and have different surroundings or background areas. Furthermore, all of this makes the manual annotation of objects very complex and costly. The powerful effect of the feature extraction methods on object detection and the successes of deep convolutional neural networks (CNN) extract deep features more than traditional methods. This study introduced a novel network structure and designed a unique feature extraction which employs squeeze and excitation network (SENet) and residual network (ResNet) to obtain feature maps, named a shallow-deep feature extraction (SDFE), that improves the resolution and the localization at the same time. Furthermore, this novel model reduces the loss of dense groups and small objects, and provides higher and more stable detection accuracy which is not significantly affected by changing the value of the threshold of the intersection over union (IoU) and overcomes the difficulties of RSIs. Moreover, this study introduced strong evidence about the factors that affect the detection of RSIs. The proposed shallow-deep and multi-scale (SD-MS) method outperforms other approaches for the given ten classes of the NWPU VHR-10 dataset.

Download Full-text

Mask OBB: A Semantic Attention-Based Mask Oriented Bounding Box Representation for Multi-Category Object Detection in Aerial Images

Remote Sensing ◽

10.3390/rs11242930 ◽

2019 ◽

Vol 11 (24) ◽

pp. 2930 ◽

Cited By ~ 6

Author(s):

Jinwang Wang ◽

Jian Ding ◽

Haowen Guo ◽

Wensheng Cheng ◽

Ting Pan ◽

...

Keyword(s):

Object Detection ◽

Empirical Studies ◽

Classification Problem ◽

Aerial Images ◽

Detection Methods ◽

Detection Accuracy ◽

Lateral Connection ◽

Feature Pyramid ◽

Bounding Boxes ◽

Definition Of

Object detection in aerial images is a fundamental yet challenging task in remote sensing field. As most objects in aerial images are in arbitrary orientations, oriented bounding boxes (OBBs) have a great superiority compared with traditional horizontal bounding boxes (HBBs). However, the regression-based OBB detection methods always suffer from ambiguity in the definition of learning targets, which will decrease the detection accuracy. In this paper, we provide a comprehensive analysis of OBB representations and cast the OBB regression as a pixel-level classification problem, which can largely eliminate the ambiguity. The predicted masks are subsequently used to generate OBBs. To handle huge scale changes of objects in aerial images, an Inception Lateral Connection Network (ILCN) is utilized to enhance the Feature Pyramid Network (FPN). Furthermore, a Semantic Attention Network (SAN) is adopted to provide the semantic feature, which can help distinguish the object of interest from the cluttered background effectively. Empirical studies show that the entire method is simple yet efficient. Experimental results on two widely used datasets, i.e., DOTA and HRSC2016, demonstrate that the proposed method outperforms state-of-the-art methods.

Download Full-text

A Survey on Deep Learning Based Methods and Datasets for Monocular 3D Object Detection

Electronics ◽

10.3390/electronics10040517 ◽

2021 ◽

Vol 10 (4) ◽

pp. 517

Author(s):

Seong-heum Kim ◽

Youngbae Hwang

Keyword(s):

Deep Learning ◽

Object Detection ◽

Low Cost ◽

Detection Methods ◽

Future Research ◽

3D Object ◽

Practical Applications ◽

Depth Sensors ◽

Significant Research ◽

3D Object Detection

Owing to recent advancements in deep learning methods and relevant databases, it is becoming increasingly easier to recognize 3D objects using only RGB images from single viewpoints. This study investigates the major breakthroughs and current progress in deep learning-based monocular 3D object detection. For relatively low-cost data acquisition systems without depth sensors or cameras at multiple viewpoints, we first consider existing databases with 2D RGB photos and their relevant attributes. Based on this simple sensor modality for practical applications, deep learning-based monocular 3D object detection methods that overcome significant research challenges are categorized and summarized. We present the key concepts and detailed descriptions of representative single-stage and multiple-stage detection solutions. In addition, we discuss the effectiveness of the detection models on their baseline benchmarks. Finally, we explore several directions for future research on monocular 3D object detection.

Download Full-text

A Multi-Task Network with Distance–Mask–Boundary Consistency Constraints for Building Extraction from Aerial Images

Remote Sensing ◽

10.3390/rs13142656 ◽

2021 ◽

Vol 13 (14) ◽

pp. 2656

Author(s):

Furong Shi ◽

Tong Zhang

Keyword(s):

Distance Estimation ◽

Image Data ◽

Learning Technologies ◽

Aerial Images ◽

Superior Performance ◽

Aerial Image ◽

Great Success ◽

Building Extraction ◽

Shape Information ◽

Multi Scale

Deep-learning technologies, especially convolutional neural networks (CNNs), have achieved great success in building extraction from areal images. However, shape details are often lost during the down-sampling process, which results in discontinuous segmentation or inaccurate segmentation boundary. In order to compensate for the loss of shape information, two shape-related auxiliary tasks (i.e., boundary prediction and distance estimation) were jointly learned with building segmentation task in our proposed network. Meanwhile, two consistency constraint losses were designed based on the multi-task network to exploit the duality between the mask prediction and two shape-related information predictions. Specifically, an atrous spatial pyramid pooling (ASPP) module was appended to the top of the encoder of a U-shaped network to obtain multi-scale features. Based on the multi-scale features, one regression loss and two classification losses were used for predicting the distance-transform map, segmentation, and boundary. Two inter-task consistency-loss functions were constructed to ensure the consistency between distance maps and masks, and the consistency between masks and boundary maps. Experimental results on three public aerial image data sets showed that our method achieved superior performance over the recent state-of-the-art models.

Download Full-text

A new multi-scale backbone network for object detection based on asymmetric convolutions

Science Progress ◽

10.1177/00368504211011343 ◽

2021 ◽

Vol 104 (2) ◽

pp. 003685042110113

Author(s):

Xianghua Ma ◽

Zhenkun Yang

Keyword(s):

Object Detection ◽

Image Features ◽

Detection Accuracy ◽

Mobile Platforms ◽

Multi Scale ◽

Backbone Network ◽

Aspect Ratios ◽

Pascal Voc ◽

Scale Characteristics ◽

Detection Speed

Real-time object detection on mobile platforms is a crucial but challenging computer vision task. However, it is widely recognized that although the lightweight object detectors have a high detection speed, the detection accuracy is relatively low. In order to improve detecting accuracy, it is beneficial to extract complete multi-scale image features in visual cognitive tasks. Asymmetric convolutions have a useful quality, that is, they have different aspect ratios, which can be used to exact image features of objects, especially objects with multi-scale characteristics. In this paper, we exploit three different asymmetric convolutions in parallel and propose a new multi-scale asymmetric convolution unit, namely MAC block to enhance multi-scale representation ability of CNNs. In addition, MAC block can adaptively merge the features with different scales by allocating learnable weighted parameters to three different asymmetric convolution branches. The proposed MAC blocks can be inserted into the state-of-the-art backbone such as ResNet-50 to form a new multi-scale backbone network of object detectors. To evaluate the performance of MAC block, we conduct experiments on CIFAR-100, PASCAL VOC 2007, PASCAL VOC 2012 and MS COCO 2014 datasets. Experimental results show that the detection precision can be greatly improved while a fast detection speed is guaranteed as well.

Download Full-text

MTI-YOLO: A Light-Weight and Real-Time Deep Neural Network for Insulator Detection in Complex Aerial Images

Energies ◽

10.3390/en14051426 ◽

2021 ◽

Vol 14 (5) ◽

pp. 1426

Author(s):

Chuanyang Liu ◽

Yiquan Wu ◽

Jingjing Liu ◽

Jiaming Han

Keyword(s):

Feature Detection ◽

Feature Fusion ◽

Memory Storage ◽

Aerial Images ◽

Detection Accuracy ◽

Composite Insulator ◽

Running Time ◽

Scale Feature ◽

Multi Scale ◽

Good Trade

Insulator detection is an essential task for the safety and reliable operation of intelligent grids. Owing to insulator images including various background interferences, most traditional image-processing methods cannot achieve good performance. Some You Only Look Once (YOLO) networks are employed to meet the requirements of actual applications for insulator detection. To achieve a good trade-off among accuracy, running time, and memory storage, this work proposes the modified YOLO-tiny for insulator (MTI-YOLO) network for insulator detection in complex aerial images. First of all, composite insulator images are collected in common scenes and the “CCIN_detection” (Chinese Composite INsulator) dataset is constructed. Secondly, to improve the detection accuracy of different sizes of insulator, multi-scale feature detection headers, a structure of multi-scale feature fusion, and the spatial pyramid pooling (SPP) model are adopted to the MTI-YOLO network. Finally, the proposed MTI-YOLO network and the compared networks are trained and tested on the “CCIN_detection” dataset. The average precision (AP) of our proposed network is 17% and 9% higher than YOLO-tiny and YOLO-v2. Compared with YOLO-tiny and YOLO-v2, the running time of the proposed network is slightly higher. Furthermore, the memory usage of the proposed network is 25.6% and 38.9% lower than YOLO-v2 and YOLO-v3, respectively. Experimental results and analysis validate that the proposed network achieves good performance in both complex backgrounds and bright illumination conditions.

Download Full-text

Augmented Reality and Machine Learning Incorporation Using YOLOv3 and ARKit

Applied Sciences ◽

10.3390/app11136006 ◽

2021 ◽

Vol 11 (13) ◽

pp. 6006

Author(s):

Huy Le ◽

Minh Nguyen ◽

Wei Qi Yan ◽

Hoa Nguyen

Keyword(s):

Machine Learning ◽

Augmented Reality ◽

Object Detection ◽

Feature Detection ◽

Detection Methods ◽

Detection Accuracy ◽

Data Annotation ◽

Machine Learning Model ◽

Potential Benefits ◽

Feature Detection And Tracking

Augmented reality is one of the fastest growing fields, receiving increased funding for the last few years as people realise the potential benefits of rendering virtual information in the real world. Most of today’s augmented reality marker-based applications use local feature detection and tracking techniques. The disadvantage of applying these techniques is that the markers must be modified to match the unique classified algorithms or they suffer from low detection accuracy. Machine learning is an ideal solution to overcome the current drawbacks of image processing in augmented reality applications. However, traditional data annotation requires extensive time and labour, as it is usually done manually. This study incorporates machine learning to detect and track augmented reality marker targets in an application using deep neural networks. We firstly implement the auto-generated dataset tool, which is used for the machine learning dataset preparation. The final iOS prototype application incorporates object detection, object tracking and augmented reality. The machine learning model is trained to recognise the differences between targets using one of YOLO’s most well-known object detection methods. The final product makes use of a valuable toolkit for developing augmented reality applications called ARKit.

Download Full-text