Capsule Networks for Object Detection in UAV Imagery

Mohamed Lamine Mekhalfi; Mesay Belete Bejiga; Davide Soresina; Farid Melgani; Begüm Demir

doi:10.3390/rs11141694

Capsule Networks for Object Detection in UAV Imagery

Remote Sensing ◽

10.3390/rs11141694 ◽

2019 ◽

Vol 11 (14) ◽

pp. 1694 ◽

Cited By ~ 2

Author(s):

Mohamed Lamine Mekhalfi ◽

Mesay Belete Bejiga ◽

Davide Soresina ◽

Farid Melgani ◽

Begüm Demir

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Relative Position ◽

State Of The Art ◽

Semantic Content ◽

Computational Time ◽

Complex Object ◽

Aerial Vehicle ◽

High Level ◽

Crowded Scenes

Recent advances in Convolutional Neural Networks (CNNs) have attracted great attention in remote sensing due to their high capability to model high-level semantic content of Remote Sensing (RS) images. However, CNNs do not explicitly retain the relative position of objects in an image and, thus, the effectiveness of the obtained features is limited in the framework of the complex object detection problems. To address this problem, in this paper we introduce Capsule Networks (CapsNets) for object detection in Unmanned Aerial Vehicle-acquired images. Unlike CNNs, CapsNets extract and exploit the information content about objects’ relative position across several layers, which enables parsing crowded scenes with overlapping objects. Experimental results obtained on two datasets for car and solar panel detection problems show that CapsNets provide similar object detection accuracies when compared to state-of-the-art deep models with significantly reduced computational time. This is due to the fact that CapsNets emphasize dynamic routine instead of the depth.

Download Full-text

A Survey of Graphical Page Object Detection with Deep Neural Networks

10.20944/preprints202104.0739.v1 ◽

2021 ◽

Author(s):

Jwalin Bhatt ◽

Khurram Azeem Hashmi ◽

Muhammad Zeshan Afzal ◽

Didier Stricker

Keyword(s):

Deep Learning ◽

Object Detection ◽

Conceptual Understanding ◽

Deep Neural Networks ◽

State Of The Art ◽

Learning Approaches ◽

Document Images ◽

Essential Information ◽

Current State ◽

High Level

In any document, graphical elements like tables, figures, and formulas contain essential information. The processing and interpretation of such information require specialized algorithms. Off-the-shelf OCR components cannot process this information reliably. Therefore, an essential step in document analysis pipelines is to detect these graphical components. It leads to a high-level conceptual understanding of the documents that makes digitization of documents viable. Since the advent of deep learning, the performance of deep learning-based object detection has improved many folds. In this work, we outline and summarize the deep learning approaches for detecting graphical page objects in the document images. Therefore, we discuss the most relevant deep learning-based approaches and state-of-the-art graphical page object detection in document images. This work provides a comprehensive understanding of the current state-of-the-art and related challenges. Furthermore, we discuss leading datasets along with the quantitative evaluation. Moreover, it discusses briefly the promising directions that can be utilized for further improvements.

Download Full-text

A Precise and GNSS-Free Landing System on Moving Platforms for Rotary-Wing UAVs

Sensors ◽

10.3390/s19040886 ◽

2019 ◽

Vol 19 (4) ◽

pp. 886 ◽

Cited By ~ 3

Author(s):

Francisco Alarcón ◽

Manuel García ◽

Ivan Maza ◽

Antidio Viguria ◽

Aníbal Ollero

Keyword(s):

Relative Position ◽

Unmanned Helicopter ◽

Landing Position ◽

Moving Platforms ◽

Landing Platform ◽

Aerial Vehicle ◽

The Stability ◽

Novel Concept ◽

High Level ◽

Rotary Wing

This article presents a precise landing system that allows rotary-wing UAVs to approach and land safely on moving platforms, without using GNSS at any stage of the landing maneuver, and with a centimeter level accuracy and high level of robustness. This system implements a novel concept where the relative position and velocity between the aerial vehicle and the landing platform are calculated from the angles of a cable that physically connects the UAV and the landing platform. The use of a cable also incorporates a number of extra benefits, such as increasing the precision in the control of the UAV altitude. It also facilitates centering the UAV right on top of the expected landing position, and increases the stability of the UAV just after contacting the landing platform. The system was implemented in an unmanned helicopter and many tests were carried out under different conditions for measuring the accuracy and the robustness of the proposed solution. Results show that the developed system allowed landing with centimeter accuracy by using only local sensors and that the helicopter could follow the landing platform in multiple trajectories at different velocities.

Download Full-text

Deformable Part Models for Complex Object Detection in Remote Sensing Imagery

Proceedings of the 7th ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data - BigSpatial 2018 ◽

10.1145/3282834.3282843 ◽

2018 ◽

Author(s):

Nathan Pool ◽

Ranga Raju Vatsavai

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Complex Object ◽

Remote Sensing Imagery ◽

Deformable Part Models

Download Full-text

AF-SSD: An Accurate and Fast Single Shot Detector for High Spatial Remote Sensing Imagery

Sensors ◽

10.3390/s20226530 ◽

2020 ◽

Vol 20 (22) ◽

pp. 6530

Author(s):

Ruihong Yin ◽

Wei Zhao ◽

Xudong Fan ◽

Yongfeng Yin

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Information Acquisition ◽

Semantic Information ◽

Single Shot ◽

Low Contrast ◽

Remote Sensing Imagery ◽

Cascade Structure ◽

Small Targets ◽

High Level

There are a large number of studies on geospatial object detection. However, many existing methods only focus on either accuracy or speed. Methods with both fast speed and high accuracy are of great importance in some scenes, like search and rescue, and military information acquisition. In remote sensing images, there are some targets that are small and have few textures and low contrast compared with the background, which impose challenges on object detection. In this paper, we propose an accurate and fast single shot detector (AF-SSD) for high spatial remote sensing imagery to solve these problems. Firstly, we design a lightweight backbone to reduce the number of trainable parameters of the network. In this lightweight backbone, we also use some wide and deep convolutional blocks to extract more semantic information and keep the high detection precision. Secondly, a novel encoding–decoding module is employed to detect small targets accurately. With up-sampling and summation operations, the encoding–decoding module can add strong high-level semantic information to low-level features. Thirdly, we design a cascade structure with spatial and channel attention modules for targets with low contrast (named low-contrast targets) and few textures (named few-texture targets). The spatial attention module can extract long-range features for few-texture targets. By weighting each channel of a feature map, the channel attention module can guide the network to concentrate on easily identifiable features for low-contrast and few-texture targets. The experimental results on the NWPU VHR-10 dataset show that our proposed AF-SSD achieves superior detection performance: parameters 5.7 M, mAP 88.7%, and 0.035 s per image on average on an NVIDIA GTX-1080Ti GPU.

Download Full-text

IoU-Adaptive Deformable R-CNN: Make Full Use of IoU for Multi-Class Object Detection in Remote Sensing Imagery

Remote Sensing ◽

10.3390/rs11030286 ◽

2019 ◽

Vol 11 (3) ◽

pp. 286 ◽

Cited By ~ 24

Author(s):

Jiangqiao Yan ◽

Hongqi Wang ◽

Menglong Yan ◽

Wenhui Diao ◽

Xian Sun ◽

...

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Object Detection ◽

Convolutional Neural Network ◽

State Of The Art ◽

Ground Truth ◽

Detection Performance ◽

Candidate Region ◽

Detection Accuracy ◽

Remote Sensing Images

Recently, methods based on Faster region-based convolutional neural network (R-CNN)have been popular in multi-class object detection in remote sensing images due to their outstandingdetection performance. The methods generally propose candidate region of interests (ROIs) througha region propose network (RPN), and the regions with high enough intersection-over-union (IoU)values against ground truth are treated as positive samples for training. In this paper, we find thatthe detection result of such methods is sensitive to the adaption of different IoU thresholds. Specially,detection performance of small objects is poor when choosing a normal higher threshold, while alower threshold will result in poor location accuracy caused by a large quantity of false positives.To address the above issues, we propose a novel IoU-Adaptive Deformable R-CNN framework formulti-class object detection. Specially, by analyzing the different roles that IoU can play in differentparts of the network, we propose an IoU-guided detection framework to reduce the loss of small objectinformation during training. Besides, the IoU-based weighted loss is designed, which can learn theIoU information of positive ROIs to improve the detection accuracy effectively. Finally, the class aspectratio constrained non-maximum suppression (CARC-NMS) is proposed, which further improves theprecision of the results. Extensive experiments validate the effectiveness of our approach and weachieve state-of-the-art detection performance on the DOTA dataset.

Download Full-text

Deformable Faster R-CNN with Aggregating Multi-Layer Features for Partially Occluded Object Detection in Optical Remote Sensing Images

Remote Sensing ◽

10.3390/rs10091470 ◽

2018 ◽

Vol 10 (9) ◽

pp. 1470 ◽

Cited By ~ 30

Author(s):

Yun Ren ◽

Changren Zhu ◽

Shunping Xiao

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Data Augmentation ◽

Optical Remote Sensing ◽

Remote Sensing Images ◽

Geometric Transformations ◽

Convolutional Networks ◽

Regular Sampling ◽

High Level ◽

Partially Occluded

The region-based convolutional networks have shown their remarkable ability for object detection in optical remote sensing images. However, the standard CNNs are inherently limited to model geometric transformations due to the fixed geometric structures in its building modules. To address this, we introduce a new module named deformable convolution that is integrated into the prevailing Faster R-CNN. By adding 2D offsets to the regular sampling grid in the standard convolution, it learns the augmenting spatial sampling locations in the modules from target tasks without additional supervision. In our work, a deformable Faster R-CNN is constructed by substituting the standard convolution layer with a deformable convolution layer in the last network stage. Besides, top-down and skip connections are adopted to produce a single high-level feature map of a fine resolution, on which the predictions are to be made. To make the model robust to occlusion, a simple yet effective data augmentation technique is proposed for training the convolutional neural network. Experimental results show that our deformable Faster R-CNN improves the mean average precision by a large margin on the SORSI and HRRS dataset.

Download Full-text

Real-Time Detection of Ground Objects Based on Unmanned Aerial Vehicle Remote Sensing with Deep Learning: Application in Excavator Detection for Pipeline Safety

Remote Sensing ◽

10.3390/rs12010182 ◽

2020 ◽

Vol 12 (1) ◽

pp. 182 ◽

Cited By ~ 10

Author(s):

Lingxuan Meng ◽

Zhixing Peng ◽

Ji Zhou ◽

Jirong Zhang ◽

Zhenyu Lu ◽

...

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

Object Detection ◽

Real Time ◽

Unmanned Aerial Vehicle ◽

Detection System ◽

Detection Model ◽

Aerial Vehicle ◽

Real Time Detection ◽

Pipeline Safety

Unmanned aerial vehicle (UAV) remote sensing and deep learning provide a practical approach to object detection. However, most of the current approaches for processing UAV remote-sensing data cannot carry out object detection in real time for emergencies, such as firefighting. This study proposes a new approach for integrating UAV remote sensing and deep learning for the real-time detection of ground objects. Excavators, which usually threaten pipeline safety, are selected as the target object. A widely used deep-learning algorithm, namely You Only Look Once V3, is first used to train the excavator detection model on a workstation and then deployed on an embedded board that is carried by a UAV. The recall rate of the trained excavator detection model is 99.4%, demonstrating that the trained model has a very high accuracy. Then, the UAV for an excavator detection system (UAV-ED) is further constructed for operational application. UAV-ED is composed of a UAV Control Module, a UAV Module, and a Warning Module. A UAV experiment with different scenarios was conducted to evaluate the performance of the UAV-ED. The whole process from the UAV observation of an excavator to the Warning Module (350 km away from the testing area) receiving the detection results only lasted about 1.15 s. Thus, the UAV-ED system has good performance and would benefit the management of pipeline safety.

Download Full-text

An Oil Well Dataset Derived from Satellite-Based Remote Sensing

Remote Sensing ◽

10.3390/rs13061132 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1132

Author(s):

Zhibao Wang ◽

Lu Bai ◽

Guangfu Song ◽

Jie Zhang ◽

Jinhua Tao ◽

...

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

Object Detection ◽

State Of The Art ◽

Google Earth ◽

Oil Wells ◽

Oil Well ◽

Optical Remote Sensing ◽

Learning Models ◽

Remote Sensing Images

Estimation of the number and geo-location of oil wells is important for policy holders considering their impact on energy resource planning. With the recent development in optical remote sensing, it is possible to identify oil wells from satellite images. Moreover, the recent advancement in deep learning frameworks for object detection in remote sensing makes it possible to automatically detect oil wells from remote sensing images. In this paper, we collected a dataset named Northeast Petroleum University–Oil Well Object Detection Version 1.0 (NEPU–OWOD V1.0) based on high-resolution remote sensing images from Google Earth Imagery. Our database includes 1192 oil wells in 432 images from Daqing City, which has the largest oilfield in China. In this study, we compared nine different state-of-the-art deep learning models based on algorithms for object detection from optical remote sensing images. Experimental results show that the state-of-the-art deep learning models achieve high precision on our collected dataset, which demonstrate the great potential for oil well detection in remote sensing.

Download Full-text

Semantic-Guided Attention Refinement Network for Salient Object Detection in Optical Remote Sensing Images

Remote Sensing ◽

10.3390/rs13112163 ◽

2021 ◽

Vol 13 (11) ◽

pp. 2163

Author(s):

Zhou Huang ◽

Huaixin Chen ◽

Biyuan Liu ◽

Zhixi Wang

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Network Architecture ◽

Salient Object Detection ◽

Optical Remote Sensing ◽

Salient Object ◽

Remote Sensing Images ◽

Multi Level ◽

Object Area ◽

High Level

Although remarkable progress has been made in salient object detection (SOD) in natural scene images (NSI), the SOD of optical remote sensing images (RSI) still faces significant challenges due to various spatial resolutions, cluttered backgrounds, and complex imaging conditions, mainly for two reasons: (1) accurate location of salient objects; and (2) subtle boundaries of salient objects. This paper explores the inherent properties of multi-level features to develop a novel semantic-guided attention refinement network (SARNet) for SOD of NSI. Specifically, the proposed semantic guided decoder (SGD) roughly but accurately locates the multi-scale object by aggregating multiple high-level features, and then this global semantic information guides the integration of subsequent features in a step-by-step feedback manner to make full use of deep multi-level features. Simultaneously, the proposed parallel attention fusion (PAF) module combines cross-level features and semantic-guided information to refine the object’s boundary and highlight the entire object area gradually. Finally, the proposed network architecture is trained through an end-to-end fully supervised model. Quantitative and qualitative evaluations on two public RSI datasets and additional NSI datasets across five metrics show that our SARNet is superior to 14 state-of-the-art (SOTA) methods without any post-processing.

Download Full-text

CBNet: A Novel Composite Backbone Network Architecture for Object Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6834 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11653-11660 ◽

Cited By ~ 3

Author(s):

Yudong Liu ◽

Yongtao Wang ◽

Siwei Wang ◽

Tingting Liang ◽

Qijie Zhao ◽

...

Keyword(s):

Object Detection ◽

Network Architecture ◽

State Of The Art ◽

Feature Maps ◽

Single Model ◽

Backbone Network ◽

Novel Strategy ◽

Composite Connections ◽

High Level ◽

Instance Segmentation

In existing CNN based detectors, the backbone network is a very important component for basic feature1 extraction, and the performance of the detectors highly depends on it. In this paper, we aim to achieve better detection performance by building a more powerful backbone from existing ones like ResNet and ResNeXt. Specifically, we propose a novel strategy for assembling multiple identical backbones by composite connections between the adjacent backbones, to form a more powerful backbone named Composite Backbone Network (CBNet). In this way, CBNet iteratively feeds the output features of the previous backbone, namely high-level features, as part of input features to the succeeding backbone, in a stage-by-stage fashion, and finally the feature maps of the last backbone (named Lead Backbone) are used for object detection. We show that CBNet can be very easily integrated into most state-of-the-art detectors and significantly improve their performances. For example, it boosts the mAP of FPN, Mask R-CNN and Cascade R-CNN on the COCO dataset by about 1.5 to 3.0 points. Moreover, experimental results show that the instance segmentation results can be improved as well. Specifically, by simply integrating the proposed CBNet into the baseline detector Cascade Mask R-CNN, we achieve a new state-of-the-art result on COCO dataset (mAP of 53.3) with a single model, which demonstrates great effectiveness of the proposed CBNet architecture. Code will be made available at https://github.com/PKUbahuangliuhe/CBNet.

Download Full-text