A brief review and challenges of object detection in optical remote sensing imagery

Shahid Karim; Ye Zhang; Shoulin Yin; Irfana Bibi; Ali Anwar Brohi

doi:10.3233/mgs-200330

A brief review and challenges of object detection in optical remote sensing imagery

Multiagent and Grid Systems ◽

10.3233/mgs-200330 ◽

2020 ◽

Vol 16 (3) ◽

pp. 227-243

Author(s):

Shahid Karim ◽

Ye Zhang ◽

Shoulin Yin ◽

Irfana Bibi ◽

Ali Anwar Brohi

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

Object Detection ◽

Detection Methods ◽

Optical Remote Sensing ◽

Single Shot ◽

Processing Efficiency ◽

Regression Problem ◽

Detection Algorithms ◽

Sensing Applications

Traditional object detection algorithms and strategies are difficult to meet the requirements of data processing efficiency, performance, speed and intelligence in object detection. Through the study and imitation of the cognitive ability of the brain, deep learning can analyze and process the data features. It has a strong ability of visualization and becomes the mainstream algorithm of current object detection applications. Firstly, we have discussed the developments of traditional object detection methods. Secondly, the frameworks of object detection (e.g. Region-based CNN (R-CNN), Spatial Pyramid Pooling Network (SPP-NET), Fast-RCNN and Faster-RCNN) which combine region proposals and convolutional neural networks (CNNs) are briefly characterized for optical remote sensing applications. You only look once (YOLO) algorithm is the representative of the object detection frameworks (e.g. YOLO and Single Shot MultiBox Detector (SSD)) which transforms the object detection into a regression problem. The limitations of remote sensing images and object detectors have been highlighted and discussed. The feasibility and limitations of these approaches will lead researchers to prudently select appropriate image enhancements. Finally, the problems of object detection algorithms in deep learning are summarized and the future recommendations are also conferred.

Download Full-text

Object detection on remote sensing images using deep learning: an improved single shot multibox detector method

Journal of Electronic Imaging ◽

10.1117/1.jei.28.3.033026 ◽

2019 ◽

Vol 28 (03) ◽

pp. 1 ◽

Cited By ~ 1

Author(s):

Kun Zhao ◽

Xiaoxi Ren ◽

Zhenzhen Kong ◽

Min Liu

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

Object Detection ◽

Single Shot ◽

Remote Sensing Images ◽

Detector Method

Download Full-text

Ship Detection Based on YOLOv2 for SAR Imagery

Remote Sensing ◽

10.3390/rs11070786 ◽

2019 ◽

Vol 11 (7) ◽

pp. 786 ◽

Cited By ~ 41

Author(s):

Yang-Lang Chang ◽

Amare Anagaw ◽

Lena Chang ◽

Yi Wang ◽

Chih-Yu Hsiao ◽

...

Keyword(s):

Deep Learning ◽

Object Detection ◽

Real Time ◽

Experimental Results ◽

Detection Methods ◽

Computational Time ◽

Detection Accuracy ◽

Single Shot ◽

Ship Detection ◽

Sar Imagery

Synthetic aperture radar (SAR) imagery has been used as a promising data source for monitoring maritime activities, and its application for oil and ship detection has been the focus of many previous research studies. Many object detection methods ranging from traditional to deep learning approaches have been proposed. However, majority of them are computationally intensive and have accuracy problems. The huge volume of the remote sensing data also brings a challenge for real time object detection. To mitigate this problem a high performance computing (HPC) method has been proposed to accelerate SAR imagery analysis, utilizing the GPU based computing methods. In this paper, we propose an enhanced GPU based deep learning method to detect ship from the SAR images. The You Only Look Once version 2 (YOLOv2) deep learning framework is proposed to model the architecture and training the model. YOLOv2 is a state-of-the-art real-time object detection system, which outperforms Faster Region-Based Convolutional Network (Faster R-CNN) and Single Shot Multibox Detector (SSD) methods. Additionally, in order to reduce computational time with relatively competitive detection accuracy, we develop a new architecture with less number of layers called YOLOv2-reduced. In the experiment, we use two types of datasets: A SAR ship detection dataset (SSDD) dataset and a Diversified SAR Ship Detection Dataset (DSSDD). These two datasets were used for training and testing purposes. YOLOv2 test results showed an increase in accuracy of ship detection as well as a noticeable reduction in computational time compared to Faster R-CNN. From the experimental results, the proposed YOLOv2 architecture achieves an accuracy of 90.05% and 89.13% on the SSDD and DSSDD datasets respectively. The proposed YOLOv2-reduced architecture has a similarly competent detection performance as YOLOv2, but with less computational time on a NVIDIA TITAN X GPU. The experimental results shows that the deep learning can make a big leap forward in improving the performance of SAR image ship detection.

Download Full-text

A Review of Remote Sensing Image Object Detection Algorithms Based on Deep Learning

2020 IEEE 5th International Conference on Image, Vision and Computing (ICIVC) ◽

10.1109/icivc50857.2020.9177453 ◽

2020 ◽

Author(s):

Zhe Zheng ◽

Lin Lei ◽

Hao Sun ◽

Gangyao Kuang

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

Object Detection ◽

Remote Sensing Image ◽

Detection Algorithms ◽

Image Object Detection ◽

Image Object

Download Full-text

VaryBlock: A Novel Approach for Object Detection in Remote Sensed Images

Sensors ◽

10.3390/s19235284 ◽

2019 ◽

Vol 19 (23) ◽

pp. 5284 ◽

Cited By ~ 3

Author(s):

Heng Zhang ◽

Jiayu Wu ◽

Yanli Liu ◽

Jia Yu

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Poor Performance ◽

Detection Methods ◽

Optical Remote Sensing ◽

Remote Sensing Images ◽

Novel Approach ◽

Challenging Tasks ◽

The Mean ◽

General Object

In recent years, the research on optical remote sensing images has received greater and greater attention. Object detection, as one of the most challenging tasks in the area of remote sensing, has been remarkably promoted by convolutional neural network (CNN)-based methods like You Only Look Once (YOLO) and Faster R-CNN. However, due to the complexity of backgrounds and the distinctive object distribution, directly applying these general object detection methods to the remote sensing object detection usually renders poor performance. To tackle this problem, a highly efficient and robust framework based on YOLO is proposed. We devise and integrate VaryBlock to the architecture which effectively offsets some of the information loss caused by downsampling. In addition, some techniques are utilized to facilitate the performance and to avoid overfitting. Experimental results show that our proposed method can enormously improve the mean average precision by a large margin on the NWPU VHR-10 dataset.

Download Full-text

An Oil Well Dataset Derived from Satellite-Based Remote Sensing

Remote Sensing ◽

10.3390/rs13061132 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1132

Author(s):

Zhibao Wang ◽

Lu Bai ◽

Guangfu Song ◽

Jie Zhang ◽

Jinhua Tao ◽

...

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

Object Detection ◽

State Of The Art ◽

Google Earth ◽

Oil Wells ◽

Oil Well ◽

Optical Remote Sensing ◽

Learning Models ◽

Remote Sensing Images

Estimation of the number and geo-location of oil wells is important for policy holders considering their impact on energy resource planning. With the recent development in optical remote sensing, it is possible to identify oil wells from satellite images. Moreover, the recent advancement in deep learning frameworks for object detection in remote sensing makes it possible to automatically detect oil wells from remote sensing images. In this paper, we collected a dataset named Northeast Petroleum University–Oil Well Object Detection Version 1.0 (NEPU–OWOD V1.0) based on high-resolution remote sensing images from Google Earth Imagery. Our database includes 1192 oil wells in 432 images from Daqing City, which has the largest oilfield in China. In this study, we compared nine different state-of-the-art deep learning models based on algorithms for object detection from optical remote sensing images. Experimental results show that the state-of-the-art deep learning models achieve high precision on our collected dataset, which demonstrate the great potential for oil well detection in remote sensing.

Download Full-text

RESOLUTION-AWARE NETWORK WITH ATTENTION MECHANISMS FOR REMOTE SENSING OBJECT DETECTION

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-2-2020-909-2020 ◽

2020 ◽

Vol V-2-2020 ◽

pp. 909-916

Author(s):

Z. Tian ◽

W. Wang ◽

B. Tian ◽

R. Zhan ◽

J. Zhang

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Attention Mechanism ◽

Image Resolution ◽

Uneven Distribution ◽

Detection Methods ◽

Optical Remote Sensing ◽

Feature Maps ◽

Optical Remote Sensing Image ◽

Comprehensive Evaluations

Abstract. Nowadays, deep-learning-based object detection methods are more and more broadly applied to the interpretation of optical remote sensing image. Although these methods can obtain promising results in general conditions, the designed networks usually ignore the characteristics of remote sensing images, such as large image resolution and uneven distribution of object location. In this paper, an effective detection method based on the convolutional neural network is proposed. First, in order to make the designed network more suitable for the image resolution, EfficientNet is incorporated into the detection framework as the backbone network. EfficientNet employs the compound scaling method to adjust the depth and width of the network, thereby meeting the needs of different resolutions of input images. Then, the attention mechanism is introduced into the proposed method to improve the extracted feature maps. The attention mechanism makes the network more focused on the object areas while reducing the influence of the background areas, so as to reduce the influence of uneven distribution. Comprehensive evaluations on a public object detection dataset demonstrate the effectiveness of the proposed method.

Download Full-text

Mapping Utility Poles in Aerial Orthoimages Using ATSS Deep Learning Method

Sensors ◽

10.3390/s20216070 ◽

2020 ◽

Vol 20 (21) ◽

pp. 6070

Author(s):

Matheus Gomes ◽

Jonathan Silva ◽

Diogo Gonçalves ◽

Pedro Zamboni ◽

Jader Perez ◽

...

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Urban Areas ◽

Aerial Images ◽

Detection Methods ◽

Ground Sample ◽

Sensing Applications ◽

Bounding Box ◽

Utility Poles ◽

Remote Sensing Applications

Mapping utility poles using side-view images acquired with car-mounted cameras is a time-consuming task, mainly in larger areas due to the need for street-by-street surveying. Aerial images cover larger areas and can be feasible alternatives although the detection and mapping of the utility poles in urban environments using top-view images is challenging. Thus, we propose the use of Adaptive Training Sample Selection (ATSS) for detecting utility poles in urban areas since it is a novel method and has not yet investigated in remote sensing applications. Here, we compared ATSS with Faster Region-based Convolutional Neural Networks (Faster R-CNN) and Focal Loss for Dense Object Detection (RetinaNet ), currently used in remote sensing applications, to assess the performance of the proposed methodology. We used 99,473 patches of 256 × 256 pixels with ground sample distance (GSD) of 10 cm. The patches were divided into training, validation and test datasets in approximate proportions of 60%, 20% and 20%, respectively. As the utility pole labels are point coordinates and the object detection methods require a bounding box, we assessed the influence of the bounding box size on the ATSS method by varying the dimensions from 30×30 to 70×70 pixels. For the proposal task, our findings show that ATSS is, on average, 5% more accurate than Faster R-CNN and RetinaNet. For a bounding box size of 40×40, we achieved Average Precision with intersection over union of 50% (AP50) of 0.913 for ATSS, 0.875 for Faster R-CNN and 0.874 for RetinaNet. Regarding the influence of the bounding box size on ATSS, our results indicate that the AP50 is about 6.5% higher for 60×60 compared to 30×30. For AP75, this margin reaches 23.1% in favor of the 60×60 bounding box size. In terms of computational costs, all the methods tested remain at the same level, with an average processing time around of 0.048 s per patch. Our findings show that ATSS outperforms other methodologies and is suitable for developing operation tools that can automatically detect and map utility poles.

Download Full-text

Efficient Object Detection Framework and Hardware Architecture for Remote Sensing Images

Remote Sensing ◽

10.3390/rs11202376 ◽

2019 ◽

Vol 11 (20) ◽

pp. 2376 ◽

Cited By ~ 4

Author(s):

Li ◽

Zhang ◽

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

Computational Complexity ◽

Object Detection ◽

Graphics Processing Units ◽

Feature Fusion ◽

Hardware Architecture ◽

Single Shot ◽

Remote Sensing Images ◽

Feature Maps

Object detection in remote sensing images on a satellite or aircraft has important economic and military significance and is full of challenges. This task requires not only accurate and efficient algorithms, but also highperformance and low power hardware architecture. However, existing deep learning based object detection algorithms require further optimization in small objects detection, reduced computational complexity and parameter size. Meanwhile, the generalpurpose processor cannot achieve better power efficiency, and the previous design of deep learning processor has still potential for mining parallelism. To address these issues, we propose an efficient contextbased feature fusion single shot multibox detector (CBFFSSD) framework, using lightweight MobileNet as the backbone network to reduce parameters and computational complexity, adding feature fusion units and detecting feature maps to enhance the recognition of small objects and improve detection accuracy. Based on the analysis and optimization of the calculation of each layer in the algorithm, we propose efficient hardware architecture of deep learning processor with multiple neural processing units (NPUs) composed of 2D processing elements (PEs), which can simultaneously calculate multiple output feature maps. The parallel architecture, hierarchical onchip storage organization, and the local register are used to achieve parallel processing, sharing and reuse of data, and make the calculation of processor more efficient. Extensive experiments and comprehensive evaluations on the public NWPU VHR10 dataset and comparisons with some stateoftheart approaches demonstrate the effectiveness and superiority of the proposed framework. Moreover, for evaluating the performance of proposed hardware architecture, we implement it on Xilinx XC7Z100 field programmable gate array (FPGA) and test on the proposed CBFFSSD and VGG16 models. Experimental results show that our processor are more power efficient than general purpose central processing units (CPUs) and graphics processing units (GPUs), and have better performance density than other stateoftheart FPGAbased designs.

Download Full-text

Fish Species Recognition with Faster R-CNN Inception-v2 using QUT FISH Dataset

Lontar Komputer Jurnal Ilmiah Teknologi Informasi ◽

10.24843/lkjiti.2020.v11.i03.p03 ◽

2020 ◽

Vol 11 (3) ◽

pp. 144

Author(s):

Yonatan Adiwinata ◽

Akane Sasaoka ◽

I Putu Agung Bayupati ◽

Oka Sudana

Keyword(s):

Deep Learning ◽

Object Detection ◽

Fish Species ◽

Species Recognition ◽

Species Conservation ◽

Detection Methods ◽

Single Shot ◽

Natural Ecosystems ◽

Efficient Technology

Fish species conservation had a big impact on the natural ecosystems balanced. The existence of efficient technology in identifying fish species could help fish conservation. The most recent research related to was a classification of fish species using the Deep Learning method. Most of the deep learning methods used were Convolutional Layer or Convolutional Neural Network (CNN). This research experimented with using object detection method based on deep learning like Faster R-CNN, which possible to recognize the species of fish inside of the image without more image preprocessing. This research aimed to know the performance of the Faster R-CNN method against other object detection methods like SSD in fish species detection. The fish dataset used in the research reference was QUT FISH Dataset. The accuracy of the Faster R-CNN reached 80.4%, far above the accuracy of the Single Shot Detector (SSD) Model with an accuracy of 49.2%.

Download Full-text

A Single Shot Framework with Multi-Scale Feature Fusion for Geospatial Object Detection

Remote Sensing ◽

10.3390/rs11050594 ◽

2019 ◽

Vol 11 (5) ◽

pp. 594 ◽

Cited By ~ 11

Author(s):

Shuo Zhuang ◽

Ping Wang ◽

Boran Jiang ◽

Gang Wang ◽

Cong Wang

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Large Scale ◽

Feature Fusion ◽

Aerial Images ◽

Detection Methods ◽

Single Shot ◽

Feature Maps ◽

Scale Feature ◽

Multi Scale

With the rapid advances in remote-sensing technologies and the larger number of satellite images, fast and effective object detection plays an important role in understanding and analyzing image information, which could be further applied to civilian and military fields. Recently object detection methods with region-based convolutional neural network have shown excellent performance. However, these two-stage methods contain region proposal generation and object detection procedures, resulting in low computation speed. Because of the expensive manual costs, the quantity of well-annotated aerial images is scarce, which also limits the progress of geospatial object detection in remote sensing. In this paper, on the one hand, we construct and release a large-scale remote-sensing dataset for geospatial object detection (RSD-GOD) that consists of 5 different categories with 18,187 annotated images and 40,990 instances. On the other hand, we design a single shot detection framework with multi-scale feature fusion. The feature maps from different layers are fused together through the up-sampling and concatenation blocks to predict the detection results. High-level features with semantic information and low-level features with fine details are fully explored for detection tasks, especially for small objects. Meanwhile, a soft non-maximum suppression strategy is put into practice to select the final detection results. Extensive experiments have been conducted on two datasets to evaluate the designed network. Results show that the proposed approach achieves a good detection performance and obtains the mean average precision value of 89.0% on a newly constructed RSD-GOD dataset and 83.8% on the Northwestern Polytechnical University very high spatial resolution-10 (NWPU VHR-10) dataset at 18 frames per second (FPS) on a NVIDIA GTX-1080Ti GPU.

Download Full-text