scholarly journals Multiscale Rotated Bounding Box-Based Deep Learning Method for Detecting Ship Targets in Remote Sensing Images

Sensors ◽  
2018 ◽  
Vol 18 (8) ◽  
pp. 2702 ◽  
Author(s):  
Shuxin Li ◽  
Zhilong Zhang ◽  
Biao Li ◽  
Chuwei Li

Since remote sensing images are captured from the top of the target, such as from a satellite or plane platform, ship targets can be presented at any orientation. When detecting ship targets using horizontal bounding boxes, there will be background clutter in the box. This clutter makes it harder to detect the ship and find its precise location, especially when the targets are in close proximity or staying close to the shore. To solve these problems, this paper proposes a deep learning algorithm using a multiscale rotated bounding box to detect the ship target in a complex background and obtain the location and orientation information of the ship. When labeling the oriented targets, we use the five-parameter method to ensure that the box shape is maintained rectangular. The algorithm uses a pretrained deep network to extract features and produces two divided flow paths to output the result. One flow path predicts the target class, while the other predicts the location and angle information. In the training stage, we match the prior multiscale rotated bounding boxes to the ground-truth bounding boxes to obtain the positive sample information and use it to train the deep learning model. When matching the rotated bounding boxes, we narrow down the selection scope to reduce the amount of calculation. In the testing stage, we use the trained model to predict and obtain the final result after comparing with the score threshold and nonmaximum suppression post-processing. Experiments conducted on a remote sensing dataset show that the algorithm is robust in detecting ship targets under complex conditions, such as wave clutter background, target in close proximity, ship close to the shore, and multiscale varieties. Compared to other algorithms, our algorithm not only exhibits better performance in ship detection but also obtains the precise location and orientation information of the ship.

2021 ◽  
Vol 13 (22) ◽  
pp. 4517
Author(s):  
Falin Wu ◽  
Jiaqi He ◽  
Guopeng Zhou ◽  
Haolun Li ◽  
Yushuang Liu ◽  
...  

Object detection in remote sensing images plays an important role in both military and civilian remote sensing applications. Objects in remote sensing images are different from those in natural images. They have the characteristics of scale diversity, arbitrary directivity, and dense arrangement, which causes difficulties in object detection. For objects with a large aspect ratio and that are oblique and densely arranged, using an oriented bounding box can help to avoid deleting some correct detection bounding boxes by mistake. The classic rotational region convolutional neural network (R2CNN) has advantages for text detection. However, R2CNN has poor performance in the detection of slender objects with arbitrary directivity in remote sensing images, and its fault tolerance rate is low. In order to solve this problem, this paper proposes an improved R2CNN based on a double detection head structure and a three-point regression method, namely, TPR-R2CNN. The proposed network modifies the original R2CNN network structure by applying a double fully connected (2-fc) detection head and classification fusion. One detection head is for classification and horizontal bounding box regression, the other is for classification and oriented bounding box regression. The three-point regression method (TPR) is proposed for oriented bounding box regression, which determines the positions of the oriented bounding box by regressing the coordinates of the center point and the first two vertices. The proposed network was validated on the DOTA-v1.5 and HRSC2016 datasets, and it achieved a mean average precision (mAP) of 3.90% and 15.27%, respectively, from feature pyramid network (FPN) baselines with a ResNet-50 backbone.


2020 ◽  
Vol 12 (1) ◽  
pp. 143 ◽  
Author(s):  
Xiaoliang Qian ◽  
Sheng Lin ◽  
Gong Cheng ◽  
Xiwen Yao ◽  
Hangli Ren ◽  
...  

The objective of detection in remote sensing images is to determine the location and category of all targets in these images. The anchor based methods are the most prevalent deep learning based methods, and still have some problems that need to be addressed. First, the existing metric (i.e., intersection over union (IoU)) could not measure the distance between two bounding boxes when they are nonoverlapping. Second, the exsiting bounding box regression loss could not directly optimize the metric in the training process. Third, the existing methods which adopt a hierarchical deep network only choose a single level feature layer for the feature extraction of region proposals, meaning they do not take full use of the advantage of multi-level features. To resolve the above problems, a novel object detection method for remote sensing images based on improved bounding box regression and multi-level features fusion is proposed in this paper. First, a new metric named generalized IoU is applied, which can quantify the distance between two bounding boxes, regardless of whether they are overlapping or not. Second, a novel bounding box regression loss is proposed, which can not only optimize the new metric (i.e., generalized IoU) directly but also overcome the problem that existing bounding box regression loss based on the new metric cannot adaptively change the gradient based on the metric value. Finally, a multi-level features fusion module is proposed and incorporated into the existing hierarchical deep network, which can make full use of the multi-level features for each region proposal. The quantitative comparisons between the proposed method and baseline method on the large scale dataset DIOR demonstrate that incorporating the proposed bounding box regression loss, multi-level features fusion module, and a combination of both into the baseline method can obtain an absolute gain of 0.7%, 1.4%, and 2.2% or so in terms of mAP, respectively. Comparing this with the state-of-the-art methods demonstrates that the proposed method has achieved a state-of-the-art performance. The curves of average precision with different thresholds show that the advantage of the proposed method is more evident when the threshold of generalized IoU (or IoU) is relatively high, which means that the proposed method can improve the precision of object localization. Similar conclusions can be obtained on a NWPU VHR-10 dataset.


2021 ◽  
Vol 26 (1) ◽  
pp. 200-215
Author(s):  
Muhammad Alam ◽  
Jian-Feng Wang ◽  
Cong Guangpei ◽  
LV Yunrong ◽  
Yuanfang Chen

AbstractIn recent years, the success of deep learning in natural scene image processing boosted its application in the analysis of remote sensing images. In this paper, we applied Convolutional Neural Networks (CNN) on the semantic segmentation of remote sensing images. We improve the Encoder- Decoder CNN structure SegNet with index pooling and U-net to make them suitable for multi-targets semantic segmentation of remote sensing images. The results show that these two models have their own advantages and disadvantages on the segmentation of different objects. In addition, we propose an integrated algorithm that integrates these two models. Experimental results show that the presented integrated algorithm can exploite the advantages of both the models for multi-target segmentation and achieve a better segmentation compared to these two models.


2021 ◽  
Vol 13 (13) ◽  
pp. 2524
Author(s):  
Ziyi Chen ◽  
Dilong Li ◽  
Wentao Fan ◽  
Haiyan Guan ◽  
Cheng Wang ◽  
...  

Deep learning models have brought great breakthroughs in building extraction from high-resolution optical remote-sensing images. Among recent research, the self-attention module has called up a storm in many fields, including building extraction. However, most current deep learning models loading with the self-attention module still lose sight of the reconstruction bias’s effectiveness. Through tipping the balance between the abilities of encoding and decoding, i.e., making the decoding network be much more complex than the encoding network, the semantic segmentation ability will be reinforced. To remedy the research weakness in combing self-attention and reconstruction-bias modules for building extraction, this paper presents a U-Net architecture that combines self-attention and reconstruction-bias modules. In the encoding part, a self-attention module is added to learn the attention weights of the inputs. Through the self-attention module, the network will pay more attention to positions where there may be salient regions. In the decoding part, multiple large convolutional up-sampling operations are used for increasing the reconstruction ability. We test our model on two open available datasets: the WHU and Massachusetts Building datasets. We achieve IoU scores of 89.39% and 73.49% for the WHU and Massachusetts Building datasets, respectively. Compared with several recently famous semantic segmentation methods and representative building extraction methods, our method’s results are satisfactory.


2018 ◽  
Vol 10 (6) ◽  
pp. 964 ◽  
Author(s):  
Zhenfeng Shao ◽  
Ke Yang ◽  
Weixun Zhou

Benchmark datasets are essential for developing and evaluating remote sensing image retrieval (RSIR) approaches. However, most of the existing datasets are single-labeled, with each image in these datasets being annotated by a single label representing the most significant semantic content of the image. This is sufficient for simple problems, such as distinguishing between a building and a beach, but multiple labels and sometimes even dense (pixel) labels are required for more complex problems, such as RSIR and semantic segmentation.We therefore extended the existing multi-labeled dataset collected for multi-label RSIR and presented a dense labeling remote sensing dataset termed "DLRSD". DLRSD contained a total of 17 classes, and the pixels of each image were assigned with 17 pre-defined labels. We used DLRSD to evaluate the performance of RSIR methods ranging from traditional handcrafted feature-based methods to deep learning-based ones. More specifically, we evaluated the performances of RSIR methods from both single-label and multi-label perspectives. These results demonstrated the advantages of multiple labels over single labels for interpreting complex remote sensing images. DLRSD provided the literature a benchmark for RSIR and other pixel-based problems such as semantic segmentation.


2019 ◽  
Vol 11 (9) ◽  
pp. 1117 ◽  
Author(s):  
Haopeng Zhang ◽  
Qin Deng

The frequent hazy weather with air pollution in North China has aroused wide attention in the past few years. One of the most important pollution resource is the anthropogenic emission by fossil-fuel power plants. To relieve the pollution and assist urban environment monitoring, it is necessary to continuously monitor the working status of power plants. Satellite or airborne remote sensing provides high quality data for such tasks. In this paper, we design a power plant monitoring framework based on deep learning to automatically detect the power plants and determine their working status in high resolution remote sensing images (RSIs). To this end, we collected a dataset named BUAA-FFPP60 containing RSIs of over 60 fossil-fuel power plants in the Beijing-Tianjin-Hebei region in North China, which covers about 123 km 2 of an urban area. We compared eight state-of-the-art deep learning models and comprehensively analyzed their performance on accuracy, speed, and hardware cost. Experimental results illustrate that our deep learning based framework can effectively detect the fossil-fuel power plants and determine their working status with mean average precision up to 0.8273, showing good potential for urban environment monitoring.


2017 ◽  
Vol 12 ◽  
pp. 05012 ◽  
Author(s):  
Ying Liu ◽  
Hong-Yuan Cui ◽  
Zheng Kuang ◽  
Guo-Qing Li

Sign in / Sign up

Export Citation Format

Share Document