CISPNet: Automatic Detection of Remote Sensing Images from Google Earth in Complex Scenes Based on Context Information Scene Perception

Wenxu Shi; Jinhong Jiang; Shengli Bao; Dailun Tan

doi:10.3390/app9224836

CISPNet: Automatic Detection of Remote Sensing Images from Google Earth in Complex Scenes Based on Context Information Scene Perception

Applied Sciences ◽

10.3390/app9224836 ◽

2019 ◽

Vol 9 (22) ◽

pp. 4836 ◽

Cited By ~ 3

Author(s):

Wenxu Shi ◽

Jinhong Jiang ◽

Shengli Bao ◽

Dailun Tan

Keyword(s):

Remote Sensing ◽

Scene Perception ◽

Remote Sensing Image ◽

Google Earth ◽

Visual Object ◽

Context Information ◽

Single Shot ◽

Aspect Ratios ◽

Image Dataset ◽

Better Than

The ability to detect small targets and the speed of the target detector are very important for the application of remote sensing image detection, and in this paper, we propose an effective and efficient method (named CISPNet) with high detection accuracy and compact architecture. In particular, according to the characteristics of the data, we apply a context information scene perception (CISP) module to obtain the contextual information for targets of different scales and use k-means clustering to set the aspect ratios and size of the default boxes. The proposed method inherits the network structure of Single Shot MultiBox Detector (SSD) and introduces the CISP module into it. We create a dataset in the Pascal Visual Object Classes (VOC) format, annotated with the three types of detection targets, aircraft, ship, and oiltanker. Experimental results on our remote sensing image dataset as well as the Northwestern Polytechnical University very-high-resolution (NWPU VRH-10) dataset demonstrate that the proposed CISPNet performs much better than the original SSD and other detectors especially for small objects. Specifically, our network can achieve 80.34% mean average precision (mAP) at the speed of 50.7 frames per second (FPS) with the input size 300 × 300 pixels on the remote sensing image dataset. On extended experiments, the performance of CISPNet in fuzzy target detection in remote sensing image is better than that of SSD.

Download Full-text

A Public Dataset for Fine-Grained Ship Classification in Optical Remote Sensing Images

Remote Sensing ◽

10.3390/rs13040747 ◽

2021 ◽

Vol 13 (4) ◽

pp. 747

Author(s):

Yanghua Di ◽

Zhiguo Jiang ◽

Haopeng Zhang

Keyword(s):

Remote Sensing ◽

Image Data ◽

Remote Sensing Image ◽

Google Earth ◽

Optical Remote Sensing ◽

Remote Sensing Images ◽

Visual Categorization ◽

Class Differences ◽

Fine Grained ◽

Ship Classification

Fine-grained visual categorization (FGVC) is an important and challenging problem due to large intra-class differences and small inter-class differences caused by deformation, illumination, angles, etc. Although major advances have been achieved in natural images in the past few years due to the release of popular datasets such as the CUB-200-2011, Stanford Cars and Aircraft datasets, fine-grained ship classification in remote sensing images has been rarely studied because of relative scarcity of publicly available datasets. In this paper, we investigate a large amount of remote sensing image data of sea ships and determine most common 42 categories for fine-grained visual categorization. Based our previous DSCR dataset, a dataset for ship classification in remote sensing images, we collect more remote sensing images containing warships and civilian ships of various scales from Google Earth and other popular remote sensing image datasets including DOTA, HRSC2016, NWPU VHR-10, We call our dataset FGSCR-42, meaning a dataset for Fine-Grained Ship Classification in Remote sensing images with 42 categories. The whole dataset of FGSCR-42 contains 9320 images of most common types of ships. We evaluate popular object classification algorithms and fine-grained visual categorization algorithms to build a benchmark. Our FGSCR-42 dataset is publicly available at our webpages.

Download Full-text

WSF-NET: Weakly Supervised Feature-Fusion Network for Binary Segmentation in Remote Sensing Image

Remote Sensing ◽

10.3390/rs10121970 ◽

2018 ◽

Vol 10 (12) ◽

pp. 1970 ◽

Cited By ~ 9

Author(s):

Kun Fu ◽

Wanxuan Lu ◽

Wenhui Diao ◽

Menglong Yan ◽

Hao Sun ◽

...

Keyword(s):

Remote Sensing ◽

Feature Fusion ◽

Class Imbalance ◽

Remote Sensing Image ◽

Google Earth ◽

Training Strategy ◽

Binary Segmentation ◽

Supervised Methods ◽

Weakly Supervised ◽

The Given

Binary segmentation in remote sensing aims to obtain binary prediction mask classifying each pixel in the given image. Deep learning methods have shown outstanding performance in this task. These existing methods in fully supervised manner need massive high-quality datasets with manual pixel-level annotations. However, the annotations are generally expensive and sometimes unreliable. Recently, using only image-level annotations, weakly supervised methods have proven to be effective in natural imagery, which significantly reduce the dependence on manual fine labeling. In this paper, we review existing methods and propose a novel weakly supervised binary segmentation framework, which is capable of addressing the issue of class imbalance via a balanced binary training strategy. Besides, a weakly supervised feature-fusion network (WSF-Net) is introduced to adapt to the unique characteristics of objects in remote sensing image. The experiments were implemented on two challenging remote sensing datasets: Water dataset and Cloud dataset. Water dataset is acquired by Google Earth with a resolution of 0.5 m, and Cloud dataset is acquired by Gaofen-1 satellite with a resolution of 16 m. The results demonstrate that using only image-level annotations, our method can achieve comparable results to fully supervised methods.

Download Full-text

Regression Tree CNN for Estimation of Ground Sampling Distance Based on Floating-Point Representation

Remote Sensing ◽

10.3390/rs11192276 ◽

2019 ◽

Vol 11 (19) ◽

pp. 2276

Author(s):

Jae-Hun Lee ◽

Sanghoon Sull

Keyword(s):

Remote Sensing ◽

Regression Tree ◽

Remote Sensing Image ◽

Input Image ◽

Google Earth ◽

Aerial Image ◽

Floating Point ◽

Binomial Tree ◽

Sampling Distance ◽

Public Datasets

The estimation of ground sampling distance (GSD) from a remote sensing image enables measurement of the size of an object as well as more accurate segmentation in the image. In this paper, we propose a regression tree convolutional neural network (CNN) for estimating the value of GSD from an input image. The proposed regression tree CNN consists of a feature extraction CNN and a binomial tree layer. The proposed network first extracts features from an input image. Based on the extracted features, it predicts the GSD value that is represented by the floating-point number with the exponent and its mantissa. They are computed by coarse scale classification and finer scale regression, respectively, resulting in improved results. Experimental results with a Google Earth aerial image dataset and a mixed dataset consisting of eight remote sensing image public datasets with different GSDs show that the proposed network reduces the GSD prediction error rate by 25% compared to a baseline network that directly estimates the GSD.

Download Full-text

Acquisitions of Vegetation Coverage and Cultivated Land Occupation Ratio of Taiyuan Valley Plain for Example Using CBERS-02B CCD Image

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.518-523.5663 ◽

2012 ◽

Vol 518-523 ◽

pp. 5663-5667

Author(s):

Shi Wei Li ◽

Ji Long Zhang ◽

Jian Sheng Yang

Keyword(s):

Remote Sensing ◽

Water Conservation ◽

Vegetation Index ◽

Normalized Difference Vegetation Index ◽

Remote Sensing Image ◽

Google Earth ◽

Vegetation Coverage ◽

Cultivated Land ◽

Land Occupation ◽

Coverage Ratio

Vegetation covering situation is very important for the quality of air quality, soil and water conservation ability and soil forming in an area. By using the remote sensing image of Taiyuan Valley Plain, the application of Normalized Difference Vegetation Index (NDVI) and unsupervised classification, the vegetation coverage map which includes non-cultivated land disposition and cultivated land disposition was obtained using ERDAS Imagine software. To evaluate the accuracy of the results, 200 points were sampled randomly, the high spatial resolution remote sensing image from Google Earth was used as the reference. The overall classification accuracy is 82%, with the Kappa statistic of 0.81. By counting the totally pixel acreage, it was gotten that the vegetation coverage was 46% and the cultivated land coverage ratio was 31% in the study area.

Download Full-text

Multi-class object detection in remote sensing image based on context information and regularized convolutional network

Second Target Recognition and Artificial Intelligence Summit Forum ◽

10.1117/12.2551461 ◽

2020 ◽

Author(s):

Bei Cheng ◽

Zhengzhou Li ◽

Qingqing Wu

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Remote Sensing Image ◽

Context Information ◽

Convolutional Network

Download Full-text

Robust Building Extraction for High Spatial Resolution Remote Sensing Images with Self-Attention Network

Sensors ◽

10.3390/s20247241 ◽

2020 ◽

Vol 20 (24) ◽

pp. 7241

Author(s):

Dengji Zhou ◽

Guizhou Wang ◽

Guojin He ◽

Tengfei Long ◽

Ranyu Yin ◽

...

Keyword(s):

Remote Sensing ◽

Spatial Resolution ◽

High Spatial Resolution ◽

Remote Sensing Image ◽

Context Information ◽

Building Extraction ◽

Remote Sensing Images ◽

Global Features ◽

Long Distance ◽

Attention Network

Building extraction from high spatial resolution remote sensing images is a hot spot in the field of remote sensing applications and computer vision. This paper presents a semantic segmentation model, which is a supervised method, named Pyramid Self-Attention Network (PISANet). Its structure is simple, because it contains only two parts: one is the backbone of the network, which is used to learn the local features (short distance context information around the pixel) of buildings from the image; the other part is the pyramid self-attention module, which is used to obtain the global features (long distance context information with other pixels in the image) and the comprehensive features (includes color, texture, geometric and high-level semantic feature) of the building. The network is an end-to-end approach. In the training stage, the input is the remote sensing image and corresponding label, and the output is probability map (the probability that each pixel is or is not building). In the prediction stage, the input is the remote sensing image, and the output is the extraction result of the building. The complexity of the network structure was reduced so that it is easy to implement. The proposed PISANet was tested on two datasets. The result shows that the overall accuracy reached 94.50 and 96.15%, the intersection-over-union reached 77.45 and 87.97%, and F1 index reached 87.27 and 93.55%, respectively. In experiments on different datasets, PISANet obtained high overall accuracy, low error rate and improved integrity of individual buildings.

Download Full-text

A New Remote Sensing Image Dataset for Large-Scale Remote Sensing Detection

2019 IEEE International Conference on Real-time Computing and Robotics (RCAR) ◽

10.1109/rcar47638.2019.9043971 ◽

2019 ◽

Author(s):

Dongyang Xie ◽

Jun Cheng ◽

Dapeng Tao

Keyword(s):

Remote Sensing ◽

Large Scale ◽

Remote Sensing Image ◽

Image Dataset

Download Full-text

One Method of Urban Land Covers Information Extraction

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.380-384.4011 ◽

2013 ◽

Vol 380-384 ◽

pp. 4011-4014

Author(s):

Da Peng Xing

Keyword(s):

Remote Sensing ◽

Nearest Neighbor ◽

Kappa Statistic ◽

Impervious Surface ◽

Remote Sensing Image ◽

Urban Land ◽

Google Earth ◽

K Nearest Neighbor ◽

Surface Information ◽

Vegetation Water

To study urban land cover information extracting method used one scene LANDSAT-TM remote sensing image of Chongqing city, China. Since NDVI calculation can enhance vegetation information, a calculation expression which is built according to water spectrum feather can enhance water information, and NDBI (Normalization Difference Building Index) calculation can enhance impervious surface information, the three calculations were used to get three thematic images, and after stacking them on to be one image, supervised classification used k-Nearest Neighbor algorithm based on Object-Oriented features of the image was used to get vegetation , water and impervious surface information. To value the final accuracy of the classification, 100 random sampling points were chosen and the high spatial resolution remote sensing image of Google Earth was taken to be the reference, the overall classification accuracy is 83.4%, and the Kappa statistic is 0.814.

Download Full-text

Change Capsule Network for Optical Remote Sensing Image Change Detection

Remote Sensing ◽

10.3390/rs13142646 ◽

2021 ◽

Vol 13 (14) ◽

pp. 2646

Author(s):

Quanfu Xu ◽

Keming Chen ◽

Guangyao Zhou ◽

Xian Sun

Keyword(s):

Remote Sensing ◽

Change Detection ◽

Remote Sensing Image ◽

Google Earth ◽

Small Data ◽

Optical Remote Sensing ◽

Optical Remote Sensing Image ◽

Vector Difference ◽

Image Change Detection ◽

Image Pairs

Change detection based on deep learning has made great progress recently, but there are still some challenges, such as the small data size in open-labeled datasets, the different viewpoints in image pairs, and the poor similarity measures in feature pairs. To alleviate these problems, this paper presents a novel change capsule network by taking advantage of a capsule network that can better deal with the different viewpoints and can achieve satisfactory performance with small training data for optical remote sensing image change detection. First, two identical non-shared weight capsule networks are designed to extract the vector-based features of image pairs. Second, the unchanged region reconstruction module is adopted to keep the feature space of the unchanged region more consistent. Third, vector cosine and vector difference are utilized to compare the vector-based features in a capsule network efficiently, which can enlarge the separability between the changed pixels and the unchanged pixels. Finally, a binary change map can be produced by analyzing both the vector cosine and vector difference. From the unchanged region reconstruction module and the vector cosine and vector difference module, the extracted feature pairs in a change capsule network are more comparable and separable. Moreover, to test the effectiveness of the proposed change capsule network in dealing with the different viewpoints in multi-temporal images, we collect a new change detection dataset from a taken-over Al Udeid Air Basee (AUAB) using Google Earth. The results of the experiments carried out on the AUAB dataset show that a change capsule network can better deal with the different viewpoints and can improve the comparability and separability of feature pairs. Furthermore, a comparison of the experimental results carried out on the AUAB dataset and SZTAKI AirChange Benchmark Set demonstrates the effectiveness and superiority of the proposed method.

Download Full-text

Different Viewpoints Image Registration for Remote Sensing Based on Multiple Image Features

10.20944/preprints201705.0027.v1 ◽

2017 ◽

Author(s):

Kun Yang ◽

Anning Pan ◽

Yang Yang ◽

Su Zhang ◽

Sim Heng Ong

Keyword(s):

Remote Sensing ◽

Image Registration ◽

Geometric Structure ◽

Remote Sensing Image ◽

Image Features ◽

Google Earth ◽

Reference Image ◽

Feature Descriptor ◽

Multiple Image ◽

Invariant Feature

Remote sensing image registration with different viewpoints plays an important role in the field of geographic information system. However, when there exists ground relief variations and imaging viewpoint changes, non-rigid distortion occurs thus the registration becomes increasingly challenging. The current methods will suffer from missing true correspondences when non-rigid geometric distortion occurs. To address the problem, we propose a robust remote sensing image registration method based on SIFT feature distance and geometric structure features. At first, the scale-invariant feature transform (SIFT), a partial intensity invariant feature descriptor is used to extract reliable feature point set from sensed and reference image respectively. Secondly, a novel algorithm based on multiple image features which constrains the geometric structure during transformation is used to estimate exact correspondences between point sets. Finally, an accurate alignment is achieved by mapping the sensed image to reference image using thin-plate spline. We evaluated the performances of the proposed method by three sets of remote sensing images obtained from the unmanned aerial vehicle (UAV) and the Google earth, and compared with five state-of-the-art methods where our algorithm solved the non-rigid registration problem of remote sensing image with different viewpoints and showed the best alignments in most cases.

Download Full-text