Multimodel Deep Learning for Person Detection in Aerial Images

Mirela Kundid Vasić; Vladan Papić

doi:10.3390/electronics9091459

Multimodel Deep Learning for Person Detection in Aerial Images

Electronics ◽

10.3390/electronics9091459 ◽

2020 ◽

Vol 9 (9) ◽

pp. 1459

Author(s):

Mirela Kundid Vasić ◽

Vladan Papić

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Contextual Information ◽

Geographical Area ◽

Aerial Images ◽

Person Detection ◽

Current State ◽

Aerial Vehicle ◽

Novel Method ◽

Better Than

In this paper, we propose a novel method for person detection in aerial images of nonurban terrain gathered by an Unmanned Aerial Vehicle (UAV), which plays an important role in Search And Rescue (SAR) missions. The UAV in SAR operations contributes significantly due to the ability to survey a larger geographical area from an aerial viewpoint. Because of the high altitude of recording, the object of interest (person) covers a small part of an image (around 0.1%), which makes this task quite challenging. To address this problem, a multimodel deep learning approach is proposed. The solution consists of two different convolutional neural networks in region proposal, as well as in the classification stage. Additionally, contextual information is used in the classification stage in order to improve the detection results. Experimental results tested on the HERIDAL dataset achieved precision of 68.89% and a recall of 94.65%, which is better than current state-of-the-art methods used for person detection in similar scenarios. Consequently, it may be concluded that this approach is suitable for usage as an auxiliary method in real SAR operations.

Download Full-text

ResMem-Net: memory based deep CNN for image memorability estimation

PeerJ Computer Science ◽

10.7717/peerj-cs.767 ◽

2021 ◽

Vol 7 ◽

pp. e767

Author(s):

Arockia Praveen ◽

Abdulfattah Noorwali ◽

Duraimurugan Samiayya ◽

Mohammad Zubair Khan ◽

Durai Raj Vincent P M ◽

...

Keyword(s):

Deep Learning ◽

Large Scale ◽

Mean Squared Error ◽

State Of The Art ◽

Rank Correlation ◽

Current State ◽

Intermediate Layers ◽

Better Than ◽

Made In ◽

Memory Efficient

Image memorability is a very hard problem in image processing due to its subjective nature. But due to the introduction of Deep Learning and the large availability of data and GPUs, great strides have been made in predicting the memorability of an image. In this paper, we propose a novel deep learning architecture called ResMem-Net that is a hybrid of LSTM and CNN that uses information from the hidden layers of the CNN to compute the memorability score of an image. The intermediate layers are important for predicting the output because they contain information about the intrinsic properties of the image. The proposed architecture automatically learns visual emotions and saliency, shown by the heatmaps generated using the GradRAM technique. We have also used the heatmaps and results to analyze and answer one of the most important questions in image memorability: “What makes an image memorable?”. The model is trained and evaluated using the publicly available Large-scale Image Memorability dataset (LaMem) from MIT. The results show that the model achieves a rank correlation of 0.679 and a mean squared error of 0.011, which is better than the current state-of-the-art models and is close to human consistency (p = 0.68). The proposed architecture also has a significantly low number of parameters compared to the state-of-the-art architecture, making it memory efficient and suitable for production.

Download Full-text

SK-Net: Deep Learning on Point Cloud via End-to-End Discovery of Spatial Keypoints

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6113 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6422-6429

Author(s):

Weikun Wu ◽

Yan Zhang ◽

David Wang ◽

Yunqi Lei

Keyword(s):

Deep Learning ◽

Spatial Pattern ◽

Point Cloud ◽

State Of The Art ◽

Contextual Information ◽

Feature Representation ◽

Specific Point ◽

End To End ◽

Ablation Study ◽

Better Than

Since the PointNet was proposed, deep learning on point cloud has been the concentration of intense 3D research. However, existing point-based methods usually are not adequate to extract the local features and the spatial pattern of a point cloud for further shape understanding. This paper presents an end-to-end framework, SK-Net, to jointly optimize the inference of spatial keypoint with the learning of feature representation of a point cloud for a specific point cloud task. One key process of SK-Net is the generation of spatial keypoints (Skeypoints). It is jointly conducted by two proposed regulating losses and a task objective function without knowledge of Skeypoint location annotations and proposals. Specifically, our Skeypoints are not sensitive to the location consistency but are acutely aware of shape. Another key process of SK-Net is the extraction of the local structure of Skeypoints (detail feature) and the local spatial pattern of normalized Skeypoints (pattern feature). This process generates a comprehensive representation, pattern-detail (PD) feature, which comprises the local detail information of a point cloud and reveals its spatial pattern through the part district reconstruction on normalized Skeypoints. Consequently, our network is prompted to effectively understand the correlation between different regions of a point cloud and integrate contextual information of the point cloud. In point cloud tasks, such as classification and segmentation, our proposed method performs better than or comparable with the state-of-the-art approaches. We also present an ablation study to demonstrate the advantages of SK-Net.

Download Full-text

IMG2nDSM: Height Estimation from Single Airborne RGB Images with Deep Learning

Remote Sensing ◽

10.3390/rs13122417 ◽

2021 ◽

Vol 13 (12) ◽

pp. 2417

Author(s):

Savvas Karatsiolis ◽

Andreas Kamilaris ◽

Ian Cole

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Aerial Imagery ◽

Aerial Images ◽

Surface Model ◽

Large Area ◽

Digital Terrain ◽

Terrain Models ◽

Architectural Features ◽

Rgb Images

Estimating the height of buildings and vegetation in single aerial images is a challenging problem. A task-focused Deep Learning (DL) model that combines architectural features from successful DL models (U-NET and Residual Networks) and learns the mapping from a single aerial imagery to a normalized Digital Surface Model (nDSM) was proposed. The model was trained on aerial images whose corresponding DSM and Digital Terrain Models (DTM) were available and was then used to infer the nDSM of images with no elevation information. The model was evaluated with a dataset covering a large area of Manchester, UK, as well as the 2018 IEEE GRSS Data Fusion Contest LiDAR dataset. The results suggest that the proposed DL architecture is suitable for the task and surpasses other state-of-the-art DL approaches by a large margin.

Download Full-text

Towards Scalable Economic Photovoltaic Potential Analysis Using Aerial Images and Deep Learning

Energies ◽

10.3390/en14133800 ◽

2021 ◽

Vol 14 (13) ◽

pp. 3800

Author(s):

Sebastian Krapf ◽

Nils Kemmerzell ◽

Syed Khawaja Haseeb Khawaja Haseeb Uddin ◽

Manuel Hack Hack Vázquez ◽

Fabian Netzler ◽

...

Keyword(s):

Deep Learning ◽

System Analysis ◽

State Of The Art ◽

Critical Role ◽

Semantic Segmentation ◽

Energy System ◽

Aerial Images ◽

Potential Analysis ◽

3D Data ◽

Challenges And Opportunities

Roof-mounted photovoltaic systems play a critical role in the global transition to renewable energy generation. An analysis of roof photovoltaic potential is an important tool for supporting decision-making and for accelerating new installations. State of the art uses 3D data to conduct potential analyses with high spatial resolution, limiting the study area to places with available 3D data. Recent advances in deep learning allow the required roof information from aerial images to be extracted. Furthermore, most publications consider the technical photovoltaic potential, and only a few publications determine the photovoltaic economic potential. Therefore, this paper extends state of the art by proposing and applying a methodology for scalable economic photovoltaic potential analysis using aerial images and deep learning. Two convolutional neural networks are trained for semantic segmentation of roof segments and superstructures and achieve an Intersection over Union values of 0.84 and 0.64, respectively. We calculated the internal rate of return of each roof segment for 71 buildings in a small study area. A comparison of this paper’s methodology with a 3D-based analysis discusses its benefits and disadvantages. The proposed methodology uses only publicly available data and is potentially scalable to the global level. However, this poses a variety of research challenges and opportunities, which are summarized with a focus on the application of deep learning, economic photovoltaic potential analysis, and energy system analysis.

Download Full-text

One for All: Neural Joint Modeling of Entities and Events

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016851 ◽

2019 ◽

Vol 33 ◽

pp. 6851-6858 ◽

Cited By ~ 4

Author(s):

Trung Minh Nguyen ◽

Thien Huu Nguyen

Keyword(s):

Deep Learning ◽

Recent Work ◽

State Of The Art ◽

Contextual Information ◽

Joint Modeling ◽

Event Extraction ◽

Event Trigger ◽

The Individual ◽

Novel Model ◽

Argument Roles

The previous work for event extraction has mainly focused on the predictions for event triggers and argument roles, treating entity mentions as being provided by human annotators. This is unrealistic as entity mentions are usually predicted by some existing toolkits whose errors might be propagated to the event trigger and argument role recognition. Few of the recent work has addressed this problem by jointly predicting entity mentions, event triggers and arguments. However, such work is limited to using discrete engineering features to represent contextual information for the individual tasks and their interactions. In this work, we propose a novel model to jointly perform predictions for entity mentions, event triggers and arguments based on the shared hidden representations from deep learning. The experiments demonstrate the benefits of the proposed method, leading to the state-of-the-art performance for event extraction.

Download Full-text

A Survey of Graphical Page Object Detection with Deep Neural Networks

10.20944/preprints202104.0739.v1 ◽

2021 ◽

Author(s):

Jwalin Bhatt ◽

Khurram Azeem Hashmi ◽

Muhammad Zeshan Afzal ◽

Didier Stricker

Keyword(s):

Deep Learning ◽

Object Detection ◽

Conceptual Understanding ◽

Deep Neural Networks ◽

State Of The Art ◽

Learning Approaches ◽

Document Images ◽

Essential Information ◽

Current State ◽

High Level

In any document, graphical elements like tables, figures, and formulas contain essential information. The processing and interpretation of such information require specialized algorithms. Off-the-shelf OCR components cannot process this information reliably. Therefore, an essential step in document analysis pipelines is to detect these graphical components. It leads to a high-level conceptual understanding of the documents that makes digitization of documents viable. Since the advent of deep learning, the performance of deep learning-based object detection has improved many folds. In this work, we outline and summarize the deep learning approaches for detecting graphical page objects in the document images. Therefore, we discuss the most relevant deep learning-based approaches and state-of-the-art graphical page object detection in document images. This work provides a comprehensive understanding of the current state-of-the-art and related challenges. Furthermore, we discuss leading datasets along with the quantitative evaluation. Moreover, it discusses briefly the promising directions that can be utilized for further improvements.

Download Full-text

Mosaic Super-resolution via Sequential Feature Pyramid Networks

10.36227/techrxiv.11402130 ◽

2019 ◽

Author(s):

Mehrdad Shoeiby ◽

Mohammad Ali Armin ◽

Sadegh Aliakbarian ◽

Saeed Anwar ◽

Lars petersson

Keyword(s):

State Of The Art ◽

Super Resolution ◽

Autonomous Driving ◽

Single Shot ◽

Current State ◽

Wide Range ◽

Feature Pyramid ◽

Novel Method ◽

Convolutional Lstm ◽

Mosaic Images

<div>Advances in the design of multi-spectral cameras have</div><div>led to great interests in a wide range of applications, from</div><div>astronomy to autonomous driving. However, such cameras</div><div>inherently suffer from a trade-off between the spatial and</div><div>spectral resolution. In this paper, we propose to address</div><div>this limitation by introducing a novel method to carry out</div><div>super-resolution on raw mosaic images, multi-spectral or</div><div>RGB Bayer, captured by modern real-time single-shot mo-</div><div>saic sensors. To this end, we design a deep super-resolution</div><div>architecture that benefits from a sequential feature pyramid</div><div>along the depth of the network. This, in fact, is achieved</div><div>by utilizing a convolutional LSTM (ConvLSTM) to learn the</div><div>inter-dependencies between features at different receptive</div><div>fields. Additionally, by investigating the effect of different</div><div>attention mechanisms in our framework, we show that a</div><div>ConvLSTM inspired module is able to provide superior at-</div><div>tention in our context. Our extensive experiments and anal-</div><div>yses evidence that our approach yields significant super-</div><div>resolution quality, outperforming current state-of-the-art</div><div>mosaic super-resolution methods on both Bayer and multi-</div><div>spectral images. Additionally, to the best of our knowledge,</div><div>our method is the first specialized method to super-resolve</div><div>mosaic images, whether it be multi-spectral or Bayer.</div><div><br></div>

Download Full-text

A Robust Context-Based Deep Learning Approach for Highly Imbalanced Hyperspectral Classification

Computational Intelligence and Neuroscience ◽

10.1155/2021/9923491 ◽

2021 ◽

Vol 2021 ◽

pp. 1-17

Author(s):

Juan F. Ramirez Rochac ◽

Nian Zhang ◽

Lara A. Thompson ◽

Tolessa Deksissa

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Mineral Exploration ◽

Classification Models ◽

Noise Resistance ◽

Deep Convolutional Neural Networks ◽

Current State ◽

Feature Augmentation ◽

Active Research ◽

Hyperspectral Classification

Hyperspectral imaging is an area of active research with many applications in remote sensing, mineral exploration, and environmental monitoring. Deep learning and, in particular, convolution-based approaches are the current state-of-the-art classification models. However, in the presence of noisy hyperspectral datasets, these deep convolutional neural networks underperform. In this paper, we proposed a feature augmentation approach to increase noise resistance in imbalanced hyperspectral classification. Our method calculates context-based features, and it uses a deep convolutional neuronet (DCN). We tested our proposed approach on the Pavia datasets and compared three models, DCN, PCA + DCN, and our context-based DCN, using the original datasets and the datasets plus noise. Our experimental results show that DCN and PCA + DCN perform well on the original datasets but not on the noisy datasets. Our robust context-based DCN was able to outperform others in the presence of noise and was able to maintain a comparable classification accuracy on clean hyperspectral images.

Download Full-text

ON THE QUASI-STATIONARY DISTRIBUTION OF SIS MODELS

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964816000188 ◽

2016 ◽

Vol 30 (4) ◽

pp. 622-639 ◽

Cited By ~ 1

Author(s):

Gaofeng Da ◽

Maochao Xu ◽

Shouhuai Xu

Keyword(s):

Stationary Distribution ◽

Upper Bound ◽

Hazard Rate ◽

State Of The Art ◽

Upper Bounds ◽

Hazard Rate Order ◽

Reversed Hazard Rate ◽

Novel Method ◽

Better Than ◽

Quasi Stationary Distribution

In this paper, we propose a novel method for constructing upper bounds of the quasi-stationary distribution of SIS processes. Using this method, we obtain an upper bound that is better than the state-of-the-art upper bound. Moreover, we prove that the fixed point map Φ [7] actually preserves the equilibrium reversed hazard rate order under a certain condition. This allows us to further improve the upper bound. Some numerical results are presented to illustrate the results.

Download Full-text

PPDIST, global 0.1° daily and 3-hourly precipitation probability distribution climatologies for 1979–2018

Scientific Data ◽

10.1038/s41597-020-00631-x ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Hylke E. Beck ◽

Seth Westra ◽

Jackson Tan ◽

Florian Pappenberger ◽

George J. Huffman ◽

...

Keyword(s):

Neural Networks ◽

Probability Distribution ◽

State Of The Art ◽

Intertropical Convergence Zone ◽

Coefficient Of Determination ◽

Current State ◽

Peak Intensity ◽

Global Land ◽

The Neural Networks ◽

Better Than

Abstract We introduce the Precipitation Probability DISTribution (PPDIST) dataset, a collection of global high-resolution (0.1°) observation-based climatologies (1979–2018) of the occurrence and peak intensity of precipitation (P) at daily and 3-hourly time-scales. The climatologies were produced using neural networks trained with daily P observations from 93,138 gauges and hourly P observations (resampled to 3-hourly) from 11,881 gauges worldwide. Mean validation coefficient of determination (R2) values ranged from 0.76 to 0.80 for the daily P occurrence indices, and from 0.44 to 0.84 for the daily peak P intensity indices. The neural networks performed significantly better than current state-of-the-art reanalysis (ERA5) and satellite (IMERG) products for all P indices. Using a 0.1 mm 3 h−1 threshold, P was estimated to occur 12.2%, 7.4%, and 14.3% of the time, on average, over the global, land, and ocean domains, respectively. The highest P intensities were found over parts of Central America, India, and Southeast Asia, along the western equatorial coast of Africa, and in the intertropical convergence zone. The PPDIST dataset is available via www.gloh2o.org/ppdist.

Download Full-text