scholarly journals Vehicle Detection in Aerial Images Using a Fast Oriented Region Search and the Vector of Locally Aggregated Descriptors

Sensors ◽  
2019 ◽  
Vol 19 (15) ◽  
pp. 3294 ◽  
Author(s):  
Liu ◽  
Ding ◽  
Zhu ◽  
Xiu ◽  
Li ◽  
...  

Vehicle detection in aerial images plays a significant role in civil and military applications and it faces many challenges including the overhead-view perspective, the highly complex background, and the variants of vehicles. This paper presents a robust vehicle detection scheme to overcome these issues. In the detection stage, we propose a novel algorithm to generate oriented proposals that could enclose the vehicle objects properly as rotated rectangles with orientations. To discriminate the object and background in the proposals, we propose a modified vector of locally aggregated descriptors (VLAD) image representation model with a recently proposed image feature, i.e., local steering kernel (LSK) feature. By applying non-maximum suppression (NMS) after classification, we show that each vehicle object is detected with a single-oriented bounding box. Experiments are conducted on aerial images to compare the proposed method with state-of-art methods and evaluate the impact of the components in the model. The results have proven the robustness of the proposed method under various circumstances and the superior performance over other existing vehicle detection approaches.

PLoS ONE ◽  
2021 ◽  
Vol 16 (5) ◽  
pp. e0250782
Author(s):  
Bin Wang ◽  
Bin Xu

With the rapid development of Unmanned Aerial Vehicles, vehicle detection in aerial images plays an important role in different applications. Comparing with general object detection problems, vehicle detection in aerial images is still a challenging research topic since it is plagued by various unique factors, e.g. different camera angle, small vehicle size and complex background. In this paper, a Feature Fusion Deep-Projection Convolution Neural Network is proposed to enhance the ability to detect small vehicles in aerial images. The backbone of the proposed framework utilizes a novel residual block named stepwise res-block to explore high-level semantic features as well as conserve low-level detail features at the same time. A specially designed feature fusion module is adopted in the proposed framework to further balance the features obtained from different levels of the backbone. A deep-projection deconvolution module is used to minimize the impact of the information contamination introduced by down-sampling/up-sampling processes. The proposed framework has been evaluated by UCAS-AOD, VEDAI, and DOTA datasets. According to the evaluation results, the proposed framework outperforms other state-of-the-art vehicle detection algorithms for aerial images.


2021 ◽  
Vol 2021 ◽  
pp. 1-19
Author(s):  
Kaifeng Li ◽  
Bin Wang

With the rapid development of deep learning and the wide usage of Unmanned Aerial Vehicles (UAVs), CNN-based algorithms of vehicle detection in aerial images have been widely studied in the past several years. As a downstream task of the general object detection, there are some differences between the vehicle detection in aerial images and the general object detection in ground view images, e.g., larger image areas, smaller target sizes, and more complex background. In this paper, to improve the performance of this task, a Dense Attentional Residual Network (DAR-Net) is proposed. The proposed network employs a novel dense waterfall residual block (DW res-block) to effectively preserve the spatial information and extract high-level semantic information at the same time. A multiscale receptive field attention (MRFA) module is also designed to select the informative feature from the feature maps and enhance the ability of multiscale perception. Based on the DW res-block and MRFA module, to protect the spatial information, the proposed framework adopts a new backbone that only downsamples the feature map 3 times; i.e., the total downsampling ratio of the proposed backbone is 8. These designs could alleviate the degradation problem, improve the information flow, and strengthen the feature reuse. In addition, deep-projection units are used to reduce the impact of information loss caused by downsampling operations, and the identity mapping is applied to each stage of the proposed backbone to further improve the information flow. The proposed DAR-Net is evaluated on VEDAI, UCAS-AOD, and DOTA datasets. The experimental results demonstrate that the proposed framework outperforms other state-of-the-art algorithms.


2020 ◽  
Vol 12 (16) ◽  
pp. 2558 ◽  
Author(s):  
Nan Mo ◽  
Li Yan

Vehicles in aerial images are generally with small sizes and unbalanced number of samples, which leads to the poor performances of the existing vehicle detection algorithms. Therefore, an oriented vehicle detection framework based on improved Faster RCNN is proposed for aerial images. First of all, we propose an oversampling and stitching data augmentation method to decrease the negative effect of category imbalance in the training dataset and construct a new dataset with balanced number of samples. Then considering that the pooling operation may loss the discriminative ability of features for small objects, we propose to amplify the feature map so that detailed information hidden in the last feature map can be enriched. Finally, we design a joint training loss function including center loss for both horizontal and oriented bounding boxes, and reduce the impact of small inter-class diversity on vehicle detection. The proposed framework is evaluated on the VEDAI dataset that consists of 9 vehicle categories. The experimental results show that the proposed framework outperforms previous approaches with a mean average precision of 60.4% and 60.1% in detecting horizontal and oriented bounding boxes respectively, which is about 8% better than Faster RCNN.


Author(s):  
S. Lakrih ◽  
J. Diouri

In this paper, fractal theory and wavelet transform are combined to detect and classify self-extinguishing and fugitive scenarios of power quality disturbances (PQDs). After deciding whether the disturbance is simple or complex, the additional voltage is denoised through Discrete Wavelet Transform (DWT); the denoising process is adapted according to whether the distorted voltage contains oscillatory transients or not. At the detection stage, the grille fractal dimension of the DWT decomposition detail is computed. Then, a threshold is deduced to detect the start and end moments of the disturbance. The results reveal that the proposed detection scheme yields accurate location of PQDs even in the presence of high oscillatory transients. An algorithm based on geometric and statistical approaches is developed at the classification stage to recognize PQDs automatically. The geometric classification is based on Continuous Wavelet Transform (CWT), whereas the statistical classification is based on Multifractal Detrended Fluctuations Analysis (MFDFA) and an energy metric. The results prove that the combination of geometric and statistical classification can serve as an effective discrimination tool for PQDs. The major strength of the proposed approach is its ability to interpret the impact of each disturbance on the multifractal behavior of the nominal voltage, thus giving the possibility to draw the necessary generalizations for real-time applications.


2021 ◽  
Vol 13 (14) ◽  
pp. 2656
Author(s):  
Furong Shi ◽  
Tong Zhang

Deep-learning technologies, especially convolutional neural networks (CNNs), have achieved great success in building extraction from areal images. However, shape details are often lost during the down-sampling process, which results in discontinuous segmentation or inaccurate segmentation boundary. In order to compensate for the loss of shape information, two shape-related auxiliary tasks (i.e., boundary prediction and distance estimation) were jointly learned with building segmentation task in our proposed network. Meanwhile, two consistency constraint losses were designed based on the multi-task network to exploit the duality between the mask prediction and two shape-related information predictions. Specifically, an atrous spatial pyramid pooling (ASPP) module was appended to the top of the encoder of a U-shaped network to obtain multi-scale features. Based on the multi-scale features, one regression loss and two classification losses were used for predicting the distance-transform map, segmentation, and boundary. Two inter-task consistency-loss functions were constructed to ensure the consistency between distance maps and masks, and the consistency between masks and boundary maps. Experimental results on three public aerial image data sets showed that our method achieved superior performance over the recent state-of-the-art models.


Author(s):  
Tianyang Xu ◽  
Zhenhua Feng ◽  
Xiao-Jun Wu ◽  
Josef Kittler

AbstractDiscriminative Correlation Filters (DCF) have been shown to achieve impressive performance in visual object tracking. However, existing DCF-based trackers rely heavily on learning regularised appearance models from invariant image feature representations. To further improve the performance of DCF in accuracy and provide a parsimonious model from the attribute perspective, we propose to gauge the relevance of multi-channel features for the purpose of channel selection. This is achieved by assessing the information conveyed by the features of each channel as a group, using an adaptive group elastic net inducing independent sparsity and temporal smoothness on the DCF solution. The robustness and stability of the learned appearance model are significantly enhanced by the proposed method as the process of channel selection performs implicit spatial regularisation. We use the augmented Lagrangian method to optimise the discriminative filters efficiently. The experimental results obtained on a number of well-known benchmarking datasets demonstrate the effectiveness and stability of the proposed method. A superior performance over the state-of-the-art trackers is achieved using less than $$10\%$$ 10 % deep feature channels.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sarv Priya ◽  
Tanya Aggarwal ◽  
Caitlin Ward ◽  
Girish Bathla ◽  
Mathews Jacob ◽  
...  

AbstractSide experiments are performed on radiomics models to improve their reproducibility. We measure the impact of myocardial masks, radiomic side experiments and data augmentation for information transfer (DAFIT) approach to differentiate patients with and without pulmonary hypertension (PH) using cardiac MRI (CMRI) derived radiomics. Feature extraction was performed from the left ventricle (LV) and right ventricle (RV) myocardial masks using CMRI in 82 patients (42 PH and 40 controls). Various side study experiments were evaluated: Original data without and with intraclass correlation (ICC) feature-filtering and DAFIT approach (without and with ICC feature-filtering). Multiple machine learning and feature selection strategies were evaluated. Primary analysis included all PH patients with subgroup analysis including PH patients with preserved LVEF (≥ 50%). For both primary and subgroup analysis, DAFIT approach without feature-filtering was the highest performer (AUC 0.957–0.958). ICC approaches showed poor performance compared to DAFIT approach. The performance of combined LV and RV masks was superior to individual masks alone. There was variation in top performing models across all approaches (AUC 0.862–0.958). DAFIT approach with features from combined LV and RV masks provide superior performance with poor performance of feature filtering approaches. Model performance varies based upon the feature selection and model combination.


Sign in / Sign up

Export Citation Format

Share Document