scholarly journals MS-Faster R-CNN: Multi-Stream Backbone for Improved Faster R-CNN Object Detection and Aerial Tracking from UAV Images

2021 ◽  
Vol 13 (9) ◽  
pp. 1670
Author(s):  
Danilo Avola ◽  
Luigi Cinque ◽  
Anxhelo Diko ◽  
Alessio Fagioli ◽  
Gian Luca Foresti ◽  
...  

Tracking objects across multiple video frames is a challenging task due to several difficult issues such as occlusions, background clutter, lighting as well as object and camera view-point variations, which directly affect the object detection. These aspects are even more emphasized when analyzing unmanned aerial vehicles (UAV) based images, where the vehicle movement can also impact the image quality. A common strategy employed to address these issues is to analyze the input images at different scales to obtain as much information as possible to correctly detect and track the objects across video sequences. Following this rationale, in this paper, we introduce a simple yet effective novel multi-stream (MS) architecture, where different kernel sizes are applied to each stream to simulate a multi-scale image analysis. The proposed architecture is then used as backbone for the well-known Faster-R-CNN pipeline, defining a MS-Faster R-CNN object detector that consistently detects objects in video sequences. Subsequently, this detector is jointly used with the Simple Online and Real-time Tracking with a Deep Association Metric (Deep SORT) algorithm to achieve real-time tracking capabilities on UAV images. To assess the presented architecture, extensive experiments were performed on the UMCD, UAVDT, UAV20L, and UAV123 datasets. The presented pipeline achieved state-of-the-art performance, confirming that the proposed multi-stream method can correctly emulate the robust multi-scale image analysis paradigm.

2019 ◽  
Vol 277 ◽  
pp. 02005
Author(s):  
Ning Feng ◽  
Le Dong ◽  
Qianni Zhang ◽  
Ning Zhang ◽  
Xi Wu ◽  
...  

Real-time semantic segmentation has become crucial in many applications such as medical image analysis and autonomous driving. In this paper, we introduce a single semantic segmentation network, called DNS, for joint object detection and segmentation task. We take advantage of multi-scale deconvolution mechanism to perform real time computations. To this goal, down-scale and up-scale streams are utilized to combine the multi-scale features for the final detection and segmentation task. By using the proposed DNS, not only the tradeoff between accuracy and cost but also the balance of detection and segmentation performance are settled. Experimental results for PASCAL VOC datasets show competitive performance for joint object detection and segmentation task.


Sensors ◽  
2020 ◽  
Vol 20 (12) ◽  
pp. 3591 ◽  
Author(s):  
Haidi Zhu ◽  
Haoran Wei ◽  
Baoqing Li ◽  
Xiaobing Yuan ◽  
Nasser Kehtarnavaz

This paper addresses real-time moving object detection with high accuracy in high-resolution video frames. A previously developed framework for moving object detection is modified to enable real-time processing of high-resolution images. First, a computationally efficient method is employed, which detects moving regions on a resized image while maintaining moving regions on the original image with mapping coordinates. Second, a light backbone deep neural network in place of a more complex one is utilized. Third, the focal loss function is employed to alleviate the imbalance between positive and negative samples. The results of the extensive experimentations conducted indicate that the modified framework developed in this paper achieves a processing rate of 21 frames per second with 86.15% accuracy on the dataset SimitMovingDataset, which contains high-resolution images of the size 1920 × 1080.


Entropy ◽  
2020 ◽  
Vol 22 (9) ◽  
pp. 941
Author(s):  
Rakesh Chandra Joshi ◽  
Saumya Yadav ◽  
Malay Kishore Dutta ◽  
Carlos M. Travieso-Gonzalez

Visually impaired people face numerous difficulties in their daily life, and technological interventions may assist them to meet these challenges. This paper proposes an artificial intelligence-based fully automatic assistive technology to recognize different objects, and auditory inputs are provided to the user in real time, which gives better understanding to the visually impaired person about their surroundings. A deep-learning model is trained with multiple images of objects that are highly relevant to the visually impaired person. Training images are augmented and manually annotated to bring more robustness to the trained model. In addition to computer vision-based techniques for object recognition, a distance-measuring sensor is integrated to make the device more comprehensive by recognizing obstacles while navigating from one place to another. The auditory information that is conveyed to the user after scene segmentation and obstacle identification is optimized to obtain more information in less time for faster processing of video frames. The average accuracy of this proposed method is 95.19% and 99.69% for object detection and recognition, respectively. The time complexity is low, allowing a user to perceive the surrounding scene in real time.


2021 ◽  
Vol 421 ◽  
pp. 173-182
Author(s):  
Haijun Zhang ◽  
Mingshan Sun ◽  
Qun Li ◽  
Linlin Liu ◽  
Ming Liu ◽  
...  

2016 ◽  
Vol 2016 ◽  
pp. 1-9 ◽  
Author(s):  
Iljung Yoon ◽  
Heewon Joung ◽  
Jooheung Lee

We implement Zynq-based self-reconfigurable system to perform real-time edge detection of 1080p video sequences. While object edge detection is a fundamental tool in computer vision, noises in the video frames negatively affect edge detection results significantly. Moreover, due to the high computational complexity of 1080p video filtering operations, hardware implementation on reconfigurable hardware fabric is necessary. Here, the proposed embedded system utilizes dynamic reconfiguration capability of Zynq SoC so that partial reconfiguration of different filter bitstreams is performed during run-time according to the detected noise density level in the incoming video frames. Pratt’s Figure of Merit (PFOM) to evaluate the accuracy of edge detection is analyzed for various noise density levels, and we demonstrate that adaptive run-time reconfiguration of the proposed filter bitstreams significantly increases the accuracy of edge detection results while efficiently providing computing power to support real-time processing of 1080p video frames. Performance results on configuration time, CPU usage, and hardware resource utilization are also compared.


2021 ◽  
Vol 2082 (1) ◽  
pp. 012012
Author(s):  
Xu Zhang ◽  
Fang Han ◽  
Ping Wang ◽  
Wei Jiang ◽  
Chen Wang

Abstract Feature pyramids have become an essential component in most modern object detectors, such as Mask RCNN, YOLOv3, RetinaNet. In these detectors, the pyramidal feature representations are commonly used which represent an image with multi-scale feature layers. However, the detectors can’t be used in many real world applications which require real time performance under a computationally limited circumstance. In the paper, we study network architecture in YOLOv3 and modify the classical backbone--darknet53 of YOLOv3 by using a group of convolutions and dilated convolutions (DC). Then, a novel one-stage object detection network framework called DC-YOLOv3 is proposed. A lot of experiments on the Pascal 2017 benchmark prove the effectiveness of our framework. The results illustrate that DC-YOLOv3 achieves comparable results with YOLOv3 while being about 1.32× faster in training time and 1.38× faster in inference time.


Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5080
Author(s):  
Baohua Qiang ◽  
Ruidong Chen ◽  
Mingliang Zhou ◽  
Yuanchao Pang ◽  
Yijie Zhai ◽  
...  

In recent years, increasing image data comes from various sensors, and object detection plays a vital role in image understanding. For object detection in complex scenes, more detailed information in the image should be obtained to improve the accuracy of detection task. In this paper, we propose an object detection algorithm by jointing semantic segmentation (SSOD) for images. First, we construct a feature extraction network that integrates the hourglass structure network with the attention mechanism layer to extract and fuse multi-scale features to generate high-level features with rich semantic information. Second, the semantic segmentation task is used as an auxiliary task to allow the algorithm to perform multi-task learning. Finally, multi-scale features are used to predict the location and category of the object. The experimental results show that our algorithm substantially enhances object detection performance and consistently outperforms other three comparison algorithms, and the detection speed can reach real-time, which can be used for real-time detection.


Sign in / Sign up

Export Citation Format

Share Document