MS-Faster R-CNN: Multi-Stream Backbone for Improved Faster R-CNN Object Detection and Aerial Tracking from UAV Images

Danilo Avola; Luigi Cinque; Anxhelo Diko; Alessio Fagioli; Gian Luca Foresti; Alessio Mecca; Daniele Pannone; Claudio Piciarelli

doi:10.3390/rs13091670

MS-Faster R-CNN: Multi-Stream Backbone for Improved Faster R-CNN Object Detection and Aerial Tracking from UAV Images

Remote Sensing ◽

10.3390/rs13091670 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1670

Author(s):

Danilo Avola ◽

Luigi Cinque ◽

Anxhelo Diko ◽

Alessio Fagioli ◽

Gian Luca Foresti ◽

...

Keyword(s):

Image Analysis ◽

Object Detection ◽

Real Time ◽

Video Sequences ◽

Multi Scale ◽

Video Frames ◽

Common Strategy ◽

Real Time Tracking ◽

Uav Images ◽

Sort Algorithm

Tracking objects across multiple video frames is a challenging task due to several difficult issues such as occlusions, background clutter, lighting as well as object and camera view-point variations, which directly affect the object detection. These aspects are even more emphasized when analyzing unmanned aerial vehicles (UAV) based images, where the vehicle movement can also impact the image quality. A common strategy employed to address these issues is to analyze the input images at different scales to obtain as much information as possible to correctly detect and track the objects across video sequences. Following this rationale, in this paper, we introduce a simple yet effective novel multi-stream (MS) architecture, where different kernel sizes are applied to each stream to simulate a multi-scale image analysis. The proposed architecture is then used as backbone for the well-known Faster-R-CNN pipeline, defining a MS-Faster R-CNN object detector that consistently detects objects in video sequences. Subsequently, this detector is jointly used with the Simple Online and Real-time Tracking with a Deep Association Metric (Deep SORT) algorithm to achieve real-time tracking capabilities on UAV images. To assess the presented architecture, extensive experiments were performed on the UMCD, UAVDT, UAV20L, and UAV123 datasets. The presented pipeline achieved state-of-the-art performance, confirming that the proposed multi-stream method can correctly emulate the robust multi-scale image analysis paradigm.

Download Full-text

DNS: A multi-scale deconvolution semantic segmentation network for joint detection and segmentation

MATEC Web of Conferences ◽

10.1051/matecconf/201927702005 ◽

2019 ◽

Vol 277 ◽

pp. 02005

Author(s):

Ning Feng ◽

Le Dong ◽

Qianni Zhang ◽

Ning Zhang ◽

Xi Wu ◽

...

Keyword(s):

Image Analysis ◽

Object Detection ◽

Real Time ◽

Medical Image ◽

Medical Image Analysis ◽

Semantic Segmentation ◽

Autonomous Driving ◽

Joint Detection ◽

Multi Scale ◽

Segmentation Task

Real-time semantic segmentation has become crucial in many applications such as medical image analysis and autonomous driving. In this paper, we introduce a single semantic segmentation network, called DNS, for joint object detection and segmentation task. We take advantage of multi-scale deconvolution mechanism to perform real time computations. To this goal, down-scale and up-scale streams are utilized to combine the multi-scale features for the final detection and segmentation task. By using the proposed DNS, not only the tradeoff between accuracy and cost but also the balance of detection and segmentation performance are settled. Experimental results for PASCAL VOC datasets show competitive performance for joint object detection and segmentation task.

Download Full-text

Interpolation-Based Object Detection Using Motion Vectors for Embedded Real-time Tracking Systems

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) ◽

10.1109/cvprw.2018.00104 ◽

2018 ◽

Cited By ~ 3

Author(s):

Takayuki Ujiie ◽

Masayuki Hiromoto ◽

Takashi Sato

Keyword(s):

Object Detection ◽

Real Time ◽

Tracking Systems ◽

Motion Vectors ◽

Real Time Tracking

Download Full-text

Real-Time Moving Object Detection in High-Resolution Video Sensing

Sensors ◽

10.3390/s20123591 ◽

2020 ◽

Vol 20 (12) ◽

pp. 3591 ◽

Cited By ~ 1

Author(s):

Haidi Zhu ◽

Haoran Wei ◽

Baoqing Li ◽

Xiaobing Yuan ◽

Nasser Kehtarnavaz

Keyword(s):

High Resolution ◽

Object Detection ◽

Real Time ◽

Moving Object Detection ◽

Moving Object ◽

Computationally Efficient ◽

Real Time Processing ◽

Time Processing ◽

Video Frames ◽

High Resolution Images

This paper addresses real-time moving object detection with high accuracy in high-resolution video frames. A previously developed framework for moving object detection is modified to enable real-time processing of high-resolution images. First, a computationally efficient method is employed, which detects moving regions on a resized image while maintaining moving regions on the original image with mapping coordinates. Second, a light backbone deep neural network in place of a more complex one is utilized. Third, the focal loss function is employed to alleviate the imbalance between positive and negative samples. The results of the extensive experimentations conducted indicate that the modified framework developed in this paper achieves a processing rate of 21 frames per second with 86.15% accuracy on the dataset SimitMovingDataset, which contains high-resolution images of the size 1920 × 1080.

Download Full-text

Efficient Multi-Object Detection and Smart Navigation Using Artificial Intelligence for Visually Impaired People

Entropy ◽

10.3390/e22090941 ◽

2020 ◽

Vol 22 (9) ◽

pp. 941

Author(s):

Rakesh Chandra Joshi ◽

Saumya Yadav ◽

Malay Kishore Dutta ◽

Carlos M. Travieso-Gonzalez

Keyword(s):

Artificial Intelligence ◽

Object Detection ◽

Real Time ◽

Visually Impaired ◽

Auditory Information ◽

Visually Impaired People ◽

Video Frames ◽

Average Accuracy ◽

Impaired People ◽

Impaired Person

Visually impaired people face numerous difficulties in their daily life, and technological interventions may assist them to meet these challenges. This paper proposes an artificial intelligence-based fully automatic assistive technology to recognize different objects, and auditory inputs are provided to the user in real time, which gives better understanding to the visually impaired person about their surroundings. A deep-learning model is trained with multiple images of objects that are highly relevant to the visually impaired person. Training images are augmented and manually annotated to bring more robustness to the trained model. In addition to computer vision-based techniques for object recognition, a distance-measuring sensor is integrated to make the device more comprehensive by recognizing obstacles while navigating from one place to another. The auditory information that is conveyed to the user after scene segmentation and obstacle identification is optimized to obtain more information in less time for faster processing of video frames. The average accuracy of this proposed method is 95.19% and 99.69% for object detection and recognition, respectively. The time complexity is low, allowing a user to perceive the surrounding scene in real time.

Download Full-text

Real-Time Tracking of Video Sequences in a Panoramic View for Object-Based Video Coding

Image Analysis - Lecture Notes in Computer Science ◽

10.1007/3-540-45103-x_134 ◽

2003 ◽

pp. 1022-1029

Author(s):

Matthijs Douze ◽

Vincent Charvillat

Keyword(s):

Video Coding ◽

Real Time ◽

Video Sequences ◽

Panoramic View ◽

Object Based ◽

Real Time Tracking

Download Full-text

An empirical study of multi-scale object detection in high resolution UAV images

Neurocomputing ◽

10.1016/j.neucom.2020.08.074 ◽

2021 ◽

Vol 421 ◽

pp. 173-182

Author(s):

Haijun Zhang ◽

Mingshan Sun ◽

Qun Li ◽

Linlin Liu ◽

Ming Liu ◽

...

Keyword(s):

High Resolution ◽

Empirical Study ◽

Object Detection ◽

Multi Scale ◽

Uav Images

Download Full-text

Zynq-Based Reconfigurable System for Real-Time Edge Detection of Noisy Video Sequences

Journal of Sensors ◽

10.1155/2016/2654059 ◽

2016 ◽

Vol 2016 ◽

pp. 1-9 ◽

Cited By ~ 2

Author(s):

Iljung Yoon ◽

Heewon Joung ◽

Jooheung Lee

Keyword(s):

Edge Detection ◽

Embedded System ◽

Real Time ◽

Video Sequences ◽

Density Level ◽

Real Time Processing ◽

Reconfigurable System ◽

Video Frames ◽

Noise Density ◽

Run Time

We implement Zynq-based self-reconfigurable system to perform real-time edge detection of 1080p video sequences. While object edge detection is a fundamental tool in computer vision, noises in the video frames negatively affect edge detection results significantly. Moreover, due to the high computational complexity of 1080p video filtering operations, hardware implementation on reconfigurable hardware fabric is necessary. Here, the proposed embedded system utilizes dynamic reconfiguration capability of Zynq SoC so that partial reconfiguration of different filter bitstreams is performed during run-time according to the detected noise density level in the incoming video frames. Pratt’s Figure of Merit (PFOM) to evaluate the accuracy of edge detection is analyzed for various noise density levels, and we demonstrate that adaptive run-time reconfiguration of the proposed filter bitstreams significantly increases the accuracy of edge detection results while efficiently providing computing power to support real-time processing of 1080p video frames. Performance results on configuration time, CPU usage, and hardware resource utilization are also compared.

Download Full-text

DC-YOLOv3: A novel efficient object detection algorithm

Journal of Physics Conference Series ◽

10.1088/1742-6596/2082/1/012012 ◽

2021 ◽

Vol 2082 (1) ◽

pp. 012012

Author(s):

Xu Zhang ◽

Fang Han ◽

Ping Wang ◽

Wei Jiang ◽

Chen Wang

Keyword(s):

Object Detection ◽

Real Time ◽

Real World ◽

Network Architecture ◽

Detection Algorithm ◽

Training Time ◽

Feature Representations ◽

Multi Scale ◽

Time Performance ◽

Real World Applications

Abstract Feature pyramids have become an essential component in most modern object detectors, such as Mask RCNN, YOLOv3, RetinaNet. In these detectors, the pyramidal feature representations are commonly used which represent an image with multi-scale feature layers. However, the detectors can’t be used in many real world applications which require real time performance under a computationally limited circumstance. In the paper, we study network architecture in YOLOv3 and modify the classical backbone--darknet53 of YOLOv3 by using a group of convolutions and dilated convolutions (DC). Then, a novel one-stage object detection network framework called DC-YOLOv3 is proposed. A lot of experiments on the Pascal 2017 benchmark prove the effectiveness of our framework. The results illustrate that DC-YOLOv3 achieves comparable results with YOLOv3 while being about 1.32× faster in training time and 1.38× faster in inference time.

Download Full-text

REAL-TIME MOVING OBJECT DETECTION IN VIDEO SEQUENCES USING SPATIO-TEMPORAL ADAPTIVE GAUSSIAN MIXTURE MODELS

Proceedings of the International Conference on Computer Vision Theory and Applications ◽

10.5220/0002816904130418 ◽

2010 ◽

Keyword(s):

Object Detection ◽

Real Time ◽

Mixture Models ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Moving Object Detection ◽

Moving Object ◽

Video Sequences ◽

Spatio Temporal

Download Full-text

Convolutional Neural Networks-Based Object Detection Algorithm by Jointing Semantic Segmentation for Images

Sensors ◽

10.3390/s20185080 ◽

2020 ◽

Vol 20 (18) ◽

pp. 5080

Author(s):

Baohua Qiang ◽

Ruidong Chen ◽

Mingliang Zhou ◽

Yuanchao Pang ◽

Yijie Zhai ◽

...

Keyword(s):

Object Detection ◽

Real Time ◽

Image Data ◽

Semantic Segmentation ◽

Image Understanding ◽

Detection Algorithm ◽

Vital Role ◽

Multi Scale ◽

Segmentation Task ◽

High Level

In recent years, increasing image data comes from various sensors, and object detection plays a vital role in image understanding. For object detection in complex scenes, more detailed information in the image should be obtained to improve the accuracy of detection task. In this paper, we propose an object detection algorithm by jointing semantic segmentation (SSOD) for images. First, we construct a feature extraction network that integrates the hourglass structure network with the attention mechanism layer to extract and fuse multi-scale features to generate high-level features with rich semantic information. Second, the semantic segmentation task is used as an auxiliary task to allow the algorithm to perform multi-task learning. Finally, multi-scale features are used to predict the location and category of the object. The experimental results show that our algorithm substantially enhances object detection performance and consistently outperforms other three comparison algorithms, and the detection speed can reach real-time, which can be used for real-time detection.

Download Full-text