Video Object Detection Using Event-Aware Convolutional Lstm and Object Relation Networks

Chen Zhang; Zhengyu Xia; Joohee Kim

doi:10.3390/electronics10161918

Video Object Detection Using Event-Aware Convolutional Lstm and Object Relation Networks

Electronics ◽

10.3390/electronics10161918 ◽

2021 ◽

Vol 10 (16) ◽

pp. 1918

Author(s):

Chen Zhang ◽

Zhengyu Xia ◽

Joohee Kim

Keyword(s):

Object Detection ◽

Reference Frames ◽

Contextual Information ◽

Object Relation ◽

Detection Performance ◽

Video Object ◽

Relation Module ◽

Ratio Change ◽

Large Motion ◽

Supporting Frame

Common video-based object detectors exploit temporal contextual information to improve the performance of object detection. However, detecting objects under challenging conditions has not been thoroughly studied yet. In this paper, we focus on improving the detection performance for challenging events such as aspect ratio change, occlusion, or large motion. To this end, we propose a video object detection network using event-aware ConvLSTM and object relation networks. Our proposed event-aware ConvLSTM is able to highlight the area where those challenging events take place. Compared with traditional ConvLSTM, with the proposed method it is easier to exploit temporal contextual information to support video-based object detectors under challenging events. To further improve the detection performance, an object relation module using supporting frame selection is applied to enhance the pooled features for target ROI. It effectively selects the features of the same object from one of the reference frames rather than all of them. Experimental results on ImageNet VID dataset show that the proposed method achieves mAP of 81.0% without any post processing and can handle challenging events efficiently in video object detection.

Download Full-text

Optical-flow-based framework to boost video object detection performance with object enhancement

Expert Systems with Applications ◽

10.1016/j.eswa.2020.114544 ◽

2021 ◽

Vol 170 ◽

pp. 114544

Author(s):

Long Fan ◽

Tao Zhang ◽

Wenli Du

Keyword(s):

Object Detection ◽

Optical Flow ◽

Detection Performance ◽

Video Object

Download Full-text

More successful recognition: Seeking the relation of video object detection performance with video coding parameters

2015 12th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP) ◽

10.1109/iccwamtip.2015.7493971 ◽

2015 ◽

Author(s):

Tao Liu ◽

Zemin Wu ◽

Mingyong Zeng ◽

Qingzhu Jiang ◽

Lei Hu

Keyword(s):

Video Coding ◽

Object Detection ◽

Detection Performance ◽

Video Object

Download Full-text

Object Detection Based on Faster R-CNN Algorithm with Skip Pooling and Fusion of Contextual Information

Sensors ◽

10.3390/s20195490 ◽

2020 ◽

Vol 20 (19) ◽

pp. 5490

Author(s):

Yi Xiao ◽

Xinqing Wang ◽

Peng Zhang ◽

Fanjie Meng ◽

Faming Shao

Keyword(s):

Neural Network ◽

Deep Learning ◽

Object Detection ◽

Detection Efficiency ◽

Contextual Information ◽

Recall Rate ◽

Detection Performance ◽

Single Shot ◽

Average Improvement ◽

Neural Network Algorithm

Deep learning is currently the mainstream method of object detection. Faster region-based convolutional neural network (Faster R-CNN) has a pivotal position in deep learning. It has impressive detection effects in ordinary scenes. However, under special conditions, there can still be unsatisfactory detection performance, such as the object having problems like occlusion, deformation, or small size. This paper proposes a novel and improved algorithm based on the Faster R-CNN framework combined with the Faster R-CNN algorithm with skip pooling and fusion of contextual information. This algorithm can improve the detection performance under special conditions on the basis of Faster R-CNN. The improvement mainly has three parts: The first part adds a context information feature extraction model after the conv5_3 of the convolutional layer; the second part adds skip pooling so that the former can fully obtain the contextual information of the object, especially for situations where the object is occluded and deformed; and the third part replaces the region proposal network (RPN) with a more efficient guided anchor RPN (GA-RPN), which can maintain the recall rate while improving the detection performance. The latter can obtain more detailed information from different feature layers of the deep neural network algorithm, and is especially aimed at scenes with small objects. Compared with Faster R-CNN, you only look once series (such as: YOLOv3), single shot detector (such as: SSD512), and other object detection algorithms, the algorithm proposed in this paper has an average improvement of 6.857% on the mean average precision (mAP) evaluation index while maintaining a certain recall rate. This strongly proves that the proposed method has higher detection rate and detection efficiency in this case.

Download Full-text

Local Attention Sequence Model for Video Object Detection

Applied Sciences ◽

10.3390/app11104561 ◽

2021 ◽

Vol 11 (10) ◽

pp. 4561

Author(s):

Zhenhui Li ◽

Xiaoping Zhuang ◽

Haibo Wang ◽

Yong Nie ◽

Jianzhong Tang

Keyword(s):

Information Processing ◽

Object Detection ◽

Detection Performance ◽

Temporal Information ◽

Detection Accuracy ◽

Video Object ◽

Processing Efficiency

Video object detection still faces several difficulties and challenges. For example, the imbalance of positive and negative samples leads to low information processing efficiency, and detection performance declines in abnormal situations in video. This paper examines video object detection based on local attention to address such challenges. We propose a local attention sequence model and optimized the parameter and calculation of ConvGRU. It could process spatial and temporal information in videos more efficiently and ultimately improve detection performance under abnormal conditions. The experiments on ImageNet VID show that our method could improve the detection accuracy by 5.3%, and the visualization results show that the method is adaptive to different abnormal conditions, thereby improving the reliability of video object detection.

Download Full-text

Sequence Level Semantics Aggregation for Video Object Detection

2019 IEEE/CVF International Conference on Computer Vision (ICCV) ◽

10.1109/iccv.2019.00931 ◽

2019 ◽

Cited By ~ 12

Author(s):

Haiping Wu ◽

Yuntao Chen ◽

Naiyan Wang ◽

Zhao-Xiang Zhang

Keyword(s):

Object Detection ◽

Video Object

Download Full-text

Global and local feature alignment for video object detection

Proceedings of the 2nd ACM International Conference on Multimedia in Asia ◽

10.1145/3444685.3446263 ◽

2021 ◽

Author(s):

Haihui Ye ◽

Qiang Qi ◽

Ying Wang ◽

Yang Lu ◽

Hanzi Wang

Keyword(s):

Object Detection ◽

Local Feature ◽

Video Object ◽

Global And Local ◽

Feature Alignment

Download Full-text

Small Object Detection in Remote Sensing Images with Residual Feature Aggregation-Based Super-Resolution and Object Detector Network

Remote Sensing ◽

10.3390/rs13091854 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1854

Author(s):

Syed Muhammad Arsalan Bashir ◽

Yi Wang

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Resolution Enhancement ◽

Super Resolution ◽

Detection Performance ◽

Image Resolution ◽

Small Object ◽

Remote Sensing Images ◽

Feature Aggregation ◽

Image Super Resolution

This paper deals with detecting small objects in remote sensing images from satellites or any aerial vehicle by utilizing the concept of image super-resolution for image resolution enhancement using a deep-learning-based detection method. This paper provides a rationale for image super-resolution for small objects by improving the current super-resolution (SR) framework by incorporating a cyclic generative adversarial network (GAN) and residual feature aggregation (RFA) to improve detection performance. The novelty of the method is threefold: first, a framework is proposed, independent of the final object detector used in research, i.e., YOLOv3 could be replaced with Faster R-CNN or any object detector to perform object detection; second, a residual feature aggregation network was used in the generator, which significantly improved the detection performance as the RFA network detected complex features; and third, the whole network was transformed into a cyclic GAN. The image super-resolution cyclic GAN with RFA and YOLO as the detection network is termed as SRCGAN-RFA-YOLO, which is compared with the detection accuracies of other methods. Rigorous experiments on both satellite images and aerial images (ISPRS Potsdam, VAID, and Draper Satellite Image Chronology datasets) were performed, and the results showed that the detection performance increased by using super-resolution methods for spatial resolution enhancement; for an IoU of 0.10, AP of 0.7867 was achieved for a scale factor of 16.

Download Full-text

A Note on Advantages of the Fuzzy Gabor Filter in Object and Text Detection

Symmetry ◽

10.3390/sym13040678 ◽

2021 ◽

Vol 13 (4) ◽

pp. 678

Author(s):

Vladimir Tadic ◽

Tatjana Loncar-Turukalo ◽

Akos Odry ◽

Zeljen Trpovski ◽

Attila Toth ◽

...

Keyword(s):

Object Detection ◽

Detection Method ◽

Gabor Filter ◽

Low Cost ◽

Fuzzy Optimization ◽

Detection Performance ◽

Text Detection ◽

License Plate ◽

2D Gabor Filter ◽

Fine Tune

This note presents a fuzzy optimization of Gabor filter-based object and text detection. The derivation of a 2D Gabor filter and the guidelines for the fuzzification of the filter parameters are described. The fuzzy Gabor filter proved to be a robust text an object detection method in low-quality input images as extensively evaluated in the problem of license plate localization. The extended set of examples confirmed that the fuzzy optimized Gabor filter with adequately fuzzified parameters detected the desired license plate texture components and highly improved the object detection when compared to the classic Gabor filter. The robustness of the proposed approach was further demonstrated on other images of various origin containing text and different textures, captured using low-cost or modest quality acquisition procedures. The possibility to fine tune the fuzzification procedure to better suit certain applications offers the potential to further boost detection performance.

Download Full-text