Local Attention Sequence Model for Video Object Detection

Zhenhui Li; Xiaoping Zhuang; Haibo Wang; Yong Nie; Jianzhong Tang

doi:10.3390/app11104561

Local Attention Sequence Model for Video Object Detection

Applied Sciences ◽

10.3390/app11104561 ◽

2021 ◽

Vol 11 (10) ◽

pp. 4561

Author(s):

Zhenhui Li ◽

Xiaoping Zhuang ◽

Haibo Wang ◽

Yong Nie ◽

Jianzhong Tang

Keyword(s):

Information Processing ◽

Object Detection ◽

Detection Performance ◽

Temporal Information ◽

Detection Accuracy ◽

Video Object ◽

Processing Efficiency

Video object detection still faces several difficulties and challenges. For example, the imbalance of positive and negative samples leads to low information processing efficiency, and detection performance declines in abnormal situations in video. This paper examines video object detection based on local attention to address such challenges. We propose a local attention sequence model and optimized the parameter and calculation of ConvGRU. It could process spatial and temporal information in videos more efficiently and ultimately improve detection performance under abnormal conditions. The experiments on ImageNet VID show that our method could improve the detection accuracy by 5.3%, and the visualization results show that the method is adaptive to different abnormal conditions, thereby improving the reliability of video object detection.

Download Full-text

IoU-Adaptive Deformable R-CNN: Make Full Use of IoU for Multi-Class Object Detection in Remote Sensing Imagery

Remote Sensing ◽

10.3390/rs11030286 ◽

2019 ◽

Vol 11 (3) ◽

pp. 286 ◽

Cited By ~ 24

Author(s):

Jiangqiao Yan ◽

Hongqi Wang ◽

Menglong Yan ◽

Wenhui Diao ◽

Xian Sun ◽

...

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Object Detection ◽

Convolutional Neural Network ◽

State Of The Art ◽

Ground Truth ◽

Detection Performance ◽

Candidate Region ◽

Detection Accuracy ◽

Remote Sensing Images

Recently, methods based on Faster region-based convolutional neural network (R-CNN)have been popular in multi-class object detection in remote sensing images due to their outstandingdetection performance. The methods generally propose candidate region of interests (ROIs) througha region propose network (RPN), and the regions with high enough intersection-over-union (IoU)values against ground truth are treated as positive samples for training. In this paper, we find thatthe detection result of such methods is sensitive to the adaption of different IoU thresholds. Specially,detection performance of small objects is poor when choosing a normal higher threshold, while alower threshold will result in poor location accuracy caused by a large quantity of false positives.To address the above issues, we propose a novel IoU-Adaptive Deformable R-CNN framework formulti-class object detection. Specially, by analyzing the different roles that IoU can play in differentparts of the network, we propose an IoU-guided detection framework to reduce the loss of small objectinformation during training. Besides, the IoU-based weighted loss is designed, which can learn theIoU information of positive ROIs to improve the detection accuracy effectively. Finally, the class aspectratio constrained non-maximum suppression (CARC-NMS) is proposed, which further improves theprecision of the results. Extensive experiments validate the effectiveness of our approach and weachieve state-of-the-art detection performance on the DOTA dataset.

Download Full-text

Optical-flow-based framework to boost video object detection performance with object enhancement

Expert Systems with Applications ◽

10.1016/j.eswa.2020.114544 ◽

2021 ◽

Vol 170 ◽

pp. 114544

Author(s):

Long Fan ◽

Tao Zhang ◽

Wenli Du

Keyword(s):

Object Detection ◽

Optical Flow ◽

Detection Performance ◽

Video Object

Download Full-text

More successful recognition: Seeking the relation of video object detection performance with video coding parameters

2015 12th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP) ◽

10.1109/iccwamtip.2015.7493971 ◽

2015 ◽

Author(s):

Tao Liu ◽

Zemin Wu ◽

Mingyong Zeng ◽

Qingzhu Jiang ◽

Lei Hu

Keyword(s):

Video Coding ◽

Object Detection ◽

Detection Performance ◽

Video Object

Download Full-text

The ‘bullshit blind spot’: The roles of overconfidence and perceived information processing in bullshit detection

10.31234/osf.io/kbfrz ◽

2021 ◽

Author(s):

Shane Littrell ◽

Jonathan Albert Fugelsang

Keyword(s):

Information Processing ◽

Reflective Thinking ◽

Blind Spot ◽

Detection Performance ◽

Detection Accuracy ◽

Misleading Information ◽

Thinking Processes ◽

Metacognitive Experience ◽

People’S Perceptions

The growing prevalence of misinformation (i.e., bullshit) in society carries with it an increased need to understand the processes underlying many people’s susceptibility to falling for it. Though several cognitive and metacognitive variables have been found to be associated with a greater propensity to falling for bullshit, little attention has been paid to people’s perceptions of and confidence in their own ability to detect it and the phenomenology of the thinking processes they employ when evaluating misleading information. Here we report two studies (N = 412) examining the associations between bullshit detection accuracy, confidence in one’s bullshit detection abilities, and the metacognitive experience of evaluating potentially misleading information. We find that people with the poorest bullshit detection performance grossly overestimate their detection abilities and significantly overplace those abilities compared to others. Additionally, highly bullshit receptive people reported using both intuitive and reflective thinking processes when evaluating misleading information. These results suggest that some people may have a “bullshit blind spot” and that traditional miserly processing explanations of receptivity to misleading information may be insufficient to fully account for these effects.

Download Full-text

Video Object Detection with Locally-Weighted Deformable Neighbors

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33018529 ◽

2019 ◽

Vol 33 ◽

pp. 8529-8536 ◽

Cited By ~ 6

Author(s):

Zhengkai Jiang ◽

Peng Gao ◽

Chaoxu Guo ◽

Qian Zhang ◽

Shiming Xiang ◽

...

Keyword(s):

Object Detection ◽

Superior Performance ◽

Detection Accuracy ◽

Great Success ◽

Video Object ◽

Deep Convolutional Neural Networks ◽

Discriminative Ability ◽

Memory Mechanism ◽

High Level ◽

Guided Propagation

Deep convolutional neural networks have achieved great success on various image recognition tasks. However, it is nontrivial to transfer the existing networks to video due to the fact that most of them are developed for static image. Frame-byframe processing is suboptimal because temporal information that is vital for video understanding is totally abandoned. Furthermore, frame-by-frame processing is slow and inefficient, which can hinder the practical usage. In this paper, we propose LWDN (Locally-Weighted Deformable Neighbors) for video object detection without utilizing time-consuming optical flow extraction networks. LWDN can latently align the high-level features between keyframes and keyframes or nonkeyframes. Inspired by (Zhu et al. 2017a) and (Hetang et al. 2017) who propose to aggregate features between keyframes and keyframes, we adopt brain-inspired memory mechanism to propagate and update the memory feature from keyframes to keyframes. We call this process Memory-Guided Propagation. With such a memory mechanism, the discriminative ability of features in keyframes and non-keyframes are both enhanced, which helps to improve the detection accuracy. Extensive experiments on VID dataset demonstrate that our method achieves superior performance in a speed and accuracy trade-off, i.e., 76.3% on the challenging VID dataset while maintaining 20fps in speed on Titan X GPU.

Download Full-text

Video Object Detection Using Event-Aware Convolutional Lstm and Object Relation Networks

Electronics ◽

10.3390/electronics10161918 ◽

2021 ◽

Vol 10 (16) ◽

pp. 1918

Author(s):

Chen Zhang ◽

Zhengyu Xia ◽

Joohee Kim

Keyword(s):

Object Detection ◽

Reference Frames ◽

Contextual Information ◽

Object Relation ◽

Detection Performance ◽

Video Object ◽

Relation Module ◽

Ratio Change ◽

Large Motion ◽

Supporting Frame

Common video-based object detectors exploit temporal contextual information to improve the performance of object detection. However, detecting objects under challenging conditions has not been thoroughly studied yet. In this paper, we focus on improving the detection performance for challenging events such as aspect ratio change, occlusion, or large motion. To this end, we propose a video object detection network using event-aware ConvLSTM and object relation networks. Our proposed event-aware ConvLSTM is able to highlight the area where those challenging events take place. Compared with traditional ConvLSTM, with the proposed method it is easier to exploit temporal contextual information to support video-based object detectors under challenging events. To further improve the detection performance, an object relation module using supporting frame selection is applied to enhance the pooled features for target ROI. It effectively selects the features of the same object from one of the reference frames rather than all of them. Experimental results on ImageNet VID dataset show that the proposed method achieves mAP of 81.0% without any post processing and can handle challenging events efficiently in video object detection.

Download Full-text

A Study on Temporal Information Processing and the Two-Phase Model of Time Shifts Processing

Advances in Psychological Science ◽

10.3724/sp.j.1042.2012.00963 ◽

2013 ◽

Vol 20 (7) ◽

pp. 963-970

Author(s):

Xian-You HE ◽

Hui YANG ◽

Yu-Mei DENG ◽

Shuang WU

Keyword(s):

Information Processing ◽

Temporal Information ◽

Phase Model ◽

Two Phase ◽

Temporal Information Processing

Download Full-text

Sequence Level Semantics Aggregation for Video Object Detection

2019 IEEE/CVF International Conference on Computer Vision (ICCV) ◽

10.1109/iccv.2019.00931 ◽

2019 ◽

Cited By ~ 12

Author(s):

Haiping Wu ◽

Yuntao Chen ◽

Naiyan Wang ◽

Zhao-Xiang Zhang

Keyword(s):

Object Detection ◽

Video Object

Download Full-text

Implementation of a Modified Faster R-CNN for Target Detection Technology of Coastal Defense Radar

Remote Sensing ◽

10.3390/rs13091703 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1703

Author(s):

He Yan ◽

Chao Chen ◽

Guodong Jin ◽

Jindong Zhang ◽

Xudong Wang ◽

...

Keyword(s):

False Alarm ◽

False Alarm Rate ◽

Target Detection ◽

Real Data ◽

Detection Performance ◽

Detection Accuracy ◽

Constant False Alarm Rate ◽

Data Set ◽

Detection Technology ◽

Coastal Defense

The traditional method of constant false-alarm rate detection is based on the assumption of an echo statistical model. The target recognition accuracy rate and the high false-alarm rate under the background of sea clutter and other interferences are very low. Therefore, computer vision technology is widely discussed to improve the detection performance. However, the majority of studies have focused on the synthetic aperture radar because of its high resolution. For the defense radar, the detection performance is not satisfactory because of its low resolution. To this end, we herein propose a novel target detection method for the coastal defense radar based on faster region-based convolutional neural network (Faster R-CNN). The main processing steps are as follows: (1) the Faster R-CNN is selected as the sea-surface target detector because of its high target detection accuracy; (2) a modified Faster R-CNN based on the characteristics of sparsity and small target size in the data set is employed; and (3) soft non-maximum suppression is exploited to eliminate the possible overlapped detection boxes. Furthermore, detailed comparative experiments based on a real data set of coastal defense radar are performed. The mean average precision of the proposed method is improved by 10.86% compared with that of the original Faster R-CNN.

Download Full-text

Transcription Alignment of Historical Vietnamese Manuscripts without Human-Annotated Learning Samples

Applied Sciences ◽

10.3390/app11114894 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4894

Author(s):

Anna Scius-Bertrand ◽

Michael Jungo ◽

Beat Wolf ◽

Andreas Fischer ◽

Marc Bui

Keyword(s):

Object Detection ◽

State Of The Art ◽

Positive Impact ◽

Detection System ◽

Training Data ◽

Detection Accuracy ◽

Current State ◽

Alignment Task ◽

Scanned Image ◽

Automatic Transcription

The current state of the art for automatic transcription of historical manuscripts is typically limited by the requirement of human-annotated learning samples, which are are necessary to train specific machine learning models for specific languages and scripts. Transcription alignment is a simpler task that aims to find a correspondence between text in the scanned image and its existing Unicode counterpart, a correspondence which can then be used as training data. The alignment task can be approached with heuristic methods dedicated to certain types of manuscripts, or with weakly trained systems reducing the required amount of annotations. In this article, we propose a novel learning-based alignment method based on fully convolutional object detection that does not require any human annotation at all. Instead, the object detection system is initially trained on synthetic printed pages using a font and then adapted to the real manuscripts by means of self-training. On a dataset of historical Vietnamese handwriting, we demonstrate the feasibility of annotation-free alignment as well as the positive impact of self-training on the character detection accuracy, reaching a detection accuracy of 96.4% with a YOLOv5m model without using any human annotation.

Download Full-text