Quantifying and Transferring Contextual Information in Object Detection

Wei-Shi Zheng;  Shaogang Gong;  Tao Xiang

doi:10.1109/tpami.2011.164

Unsupervised Moving Object Detection via Contextual Information Separation

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) ◽

10.1109/cvpr.2019.00097 ◽

2019 ◽

Cited By ~ 4

Author(s):

Yanchao Yang ◽

Antonio Loquercio ◽

Davide Scaramuzza ◽

Stefano Soatto

Keyword(s):

Object Detection ◽

Contextual Information ◽

Moving Object Detection ◽

Moving Object ◽

Information Separation

Download Full-text

Small Object Detection in Traffic Scenes Based on Attention Feature Fusion

Sensors ◽

10.3390/s21093031 ◽

2021 ◽

Vol 21 (9) ◽

pp. 3031

Author(s):

Jing Lian ◽

Yuhang Yin ◽

Linhui Li ◽

Zhenghao Wang ◽

Yafu Zhou

Keyword(s):

Object Detection ◽

Feature Fusion ◽

Contextual Information ◽

Detection Accuracy ◽

Small Object ◽

Limited Information ◽

Feature Maps ◽

Multi Scale ◽

Validation Set ◽

Small Object Detection

There are many small objects in traffic scenes, but due to their low resolution and limited information, their detection is still a challenge. Small object detection is very important for the understanding of traffic scene environments. To improve the detection accuracy of small objects in traffic scenes, we propose a small object detection method in traffic scenes based on attention feature fusion. First, a multi-scale channel attention block (MS-CAB) is designed, which uses local and global scales to aggregate the effective information of the feature maps. Based on this block, an attention feature fusion block (AFFB) is proposed, which can better integrate contextual information from different layers. Finally, the AFFB is used to replace the linear fusion module in the object detection network and obtain the final network structure. The experimental results show that, compared to the benchmark model YOLOv5s, this method has achieved a higher mean Average Precison (mAP) under the premise of ensuring real-time performance. It increases the mAP of all objects by 0.9 percentage points on the validation set of the traffic scene dataset BDD100K, and at the same time, increases the mAP of small objects by 3.5%.

Download Full-text

SSD7-FFAM: A Real-Time Object Detection Network Friendly to Embedded Devices from Scratch

Applied Sciences ◽

10.3390/app11031096 ◽

2021 ◽

Vol 11 (3) ◽

pp. 1096

Author(s):

Qing Li ◽

Yingcheng Lin ◽

Wei He

Keyword(s):

Object Detection ◽

Real Time ◽

Large Scale ◽

Feature Fusion ◽

Contextual Information ◽

Attention Mechanism ◽

Detection Accuracy ◽

Single Shot ◽

Feature Maps ◽

Embedded Devices

The high requirements for computing and memory are the biggest challenges in deploying existing object detection networks to embedded devices. Living lightweight object detectors directly use lightweight neural network architectures such as MobileNet or ShuffleNet pre-trained on large-scale classification datasets, which results in poor network structure flexibility and is not suitable for some specific scenarios. In this paper, we propose a lightweight object detection network Single-Shot MultiBox Detector (SSD)7-Feature Fusion and Attention Mechanism (FFAM), which saves storage space and reduces the amount of calculation by reducing the number of convolutional layers. We offer a novel Feature Fusion and Attention Mechanism (FFAM) method to improve detection accuracy. Firstly, the FFAM method fuses high-level semantic information-rich feature maps with low-level feature maps to improve small objects’ detection accuracy. The lightweight attention mechanism cascaded by channels and spatial attention modules is employed to enhance the target’s contextual information and guide the network to focus on its easy-to-recognize features. The SSD7-FFAM achieves 83.7% mean Average Precision (mAP), 1.66 MB parameters, and 0.033 s average running time on the NWPU VHR-10 dataset. The results indicate that the proposed SSD7-FFAM is more suitable for deployment to embedded devices for real-time object detection.

Download Full-text

Feature-Enhanced Occlusion Perception Object Detection for Smart Cities

Wireless Communications and Mobile Computing ◽

10.1155/2021/5544194 ◽

2021 ◽

Vol 2021 ◽

pp. 1-14

Author(s):

Jie Xu ◽

Hanyuan Wang ◽

Mingzhu Xu ◽

Fan Yang ◽

Yifei Zhou ◽

...

Keyword(s):

Object Detection ◽

Traffic Control ◽

Spatial Information ◽

Smart Cities ◽

Contextual Information ◽

Feature Representation ◽

Superior Performance ◽

Feature Maps ◽

Pascal Voc ◽

Occluded Objects

Object detection is used widely in smart cities including safety monitoring, traffic control, and car driving. However, in the smart city scenario, many objects will have occlusion problems. Moreover, most popular object detectors are often sensitive to various real-world occlusions. This paper proposes a feature-enhanced occlusion perception object detector by simultaneously detecting occluded objects and fully utilizing spatial information. To generate hard examples with occlusions, a mask generator localizes and masks discriminated regions with weakly supervised methods. To obtain enriched feature representation, we design a multiscale representation fusion module to combine hierarchical feature maps. Moreover, this method exploits contextual information by heaping up representations from different regions in feature maps. The model is trained end-to-end learning by minimizing the multitask loss. Our model obtains superior performance compared to previous object detectors, 77.4% mAP and 74.3% mAP on PASCAL VOC 2007 and PASCAL VOC 2012, respectively. It also achieves 24.6% mAP on MS COCO. Experiments demonstrate that the proposed method is useful to improve the effectiveness of object detection, making it highly suitable for smart cities application that need to discover key objects with occlusions.

Download Full-text

Improving Small Objects Detection using Transformer

10.36227/techrxiv.16921000.v2 ◽

2021 ◽

Author(s):

Shikha Dubey ◽

Farrukh Olimov ◽

Muhammad Aasim Rafique ◽

Moongu Jeon

Keyword(s):

Object Detection ◽

Short Distance ◽

Feature Fusion ◽

Contextual Information ◽

Spatial Association ◽

Trade Off ◽

Inductive Bias ◽

Deep Layers ◽

Bounding Boxes ◽

Objects Detection

General artificial intelligence is a trade-off between the inductive bias of an algorithm and its out-of-distribution generalization performance. The conspicuous impact of inductive bias is an unceasing trend of improved predictions in various problems in computer vision like object detection. Although a recently introduced object detection technique, based on transformers (DETR), shows results competitive to the conventional and modern object detection models, its accuracy deteriorates for detecting small-sized objects (in perspective). This study examines the inductive bias of DETR and proposes a normalized inductive bias for object detection using a transformer (SOF-DETR). It uses a lazy-fusion of features to sustain deep contextual information of objects present in the image. The features from multiple subsequent deep layers are fused with element-wise-summation and input to a transformer network for object queries that learn the long and short-distance spatial association in the image by the attention mechanism.<br>SOF-DETR uses a global set-based prediction for object detection, which directly produces a set of bounding boxes. The experimental results on the MS COCO dataset show the effectiveness of the added normalized inductive bias and feature fusion techniques by detecting more small-sized objects than DETR. <br>

Download Full-text

A Survey of the Four Pillars for Small Object Detection: Multiscale Representation, Contextual Information, Super-Resolution, and Region Proposal

IEEE Transactions on Systems Man and Cybernetics Systems ◽

10.1109/tsmc.2020.3005231 ◽

2020 ◽

pp. 1-18

Author(s):

Guang Chen ◽

Haitao Wang ◽

Kai Chen ◽

Zhijun Li ◽

Zida Song ◽

...

Keyword(s):

Object Detection ◽

Contextual Information ◽

Super Resolution ◽

Small Object ◽

Multiscale Representation ◽

Small Object Detection

Download Full-text

SSD-TSEFFM: New SSD Using Trident Feature and Squeeze and Extraction Feature Fusion

Sensors ◽

10.3390/s20133630 ◽

2020 ◽

Vol 20 (13) ◽

pp. 3630 ◽

Cited By ~ 1

Author(s):

Young-Joon Hwang ◽

Jin-Gu Lee ◽

Un-Chul Moon ◽

Ho-Hyun Park

Keyword(s):

Object Detection ◽

Semantic Information ◽

Feature Fusion ◽

Contextual Information ◽

Single Shot ◽

Small Object ◽

Dilated Convolution ◽

Average Improvement ◽

Proposed Model ◽

Small Object Detection

The single shot multi-box detector (SSD) exhibits low accuracy in small-object detection; this is because it does not consider the scale contextual information between its layers, and the shallow layers lack adequate semantic information. To improve the accuracy of the original SSD, this paper proposes a new single shot multi-box detector using trident feature and squeeze and extraction feature fusion (SSD-TSEFFM); this detector employs the trident network and the squeeze and excitation feature fusion module. Furthermore, a trident feature module (TFM) is developed, inspired by the trident network, to consider the scale contextual information. The use of this module makes the proposed model robust to scale changes owing to the application of dilated convolution. Further, the squeeze and excitation block feature fusion module (SEFFM) is used to provide more semantic information to the model. The SSD-TSEFFM is compared with the faster regions with convolution neural network features (RCNN) (2015), SSD (2016), and DF-SSD (2020) on the PASCAL VOC 2007 and 2012 datasets. The experimental results demonstrate the high accuracy of the proposed model in small-object detection, in addition to a good overall accuracy. The SSD-TSEFFM achieved 80.4% mAP and 80.2% mAP on the 2007 and 2012 datasets, respectively. This indicates an average improvement of approximately 2% over other models.

Download Full-text

Object Detection in Remote Sensing Images Based on a Scene-Contextual Feature Pyramid Network

Remote Sensing ◽

10.3390/rs11030339 ◽

2019 ◽

Vol 11 (3) ◽

pp. 339 ◽

Cited By ~ 5

Author(s):

Chaoyue Chen ◽

Weiguo Gong ◽

Yongliang Chen ◽

Weihong Li

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Contextual Information ◽

Model Performance ◽

Small Object ◽

Remote Sensing Images ◽

Art Object ◽

Contextual Feature ◽

Feature Pyramid ◽

Small Object Detection

Object detection has attracted increasing attention in the field of remote sensing image analysis. Complex backgrounds, vertical views, and variations in target kind and size in remote sensing images make object detection a challenging task. In this work, considering that the types of objects are often closely related to the scene in which they are located, we propose a convolutional neural network (CNN) by combining scene-contextual information for object detection. Specifically, we put forward the scene-contextual feature pyramid network (SCFPN), which aims to strengthen the relationship between the target and the scene and solve problems resulting from variations in target size. Additionally, to improve the capability of feature extraction, the network is constructed by repeating a building aggregated residual block. This block increases the receptive field, which can extract richer information for targets and achieve excellent performance with respect to small object detection. Moreover, to improve the proposed model performance, we use group normalization, which divides the channels into groups and computes the mean and variance for normalization within each group, to solve the limitation of the batch normalization. The proposed method is validated on a public and challenging dataset. The experimental results demonstrate that our proposed method outperforms other state-of-the-art object detection models.

Download Full-text

Quantifying contextual information for object detection

2009 IEEE 12th International Conference on Computer Vision ◽

10.1109/iccv.2009.5459344 ◽

2009 ◽

Cited By ~ 1

Author(s):

Wei-Shi Zheng ◽

Shaogang Gong ◽

Tao Xiang

Keyword(s):

Object Detection ◽

Contextual Information

Download Full-text

Object Detection Based on Faster R-CNN Algorithm with Skip Pooling and Fusion of Contextual Information

Sensors ◽

10.3390/s20195490 ◽

2020 ◽

Vol 20 (19) ◽

pp. 5490

Author(s):

Yi Xiao ◽

Xinqing Wang ◽

Peng Zhang ◽

Fanjie Meng ◽

Faming Shao

Keyword(s):

Neural Network ◽

Deep Learning ◽

Object Detection ◽

Detection Efficiency ◽

Contextual Information ◽

Recall Rate ◽

Detection Performance ◽

Single Shot ◽

Average Improvement ◽

Neural Network Algorithm

Deep learning is currently the mainstream method of object detection. Faster region-based convolutional neural network (Faster R-CNN) has a pivotal position in deep learning. It has impressive detection effects in ordinary scenes. However, under special conditions, there can still be unsatisfactory detection performance, such as the object having problems like occlusion, deformation, or small size. This paper proposes a novel and improved algorithm based on the Faster R-CNN framework combined with the Faster R-CNN algorithm with skip pooling and fusion of contextual information. This algorithm can improve the detection performance under special conditions on the basis of Faster R-CNN. The improvement mainly has three parts: The first part adds a context information feature extraction model after the conv5_3 of the convolutional layer; the second part adds skip pooling so that the former can fully obtain the contextual information of the object, especially for situations where the object is occluded and deformed; and the third part replaces the region proposal network (RPN) with a more efficient guided anchor RPN (GA-RPN), which can maintain the recall rate while improving the detection performance. The latter can obtain more detailed information from different feature layers of the deep neural network algorithm, and is especially aimed at scenes with small objects. Compared with Faster R-CNN, you only look once series (such as: YOLOv3), single shot detector (such as: SSD512), and other object detection algorithms, the algorithm proposed in this paper has an average improvement of 6.857% on the mean average precision (mAP) evaluation index while maintaining a certain recall rate. This strongly proves that the proposed method has higher detection rate and detection efficiency in this case.

Download Full-text