Small Object Detection in Traffic Scenes Based on Attention Feature Fusion

Jing Lian; Yuhang Yin; Linhui Li; Zhenghao Wang; Yafu Zhou

doi:10.3390/s21093031

Small Object Detection in Traffic Scenes Based on Attention Feature Fusion

Sensors ◽

10.3390/s21093031 ◽

2021 ◽

Vol 21 (9) ◽

pp. 3031

Author(s):

Jing Lian ◽

Yuhang Yin ◽

Linhui Li ◽

Zhenghao Wang ◽

Yafu Zhou

Keyword(s):

Object Detection ◽

Feature Fusion ◽

Contextual Information ◽

Detection Accuracy ◽

Small Object ◽

Limited Information ◽

Feature Maps ◽

Multi Scale ◽

Validation Set ◽

Small Object Detection

There are many small objects in traffic scenes, but due to their low resolution and limited information, their detection is still a challenge. Small object detection is very important for the understanding of traffic scene environments. To improve the detection accuracy of small objects in traffic scenes, we propose a small object detection method in traffic scenes based on attention feature fusion. First, a multi-scale channel attention block (MS-CAB) is designed, which uses local and global scales to aggregate the effective information of the feature maps. Based on this block, an attention feature fusion block (AFFB) is proposed, which can better integrate contextual information from different layers. Finally, the AFFB is used to replace the linear fusion module in the object detection network and obtain the final network structure. The experimental results show that, compared to the benchmark model YOLOv5s, this method has achieved a higher mean Average Precison (mAP) under the premise of ensuring real-time performance. It increases the mAP of all objects by 0.9 percentage points on the validation set of the traffic scene dataset BDD100K, and at the same time, increases the mAP of small objects by 3.5%.

Download Full-text

ZoomInNet: A Novel Small Object Detector in Drone Images with Cross-Scale Knowledge Distillation

Remote Sensing ◽

10.3390/rs13061198 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1198

Author(s):

Bi-Yuan Liu ◽

Huai-Xin Chen ◽

Zhou Huang ◽

Xing Liu ◽

Yun-Zhi Yang

Keyword(s):

Object Detection ◽

Feature Representation ◽

Detection Accuracy ◽

Small Object ◽

Feature Maps ◽

Ground Object ◽

Knowledge Distillation ◽

The Cross ◽

The Difference ◽

Small Object Detection

Drone-based object detection has been widely applied in ground object surveillance, urban patrol, and some other fields. However, the dramatic scale changes and complex backgrounds of drone images usually result in weak feature representation of small objects, which makes it challenging to achieve high-precision object detection. Aiming to improve small objects detection, this paper proposes a novel cross-scale knowledge distillation (CSKD) method, which enhances the features of small objects in a manner similar to image enlargement, so it is termed as ZoomInNet. First, based on an efficient feature pyramid network structure, the teacher and student network are trained with images in different scales to introduce the cross-scale feature. Then, the proposed layer adaption (LA) and feature level alignment (FA) mechanisms are applied to align the feature size of the two models. After that, the adaptive key distillation point (AKDP) algorithm is used to get the crucial positions in feature maps that need knowledge distillation. Finally, the position-aware L2 loss is used to measure the difference between feature maps from cross-scale models, realizing the cross-scale information compression in a single model. Experiments on the challenging Visdrone2018 dataset show that the proposed method draws on the advantages of the image pyramid methods, while avoids the large calculation of them and significantly improves the detection accuracy of small objects. Simultaneously, the comparison with mainstream methods proves that our method has the best performance in small object detection.

Download Full-text

Improved YOLOv3 with duplex FPN for object detection based on deep learning

International Journal of Electrical Engineering Education ◽

10.1177/0020720920983524 ◽

2021 ◽

pp. 002072092098352

Author(s):

Seokyong Shin ◽

Hyunho Han ◽

Sang Hun Lee

Keyword(s):

Deep Learning ◽

Object Detection ◽

Autonomous Vehicles ◽

Detection Accuracy ◽

Small Object ◽

Feature Maps ◽

Low Level ◽

Small Object Detection ◽

High Level ◽

Networks Structure

YOLOv3 is a deep learning-based real-time object detector and is mainly used in applications such as video surveillance and autonomous vehicles. In this paper, we proposed an improved YOLOv3 (You Only Look Once version 3) applied Duplex FPN, which enhanced large object detection by utilizing low-level feature information. The conventional YOLOv3 improved the small object detection performance by applying FPN (Feature Pyramid Networks) structure to YOLOv2. However, YOLOv3 with an FPN structure specialized in detecting small objects, so it is difficult to detect large objects. Therefore, this paper proposed an improved YOLOv3 applied Duplex FPN, which can utilize low-level location information in high-level feature maps instead of the existing FPN structure of YOLOv3. This improved the detection accuracy of large objects. Also, an extra detection layer was added to the top-level feature map to prevent failure of detection of parts of large objects. Further, dimension clusters of each detection layer were reassigned to learn quickly how to accurately detect objects. The proposed method was compared and analyzed in the PASCAL VOC dataset. The experimental results showed that the bounding box accuracy of large objects improved owing to the Duplex FPN and extra detection layer, and the proposed method succeeded in detecting large objects that the existing YOLOv3 did not.

Download Full-text

SSD7-FFAM: A Real-Time Object Detection Network Friendly to Embedded Devices from Scratch

Applied Sciences ◽

10.3390/app11031096 ◽

2021 ◽

Vol 11 (3) ◽

pp. 1096

Author(s):

Qing Li ◽

Yingcheng Lin ◽

Wei He

Keyword(s):

Object Detection ◽

Real Time ◽

Large Scale ◽

Feature Fusion ◽

Contextual Information ◽

Attention Mechanism ◽

Detection Accuracy ◽

Single Shot ◽

Feature Maps ◽

Embedded Devices

The high requirements for computing and memory are the biggest challenges in deploying existing object detection networks to embedded devices. Living lightweight object detectors directly use lightweight neural network architectures such as MobileNet or ShuffleNet pre-trained on large-scale classification datasets, which results in poor network structure flexibility and is not suitable for some specific scenarios. In this paper, we propose a lightweight object detection network Single-Shot MultiBox Detector (SSD)7-Feature Fusion and Attention Mechanism (FFAM), which saves storage space and reduces the amount of calculation by reducing the number of convolutional layers. We offer a novel Feature Fusion and Attention Mechanism (FFAM) method to improve detection accuracy. Firstly, the FFAM method fuses high-level semantic information-rich feature maps with low-level feature maps to improve small objects’ detection accuracy. The lightweight attention mechanism cascaded by channels and spatial attention modules is employed to enhance the target’s contextual information and guide the network to focus on its easy-to-recognize features. The SSD7-FFAM achieves 83.7% mean Average Precision (mAP), 1.66 MB parameters, and 0.033 s average running time on the NWPU VHR-10 dataset. The results indicate that the proposed SSD7-FFAM is more suitable for deployment to embedded devices for real-time object detection.

Download Full-text

SSD-TSEFFM: New SSD Using Trident Feature and Squeeze and Extraction Feature Fusion

Sensors ◽

10.3390/s20133630 ◽

2020 ◽

Vol 20 (13) ◽

pp. 3630 ◽

Cited By ~ 1

Author(s):

Young-Joon Hwang ◽

Jin-Gu Lee ◽

Un-Chul Moon ◽

Ho-Hyun Park

Keyword(s):

Object Detection ◽

Semantic Information ◽

Feature Fusion ◽

Contextual Information ◽

Single Shot ◽

Small Object ◽

Dilated Convolution ◽

Average Improvement ◽

Proposed Model ◽

Small Object Detection

The single shot multi-box detector (SSD) exhibits low accuracy in small-object detection; this is because it does not consider the scale contextual information between its layers, and the shallow layers lack adequate semantic information. To improve the accuracy of the original SSD, this paper proposes a new single shot multi-box detector using trident feature and squeeze and extraction feature fusion (SSD-TSEFFM); this detector employs the trident network and the squeeze and excitation feature fusion module. Furthermore, a trident feature module (TFM) is developed, inspired by the trident network, to consider the scale contextual information. The use of this module makes the proposed model robust to scale changes owing to the application of dilated convolution. Further, the squeeze and excitation block feature fusion module (SEFFM) is used to provide more semantic information to the model. The SSD-TSEFFM is compared with the faster regions with convolution neural network features (RCNN) (2015), SSD (2016), and DF-SSD (2020) on the PASCAL VOC 2007 and 2012 datasets. The experimental results demonstrate the high accuracy of the proposed model in small-object detection, in addition to a good overall accuracy. The SSD-TSEFFM achieved 80.4% mAP and 80.2% mAP on the 2007 and 2012 datasets, respectively. This indicates an average improvement of approximately 2% over other models.

Download Full-text

FASSD: A Feature Fusion and Spatial Attention-Based Single Shot Detector for Small Object Detection

Electronics ◽

10.3390/electronics9091536 ◽

2020 ◽

Vol 9 (9) ◽

pp. 1536

Author(s):

Deng Jiang ◽

Bei Sun ◽

Shaojing Su ◽

Zhen Zuo ◽

Peng Wu ◽

...

Keyword(s):

Object Detection ◽

Spatial Attention ◽

Feature Fusion ◽

Single Shot ◽

Small Object ◽

Feature Maps ◽

Feature Representations ◽

Small Object Detection ◽

High Level ◽

Detection Speed

Deep learning methods have significantly improved object detection performance, but small object detection remains an extremely difficult and challenging task in computer vision. We propose a feature fusion and spatial attention-based single shot detector (FASSD) for small object detection. We fuse high-level semantic information into shallow layers to generate discriminative feature representations for small objects. To adaptively enhance the expression of small object areas and suppress the feature response of background regions, the spatial attention block learns a self-attention mask to enhance the original feature maps. We also establish a small object dataset (LAKE-BOAT) of a scene with a boat on a lake and tested our algorithm to evaluate its performance. The results show that our FASSD achieves 79.3% mAP (mean average precision) on the PASCAL VOC2007 test with input 300 × 300, which outperforms the original single shot multibox detector (SSD) by 1.6 points, as well as most improved algorithms based on SSD. The corresponding detection speed was 45.3 FPS (frame per second) on the VOC2007 test using a single NVIDIA TITAN RTX GPU. The test results of a simplified FASSD on the LAKE-BOAT dataset indicate that our model achieved an improvement of 3.5% mAP on the baseline network while maintaining a real-time detection speed (64.4 FPS).

Download Full-text

Enhancement and Fusion of Multi-Scale Feature Maps for Small Object Detection

2020 39th Chinese Control Conference (CCC) ◽

10.23919/ccc50068.2020.9189352 ◽

2020 ◽

Author(s):

Zhijun Xue ◽

Wenjie Chen ◽

Jing Li

Keyword(s):

Object Detection ◽

Small Object ◽

Feature Maps ◽

Scale Feature ◽

Multi Scale ◽

Small Object Detection

Download Full-text

Small Targets Detection for Transmission Tower Based on SRGAN and Faster RCNN

Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering) ◽

10.2174/2352096514666211026143543 ◽

2021 ◽

Vol 14 ◽

Author(s):

Runze Liu ◽

Guangwei Yan ◽

Hui He ◽

Yubin An ◽

Ting Wang ◽

...

Keyword(s):

Object Detection ◽

Super Resolution ◽

Generative Adversarial Networks ◽

Stable Operation ◽

Detection Accuracy ◽

Small Object ◽

Detection Model ◽

Tower Equipment ◽

Small Object Detection ◽

Small Targets

Background: Power line inspection is essential to ensure the safe and stable operation of the power system. Object detection for tower equipment can significantly improve inspection efficiency. However, due to the low resolution of small targets and limited features, the detection accuracy of small targets is not easy to improve. Objective: This study aimed to improve the tiny targets’ resolution while making the small target's texture and detailed features more prominent to be perceived by the detection model. Methods: In this paper, we propose an algorithm that employs generative adversarial networks to improve small objects' detection accuracy. First, the original image is converted into a super-resolution one by a super-resolution reconstruction network (SRGAN). Then the object detection framework Faster RCNN is utilized to detect objects on the super-resolution images. Result: The experimental results on two small object recognition datasets show that the model proposed in this paper has good robustness. It can especially detect the targets missed by Faster RCNN, which indicates that SRGAN can effectively enhance the detailed information of small targets by improving the resolution. Conclusion: We found that higher resolution data is conducive to obtaining more detailed information of small targets, which can help the detection algorithm achieve higher accuracy. The small object detection model based on the generative adversarial network proposed in this paper is feasible and more efficient. Compared with Faster RCNN, this model has better performance on small object detection.

Download Full-text

A Novel Multi-Scale Feature Fusion Method for Region Proposal Network in Fast Object Detection

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2020070107 ◽

2020 ◽

Vol 16 (3) ◽

pp. 132-145

Author(s):

Gang Liu ◽

Chuyi Wang

Keyword(s):

Object Detection ◽

Multiple Scales ◽

Feature Fusion ◽

Uniform Space ◽

Fusion Method ◽

Well Performance ◽

Feature Maps ◽

Neural Network Models ◽

Scale Feature ◽

Multi Scale

Neural network models have been widely used in the field of object detecting. The region proposal methods are widely used in the current object detection networks and have achieved well performance. The common region proposal methods hunt the objects by generating thousands of the candidate boxes. Compared to other region proposal methods, the region proposal network (RPN) method improves the accuracy and detection speed with several hundred candidate boxes. However, since the feature maps contains insufficient information, the ability of RPN to detect and locate small-sized objects is poor. A novel multi-scale feature fusion method for region proposal network to solve the above problems is proposed in this article. The proposed method is called multi-scale region proposal network (MS-RPN) which can generate suitable feature maps for the region proposal network. In MS-RPN, the selected feature maps at multiple scales are fine turned respectively and compressed into a uniform space. The generated fusion feature maps are called refined fusion features (RFFs). RFFs incorporate abundant detail information and context information. And RFFs are sent to RPN to generate better region proposals. The proposed approach is evaluated on PASCAL VOC 2007 and MS COCO benchmark tasks. MS-RPN obtains significant improvements over the comparable state-of-the-art detection models.

Download Full-text

An Evaluation of Deep Learning Methods for Small Object Detection

Journal of Electrical and Computer Engineering ◽

10.1155/2020/3189691 ◽

2020 ◽

Vol 2020 ◽

pp. 1-18 ◽

Cited By ~ 2

Author(s):

Nhat-Duy Nguyen ◽

Tien Do ◽

Thanh Duc Ngo ◽

Duy-Dinh Le

Keyword(s):

Deep Learning ◽

Object Detection ◽

State Of The Art ◽

Rapid Development ◽

Empirical Evaluation ◽

Grid Cell ◽

Small Object ◽

Feature Maps ◽

Comparative Results ◽

Small Object Detection

Small object detection is an interesting topic in computer vision. With the rapid development in deep learning, it has drawn attention of several researchers with innovations in approaches to join a race. These innovations proposed comprise region proposals, divided grid cell, multiscale feature maps, and new loss function. As a result, performance of object detection has recently had significant improvements. However, most of the state-of-the-art detectors, both in one-stage and two-stage approaches, have struggled with detecting small objects. In this study, we evaluate current state-of-the-art models based on deep learning in both approaches such as Fast RCNN, Faster RCNN, RetinaNet, and YOLOv3. We provide a profound assessment of the advantages and limitations of models. Specifically, we run models with different backbones on different datasets with multiscale objects to find out what types of objects are suitable for each model along with backbones. Extensive empirical evaluation was conducted on 2 standard datasets, namely, a small object dataset and a filtered dataset from PASCAL VOC 2007. Finally, comparative results and analyses are then presented.

Download Full-text

Object Detection Network Based on Feature Fusion and Attention Mechanism

Future Internet ◽

10.3390/fi11010009 ◽

2019 ◽

Vol 11 (1) ◽

pp. 9 ◽

Cited By ~ 6

Author(s):

Ying Zhang ◽

Yimin Chen ◽

Chen Huang ◽

Mingke Gao

Keyword(s):

Object Detection ◽

Feature Fusion ◽

Empirical Evaluation ◽

Attention Mechanism ◽

Detection Accuracy ◽

Small Object ◽

Art Object ◽

Pascal Voc ◽

Almost All ◽

The Impact

In recent years, almost all of the current top-performing object detection networks use CNN (convolutional neural networks) features. State-of-the-art object detection networks depend on CNN features. In this work, we add feature fusion in the object detection network to obtain a better CNN feature, which incorporates well deep, but semantic, and shallow, but high-resolution, CNN features, thus improving the performance of a small object. Also, the attention mechanism was applied to our object detection network, AF R-CNN (attention mechanism and convolution feature fusion based object detection), to enhance the impact of significant features and weaken background interference. Our AF R-CNN is a single end to end network. We choose the pre-trained network, VGG-16, to extract CNN features. Our detection network is trained on the dataset, PASCAL VOC 2007 and 2012. Empirical evaluation of the PASCAL VOC 2007 dataset demonstrates the effectiveness and improvement of our approach. Our AF R-CNN achieves an object detection accuracy of 75.9% on PASCAL VOC 2007, six points higher than Faster R-CNN.

Download Full-text