Multiscale Object Detection in Infrared Streetscape Images Based on Deep Learning and Instance Level Data Augmentation

Hao Qu; Lilian Zhang; Xuesong Wu; Xiaofeng He; Xiaoping Hu; Xudong Wen

doi:10.3390/app9030565

Multiscale Object Detection in Infrared Streetscape Images Based on Deep Learning and Instance Level Data Augmentation

Applied Sciences ◽

10.3390/app9030565 ◽

2019 ◽

Vol 9 (3) ◽

pp. 565 ◽

Cited By ~ 6

Author(s):

Hao Qu ◽

Lilian Zhang ◽

Xuesong Wu ◽

Xiaofeng He ◽

Xiaoping Hu ◽

...

Keyword(s):

Object Detection ◽

Data Augmentation ◽

Region Of Interest ◽

Complex Environments ◽

Feature Maps ◽

Multi Scale ◽

Level Data ◽

Training Stage ◽

Street Scene ◽

Layer Region

The development of object detection in infrared images has attracted more attention in recent years. However, there are few studies on multi-scale object detection in infrared street scene images. Additionally, the lack of high-quality infrared datasets hinders research into such algorithms. In order to solve these issues, we firstly make a series of modifications based on Faster Region-Convolutional Neural Network (R-CNN). In this paper, a double-layer region proposal network (RPN) is proposed to predict proposals of different scales on both fine and coarse feature maps. Secondly, a multi-scale pooling module is introduced into the backbone of the network to explore the response of objects on different scales. Furthermore, the inception4 module and the position sensitive region of interest (ROI) align (PSalign) pooling layer are utilized to explore richer features of the objects. Thirdly, this paper proposes instance level data augmentation, which takes into account the imbalance between categories while enlarging dataset. In the training stage, the online hard example mining method is utilized to further improve the robustness of the algorithm in complex environments. The experimental results show that, compared with baseline, our detection method has state-of-the-art performance.

Download Full-text

Automatic Carotid Artery Detection Using Attention Layer Region-Based Convolution Neural Network

International Journal of Humanoid Robotics ◽

10.1142/s0219843619500154 ◽

2019 ◽

Vol 16 (04) ◽

pp. 1950015

Author(s):

Xiaoyan Wang ◽

Xingyu Zhong ◽

Ming Xia ◽

Weiwei Jiang ◽

Xiaojie Huang ◽

...

Keyword(s):

Carotid Artery ◽

Object Detection ◽

State Of The Art ◽

Region Of Interest ◽

Feature Maps ◽

Interactive Approach ◽

Detection Systems ◽

Art Object ◽

Layer Region ◽

Artery Detection

Localization of vessel Region of Interest (ROI) from medical images provides an interactive approach that can assist doctors in evaluating carotid artery diseases. Accurate vessel detection is a prerequisite for the following procedures, like wall segmentation, plaque identification and 3D reconstruction. Deep learning models such as CNN have been widely used in medical image processing, and achieve state-of-the-art performance. Faster R-CNN is one of the most representative and successful methods for object detection. Using outputs of feature maps in different layers has been proved to be a useful way to improve the detection performance, however, the common method is to ensemble outputs of different layers directly, and the special characteristic and different importance of each layer haven’t been considered. In this work, we introduce a new network named Attention Layer R-CNN(AL R-CNN) and use it for automatic carotid artery detection, in which we integrate a new module named Attention Layer Part (ALP) into a basic Faster R-CNN system for better assembling feature maps of different layers. Experimental results on carotid dataset show that our method surpasses other state-of-the-art object detection systems.

Download Full-text

A Novel Multi-Scale Feature Fusion Method for Region Proposal Network in Fast Object Detection

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2020070107 ◽

2020 ◽

Vol 16 (3) ◽

pp. 132-145

Author(s):

Gang Liu ◽

Chuyi Wang

Keyword(s):

Object Detection ◽

Multiple Scales ◽

Feature Fusion ◽

Uniform Space ◽

Fusion Method ◽

Well Performance ◽

Feature Maps ◽

Neural Network Models ◽

Scale Feature ◽

Multi Scale

Neural network models have been widely used in the field of object detecting. The region proposal methods are widely used in the current object detection networks and have achieved well performance. The common region proposal methods hunt the objects by generating thousands of the candidate boxes. Compared to other region proposal methods, the region proposal network (RPN) method improves the accuracy and detection speed with several hundred candidate boxes. However, since the feature maps contains insufficient information, the ability of RPN to detect and locate small-sized objects is poor. A novel multi-scale feature fusion method for region proposal network to solve the above problems is proposed in this article. The proposed method is called multi-scale region proposal network (MS-RPN) which can generate suitable feature maps for the region proposal network. In MS-RPN, the selected feature maps at multiple scales are fine turned respectively and compressed into a uniform space. The generated fusion feature maps are called refined fusion features (RFFs). RFFs incorporate abundant detail information and context information. And RFFs are sent to RPN to generate better region proposals. The proposed approach is evaluated on PASCAL VOC 2007 and MS COCO benchmark tasks. MS-RPN obtains significant improvements over the comparable state-of-the-art detection models.

Download Full-text

MF-Net: Multi-Scale Information Fusion Network for CNV Segmentation in Retinal OCT Images

Frontiers in Neuroscience ◽

10.3389/fnins.2021.743769 ◽

2021 ◽

Vol 15 ◽

Author(s):

Qingquan Meng ◽

Lianyu Wang ◽

Tingting Wang ◽

Meng Wang ◽

Weifang Zhu ◽

...

Keyword(s):

Information Fusion ◽

Data Augmentation ◽

Contextual Information ◽

Treatment Plan ◽

Speckle Noise ◽

Medical Image Segmentation ◽

Unlabeled Data ◽

Feature Maps ◽

Multi Scale ◽

Segmentation Accuracy

Choroid neovascularization (CNV) is one of the blinding ophthalmologic diseases. It is mainly caused by new blood vessels growing in choroid and penetrating Bruch's membrane. Accurate segmentation of CNV is essential for ophthalmologists to analyze the condition of the patient and specify treatment plan. Although many deep learning-based methods have achieved promising results in many medical image segmentation tasks, CNV segmentation in retinal optical coherence tomography (OCT) images is still very challenging as the blur boundary of CNV, large morphological differences, speckle noise, and other similar diseases interference. In addition, the lack of pixel-level annotation data is also one of the factors that affect the further improvement of CNV segmentation accuracy. To improve the accuracy of CNV segmentation, a novel multi-scale information fusion network (MF-Net) based on U-Shape architecture is proposed for CNV segmentation in retinal OCT images. A novel multi-scale adaptive-aware deformation module (MAD) is designed and inserted into the top of the encoder path, aiming at guiding the model to focus on multi-scale deformation of the targets, and aggregates the contextual information. Meanwhile, to improve the ability of the network to learn to supplement low-level local high-resolution semantic information to high-level feature maps, a novel semantics-details aggregation module (SDA) between encoder and decoder is proposed. In addition, to leverage unlabeled data to further improve the CNV segmentation, a semi-supervised version of MF-Net is designed based on pseudo-label data augmentation strategy, which can leverage unlabeled data to further improve CNV segmentation accuracy. Finally, comprehensive experiments are conducted to validate the performance of the proposed MF-Net and SemiMF-Net. The experiment results show that both proposed MF-Net and SemiMF-Net outperforms other state-of-the-art algorithms.

Download Full-text

Fast Object Detection and Recognition Algorithm Based on Improved Multi-Scale Feature Maps

Laser & Optoelectronics Progress ◽

10.3788/lop56.021002 ◽

2019 ◽

Vol 56 (2) ◽

pp. 021002

Author(s):

单倩文 Shan Qianwen ◽

郑新波 Zheng Xinbo ◽

何小海 He Xiaohai ◽

滕奇志 Teng Qizhi ◽

吴晓红 Wu Xiaohong

Keyword(s):

Object Detection ◽

Recognition Algorithm ◽

Feature Maps ◽

Scale Feature ◽

Multi Scale ◽

Detection And Recognition

Download Full-text

Multi-scale Pyramid Feature Maps for Object Detection

2017 16th International Symposium on Distributed Computing and Applications to Business, Engineering and Science (DCABES) ◽

10.1109/dcabes.2017.59 ◽

2017 ◽

Author(s):

Hao Huijun ◽

Ye Ronghua ◽

Chen Zhongyu ◽

Zheng Zhonglong

Keyword(s):

Object Detection ◽

Feature Maps ◽

Multi Scale

Download Full-text

PSANet: Pyramid Splitting and Aggregation Network for 3D Object Detection in Point Cloud

Sensors ◽

10.3390/s21010136 ◽

2020 ◽

Vol 21 (1) ◽

pp. 136

Author(s):

Fangyu Li ◽

Weizheng Jin ◽

Cien Fan ◽

Lian Zou ◽

Qingsheng Chen ◽

...

Keyword(s):

Object Detection ◽

Point Clouds ◽

Autonomous Driving ◽

Feature Maps ◽

3D Object ◽

Multi Scale ◽

Backbone Network ◽

3D Object Detection ◽

Different Levels ◽

Fine Branch

3D object detection in LiDAR point clouds has been extensively used in autonomous driving, intelligent robotics, and augmented reality. Although the one-stage 3D detector has satisfactory training and inference speed, there are still some performance problems due to insufficient utilization of bird’s eye view (BEV) information. In this paper, a new backbone network is proposed to complete the cross-layer fusion of multi-scale BEV feature maps, which makes full use of various information for detection. Specifically, our proposed backbone network can be divided into a coarse branch and a fine branch. In the coarse branch, we use the pyramidal feature hierarchy (PFH) to generate multi-scale BEV feature maps, which retain the advantages of different levels and serves as the input of the fine branch. In the fine branch, our proposed pyramid splitting and aggregation (PSA) module deeply integrates different levels of multi-scale feature maps, thereby improving the expressive ability of the final features. Extensive experiments on the challenging KITTI-3D benchmark show that our method has better performance in both 3D and BEV object detection compared with some previous state-of-the-art methods. Experimental results with average precision (AP) prove the effectiveness of our network.

Download Full-text

Improving multi-class Boosting-based object detection

Integrated Computer-Aided Engineering ◽

10.3233/ica-200636 ◽

2020 ◽

Vol 28 (1) ◽

pp. 81-96

Author(s):

José Miguel Buenaposada ◽

Luis Baumela

Keyword(s):

Deep Learning ◽

Object Detection ◽

Data Augmentation ◽

Detection Performance ◽

Significant Progress ◽

Training Techniques ◽

Multi Scale ◽

Bounding Box ◽

Open Issue ◽

The Impact

In recent years we have witnessed significant progress in the performance of object detection in images. This advance stems from the use of rich discriminative features produced by deep models and the adoption of new training techniques. Although these techniques have been extensively used in the mainstream deep learning-based models, it is still an open issue to analyze their impact in alternative, and computationally more efficient, ensemble-based approaches. In this paper we evaluate the impact of the adoption of data augmentation, bounding box refinement and multi-scale processing in the context of multi-class Boosting-based object detection. In our experiments we show that use of these training advancements significantly improves the object detection performance.

Download Full-text

FAS-Net: Construct Effective Features Adaptively for Multi-Scale Object Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6947 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12573-12580

Author(s):

Jiangqiao Yan ◽

Yue Zhang ◽

Zhonghan Chang ◽

Tengfei Zhang ◽

Menglong Yan ◽

...

Keyword(s):

Object Detection ◽

Feature Fusion ◽

Region Of Interest ◽

Feature Representation ◽

Adaptive Selection ◽

Multi Scale ◽

Common Operation ◽

Feature Pyramid ◽

Segmentation Task ◽

Valid Information

Feature pyramid is the mainstream method for multi-scale object detection. In most detectors with feature pyramid, each proposal is predicted based on feature grids pooled from only one feature level, which is assigned heuristically. Recent studies report that the feature representation extracted using this method is sub-optimal, since they ignore the valid information exists on other unselected layers of the feature pyramid. To address this issue, researchers present to fuse valid information across all feature levels. However, these methods can be further improved: the feature fusion strategies, which use common operation (element-wise max or sum) in most detectors, should be replaced by a more flexible way. In this work, a novel method called feature adaptive selection subnetwork (FAS-Net) is proposed to construct effective features for detecting objects of different scales. Particularly, its adaption consists of two level: global attention and local adaptive selection. First, we model the global context of each feature map with global attention based feature selection module (GAFSM), which can strengthen the effective features across each layer adaptively. Then we extract the features of each region of interest (RoI) on the entire feature pyramid to construct a RoI feature pyramid. Finally, the RoI feature pyramid is sent to the feature adaptive selection module (FASM) to integrate the strengthened features according to the input adaptively. Our FAS-Net can be easily extended to other two-stage object detectors with feature pyramid, and supports to analyze the importance of different feature levels for multi-scale objects quantitatively. Besides, FAS-Net can also be further applied to instance segmentation task and get consistent improvements. Experiments on PASCAL07/12 and MSCOCO17 demonstrate the effectiveness and generalization of the proposed method.

Download Full-text

Voxel-FPN: Multi-Scale Voxel Feature Aggregation for 3D Object Detection from LIDAR Point Clouds

Sensors ◽

10.3390/s20030704 ◽

2020 ◽

Vol 20 (3) ◽

pp. 704 ◽

Cited By ~ 6

Author(s):

Hongwu Kuang ◽

Bei Wang ◽

Jianping An ◽

Ming Zhang ◽

Zehan Zhang

Keyword(s):

Object Detection ◽

Point Clouds ◽

Autonomous Driving ◽

Feature Maps ◽

3D Object ◽

Cloud Data ◽

Multi Scale ◽

Feature Pyramid ◽

Point Data ◽

3D Object Detection

Object detection in point cloud data is one of the key components in computer vision systems, especially for autonomous driving applications. In this work, we present Voxel-Feature Pyramid Network, a novel one-stage 3D object detector that utilizes raw data from LIDAR sensors only. The core framework consists of an encoder network and a corresponding decoder followed by a region proposal network. Encoder extracts and fuses multi-scale voxel information in a bottom-up manner, whereas decoder fuses multiple feature maps from various scales by Feature Pyramid Network in a top-down way. Extensive experiments show that the proposed method has better performance on extracting features from point data and demonstrates its superiority over some baselines on the challenging KITTI-3D benchmark, obtaining good performance on both speed and accuracy in real-world scenarios.

Download Full-text

Small Object Detection in Traffic Scenes Based on Attention Feature Fusion

Sensors ◽

10.3390/s21093031 ◽

2021 ◽

Vol 21 (9) ◽

pp. 3031

Author(s):

Jing Lian ◽

Yuhang Yin ◽

Linhui Li ◽

Zhenghao Wang ◽

Yafu Zhou

Keyword(s):

Object Detection ◽

Feature Fusion ◽

Contextual Information ◽

Detection Accuracy ◽

Small Object ◽

Limited Information ◽

Feature Maps ◽

Multi Scale ◽

Validation Set ◽

Small Object Detection

There are many small objects in traffic scenes, but due to their low resolution and limited information, their detection is still a challenge. Small object detection is very important for the understanding of traffic scene environments. To improve the detection accuracy of small objects in traffic scenes, we propose a small object detection method in traffic scenes based on attention feature fusion. First, a multi-scale channel attention block (MS-CAB) is designed, which uses local and global scales to aggregate the effective information of the feature maps. Based on this block, an attention feature fusion block (AFFB) is proposed, which can better integrate contextual information from different layers. Finally, the AFFB is used to replace the linear fusion module in the object detection network and obtain the final network structure. The experimental results show that, compared to the benchmark model YOLOv5s, this method has achieved a higher mean Average Precison (mAP) under the premise of ensuring real-time performance. It increases the mAP of all objects by 0.9 percentage points on the validation set of the traffic scene dataset BDD100K, and at the same time, increases the mAP of small objects by 3.5%.

Download Full-text