Anchor Generation Optimization and Region of Interest Assignment for Vehicle Detection

Ye Wang; Zhenyi Liu; Weiwen Deng

doi:10.3390/s19051089

Anchor Generation Optimization and Region of Interest Assignment for Vehicle Detection

Sensors ◽

10.3390/s19051089 ◽

2019 ◽

Vol 19 (5) ◽

pp. 1089 ◽

Cited By ~ 3

Author(s):

Ye Wang ◽

Zhenyi Liu ◽

Weiwen Deng

Keyword(s):

Pedestrian Detection ◽

Region Of Interest ◽

Vehicle Detection ◽

Detection Accuracy ◽

Fixed Size ◽

Feature Maps ◽

Feature Map ◽

Bounding Box ◽

New Feature ◽

And Training

Region proposal network (RPN) based object detection, such as Faster Regions with CNN (Faster R-CNN), has gained considerable attention due to its high accuracy and fast speed. However, it has room for improvements when used in special application situations, such as the on-board vehicle detection. Original RPN locates multiscale anchors uniformly on each pixel of the last feature map and classifies whether an anchor is part of the foreground or background with one pixel in the last feature map. The receptive field of each pixel in the last feature map is fixed in the original faster R-CNN and does not coincide with the anchor size. Hence, only a certain part can be seen for large vehicles and too much useless information is contained in the feature for small vehicles. This reduces detection accuracy. Furthermore, the perspective projection results in the vehicle bounding box size becoming related to the bounding box position, thereby reducing the effectiveness and accuracy of the uniform anchor generation method. This reduces both detection accuracy and computing speed. After the region proposal stage, many regions of interest (ROI) are generated. The ROI pooling layer projects an ROI to the last feature map and forms a new feature map with a fixed size for final classification and box regression. The number of feature map pixels in the projected region can also influence the detection performance but this is not accurately controlled in former works. In this paper, the original faster R-CNN is optimized, especially for the on-board vehicle detection. This paper tries to solve these above-mentioned problems. The proposed method is tested on the KITTI dataset and the result shows a significant improvement without too many tricky parameter adjustments and training skills. The proposed method can also be used on other objects with obvious foreshortening effects, such as on-board pedestrian detection. The basic idea of the proposed method does not rely on concrete implementation and thus, most deep learning based object detectors with multiscale feature maps can be optimized with it.

Download Full-text

Detection and Segmentation of Mature Green Tomatoes Based on Mask R-CNN with Automatic Image Acquisition Approach

Sensors ◽

10.3390/s21237842 ◽

2021 ◽

Vol 21 (23) ◽

pp. 7842

Author(s):

Linlu Zu ◽

Yanping Zhao ◽

Jiuqin Liu ◽

Fei Su ◽

Yan Zhang ◽

...

Keyword(s):

Image Acquisition ◽

Region Of Interest ◽

Bilinear Interpolation ◽

Training Process ◽

Fixed Size ◽

Target Region ◽

Feature Map ◽

Bounding Box ◽

Proposed Model ◽

Acquisition Processes

Since the mature green tomatoes have color similar to branches and leaves, some are shaded by branches and leaves, and overlapped by other tomatoes, the accurate detection and location of these tomatoes is rather difficult. This paper proposes to use the Mask R-CNN algorithm for the detection and segmentation of mature green tomatoes. A mobile robot is designed to collect images round-the-clock and with different conditions in the whole greenhouse, thus, to make sure the captured dataset are not only objects with the interest of users. After the training process, RestNet50-FPN is selected as the backbone network. Then, the feature map is trained through the region proposal network to generate the region of interest (ROI), and the ROIAlign bilinear interpolation is used to calculate the target region, such that the corresponding region in the feature map is pooled to a fixed size based on the position coordinates of the preselection box. Finally, the detection and segmentation of mature green tomatoes is realized by the parallel actions of ROI target categories, bounding box regression and mask. When the Intersection over Union is equal to 0.5, the performance of the trained model is the best. The experimental results show that the F1-Score of bounding box and mask region all achieve 92.0%. The image acquisition processes are fully unobservable, without any user preselection, which are a highly heterogenic mix, the selected Mask R-CNN algorithm could also accurately detect mature green tomatoes. The performance of this proposed model in a real greenhouse harvesting environment is also evaluated, thus facilitating the direct application in a tomato harvesting robot.

Download Full-text

A Novel Detector Based on Convolution Neural Networks for Multiscale SAR Ship Detection in Complex Background

Sensors ◽

10.3390/s20092547 ◽

2020 ◽

Vol 20 (9) ◽

pp. 2547 ◽

Cited By ~ 2

Author(s):

Wenxin Dai ◽

Yuqing Mao ◽

Rongao Yuan ◽

Yijing Liu ◽

Xuemei Pu ◽

...

Keyword(s):

Region Of Interest ◽

Feature Representation ◽

Feature Maps ◽

Sar Images ◽

Feature Map ◽

The Public ◽

Single Feature ◽

Great Performance ◽

Residual Block ◽

Fusion Feature

Convolution neural network (CNN)-based detectors have shown great performance on ship detections of synthetic aperture radar (SAR) images. However, the performance of current models has not been satisfactory enough for detecting multiscale ships and small-size ones in front of complex backgrounds. To address the problem, we propose a novel SAR ship detector based on CNN, which consist of three subnetworks: the Fusion Feature Extractor Network (FFEN), Region Proposal Network (RPN), and Refine Detection Network (RDN). Instead of using a single feature map, we fuse feature maps in bottom–up and top–down ways and generate proposals from each fused feature map in FFEN. Furthermore, we further merge features generated by the region-of-interest (RoI) pooling layer in RDN. Based on the feature representation strategy, the CNN framework constructed can significantly enhance the location and semantics information for the multiscale ships, in particular for the small ships. On the other hand, the residual block is introduced to increase the network depth, through which the detection precision could be further improved. The public SAR ship dataset (SSDD) and China Gaofen-3 satellite SAR image are used to validate the proposed method. Our method shows excellent performance for detecting the multiscale and small-size ships with respect to some competitive models and exhibits high potential in practical application.

Download Full-text

Armored Target Detection in Battlefield Environment Based on Top-Down Aggregation Network and Hierarchical Scale Optimization

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001419500071 ◽

2019 ◽

Vol 33 (04) ◽

pp. 1950007 ◽

Cited By ~ 2

Author(s):

Haoze Sun ◽

Tianqing Chang ◽

Lei Zhang ◽

Guozhen Yang ◽

Bin Han ◽

...

Keyword(s):

Target Detection ◽

Region Of Interest ◽

Recall Rate ◽

Detection Methods ◽

Detection Accuracy ◽

Feature Maps ◽

Top Down ◽

Long Distance ◽

General Object ◽

Battlefield Environment

Armored equipment plays a crucial role in the ground battlefield. The fast and accurate detection of enemy armored targets is significant to take the initiative in the battlefield. Comparing to general object detection and vehicle detection, armored target detection in battlefield environment is more challenging due to the long distance of observation and the complicated environment. In this paper, an accurate and robust automatic detection method is proposed to detect armored targets in battlefield environment. Firstly, inspired by Feature Pyramid Network (FPN), we propose a top-down aggregation (TDA) network which enhances shallow feature maps by aggregating semantic information from deeper layers. Then, using the proposed TDA network in a basic Faster R-CNN framework, we explore the further optimization of the approach for armored target detection: for the Region of Interest (RoI) Proposal Network (RPN), we propose a multi-branch RPNs framework to generate proposals that match the scale of armored targets and the specific receptive field of each aggregated layer and design hierarchical loss for the multi-branch RPNs; for RoI Classifier Network (RCN), we apply RoI pooling on the single finest scale feature map and construct a light and fast detection network. To evaluate our method, comparable experiments with state-of-art detection methods were conducted on a challenging dataset of images with armored targets. The experimental results demonstrate the effectiveness of the proposed method in terms of detection accuracy and recall rate.

Download Full-text

Research on Multiscene Vehicle Dataset Based on Improved FCOS Detection Algorithms

Complexity ◽

10.1155/2021/9167116 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Fei Yan ◽

Hui Zhang ◽

Tianyang Zhou ◽

Zhiyong Fan ◽

Jia Liu

Keyword(s):

Vehicle Detection ◽

Autonomous Driving ◽

Intelligent Transportation ◽

Detection Accuracy ◽

Comparative Experiment ◽

Feature Maps ◽

Propagation Process ◽

Detection Algorithms ◽

Balance Principle ◽

Complex Scenes

Whether in intelligent transportation or autonomous driving, vehicle detection is an important part. Vehicle detection still faces many problems, such as inaccurate vehicle detection positioning and low detection accuracy in complex scenes. FCOS as a representative of anchor-free detection algorithms was once a sensation, but now it seems to be slightly insufficient. Based on this situation, we propose an improved FCOS algorithm. The improvements are as follows: (1) we introduce a deformable convolution into the backbone to solve the problem that the receptive field cannot cover the overall goal; (2) we add a bottom-up information path after the FPN of the neck module to reduce the loss of information in the propagation process; (3) we introduce the balance module according to the balance principle, which reduces inconsistent detection of the bbox head caused by the mismatch of variance of different feature maps. To enhance the comparative experiment, we have extracted some of the most recent datasets from UA-DETRAC, COCO, and Pascal VOC. The experimental results show that our method has achieved good results on its dataset.

Download Full-text

Separable reverse connected network for efficient multi-scale vehicle detection

International Journal of Advanced Robotic Systems ◽

10.1177/1729881419870678 ◽

2019 ◽

Vol 16 (4) ◽

pp. 172988141987067

Author(s):

Enze Yang ◽

Linlin Huang ◽

Jian Hu

Keyword(s):

Vehicle Detection ◽

Visual Object ◽

Detection Accuracy ◽

Feature Maps ◽

Compression Technique ◽

Connected Network ◽

Multi Scale ◽

Model Compression ◽

Training Scheme ◽

Wide Range

Vehicle detection is involved in a wide range of intelligent transportation and smart city applications, and the demand of fast and accurate detection of vehicles is increasing. In this article, we propose a convolutional neural network-based framework, called separable reverse connected network, for multi-scale vehicles detection. In this network, reverse connected structure enriches the semantic context information of previous layers, while separable convolution is introduced for sparse representation of heavy feature maps generated from subnetworks. Further, we use multi-scale training scheme, online hard example mining, and model compression technique to accelerate the training process as well as reduce the parameters. Experimental results on Pascal Visual Object Classes (VOC) 2007 + 2012 and MicroSoft Common Objects in COntext (MS COCO) 2014 demonstrate the proposed method yields state-of-the-art performance. Moreover, by separable convolution and model compression, the network of two-stage detector is accelerated by about two times with little loss of detection accuracy.

Download Full-text

A Nighttime Vehicle Detection Method with Attentive GAN for Accurate Classification and Regression

Entropy ◽

10.3390/e23111490 ◽

2021 ◽

Vol 23 (11) ◽

pp. 1490

Author(s):

Yan Liu ◽

Tiantian Qiu ◽

Jingwen Wang ◽

Wenting Qi

Keyword(s):

Region Of Interest ◽

Vehicle Detection ◽

Detection Algorithm ◽

Vital Role ◽

Detection Accuracy ◽

Local Regression ◽

Generative Adversarial Network ◽

Automatic Driving ◽

Road Lighting ◽

Adversarial Network

Vehicle detection plays a vital role in the design of Automatic Driving System (ADS), which has achieved remarkable improvements in recent years. However, vehicle detection in night scenes still has considerable challenges for the reason that the vehicle features are not obvious and are easily affected by complex road lighting or lights from vehicles. In this paper, a high-accuracy vehicle detection algorithm is proposed to detect vehicles in night scenes. Firstly, an improved Generative Adversarial Network (GAN), named Attentive GAN, is used to enhance the vehicle features of nighttime images. Then, with the purpose of achieving a higher detection accuracy, a multiple local regression is employed in the regression branch, which predicts multiple bounding box offsets. An improved Region of Interest (RoI) pooling method is used to get distinguishing features in a classification branch based on Faster Region-based Convolutional Neural Network (R-CNN). Cross entropy loss is introduced to improve the accuracy of classification branch. The proposed method is examined with the proposed dataset, which is composed of the selected nighttime images from BDD-100k dataset (Berkeley Diverse Driving Database, including 100,000 images). Compared with a series of state-of-the-art detectors, the experiments demonstrate that the proposed algorithm can effectively contribute to vehicle detection accuracy in nighttime.

Download Full-text

Pedestrian Detection under Parallel Feature Fusion Based on Choquet Integral

Symmetry ◽

10.3390/sym13020250 ◽

2021 ◽

Vol 13 (2) ◽

pp. 250

Author(s):

Rong Yang ◽

Yun Wang ◽

Ying Xu ◽

Li Qiu ◽

Qiang Li

Keyword(s):

Feature Fusion ◽

Choquet Integral ◽

Detection System ◽

Pedestrian Detection ◽

Texture Feature ◽

Current Method ◽

Detection Accuracy ◽

Parallel Feature ◽

New Feature ◽

The Cost

Feature-based pedestrian detection method is currently the mainstream direction to solve the problem of pedestrian detection. In this kind of method, whether the appropriate feature can be extracted is the key to the comprehensive performance of the whole pedestrian detection system. It is believed that the appearance of a pedestrian can be better captured by the combination of edge/local shape feature and texture feature. In this field, the current method is to simply concatenate HOG (histogram of oriented gradient) features and LBP (local binary pattern) features extracted from an image to produce a new feature with large dimension. This kind of method achieves better performance at the cost of increasing the number of features. In this paper, Choquet integral based on the signed fuzzy measure is introduced to fuse HOG and LBP descriptors in parallel that is expected to improve accuracy without increasing feature dimensions. The parameters needed in the whole fusion process are optimized by a training algorithm based on genetic algorithm. This architecture has three advantages. Firstly, because the fusion of HOG and LBP features is parallel, the dimensions of the new features are not increased. Secondly, the speed of feature fusion is fast, thus reducing the time of pedestrian detection. Thirdly, the new features after fusion have the advantages of HOG and LBP features, which is helpful to improve the detection accuracy. The series of experimentation with the architecture proposed in this paper reaches promising and satisfactory results.

Download Full-text

Using an Improved YOLOv4 Deep Learning Network for Accurate Detection of Whitefly and Thrips on Sticky Trap Images

Transactions of the ASABE ◽

10.13031/trans.14394 ◽

2021 ◽

Vol 64 (3) ◽

pp. 919-927

Author(s):

Dujin Wang ◽

Yizhong Wang ◽

Ming Li ◽

Xinting Yang ◽

Jianwei Wu ◽

...

Keyword(s):

Deep Learning ◽

Image Features ◽

Sticky Trap ◽

Field Conditions ◽

Detection Accuracy ◽

List Type ◽

Small Object ◽

Feature Maps ◽

Feature Map ◽

Pest Detection

HighlightsThe proposed method detected thrips and whitefly more accurately than previous methods.The proposed method demonstrated good robustness to illumination reflections and different pest densities.Small pest detection was improved by adding large-scale feature maps and more residual units to a shallow network.Machine vision and deep learning created an end-to-end model to detect small pests on sticky traps in field conditions.Abstract. Pest detection is the basis of precise control in vegetable greenhouses. To improve the detection accuracy and robustness for two common small pests (whitefly and thrips) in greenhouses, this study proposes a novel small object detection approach based on the YOLOv4 model. Yellow sticky trap (YST) images at the original resolution (2560 × 1920 pixels) were collected using pest monitoring equipment in a greenhouse. The images were then cropped and labeled to create sub-images (416 × 416 pixels) to construct an experimental dataset. The labeled images used in this study (900 training, 100 validation, and 200 test) are available for comparative studies. To enhance the model’s ability to detect small pests, the feature map at the 8-fold downsampling layer in the backbone network was merged with the feature map at the 4-fold downsampling layer to generate a new layer and output a feature map with a size of 104 × 104 pixels. Furthermore, the residual units in the first two residual blocks were enlarged by four times to extract more shallow image features and the location information of target pests to withstand image degradation in the field. The experimental results showed that the mean average precision (mAP) for detection of whitefly and thrips using the proposed approach was improved by 8.2% and 3.4% compared with the YOLOv3 and YOLOv4 models, respectively. The detection performance slightly decreased as the pest densities increased in the YST image, but the mAP value was still 92.7% in the high-density dataset, which indicates that the proposed model has good robustness over a range of pest densities. Compared with previous similar studies, the proposed method has better potential to monitor whitefly and thrips using YSTs in field conditions. Keywords: Deep learning, Greenhouse pest management, Image processing, Pest detection, Small object, YOLOv4.

Download Full-text

Vehicle and Pedestrian Detection Based on Multi-level Feature Fusion in Autonomous Driving

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666200304123323 ◽

2020 ◽

Vol 13 ◽

Author(s):

Chen Guoqiang ◽

Yi Huailong ◽

Mao Zhuangzhuang

Keyword(s):

Autonomous Vehicles ◽

Feature Fusion ◽

Pedestrian Detection ◽

Autonomous Driving ◽

Seasonal Effects ◽

Detection Accuracy ◽

Semantic Features ◽

Feature Maps ◽

Safe Driving ◽

Multi Level

Aims: The factors including light, weather, dynamic objects, seasonal effects and structures bring great challenges for the autonomous driving algorithm in the real world. Autonomous vehicles can detect different object obstacles in complex scenes to ensure safe driving. Background: The ability to detect vehicles and pedestrians is critical to the safe driving of autonomous vehicles. Automated vehicle vision systems must handle extremely wide and challenging scenarios. Objective: The goal of the work is to design a robust detector to detect vehicles and pedestrians. The main contribution is that the Multi-level Feature Fusion Block (MFFB) and the Detector Cascade Block (DCB) are designed. The multi-level feature fusion and multi-step prediction are used which greatly improve the detection object precision. Methods: The paper proposes a vehicle and pedestrian object detector, which is an end-to-end deep convolutional neural network. The key parts of the paper are to design the Multi-level Feature Fusion Block (MFFB) and Detector Cascade Block (DCB). The former combines inherent multi-level features by combining contextual information with useful multi-level features that combine high resolution but low semantics and low resolution but high semantic features. The latter uses multi-step prediction, cascades a series of detectors, and combines predictions of multiple feature maps to handle objects of different sizes. Results: The experiments on the RobotCar dataset and the KITTI dataset show that our algorithm can achieve high precision results through real-time detection. The algorithm achieves 84.61% mAP on the RobotCar dataset and is evaluated on the well-known KITTI benchmark dataset, achieving 81.54% mAP. In particular, the detection accuracy of a single-category vehicle reaches 90.02%. Conclusion: The experimental results show that the proposed algorithm has a good trade-off between detection accuracy and detection speed, which is beyond the current state-of-the-art RefineDet algorithm. The 2D object detector is proposed in the paper, which can solve the problem of vehicle and pedestrian detection and improve the accuracy, robustness and generalization ability in autonomous driving.

Download Full-text

Automatic accounting of Baikal diatomic algae: approaches and prospects

Issues of modern algology (Вопросы современной альгологии) ◽

10.33624/2311-0147-2019-2(20)-295-299 ◽

2019 ◽

pp. 295-299

Author(s):

Кonstantin А. Elshin ◽

Еlena I. Molchanova ◽

Мarina V. Usoltseva ◽

Yelena V. Likhoshway

Keyword(s):

Object Detection ◽

Loss Function ◽

Classification Accuracy ◽

Diatom Species ◽

Bounding Box ◽

Synedra Acus ◽

And Training

Using the TensorFlow Object Detection API, an approach to identifying and registering Baikal diatom species Synedra acus subsp. radians has been tested. As a result, a set of images was formed and training was conducted. It is shown that аfter 15000 training iterations, the total value of the loss function was obtained equal to 0,04. At the same time, the classification accuracy is equal to 95%, and the accuracy of construction of the bounding box is also equal to 95%.

Download Full-text