Multimodal Fusion Object Detection System for Autonomous Vehicles

Michael Person; Mathew Jensen; Anthony O. Smith; Hector Gutierrez

doi:10.1115/1.4043222

Multimodal Fusion Object Detection System for Autonomous Vehicles

Journal of Dynamic Systems Measurement and Control ◽

10.1115/1.4043222 ◽

2019 ◽

Vol 141 (7) ◽

Cited By ~ 4

Author(s):

Michael Person ◽

Mathew Jensen ◽

Anthony O. Smith ◽

Hector Gutierrez

Keyword(s):

Object Detection ◽

Autonomous Vehicles ◽

Point Cloud ◽

Detection System ◽

Three Dimensional ◽

General Purpose ◽

Primary Objective ◽

Multimodal Fusion ◽

Detection Accuracy ◽

Cloud Data

In order for autonomous vehicles to safely navigate the road ways, accurate object detection must take place before safe path planning can occur. Currently, general purpose object detection convolutional neural network (CNN) models have the highest detection accuracies of any method. However, there is a gap in the proposed detection frameworks. Specifically, those that provide high detection accuracy necessary for deployment but do not perform inference in realtime, and those that perform inference in realtime but detection accuracy is low. We propose multimodel fusion detection system (MFDS), a sensor fusion system that combines the speed of a fast image detection CNN model along with the accuracy of light detection and range (LiDAR) point cloud data through a decision tree approach. The primary objective is to bridge the tradeoff between performance and accuracy. The motivation for MFDS is to reduce the computational complexity associated with using a CNN model to extract features from an image. To improve efficiency, MFDS extracts complimentary features from the LiDAR point cloud in order to obtain comparable detection accuracy. MFDS is novel by not only using the image detections to aid three-dimensional (3D) LiDAR detection but also using the LiDAR data to jointly bolster the image detections and provide 3D detections. MFDS achieves 3.7% higher accuracy than the base CNN detection model and is able to operate at 10 Hz. Additionally, the memory requirement for MFDS is small enough to fit on the Nvidia Tx1 when deployed on an embedded device.

Download Full-text

Learning Deformable Network for 3D Object Detection on Point Clouds

Mobile Information Systems ◽

10.1155/2021/3163470 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Wanyi Zhang ◽

Xiuhua Fu ◽

Wei Li

Keyword(s):

Object Detection ◽

Point Cloud ◽

Three Dimensional ◽

Ground Truth ◽

Point Clouds ◽

Detection Accuracy ◽

3D Object ◽

Cloud Data ◽

Dimensional Object ◽

3D Object Detection

3D object detection based on point cloud data in the unmanned driving scene has always been a research hotspot in unmanned driving sensing technology. With the development and maturity of deep neural networks technology, the method of using neural network to detect three-dimensional object target begins to show great advantages. The experimental results show that the mismatch between anchor and training samples would affect the detection accuracy, but it has not been well solved. The contributions of this paper are as follows. For the first time, deformable convolution is introduced into the point cloud object detection network, which enhances the adaptability of the network to vehicles with different directions and shapes. Secondly, a new generation method of anchor in RPN is proposed, which can effectively prevent the mismatching between the anchor and ground truth and remove the angle classification loss in the loss function. Compared with the state-of-the-art method, the AP and AOS of the detection results are improved.

Download Full-text

A Set of Single YOLO Modalities to Detect Occluded Entities via Viewpoint Conversion

Applied Sciences ◽

10.3390/app11136016 ◽

2021 ◽

Vol 11 (13) ◽

pp. 6016

Author(s):

Jinsoo Kim ◽

Jeongho Cho

Keyword(s):

Object Detection ◽

Autonomous Vehicles ◽

Autonomous Driving ◽

Detection Algorithm ◽

Detection Accuracy ◽

Cloud Data ◽

Detection Techniques ◽

Bounding Boxes ◽

Partially Occluded ◽

Rgb Image

For autonomous vehicles, it is critical to be aware of the driving environment to avoid collisions and drive safely. The recent evolution of convolutional neural networks has contributed significantly to accelerating the development of object detection techniques that enable autonomous vehicles to handle rapid changes in various driving environments. However, collisions in an autonomous driving environment can still occur due to undetected obstacles and various perception problems, particularly occlusion. Thus, we propose a robust object detection algorithm for environments in which objects are truncated or occluded by employing RGB image and light detection and ranging (LiDAR) bird’s eye view (BEV) representations. This structure combines independent detection results obtained in parallel through “you only look once” networks using an RGB image and a height map converted from the BEV representations of LiDAR’s point cloud data (PCD). The region proposal of an object is determined via non-maximum suppression, which suppresses the bounding boxes of adjacent regions. A performance evaluation of the proposed scheme was performed using the KITTI vision benchmark suite dataset. The results demonstrate the detection accuracy in the case of integration of PCD BEV representations is superior to when only an RGB camera is used. In addition, robustness is improved by significantly enhancing detection accuracy even when the target objects are partially occluded when viewed from the front, which demonstrates that the proposed algorithm outperforms the conventional RGB-based model.

Download Full-text

EPGNet: Enhanced Point Cloud Generation for 3D Object Detection

Sensors ◽

10.3390/s20236927 ◽

2020 ◽

Vol 20 (23) ◽

pp. 6927

Author(s):

Qingsheng Chen ◽

Cien Fan ◽

Weizheng Jin ◽

Lian Zou ◽

Fangyu Li ◽

...

Keyword(s):

Object Detection ◽

Point Cloud ◽

Three Dimensional ◽

Autonomous Driving ◽

Cloud Data ◽

Second Stage ◽

Real Scene ◽

Previous State ◽

Dimensional Object ◽

3D Object Detection

Three-dimensional object detection from point cloud data is becoming more and more significant, especially for autonomous driving applications. However, it is difficult for lidar to obtain the complete structure of an object in a real scene due to its scanning characteristics. Although the existing methods have made great progress, most of them ignore the prior information of object structure, such as symmetry. So, in this paper, we use the symmetry of the object to complete the missing part in the point cloud and then detect it. Specifically, we propose a two-stage detection framework. In the first stage, we adopt an encoder–decoder structure to generate the symmetry points of the foreground points and make the symmetry points and the non-empty voxel centers form an enhanced point cloud. In the second stage, the enhanced point cloud is input into the baseline, which is an anchor-based region proposal network, to generate the detection results. Extensive experiments on the challenging KITTI benchmark show the effectiveness of our method, which has better performance on both 3D and BEV (bird’s eye view) object detection compared with some previous state-of-the-art methods.

Download Full-text

3D Fast Object Detection Based on Discriminant Images and Dynamic Distance Threshold Clustering

Sensors ◽

10.3390/s20247221 ◽

2020 ◽

Vol 20 (24) ◽

pp. 7221

Author(s):

Baifan Chen ◽

Hong Chen ◽

Dian Yuan ◽

Lingli Yu

Keyword(s):

Object Detection ◽

Real Time ◽

Autonomous Vehicles ◽

Point Cloud ◽

Point Cloud Data ◽

Long Distance ◽

Safe Driving ◽

Distance Threshold ◽

Cloud Data ◽

Perception System

The object detection algorithm based on vehicle-mounted lidar is a key component of the perception system on autonomous vehicles. It can provide high-precision and highly robust obstacle information for the safe driving of autonomous vehicles. However, most algorithms are often based on a large amount of point cloud data, which makes real-time detection difficult. To solve this problem, this paper proposes a 3D fast object detection method based on three main steps: First, the ground segmentation by discriminant image (GSDI) method is used to convert point cloud data into discriminant images for ground points segmentation, which avoids the direct computing of the point cloud data and improves the efficiency of ground points segmentation. Second, the image detector is used to generate the region of interest of the three-dimensional object, which effectively narrows the search range. Finally, the dynamic distance threshold clustering (DDTC) method is designed for different density of the point cloud data, which improves the detection effect of long-distance objects and avoids the over-segmentation phenomenon generated by the traditional algorithm. Experiments have showed that this algorithm can meet the real-time requirements of autonomous driving while maintaining high accuracy.

Download Full-text

REPRESENTATION AND MODELING OF THREE-DIMENSIONAL LANDFORM DATA BASED ON POINT CLOUD DATA USING VARIOUS MEASUREMENT HARDWARE

Journal of Japan Society of Civil Engineers Ser F3 (Civil Engineering Informatics) ◽

10.2208/jscejcei.74.ii_99 ◽

2018 ◽

Vol 74 (2) ◽

pp. II_99-II_109 ◽

Cited By ~ 1

Author(s):

Satoshi KUBOTA ◽

Ryuichi IMAI ◽

Kenji NAKAMURA ◽

Jun SAKURAI ◽

Shigenori TANAKA

Keyword(s):

Point Cloud ◽

Three Dimensional ◽

Point Cloud Data ◽

Cloud Data

Download Full-text

Transcription Alignment of Historical Vietnamese Manuscripts without Human-Annotated Learning Samples

Applied Sciences ◽

10.3390/app11114894 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4894

Author(s):

Anna Scius-Bertrand ◽

Michael Jungo ◽

Beat Wolf ◽

Andreas Fischer ◽

Marc Bui

Keyword(s):

Object Detection ◽

State Of The Art ◽

Positive Impact ◽

Detection System ◽

Training Data ◽

Detection Accuracy ◽

Current State ◽

Alignment Task ◽

Scanned Image ◽

Automatic Transcription

The current state of the art for automatic transcription of historical manuscripts is typically limited by the requirement of human-annotated learning samples, which are are necessary to train specific machine learning models for specific languages and scripts. Transcription alignment is a simpler task that aims to find a correspondence between text in the scanned image and its existing Unicode counterpart, a correspondence which can then be used as training data. The alignment task can be approached with heuristic methods dedicated to certain types of manuscripts, or with weakly trained systems reducing the required amount of annotations. In this article, we propose a novel learning-based alignment method based on fully convolutional object detection that does not require any human annotation at all. Instead, the object detection system is initially trained on synthetic printed pages using a font and then adapted to the real manuscripts by means of self-training. On a dataset of historical Vietnamese handwriting, we demonstrate the feasibility of annotation-free alignment as well as the positive impact of self-training on the character detection accuracy, reaching a detection accuracy of 96.4% with a YOLOv5m model without using any human annotation.

Download Full-text

Multi-Dimensional Underwater Point Cloud Detection Based on Deep Learning

Sensors ◽

10.3390/s21030884 ◽

2021 ◽

Vol 21 (3) ◽

pp. 884

Author(s):

Chia-Ming Tsai ◽

Yi-Horng Lai ◽

Yung-Da Sun ◽

Yu-Jen Chung ◽

Jau-Woei Perng

Keyword(s):

Deep Learning ◽

Point Cloud ◽

Three Dimensional ◽

Point Clouds ◽

Training Data ◽

Network Architectures ◽

Point Cloud Data ◽

Data Types ◽

Raw Data ◽

Cloud Data

Numerous sensors can obtain images or point cloud data on land, however, the rapid attenuation of electromagnetic signals and the lack of light in water have been observed to restrict sensing functions. This study expands the utilization of two- and three-dimensional detection technologies in underwater applications to detect abandoned tires. A three-dimensional acoustic sensor, the BV5000, is used in this study to collect underwater point cloud data. Some pre-processing steps are proposed to remove noise and the seabed from raw data. Point clouds are then processed to obtain two data types: a 2D image and a 3D point cloud. Deep learning methods with different dimensions are used to train the models. In the two-dimensional method, the point cloud is transferred into a bird’s eye view image. The Faster R-CNN and YOLOv3 network architectures are used to detect tires. Meanwhile, in the three-dimensional method, the point cloud associated with a tire is cut out from the raw data and is used as training data. The PointNet and PointConv network architectures are then used for tire classification. The results show that both approaches provide good accuracy.

Download Full-text

Comparison of Depth Camera and Terrestrial Laser Scanner in Monitoring Structural Deflections

Sensors ◽

10.3390/s21010201 ◽

2020 ◽

Vol 21 (1) ◽

pp. 201

Author(s):

Michael Bekele Maru ◽

Donghwan Lee ◽

Kassahun Demissie Tola ◽

Seunghee Park

Keyword(s):

Optical Sensors ◽

Point Cloud ◽

Three Dimensional ◽

Laser Scanner ◽

Point Clouds ◽

Depth Camera ◽

Terrestrial Laser Scanner ◽

Cloud Data ◽

3D Point Clouds ◽

3D Information

Modeling a structure in the virtual world using three-dimensional (3D) information enhances our understanding, while also aiding in the visualization, of how a structure reacts to any disturbance. Generally, 3D point clouds are used for determining structural behavioral changes. Light detection and ranging (LiDAR) is one of the crucial ways by which a 3D point cloud dataset can be generated. Additionally, 3D cameras are commonly used to develop a point cloud containing many points on the external surface of an object around it. The main objective of this study was to compare the performance of optical sensors, namely a depth camera (DC) and terrestrial laser scanner (TLS) in estimating structural deflection. We also utilized bilateral filtering techniques, which are commonly used in image processing, on the point cloud data for enhancing their accuracy and increasing the application prospects of these sensors in structure health monitoring. The results from these sensors were validated by comparing them with the outputs from a linear variable differential transformer sensor, which was mounted on the beam during an indoor experiment. The results showed that the datasets obtained from both the sensors were acceptable for nominal deflections of 3 mm and above because the error range was less than ±10%. However, the result obtained from the TLS were better than those obtained from the DC.

Download Full-text

Research on Generation Rules of Chest Width Point Curve Based on 3D Female Mannequin

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.796.513 ◽

2013 ◽

Vol 796 ◽

pp. 513-518

Author(s):

Rong Jin ◽

Bing Fei Gu ◽

Guo Lian Liu

Keyword(s):

Point Cloud ◽

Three Dimensional ◽

Upper Body ◽

Secondary Development ◽

Linear Regression Method ◽

Point Cloud Data ◽

Body Type ◽

Cloud Data ◽

Engineering Software ◽

The Individual

In this paper 110 female undergraduates in Soochow University are measured by using 3D non-contact measurement system and manual measurement. 3D point cloud data of human body is taken as research objects by using anti-engineering software, and secondary development of point cloud data is done on the basis of optimizing point cloud data. In accordance with the definition of the human chest width points and other feature points, and in the operability of the three-dimensional point cloud data, the width, thickness, and length dimensions of the curve through the chest width point are measured. Classification of body type is done by choosing the ratio values as classification index which is the ratio between thickness and width of the curve. The generation rules of the chest curve are determined for each type by using linear regression method. Human arm model could be established by the computer automatically. Thereby the individual model of the female upper body mannequin modeling can be improved effectively.

Download Full-text

Fragmentation calculation method for blast muck piles in open-pit copper mines based on three-dimensional laser point cloud data

International Journal of Applied Earth Observation and Geoinformation ◽

10.1016/j.jag.2021.102338 ◽

2021 ◽

Vol 100 ◽

pp. 102338

Author(s):

Yongzhi Wang ◽

Wenlong Tu ◽

Hui Li

Keyword(s):

Calculation Method ◽

Point Cloud ◽

Three Dimensional ◽

Open Pit ◽

Point Cloud Data ◽

Cloud Data ◽

Copper Mines

Download Full-text