We Learn Better Road Pothole Detection: from Attention Aggregation to Adversarial Domain Adaptation

10.36227/techrxiv.12813323.v1 ◽

2020 ◽

Author(s):

Rui Fan ◽

Hengli Wang ◽

Junaid Bocus ◽

Ming Liu

Keyword(s):

Domain Adaptation ◽

State Of The Art ◽

Semantic Segmentation ◽

Depth Image ◽

The Road ◽

Depth Images ◽

Different Types ◽

Processing Module ◽

Rgb Images ◽

Pothole Detection

<div>Manual visual inspection, typically performed by certified inspectors, is still the main form of road pothole detection. This process is, however, not only tedious, time-consuming and costly, but also dangerous for the inspectors. Furthermore, the road pothole detection results are always subjective, because they depend entirely on the inspector's experience. In this paper, we first introduce a disparity (or inverse depth) image processing module, named quasi inverse perspective transformation (QIPT), which can make the damaged road areas become highly distinguishable. Then, we propose a novel attention aggregation (AA) framework, which can improve the semantic segmentation networks for better road pothole detection, by taking the advantages of different types of attention modules. Moreover, we develop a novel training set augmentation technique based on adversarial domain adaptation, where synthetic road RGB images and transformed road disparity (or inverse depth) images are generated to enhance the training of semantic segmentation networks.</div><div>The experimental results illustrate that, firstly, the disparity (or inverse depth) images transformed by our QIPT module become more informative; secondly, the adversarial domain adaptation can not only significantly improve the performance of the state-of-the-art semantic segmentation networks, but also accelerate their convergence. In addition, AA-UNet and AA-RTFNet, our best performing implementations, respectively outperform all other state-of-the-art single-modal and data-fusion networks for road pothole detection.</div>

Download Full-text

We Learn Better Road Pothole Detection: from Attention Aggregation to Adversarial Domain Adaptation

10.36227/techrxiv.12813323.v2 ◽

2020 ◽

Author(s):

Rui Fan ◽

Hengli Wang ◽

Junaid Bocus ◽

Ming Liu

Keyword(s):

Domain Adaptation ◽

State Of The Art ◽

Semantic Segmentation ◽

Depth Image ◽

The Road ◽

Depth Images ◽

Different Types ◽

Processing Module ◽

Rgb Images ◽

Pothole Detection

<div>Manual visual inspection, typically performed by certified inspectors, is still the main form of road pothole detection. This process is, however, not only tedious, time-consuming and costly, but also dangerous for the inspectors. Furthermore, the road pothole detection results are always subjective, because they depend entirely on the inspector's experience. In this paper, we first introduce a disparity (or inverse depth) image processing module, named quasi inverse perspective transformation (QIPT), which can make the damaged road areas become highly distinguishable. Then, we propose a novel attention aggregation (AA) framework, which can improve the semantic segmentation networks for better road pothole detection, by taking the advantages of different types of attention modules. Moreover, we develop a novel training set augmentation technique based on adversarial domain adaptation, where synthetic road RGB images and transformed road disparity (or inverse depth) images are generated to enhance the training of semantic segmentation networks.</div><div>The experimental results illustrate that, firstly, the disparity (or inverse depth) images transformed by our QIPT module become more informative; secondly, the adversarial domain adaptation can not only significantly improve the performance of the state-of-the-art semantic segmentation networks, but also accelerate their convergence. In addition, AA-UNet and AA-RTFNet, our best performing implementations, respectively outperform all other state-of-the-art single-modal and data-fusion networks for road pothole detection.</div>

Download Full-text

HRDepthNet: Depth Image-Based Marker-Less Tracking of Body Joints

Sensors ◽

10.3390/s21041356 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1356

Author(s):

Linda Christin Büker ◽

Finnja Zuber ◽

Andreas Hein ◽

Sebastian Fudickar

Keyword(s):

Color Images ◽

Depth Image ◽

Accuracy Evaluation ◽

Timed Up And Go ◽

Position Errors ◽

Depth Images ◽

Upper And Lower Extremities ◽

Rgb Images ◽

Human Joints ◽

Body Joints

With approaches for the detection of joint positions in color images such as HRNet and OpenPose being available, consideration of corresponding approaches for depth images is limited even though depth images have several advantages over color images like robustness to light variation or color- and texture invariance. Correspondingly, we introduce High- Resolution Depth Net (HRDepthNet)—a machine learning driven approach to detect human joints (body, head, and upper and lower extremities) in purely depth images. HRDepthNet retrains the original HRNet for depth images. Therefore, a dataset is created holding depth (and RGB) images recorded with subjects conducting the timed up and go test—an established geriatric assessment. The images were manually annotated RGB images. The training and evaluation were conducted with this dataset. For accuracy evaluation, detection of body joints was evaluated via COCO’s evaluation metrics and indicated that the resulting depth image-based model achieved better results than the HRNet trained and applied on corresponding RGB images. An additional evaluation of the position errors showed a median deviation of 1.619 cm (x-axis), 2.342 cm (y-axis) and 2.4 cm (z-axis).

Download Full-text

Semantic Segmentation of a Printed Circuit Board for Component Recognition Based on Depth Images

Sensors ◽

10.3390/s20185318 ◽

2020 ◽

Vol 20 (18) ◽

pp. 5318

Author(s):

Dongnian Li ◽

Changming Li ◽

Chengjun Chen ◽

Zhengxu Zhao

Keyword(s):

Random Forest ◽

Printed Circuit Board ◽

Semantic Segmentation ◽

Circuit Board ◽

Depth Image ◽

Training Set ◽

Pixel Classification ◽

Printed Circuit ◽

Depth Images ◽

Illumination Changes

Locating and identifying the components mounted on a printed circuit board (PCB) based on machine vision is an important and challenging problem for automated PCB inspection and automated PCB recycling. In this paper, we propose a PCB semantic segmentation method based on depth images that segments and recognizes components in the PCB through pixel classification. The image training set for the PCB was automatically synthesized with graphic rendering. Based on a series of concentric circles centered at the given depth pixel, we extracted the depth difference features from the depth images in the training set to train a random forest pixel classifier. By using the constructed random forest pixel classifier, we performed semantic segmentation for the PCB to segment and recognize components in the PCB through pixel classification. Experiments on both synthetic and real test sets were conducted to verify the effectiveness of the proposed method. The experimental results demonstrate that our method can segment and recognize most of the components from a real depth image of the PCB. Our method is immune to illumination changes and can be implemented in parallel on a GPU.

Download Full-text

Multiscale Road Extraction in Remote Sensing Images

Computational Intelligence and Neuroscience ◽

10.1155/2019/2373798 ◽

2019 ◽

Vol 2019 ◽

pp. 1-9 ◽

Cited By ~ 4

Author(s):

Aziguli Wulamu ◽

Zuxian Shi ◽

Dezheng Zhang ◽

Zheyu He

Keyword(s):

Remote Sensing ◽

Network Architecture ◽

Semantic Segmentation ◽

Road Extraction ◽

Remote Sensing Images ◽

The Road ◽

Proposed Model ◽

Different Types ◽

Spatial Pyramid Pooling ◽

The One

Recent advances in convolutional neural networks (CNNs) have shown impressive results in semantic segmentation. Among the successful CNN-based methods, U-Net has achieved exciting performance. In this paper, we proposed a novel network architecture based on U-Net and atrous spatial pyramid pooling (ASPP) to deal with the road extraction task in the remote sensing field. On the one hand, U-Net structure can effectively extract valuable features; on the other hand, ASPP is able to utilize multiscale context information in remote sensing images. Compared to the baseline, this proposed model has improved the pixelwise mean Intersection over Union (mIoU) of 3 points. Experimental results show that the proposed network architecture can deal with different types of road surface extraction tasks under various terrains in Yinchuan city, solve the road connectivity problem to some extent, and has certain tolerance to shadows and occlusion.

Download Full-text

Self-Ensembling Attention Networks: Addressing Domain Shift for Semantic Segmentation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015581 ◽

2019 ◽

Vol 33 ◽

pp. 5581-5588 ◽

Cited By ~ 3

Author(s):

Yonghao Xu ◽

Bo Du ◽

Lefei Zhang ◽

Qian Zhang ◽

Guoli Wang ◽

...

Keyword(s):

Domain Adaptation ◽

State Of The Art ◽

Semantic Segmentation ◽

Great Success ◽

Learning Models ◽

Target Domain ◽

Attention Networks ◽

Source Domain ◽

Benchmark Datasets ◽

Different Levels

Recent years have witnessed the great success of deep learning models in semantic segmentation. Nevertheless, these models may not generalize well to unseen image domains due to the phenomenon of domain shift. Since pixel-level annotations are laborious to collect, developing algorithms which can adapt labeled data from source domain to target domain is of great significance. To this end, we propose self-ensembling attention networks to reduce the domain gap between different datasets. To the best of our knowledge, the proposed method is the first attempt to introduce selfensembling model to domain adaptation for semantic segmentation, which provides a different view on how to learn domain-invariant features. Besides, since different regions in the image usually correspond to different levels of domain gap, we introduce the attention mechanism into the proposed framework to generate attention-aware features, which are further utilized to guide the calculation of consistency loss in the target domain. Experiments on two benchmark datasets demonstrate that the proposed framework can yield competitive performance compared with the state of the art methods.

Download Full-text

Full Resolution Dense Depth Recovery by Fusing RGB Images and Sparse Depth

10.36227/techrxiv.11687193.v1 ◽

2020 ◽

Author(s):

Guoliang Liu

Keyword(s):

State Of The Art ◽

Depth Estimation ◽

Depth Image ◽

Estimation Accuracy ◽

Estimation Result ◽

Recovery Method ◽

Depth Recovery ◽

Full Resolution ◽

Rgb Images ◽

Rgb Image

Full resolution depth is required in many realworld engineering applications. However, exist depth sensorsonly offer sparse depth sample points with limited resolutionand noise, e.g., LiDARs. We here propose a deep learningbased full resolution depth recovery method from monocularimages and corresponding sparse depth measurements of targetenvironment. The novelty of our idea is that the structure similarinformation between the RGB image and depth image is used torefine the dense depth estimation result. This important similarstructure information can be found using a correlation layerin the regression neural network. We show that the proposedmethod can achieve higher estimation accuracy compared tothe state of the art methods. The experiments conducted on theNYU Depth V2 prove the novelty of our idea.<br>

Download Full-text

Automatic 3D Landmark Extraction System Based on an Encoder–Decoder Using Fusion of Vision and LiDAR

Remote Sensing ◽

10.3390/rs12071142 ◽

2020 ◽

Vol 12 (7) ◽

pp. 1142

Author(s):

Jeonghoon Kwak ◽

Yunsick Sung

Keyword(s):

Point Cloud ◽

Point Clouds ◽

Depth Image ◽

3D Point Cloud ◽

Digital World ◽

Depth Images ◽

3D Point Clouds ◽

Rgb Images ◽

Rgb Image ◽

3D Landmarks

To provide a realistic environment for remote sensing applications, point clouds are used to realize a three-dimensional (3D) digital world for the user. Motion recognition of objects, e.g., humans, is required to provide realistic experiences in the 3D digital world. To recognize a user’s motions, 3D landmarks are provided by analyzing a 3D point cloud collected through a light detection and ranging (LiDAR) system or a red green blue (RGB) image collected visually. However, manual supervision is required to extract 3D landmarks as to whether they originate from the RGB image or the 3D point cloud. Thus, there is a need for a method for extracting 3D landmarks without manual supervision. Herein, an RGB image and a 3D point cloud are used to extract 3D landmarks. The 3D point cloud is utilized as the relative distance between a LiDAR and a user. Because it cannot contain all information the user’s entire body due to disparities, it cannot generate a dense depth image that provides the boundary of user’s body. Therefore, up-sampling is performed to increase the density of the depth image generated based on the 3D point cloud; the density depends on the 3D point cloud. This paper proposes a system for extracting 3D landmarks using 3D point clouds and RGB images without manual supervision. A depth image provides the boundary of a user’s motion and is generated by using 3D point cloud and RGB image collected by a LiDAR and an RGB camera, respectively. To extract 3D landmarks automatically, an encoder–decoder model is trained with the generated depth images, and the RGB images and 3D landmarks are extracted from these images with the trained encoder model. The method of extracting 3D landmarks using RGB depth (RGBD) images was verified experimentally, and 3D landmarks were extracted to evaluate the user’s motions with RGBD images. In this manner, landmarks could be extracted according to the user’s motions, rather than by extracting them using the RGB images. The depth images generated by the proposed method were 1.832 times denser than the up-sampling-based depth images generated with bilateral filtering.

Download Full-text

SemanticDepth: Fusing Semantic Segmentation and Monocular Depth Estimation for Enabling Autonomous Driving in Roads without Lane Lines

Sensors ◽

10.3390/s19143224 ◽

2019 ◽

Vol 19 (14) ◽

pp. 3224 ◽

Cited By ~ 4

Author(s):

Pablo R. Palafox ◽

Johannes Betz ◽

Felix Nobis ◽

Konstantin Riedl ◽

Markus Lienkamp

Keyword(s):

Semantic Segmentation ◽

Depth Estimation ◽

Autonomous Driving ◽

Warning Systems ◽

The Road ◽

Lane Departure ◽

Rgb Images ◽

Monocular Depth ◽

On The Road ◽

The City

Typically, lane departure warning systems rely on lane lines being present on the road.However, in many scenarios, e.g., secondary roads or some streets in cities, lane lines are eithernot present or not sufficiently well signaled. In this work, we present a vision-based method tolocate a vehicle within the road when no lane lines are present using only RGB images as input.To this end, we propose to fuse together the outputs of a semantic segmentation and a monoculardepth estimation architecture to reconstruct locally a semantic 3D point cloud of the viewed scene.We only retain points belonging to the road and, additionally, to any kind of fences or walls thatmight be present right at the sides of the road. We then compute the width of the road at a certainpoint on the planned trajectory and, additionally, what we denote as the fence-to-fence distance.Our system is suited to any kind of motoring scenario and is especially useful when lane lines arenot present on the road or do not signal the path correctly. The additional fence-to-fence distancecomputation is complementary to the road’s width estimation. We quantitatively test our methodon a set of images featuring streets of the city of Munich that contain a road-fence structure, so asto compare our two proposed variants, namely the road’s width and the fence-to-fence distancecomputation. In addition, we also validate our system qualitatively on the Stuttgart sequence of thepublicly available Cityscapes dataset, where no fences or walls are present at the sides of the road,thus demonstrating that our system can be deployed in a standard city-like environment. For thebenefit of the community, we make our software open source.

Download Full-text

An Adversarial Perturbation Oriented Domain Adaptation Approach for Semantic Segmentation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6952 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12613-12620 ◽

Cited By ~ 3

Author(s):

Jihan Yang ◽

Ruijia Xu ◽

Ruiyu Li ◽

Xiaojuan Qi ◽

Xiaoyong Shen ◽

...

Keyword(s):

Marginal Distribution ◽

Domain Adaptation ◽

State Of The Art ◽

Feature Space ◽

Semantic Segmentation ◽

The State ◽

Object Size ◽

Feature Maps ◽

Feature Representations ◽

Unsupervised Domain Adaptation

We focus on Unsupervised Domain Adaptation (UDA) for the task of semantic segmentation. Recently, adversarial alignment has been widely adopted to match the marginal distribution of feature representations across two domains globally. However, this strategy fails in adapting the representations of the tail classes or small objects for semantic segmentation since the alignment objective is dominated by head categories or large objects. In contrast to adversarial alignment, we propose to explicitly train a domain-invariant classifier by generating and defensing against pointwise feature space adversarial perturbations. Specifically, we firstly perturb the intermediate feature maps with several attack objectives (i.e., discriminator and classifier) on each individual position for both domains, and then the classifier is trained to be invariant to the perturbations. By perturbing each position individually, our model treats each location evenly regardless of the category or object size and thus circumvents the aforementioned issue. Moreover, the domain gap in feature space is reduced by extrapolating source and target perturbed features towards each other with attack on the domain discriminator. Our approach achieves the state-of-the-art performance on two challenging domain adaptation tasks for semantic segmentation: GTA5 → Cityscapes and SYNTHIA → Cityscapes.

Download Full-text