Object-Level Semantic Map Construction for Dynamic Scenes

Xujie Kang; Jing Li; Xiangtao Fan; Hongdeng Jian; Chen Xu

doi:10.3390/app11020645

Object-Level Semantic Map Construction for Dynamic Scenes

Applied Sciences ◽

10.3390/app11020645 ◽

2021 ◽

Vol 11 (2) ◽

pp. 645

Author(s):

Xujie Kang ◽

Jing Li ◽

Xiangtao Fan ◽

Hongdeng Jian ◽

Chen Xu

Keyword(s):

Moving Objects ◽

Dynamic Environments ◽

Dynamic Object ◽

Dynamic Scenes ◽

Current Frame ◽

Semantic Map ◽

Pose Tracking ◽

Camera Pose ◽

Object Level ◽

Instance Segmentation

Visual simultaneous localization and mapping (SLAM) is challenging in dynamic environments as moving objects can impair camera pose tracking and mapping. This paper introduces a method for robust dense bject-level SLAM in dynamic environments that takes a live stream of RGB-D frame data as input, detects moving objects, and segments the scene into different objects while simultaneously tracking and reconstructing their 3D structures. This approach provides a new method of dynamic object detection, which integrates prior knowledge of the object model database constructed, object-oriented 3D tracking against the camera pose, and the association between the instance segmentation results on the current frame data and an object database to find dynamic objects in the current frame. By leveraging the 3D static model for frame-to-model alignment, as well as dynamic object culling, the camera motion estimation reduced the overall drift. According to the camera pose accuracy and instance segmentation results, an object-level semantic map representation was constructed for the world map. The experimental results obtained using the TUM RGB-D dataset, which compares the proposed method to the related state-of-the-art approaches, demonstrating that our method achieves similar performance in static scenes and improved accuracy and robustness in dynamic scenes.

Download Full-text

Robust Semantic Mapping in Challenging Environments

Robotica ◽

10.1017/s0263574719000584 ◽

2019 ◽

Vol 38 (2) ◽

pp. 256-270 ◽

Cited By ~ 1

Author(s):

Jiyu Cheng ◽

Yuxiang Sun ◽

Max Q.-H. Meng

Keyword(s):

Semantic Information ◽

Moving Objects ◽

Dynamic Environments ◽

Semantic Mapping ◽

Visual Slam ◽

The Public ◽

Semantic Map ◽

Recent Success ◽

3D Space ◽

Localization And Mapping

SummaryVisual simultaneous localization and mapping (visual SLAM) has been well developed in recent decades. To facilitate tasks such as path planning and exploration, traditional visual SLAM systems usually provide mobile robots with the geometric map, which overlooks the semantic information. To address this problem, inspired by the recent success of the deep neural network, we combine it with the visual SLAM system to conduct semantic mapping. Both the geometric and semantic information will be projected into the 3D space for generating a 3D semantic map. We also use an optical-flow-based method to deal with the moving objects such that our method is capable of working robustly in dynamic environments. We have performed our experiments in the public TUM dataset and our recorded office dataset. Experimental results demonstrate the feasibility and impressive performance of the proposed method.

Download Full-text

3D Semantic VSLAM of Indoor Environment Based on Mask Scoring RCNN

Discrete Dynamics in Nature and Society ◽

10.1155/2020/5916205 ◽

2020 ◽

Vol 2020 ◽

pp. 1-14

Author(s):

Chongben Tao ◽

Yufeng Jin ◽

Feng Cao ◽

Zufeng Zhang ◽

Chunguang Li ◽

...

Keyword(s):

Indoor Environment ◽

Semantic Information ◽

Moving Objects ◽

Experimental Comparison ◽

Indoor Environments ◽

Feature Points ◽

Current Frame ◽

Semantic Map ◽

Key Frame ◽

The Impact

In view of existing Visual SLAM (VSLAM) algorithms when constructing semantic map of indoor environment, there are problems with low accuracy and low label classification accuracy when feature points are sparse. This paper proposed a 3D semantic VSLAM algorithm called BMASK-RCNN based on Mask Scoring RCNN. Firstly, feature points of images are extracted by Binary Robust Invariant Scalable Keypoints (BRISK) algorithm. Secondly, map points of reference key frame are projected to current frame for feature matching and pose estimation, and an inverse depth filter is used to estimate scene depth of created key frame to obtain camera pose changes. In order to achieve object detection and semantic segmentation for both static objects and dynamic objects in indoor environments and then construct dense 3D semantic map with VSLAM algorithm, a Mask Scoring RCNN is used to adjust its structure partially, where a TUM RGB-D SLAM dataset for transfer learning is employed. Semantic information of independent targets in scenes provides semantic information including categories, which not only provides high accuracy of localization but also realizes the probability update of semantic estimation by marking movable objects, thereby reducing the impact of moving objects on real-time mapping. Through simulation and actual experimental comparison with other three algorithms, results show the proposed algorithm has better robustness, and semantic information used in 3D semantic mapping can be accurately obtained.

Download Full-text

DMS-SLAM: A General Visual SLAM System for Dynamic Scenes with Multiple Sensors

Sensors ◽

10.3390/s19173714 ◽

2019 ◽

Vol 19 (17) ◽

pp. 3714 ◽

Cited By ~ 1

Author(s):

Guihua Liu ◽

Weilin Zeng ◽

Bo Feng ◽

Feng Xu

Keyword(s):

Large Deviation ◽

Autonomous Driving ◽

Position Estimation ◽

Visual Slam ◽

Dynamic Scenes ◽

Current Frame ◽

Multiple Sensors ◽

Pose Tracking ◽

Dynamic Objects ◽

Local Map

Presently, although many impressed SLAM systems have achieved exceptional accuracy in a real environment, most of them are verified in the static environment. However, for mobile robots and autonomous driving, the dynamic objects in the scene can result in tracking failure or large deviation during pose estimation. In this paper, a general visual SLAM system for dynamic scenes with multiple sensors called DMS-SLAM is proposed. First, the combination of GMS and sliding window is used to achieve the initialization of the system, which can eliminate the influence of dynamic objects and construct a static initialization 3D map. Then, the corresponding 3D points of the current frame in the local map are obtained by reprojection. These points are combined with the constant speed model or reference frame model to achieve the position estimation of the current frame and the update of the 3D map points in the local map. Finally, the keyframes selected by the tracking module are combined with the GMS feature matching algorithm to add static 3D map points to the local map. DMS-SLAM implements pose tracking, closed-loop detection and relocalization based on static 3D map points of the local map and supports monocular, stereo and RGB-D visual sensors in dynamic scenes. Exhaustive evaluation in public TUM and KITTI datasets demonstrates that DMS-SLAM outperforms state-of-the-art visual SLAM systems in accuracy and speed in dynamic scenes.

Download Full-text

Visual SLAM Based on Dynamic Object Removal

10.36227/techrxiv.11687190.v1 ◽

2020 ◽

Author(s):

Guoliang Liu

Keyword(s):

Moving Objects ◽

State Of The Art ◽

Service Robot ◽

Visual Slam ◽

Dynamic Object ◽

Dynamic Features ◽

Dynamic Scenes ◽

The Core ◽

Localization And Mapping ◽

Strong Hypothesis

Visual simultaneous localization and mapping (SLAM) is the core of intelligent robot navigation system. Many traditional SLAM algorithms assume that the scene is static. When a dynamic object appears in the environment, the accuracy of visual SLAM can degrade due to the interference of dynamic features of moving objects. This strong hypothesis limits the SLAM applications for service robot or driverless car intherealdynamicenvironment.Inthispaper,adynamicobject removal algorithm that combines object recognition and optical ﬂow techniques is proposed in the visual SLAM framework for dynamic scenes. The experimental results show that our new method can detect moving object effectively and improve the SLAM performance compared to the state of the art methods.<br>

Download Full-text

Visual SLAM Based on Dynamic Object Removal

10.36227/techrxiv.11687190 ◽

2020 ◽

Author(s):

Guoliang Liu

Keyword(s):

Moving Objects ◽

State Of The Art ◽

Service Robot ◽

Visual Slam ◽

Dynamic Object ◽

Dynamic Features ◽

Dynamic Scenes ◽

The Core ◽

Localization And Mapping ◽

Strong Hypothesis

Download Full-text

DYNAMIC BACKGROUND SUBTRACTION BASED ON LOCAL DEPENDENCY HISTOGRAM

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001409007569 ◽

2009 ◽

Vol 23 (07) ◽

pp. 1397-1419 ◽

Cited By ~ 28

Author(s):

SHENGPING ZHANG ◽

HONGXUN YAO ◽

SHAOHUI LIU

Keyword(s):

Background Subtraction ◽

Moving Objects ◽

Dynamic Scenes ◽

Dynamic Background ◽

Current Frame ◽

Local Dependency ◽

Illumination Changes ◽

Effective Dynamic ◽

Spatial Dependencies ◽

Traditional Approaches

Traditional background subtraction methods perform poorly when scenes contain dynamic backgrounds such as waving tree branches, spouting fountain, illumination changes, camera jitters, etc. In this paper, from the view of spatial context, we present a novel and effective dynamic background method with three contributions. First, we present a novel local dependency descriptor, called local dependency histogram (LDH), to effectively model the spatial dependencies between a pixel and its neighboring pixels. The spatial dependencies contain substantial evidence for differentiating dynamic background regions from moving objects of interest. Second, based on the proposed LDH, an effective approach to dynamic background subtraction is proposed, in which each pixel is modeled as a group of weighted LDHs. Labeling a pixel as foreground or background is done by comparing the LDH computed in current frame against its model LDHs. The model LDHs are adaptively updated by the current LDH. Finally, unlike traditional approaches using a fixed threshold to judge whether a pixel matches to its model, an adaptive thresholding technique is also proposed. Experimental results on a diverse set of dynamic scenes validate that the proposed method significantly outperforms traditional methods for dynamic background subtraction.

Download Full-text

Sensor Fusion-Based Approach to Eliminating Moving Objects for SLAM in Dynamic Environments

Sensors ◽

10.3390/s21010230 ◽

2021 ◽

Vol 21 (1) ◽

pp. 230

Author(s):

Xiangwei Dang ◽

Zheng Rong ◽

Xingdong Liang

Keyword(s):

State Estimation ◽

Autonomous Navigation ◽

Moving Objects ◽

Physical Simulation ◽

Point Clouds ◽

Dynamic Environments ◽

Accurate Localization ◽

Novel Approach ◽

Localization And Mapping ◽

Real World Datasets

Accurate localization and reliable mapping is essential for autonomous navigation of robots. As one of the core technologies for autonomous navigation, Simultaneous Localization and Mapping (SLAM) has attracted widespread attention in recent decades. Based on vision or LiDAR sensors, great efforts have been devoted to achieving real-time SLAM that can support a robot’s state estimation. However, most of the mature SLAM methods generally work under the assumption that the environment is static, while in dynamic environments they will yield degenerate performance or even fail. In this paper, first we quantitatively evaluate the performance of the state-of-the-art LiDAR-based SLAMs taking into account different pattens of moving objects in the environment. Through semi-physical simulation, we observed that the shape, size, and distribution of moving objects all can impact the performance of SLAM significantly, and obtained instructive investigation results by quantitative comparison between LOAM and LeGO-LOAM. Secondly, based on the above investigation, a novel approach named EMO to eliminating the moving objects for SLAM fusing LiDAR and mmW-radar is proposed, towards improving the accuracy and robustness of state estimation. The method fully uses the advantages of different characteristics of two sensors to realize the fusion of sensor information with two different resolutions. The moving objects can be efficiently detected based on Doppler effect by radar, accurately segmented and localized by LiDAR, then filtered out from the point clouds through data association and accurate synchronized in time and space. Finally, the point clouds representing the static environment are used as the input of SLAM. The proposed approach is evaluated through experiments using both semi-physical simulation and real-world datasets. The results demonstrate the effectiveness of the method at improving SLAM performance in accuracy (decrease by 30% at least in absolute position error) and robustness in dynamic environments.

Download Full-text

Beyond dominant plane assumption: Moving objects detection in severe dynamic scenes with Multi-Classes RANSAC

2012 International Conference on Audio, Language and Image Processing ◽

10.1109/icalip.2012.6376727 ◽

2012 ◽

Cited By ~ 3

Author(s):

Xu Zhang ◽

Shengjin Wang ◽

Xiaoqing Ding

Keyword(s):

Moving Objects ◽

Dynamic Scenes ◽

Moving Objects Detection ◽

Objects Detection

Download Full-text

Analyzing Dynamic Scenes Containing Multiple Moving Objects

Image Sequence Analysis - Springer Series in Information Sciences ◽

10.1007/978-3-642-87037-8_6 ◽

1981 ◽

pp. 355-380 ◽

Cited By ~ 3

Author(s):

J. K. Aggarwal ◽

W. N. Martin

Keyword(s):

Moving Objects ◽

Dynamic Scenes

Download Full-text

Accurate moving object segmentation in unconstraint videos based on robust seed pixels selection

International Journal of Advanced Robotic Systems ◽

10.1177/1729881420947273 ◽

2020 ◽

Vol 17 (4) ◽

pp. 172988142094727

Author(s):

Wenlong Zhang ◽

Xiaoliang Sun ◽

Qifeng Yu

Keyword(s):

Optical Flow ◽

Moving Objects ◽

Object Segmentation ◽

Moving Object ◽

Data Sets ◽

Seed Selection ◽

Current Frame ◽

Moving Object Segmentation ◽

Random Walker ◽

Previous Frame

Due to the clutter background motion, accurate moving object segmentation in unconstrained videos remains a significant open problem, especially for the slow-moving object. This article proposes an accurate moving object segmentation method based on robust seed selection. The seed pixels of the object and background are selected robustly by using the optical flow cues. Firstly, this article detects the moving object’s rough contour according to the local difference in the weighted orientation cues of the optical flow. Then, the detected rough contour is used to guide the object and the background seed pixel selection. The object seed pixels in the previous frame are propagated to the current frame according to the optical flow to improve the robustness of the seed selection. Finally, we adopt the random walker algorithm to segment the moving object accurately according to the selected seed pixels. Experiments on publicly available data sets indicate that the proposed method shows excellent performance in segmenting moving objects accurately in unconstraint videos.

Download Full-text