scholarly journals Visual Map Construction Using RGB-D Sensors for Image-Based Localization in Indoor Environments

2017 ◽  
Vol 2017 ◽  
pp. 1-18 ◽  
Author(s):  
Guanyuan Feng ◽  
Lin Ma ◽  
Xuezhi Tan

RGB-D sensors capture RGB images and depth images simultaneously, which makes it possible to acquire the depth information at pixel level. This paper focuses on the use of RGB-D sensors to construct a visual map which is an extended dense 3D map containing essential elements for image-based localization, such as poses of the database camera, visual features, and 3D structures of the building. Taking advantage of matched visual features and corresponding depth values, a novel local optimization algorithm is proposed to achieve point cloud registration and database camera pose estimation. Next, graph-based optimization is used to obtain the global consistency of the map. On the basis of the visual map, the image-based localization method is investigated, making use of the epipolar constraint. The performance of the visual map construction and the image-based localization are evaluated on typical indoor scenes. The simulation results show that the average position errors of the database camera and the query camera can be limited to within 0.2 meters and 0.9 meters, respectively.

Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1356
Author(s):  
Linda Christin Büker ◽  
Finnja Zuber ◽  
Andreas Hein ◽  
Sebastian Fudickar

With approaches for the detection of joint positions in color images such as HRNet and OpenPose being available, consideration of corresponding approaches for depth images is limited even though depth images have several advantages over color images like robustness to light variation or color- and texture invariance. Correspondingly, we introduce High- Resolution Depth Net (HRDepthNet)—a machine learning driven approach to detect human joints (body, head, and upper and lower extremities) in purely depth images. HRDepthNet retrains the original HRNet for depth images. Therefore, a dataset is created holding depth (and RGB) images recorded with subjects conducting the timed up and go test—an established geriatric assessment. The images were manually annotated RGB images. The training and evaluation were conducted with this dataset. For accuracy evaluation, detection of body joints was evaluated via COCO’s evaluation metrics and indicated that the resulting depth image-based model achieved better results than the HRNet trained and applied on corresponding RGB images. An additional evaluation of the position errors showed a median deviation of 1.619 cm (x-axis), 2.342 cm (y-axis) and 2.4 cm (z-axis).


Author(s):  
JUAN ANDRADE-CETTO ◽  
ALBERTO SANFELIU

A system that builds and maintains a dynamic map for a mobile robot is presented. A learning rule associated to each observed landmark is used to compute its robustness. The position of the robot during map construction is estimated by combining sensor readings, motion commands, and the current map state by means of an Extended Kalman Filter. The combination of landmark strength validation and Kalman filtering for map updating and robot position estimation allows for robust learning of moderately dynamic indoor environments.


2018 ◽  
Vol 2018 ◽  
pp. 1-13 ◽  
Author(s):  
Hasan Mahmud ◽  
Md. Kamrul Hasan ◽  
Abdullah-Al-Tariq ◽  
Md. Hasanul Kabir ◽  
M. A. Mottalib

Symbolic gestures are the hand postures with some conventionalized meanings. They are static gestures that one can perform in a very complex environment containing variations in rotation and scale without using voice. The gestures may be produced in different illumination conditions or occluding background scenarios. Any hand gesture recognition system should find enough discriminative features, such as hand-finger contextual information. However, in existing approaches, depth information of hand fingers that represents finger shapes is utilized in limited capacity to extract discriminative features of fingers. Nevertheless, if we consider finger bending information (i.e., a finger that overlaps palm), extracted from depth map, and use them as local features, static gestures varying ever so slightly can become distinguishable. Our work here corroborated this idea and we have generated depth silhouettes with variation in contrast to achieve more discriminative keypoints. This approach, in turn, improved the recognition accuracy up to 96.84%. We have applied Scale-Invariant Feature Transform (SIFT) algorithm which takes the generated depth silhouettes as input and produces robust feature descriptors as output. These features (after converting into unified dimensional feature vectors) are fed into a multiclass Support Vector Machine (SVM) classifier to measure the accuracy. We have tested our results with a standard dataset containing 10 symbolic gesture representing 10 numeric symbols (0-9). After that we have verified and compared our results among depth images, binary images, and images consisting of the hand-finger edge information generated from the same dataset. Our results show higher accuracy while applying SIFT features on depth images. Recognizing numeric symbols accurately performed through hand gestures has a huge impact on different Human-Computer Interaction (HCI) applications including augmented reality, virtual reality, and other fields.


Author(s):  
Shengjun Tang ◽  
Qing Zhu ◽  
Wu Chen ◽  
Walid Darwish ◽  
Bo Wu ◽  
...  

RGB-D sensors are novel sensing systems that capture RGB images along with pixel-wise depth information. Although they are widely used in various applications, RGB-D sensors have significant drawbacks with respect to 3D dense mapping of indoor environments. First, they only allow a measurement range with a limited distance (e.g., within 3 m) and a limited field of view. Second, the error of the depth measurement increases with increasing distance to the sensor. In this paper, we propose an enhanced RGB-D mapping method for detailed 3D modeling of large indoor environments by combining RGB image-based modeling and depth-based modeling. The scale ambiguity problem during the pose estimation with RGB image sequences can be resolved by integrating the information from the depth and visual information provided by the proposed system. A robust rigid-transformation recovery method is developed to register the RGB image-based and depth-based 3D models together. The proposed method is examined with two datasets collected in indoor environments for which the experimental results demonstrate the feasibility and robustness of the proposed method


Electronics ◽  
2019 ◽  
Vol 8 (2) ◽  
pp. 220 ◽  
Author(s):  
Ruibin Guo ◽  
Keju Peng ◽  
Dongxiang Zhou ◽  
Yunhui Liu

Orientation estimation is a crucial part of robotics tasks such as motion control, autonomous navigation, and 3D mapping. In this paper, we propose a robust visual-based method to estimate robots’ drift-free orientation with RGB-D cameras. First, we detect and track hybrid features (i.e., plane, line, and point) from color and depth images, which provides reliable constraints even in uncharacteristic environments with low texture or no consistent lines. Then, we construct a cost function based on these features and, by minimizing this function, we obtain the accurate rotation matrix of each captured frame with respect to its reference keyframe. Furthermore, we present a vanishing direction-estimation method to extract the Manhattan World (MW) axes; by aligning the current MW axes with the global MW axes, we refine the aforementioned rotation matrix of each keyframe and achieve drift-free orientation. Experiments on public RGB-D datasets demonstrate the robustness and accuracy of the proposed algorithm for orientation estimation. In addition, we have applied our proposed visual compass to pose estimation, and the evaluation on public sequences shows improved accuracy.


Sensors ◽  
2019 ◽  
Vol 19 (7) ◽  
pp. 1648 ◽  
Author(s):  
Ester Martinez-Martin ◽  
Angel del Pobil

Advances in Robotics are leading to a new generation of assistant robots working in ordinary, domestic settings. This evolution raises new challenges in the tasks to be accomplished by the robots. This is the case for object manipulation where the detect-approach-grasp loop requires a robust recovery stage, especially when the held object slides. Several proprioceptive sensors have been developed in the last decades, such as tactile sensors or contact switches, that can be used for that purpose; nevertheless, their implementation may considerably restrict the gripper’s flexibility and functionality, increasing their cost and complexity. Alternatively, vision can be used since it is an undoubtedly rich source of information, and in particular, depth vision sensors. We present an approach based on depth cameras to robustly evaluate the manipulation success, continuously reporting about any object loss and, consequently, allowing it to robustly recover from this situation. For that, a Lab-colour segmentation allows the robot to identify potential robot manipulators in the image. Then, the depth information is used to detect any edge resulting from two-object contact. The combination of those techniques allows the robot to accurately detect the presence or absence of contact points between the robot manipulator and a held object. An experimental evaluation in realistic indoor environments supports our approach.


2019 ◽  
Vol 16 (04) ◽  
pp. 1941002 ◽  
Author(s):  
Jing Li ◽  
Yang Mi ◽  
Gongfa Li ◽  
Zhaojie Ju

Facial expression recognition has been widely used in human computer interaction (HCI) systems. Over the years, researchers have proposed different feature descriptors, implemented different classification methods, and carried out a number of experiments on various datasets for automatic facial expression recognition. However, most of them used 2D static images or 2D video sequences for the recognition task. The main limitations of 2D-based analysis are problems associated with variations in pose and illumination, which reduce the recognition accuracy. Therefore, an alternative way is to incorporate depth information acquired by 3D sensor, because it is invariant in both pose and illumination. In this paper, we present a two-stream convolutional neural network (CNN)-based facial expression recognition system and test it on our own RGB-D facial expression dataset collected by Microsoft Kinect for XBOX in unspontaneous scenarios since Kinect is an inexpensive and portable device to capture both RGB and depth information. Our fully annotated dataset includes seven expressions (i.e., neutral, sadness, disgust, fear, happiness, anger, and surprise) for 15 subjects (9 males and 6 females) aged from 20 to 25. The two individual CNNs are identical in architecture but do not share parameters. To combine the detection results produced by these two CNNs, we propose the late fusion approach. The experimental results demonstrate that the proposed two-stream network using RGB-D images is superior to that of using only RGB images or depth images.


2019 ◽  
Vol 11 ◽  
pp. 175682931882232
Author(s):  
Navid Dorudian ◽  
Stanislao Lauria ◽  
Stephen Swift

A novel approach to detect micro air vehicles in GPS-denied environments using an external RGB-D sensor is presented. The nonparametric background subtraction technique incorporating several innovative mechanisms allows the detection of high-speed moving micro air vehicles by combining colour and depth information. The proposed method stores several colour and depth images as models and then compares each pixel from a frame with the stored models to classify the pixel as background or foreground. To adapt to scene changes, once a pixel is classified as background, the system updates the model by finding and substituting the closest pixel to the camera with the current pixel. The background model update presented uses different criteria from existing methods. Additionally, a blind update model is added to adapt to background sudden changes. The proposed architecture is compared with existing techniques using two different micro air vehicles and publicly available datasets. Results showing some improvements over existing methods are discussed.


2020 ◽  
Vol 308 ◽  
pp. 06003
Author(s):  
Jan-Christoph Schmitz ◽  
Stephan Tilgner ◽  
Kathrin Kalischewski ◽  
Daniel Wagner ◽  
Anton Kummert

This paper describes a system to automatically observe if the driver has his hands on the wheel, which is important to know that he can intervene if necessary. To accomplish this an artificial neural network is used, which utilizes depth information captured by a camera in the roof module of the car. This means that the driver and the steering wheel are viewed from above. The created classification system is described. It is designed to require as little computational effort as possible, since the target application is on an embedded system in the car. A dataset is presented and the effect of a class imbalance that is incorporated in it is studied. Furthermore, it is examined which part, i.e. the depth or the intensity image, of the available data is important to achieve the best possible performance. Finally, by examining a learning curve, an experiment is made to find out whether the recording of further training data would be reasonable.


Sign in / Sign up

Export Citation Format

Share Document