scholarly journals Latent 3D Volume for Joint Depth Estimation and Semantic Segmentation from a Single Image

Sensors ◽  
2020 ◽  
Vol 20 (20) ◽  
pp. 5765 ◽  
Author(s):  
Seiya Ito ◽  
Naoshi Kaneko ◽  
Kazuhiko Sumi

This paper proposes a novel 3D representation, namely, a latent 3D volume, for joint depth estimation and semantic segmentation. Most previous studies encoded an input scene (typically given as a 2D image) into a set of feature vectors arranged over a 2D plane. However, considering the real world is three-dimensional, this 2D arrangement reduces one dimension and may limit the capacity of feature representation. In contrast, we examine the idea of arranging the feature vectors in 3D space rather than in a 2D plane. We refer to this 3D volumetric arrangement as a latent 3D volume. We will show that the latent 3D volume is beneficial to the tasks of depth estimation and semantic segmentation because these tasks require an understanding of the 3D structure of the scene. Our network first constructs an initial 3D volume using image features and then generates latent 3D volume by passing the initial 3D volume through several 3D convolutional layers. We apply depth regression and semantic segmentation by projecting the latent 3D volume onto a 2D plane. The evaluation results show that our method outperforms previous approaches on the NYU Depth v2 dataset.

Sensors ◽  
2019 ◽  
Vol 19 (3) ◽  
pp. 563 ◽  
Author(s):  
J. Osuna-Coutiño ◽  
Jose Martinez-Carranza

High-Level Structure (HLS) extraction in a set of images consists of recognizing 3D elements with useful information to the user or application. There are several approaches to HLS extraction. However, most of these approaches are based on processing two or more images captured from different camera views or on processing 3D data in the form of point clouds extracted from the camera images. In contrast and motivated by the extensive work developed for the problem of depth estimation in a single image, where parallax constraints are not required, in this work, we propose a novel methodology towards HLS extraction from a single image with promising results. For that, our method has four steps. First, we use a CNN to predict the depth for a single image. Second, we propose a region-wise analysis to refine depth estimates. Third, we introduce a graph analysis to segment the depth in semantic orientations aiming at identifying potential HLS. Finally, the depth sections are provided to a new CNN architecture that predicts HLS in the shape of cubes and rectangular parallelepipeds.


Author(s):  
N. Zeller ◽  
C. A. Noury ◽  
F. Quint ◽  
C. Teulière ◽  
U. Stilla ◽  
...  

In this paper we present a new calibration approach for focused plenoptic cameras. We derive a new mathematical projection model of a focused plenoptic camera which considers lateral as well as depth distortion. Therefore, we derive a new depth distortion model directly from the theory of depth estimation in a focused plenoptic camera. In total the model consists of five intrinsic parameters, the parameters for radial and tangential distortion in the image plane and two new depth distortion parameters. In the proposed calibration we perform a complete bundle adjustment based on a 3D calibration target. The residual of our optimization approach is three dimensional, where the depth residual is defined by a scaled version of the inverse virtual depth difference and thus conforms well to the measured data. Our method is evaluated based on different camera setups and shows good accuracy. For a better characterization of our approach we evaluate the accuracy of virtual image points projected back to 3D space.


Author(s):  
N. Zeller ◽  
C. A. Noury ◽  
F. Quint ◽  
C. Teulière ◽  
U. Stilla ◽  
...  

In this paper we present a new calibration approach for focused plenoptic cameras. We derive a new mathematical projection model of a focused plenoptic camera which considers lateral as well as depth distortion. Therefore, we derive a new depth distortion model directly from the theory of depth estimation in a focused plenoptic camera. In total the model consists of five intrinsic parameters, the parameters for radial and tangential distortion in the image plane and two new depth distortion parameters. In the proposed calibration we perform a complete bundle adjustment based on a 3D calibration target. The residual of our optimization approach is three dimensional, where the depth residual is defined by a scaled version of the inverse virtual depth difference and thus conforms well to the measured data. Our method is evaluated based on different camera setups and shows good accuracy. For a better characterization of our approach we evaluate the accuracy of virtual image points projected back to 3D space.


2020 ◽  
Vol 12 (5) ◽  
Author(s):  
Zilong Li ◽  
Songming Hou ◽  
Thomas C. Bishop

Abstract The Magic Snake (Rubik’s Snake) is a toy that was invented decades ago. It draws much less attention than Rubik’s Cube, which was invented by the same professor, Erno Rubik. The number of configurations of a Magic Snake, determined by the number of discrete rotations about the elementary wedges in a typical snake, is far less than the possible configurations of a typical cube. However, a cube has only a single three-dimensional (3D) structure while the number of sterically allowed 3D conformations of the snake is unknown. Here, we demonstrate how to represent a Magic Snake as a one-dimensional (1D) sequence that can be converted into a 3D structure. We then provide two strategies for designing Magic Snakes to have specified 3D structures. The first enables the folding of a Magic Snake onto any 3D space curve. The second introduces the idea of “embedding” to expand an existing Magic Snake into a longer, more complex, self-similar Magic Snake. Collectively, these ideas allow us to rapidly list and then compute all possible 3D conformations of a Magic Snake. They also form the basis for multidimensional, multi-scale representations of chain-like structures and other slender bodies including certain types of robots, polymers, proteins, and DNA.


2021 ◽  
Vol 38 (6) ◽  
pp. 1719-1726
Author(s):  
Tanbo Zhu ◽  
Die Wang ◽  
Yuhua Li ◽  
Wenjie Dong

In real training, the training conditions are often undesirable, and the use of equipment is severely limited. These problems can be solved by virtual practical training, which breaks the limit of space, lowers the training cost, while ensuring the training quality. However, the existing methods work poorly in image reconstruction, because they fail to consider the fact that the environmental perception of actual scene is strongly regular by nature. Therefore, this paper investigates the three-dimensional (3D) image reconstruction for virtual talent training scene. Specifically, a fusion network model was deigned, and the deep-seated correlation between target detection and semantic segmentation was discussed for images shot in two-dimensional (2D) scenes, in order to enhance the extraction effect of image features. Next, the vertical and horizontal parallaxes of the scene were solved, and the depth-based virtual talent training scene was reconstructed three dimensionally, based on the continuity of scene depth. Finally, the proposed algorithm was proved effective through experiments.


2016 ◽  
Vol 28 (4) ◽  
pp. 523-532 ◽  
Author(s):  
Akihiro Obara ◽  
◽  
Xu Yang ◽  
Hiromasa Oku ◽  

[abstFig src='/00280004/10.jpg' width='300' text='Concept of SLF generated by two projectors' ] Triangulation is commonly used to restore 3D scenes, but its frame of less than 30 fps due to time-consuming stereo-matching is an obstacle for applications requiring that results be fed back in real time. The structured light field (SLF) our group proposed previously reduced the amount of calculation in 3D restoration, realizing high-speed measurement. Specifically, the SLF estimates depth information by projecting information on distance directly to a target. The SLF synthesized as reported, however, presents difficulty in extracting image features for depth estimation. In this paper, we propose synthesizing the SLF using two projectors with a certain layout. Our proposed SLF’s basic properties are based on an optical model. We evaluated the SLF’s performance using a prototype we developed and applied to the high-speed depth estimation of a target moving randomly at a speed of 1000 Hz. We demonstrate the target’s high-speed tracking based on high-speed depth information feedback.


2006 ◽  
Vol 09 (01n02) ◽  
pp. 99-120 ◽  
Author(s):  
PASCAL BRUNIAUX ◽  
CYRIL NGO NGOC

This study aims to develop a realistic mathematical model of fabric. In contrast to other studies on fabric modeling as a deformable surface, the model described in this article takes into account the geometry of the object. Moreover, it integrates the nonlinear phenomena of the dynamic behavior of material. As input parameters, the weaving data that define the 3D structure of the object and the mechanical properties of the yarn that express its dynamics are used. Thus, the fabric model is composed of a geometrical model of fabric (structure) on which a model of yarn (material characterization) is added. This hypothesis may be reasonable since a fabric shows the result of a three-dimensional assembly of yarns judiciously disposed. Since these yarns interact dynamically: the main difficulty consists of defining the yarn model. In our case, it is composed of various nonlinear functions representing the dynamic behavior of yarn. In order to characterize the flexibility of material, the weight, the elasticity and any other mechanical characteristics defining the relation between the strain and the stretching out of the shape should be taken into account. Firstly, several works dealing with realistic mathematical models of fabric are described. A taxonomic classification is achieved in order to position our study (in comparison to the scientific community). Secondly, the model of the fabric is described. A geometrical model of the object is presented. It allows one to dimension the object in a 3D space and then to position it at its initial state. Subsequently, a nodal model of yarns is described, step by step, in order to demonstrate the separability of the various dynamic behaviors. These nodal links make it simple to integrate the proposed model in the global geometrical model. Thus, the methods of numerical resolution used to simulate the complete model of the fabric are exposed. One method is selected and used in order to improve the performances of the fabric simulator and to obtain better stability. Several simulations illustrate the quality of the results obtained.


2021 ◽  
Vol 10 (11) ◽  
pp. 739
Author(s):  
Fan Yang ◽  
Mingliang Che ◽  
Xinkai Zuo ◽  
Lin Li ◽  
Jiyi Zhang ◽  
...  

Room segmentation is a basic task for the semantic enrichment of point clouds. Recent studies have mainly projected single-floor point clouds to binary images to realize two-dimensional room segmentation. However, these methods have difficulty solving semantic segmentation problems in complex 3D indoor environments, including cross-floor spaces and rooms inside rooms; this is the bottleneck of indoor 3D modeling for non-Manhattan worlds. To make full use of the abundant geometric and spatial structure information in 3D space, a novel 3D room segmentation method that realizes room segmentation directly in 3D space is proposed in this study. The method utilizes volumetric representation based on a VDB data structure and packs an indoor space with a set of compact spheres to form rooms as separated connected components. Experimental results on different types of indoor point cloud datasets demonstrate the efficiency of the proposed method.


Acta Numerica ◽  
2017 ◽  
Vol 26 ◽  
pp. 305-364 ◽  
Author(s):  
Onur Özyeşil ◽  
Vladislav Voroninski ◽  
Ronen Basri ◽  
Amit Singer

The structure from motion (SfM) problem in computer vision is to recover the three-dimensional (3D) structure of a stationary scene from a set of projective measurements, represented as a collection of two-dimensional (2D) images, via estimation of motion of the cameras corresponding to these images. In essence, SfM involves the three main stages of (i) extracting features in images (e.g. points of interest, lines,etc.) and matching these features between images, (ii) camera motion estimation (e.g. using relative pairwise camera positions estimated from the extracted features), and (iii) recovery of the 3D structure using the estimated motion and features (e.g. by minimizing the so-calledreprojection error). This survey mainly focuses on relatively recent developments in the literature pertaining to stages (ii) and (iii). More specifically, after touching upon the early factorization-based techniques for motion and structure estimation, we provide a detailed account of some of the recent cameralocationestimation methods in the literature, followed by discussion of notable techniques for 3D structure recovery. We also cover the basics of thesimultaneous localization and mapping(SLAM) problem, which can be viewed as a specific case of the SfM problem. Further, our survey includes a review of the fundamentals of feature extraction and matching (i.e. stage (i) above), various recent methods for handling ambiguities in 3D scenes, SfM techniques involving relatively uncommon camera models and image features, and popular sources of data and SfM software.


Author(s):  
J. Frank ◽  
B. F. McEwen ◽  
M. Radermacher ◽  
C. L. Rieder

The tomographic reconstruction from multiple projections of cellular components, within a thick section, offers a way of visualizing and quantifying their three-dimensional (3D) structure. However, asymmetric objects require as many views from the widest tilt range as possible; otherwise the reconstruction may be uninterpretable. Even if not for geometric obstructions, the increasing pathway of electrons, as the tilt angle is increased, poses the ultimate upper limitation to the projection range. With the maximum tilt angle being fixed, the only way to improve the faithfulness of the reconstruction is by changing the mode of the tilting from single-axis to conical; a point within the object projected with a tilt angle of 60° and a full 360° azimuthal range is then reconstructed as a slightly elliptic (axis ratio 1.2 : 1) sphere.


Sign in / Sign up

Export Citation Format

Share Document