OmniSCV: An Omnidirectional Synthetic Image Generator for Computer Vision

Bruno Berenguel-Baeta; Jesus Bermudez-Cameo; Jose J. Guerrero

doi:10.3390/s20072066

OmniSCV: An Omnidirectional Synthetic Image Generator for Computer Vision

Sensors ◽

10.3390/s20072066 ◽

2020 ◽

Vol 20 (7) ◽

pp. 2066 ◽

Cited By ~ 1

Author(s):

Bruno Berenguel-Baeta ◽

Jesus Bermudez-Cameo ◽

Jose J. Guerrero

Keyword(s):

Computer Vision ◽

Ground Truth ◽

Synthetic Image ◽

Depth Information ◽

Consumer Society ◽

Central Projection ◽

Wide Field ◽

3D Vision ◽

Catadioptric Systems ◽

Omnidirectional Images

Omnidirectional and 360° images are becoming widespread in industry and in consumer society, causing omnidirectional computer vision to gain attention. Their wide field of view allows the gathering of a great amount of information about the environment from only an image. However, the distortion of these images requires the development of specific algorithms for their treatment and interpretation. Moreover, a high number of images is essential for the correct training of computer vision algorithms based on learning. In this paper, we present a tool for generating datasets of omnidirectional images with semantic and depth information. These images are synthesized from a set of captures that are acquired in a realistic virtual environment for Unreal Engine 4 through an interface plugin. We gather a variety of well-known projection models such as equirectangular and cylindrical panoramas, different fish-eye lenses, catadioptric systems, and empiric models. Furthermore, we include in our tool photorealistic non-central-projection systems as non-central panoramas and non-central catadioptric systems. As far as we know, this is the first reported tool for generating photorealistic non-central images in the literature. Moreover, since the omnidirectional images are made virtually, we provide pixel-wise information about semantics and depth as well as perfect knowledge of the calibration parameters of the cameras. This allows the creation of ground-truth information with pixel precision for training learning algorithms and testing 3D vision approaches. To validate the proposed tool, different computer vision algorithms are tested as line extractions from dioptric and catadioptric central images, 3D Layout recovery and SLAM using equirectangular panoramas, and 3D reconstruction from non-central panoramas.

Download Full-text

Geometric property-based convolutional neural network for indoor object detection

International Journal of Advanced Robotic Systems ◽

10.1177/1729881421993323 ◽

2021 ◽

Vol 18 (1) ◽

pp. 172988142199332

Author(s):

Xintao Ding ◽

Boquan Li ◽

Jinbao Wang

Keyword(s):

Neural Network ◽

Object Detection ◽

Convolutional Neural Network ◽

Geometric Property ◽

Ground Truth ◽

Geometric Constraints ◽

Depth Information ◽

Training Set ◽

Object Knowledge ◽

The Mean

Indoor object detection is a very demanding and important task for robot applications. Object knowledge, such as two-dimensional (2D) shape and depth information, may be helpful for detection. In this article, we focus on region-based convolutional neural network (CNN) detector and propose a geometric property-based Faster R-CNN method (GP-Faster) for indoor object detection. GP-Faster incorporates geometric property in Faster R-CNN to improve the detection performance. In detail, we first use mesh grids that are the intersections of direct and inverse proportion functions to generate appropriate anchors for indoor objects. After the anchors are regressed to the regions of interest produced by a region proposal network (RPN-RoIs), we then use 2D geometric constraints to refine the RPN-RoIs, in which the 2D constraint of every classification is a convex hull region enclosing the width and height coordinates of the ground-truth boxes on the training set. Comparison experiments are implemented on two indoor datasets SUN2012 and NYUv2. Since the depth information is available in NYUv2, we involve depth constraints in GP-Faster and propose 3D geometric property-based Faster R-CNN (DGP-Faster) on NYUv2. The experimental results show that both GP-Faster and DGP-Faster increase the performance of the mean average precision.

Download Full-text

Monocular Depth Estimation with Joint Attention Feature Distillation and Wavelet-Based Loss Function

Sensors ◽

10.3390/s21010054 ◽

2020 ◽

Vol 21 (1) ◽

pp. 54

Author(s):

Peng Liu ◽

Zonghua Zhang ◽

Zhaozong Meng ◽

Nan Gao

Keyword(s):

Joint Attention ◽

Loss Function ◽

Depth Estimation ◽

Depth Information ◽

3D Vision ◽

Network Training ◽

Crucial Component ◽

Benchmark Datasets ◽

Ill Posed ◽

Monocular Depth

Depth estimation is a crucial component in many 3D vision applications. Monocular depth estimation is gaining increasing interest due to flexible use and extremely low system requirements, but inherently ill-posed and ambiguous characteristics still cause unsatisfactory estimation results. This paper proposes a new deep convolutional neural network for monocular depth estimation. The network applies joint attention feature distillation and wavelet-based loss function to recover the depth information of a scene. Two improvements were achieved, compared with previous methods. First, we combined feature distillation and joint attention mechanisms to boost feature modulation discrimination. The network extracts hierarchical features using a progressive feature distillation and refinement strategy and aggregates features using a joint attention operation. Second, we adopted a wavelet-based loss function for network training, which improves loss function effectiveness by obtaining more structural details. The experimental results on challenging indoor and outdoor benchmark datasets verified the proposed method’s superiority compared with current state-of-the-art methods.

Download Full-text

Projection Screen with Wide-FOV and Motion Parallax Display for Teleoperation of Construction Machinery

Journal of Robotics and Mechatronics ◽

10.20965/jrm.2021.p0604 ◽

2021 ◽

Vol 33 (3) ◽

pp. 604-609

Author(s):

Daisuke Kondo ◽

Keyword(s):

Motion Parallax ◽

Field Of View ◽

Depth Information ◽

Wide Field ◽

Viewing Angle ◽

Cooperative Research ◽

Construction Machinery ◽

Cooperative Research Center ◽

Projection Screen ◽

Wide Field Of View

The teleoperation of construction machinery has been introduced to mines and disaster sites. However, the work efficiency of teleoperations is lower than that of onboard operations owing to limitations in the viewing angle and insufficient depth information. To solve these problems and realize effective teleoperations, the Komatsu MIRAI Construction Equipment Cooperative Research Center is developing the next-generation teleoperation cockpit. In this study, we develop a display for teleoperations with a wide field-of-view, a portable projection screen, and a system that reproduces motion parallax, which is suitable for depth perception in the operating range of construction machinery.

Download Full-text

Recent Advances in Saliency Estimation for Omnidirectional Images, Image Groups, and Video Sequences

Applied Sciences ◽

10.3390/app10155143 ◽

2020 ◽

Vol 10 (15) ◽

pp. 5143

Author(s):

Marco Buzzelli

Keyword(s):

Evaluation Criteria ◽

Visual Saliency ◽

Ground Truth ◽

Video Sequences ◽

Domain Specific ◽

Saliency Maps ◽

Perceptual Property ◽

Omnidirectional Images ◽

Video Saliency ◽

Selection Of

We present a review of methods for automatic estimation of visual saliency: the perceptual property that makes specific elements in a scene stand out and grab the attention of the viewer. We focus on domains that are especially recent and relevant, as they make saliency estimation particularly useful and/or effective: omnidirectional images, image groups for co-saliency, and video sequences. For each domain, we perform a selection of recent methods, we highlight their commonalities and differences, and describe their unique approaches. We also report and analyze the datasets involved in the development of such methods, in order to reveal additional peculiarities of each domain, such as the representation used for the ground truth saliency information (scanpaths, saliency maps, or salient object regions). We define domain-specific evaluation measures, and provide quantitative comparisons on the basis of common datasets and evaluation criteria, highlighting the different impact of existing approaches on each domain. We conclude by synthesizing the emerging directions for research in the specialized literature, which include novel representations for omnidirectional images, inter- and intra- image saliency decomposition for co-saliency, and saliency shift for video saliency estimation.

Download Full-text

Automated single particle detection and tracking for large microscopy datasets

Royal Society Open Science ◽

10.1098/rsos.160225 ◽

2016 ◽

Vol 3 (5) ◽

pp. 160225 ◽

Cited By ~ 11

Author(s):

Rhodri S. Wilson ◽

Lei Yang ◽

Alison Dun ◽

Annya M. Smyth ◽

Rory R. Duncan ◽

...

Keyword(s):

Single Molecule ◽

Single Particle ◽

Image Data ◽

Ground Truth ◽

Detection Algorithm ◽

Large Datasets ◽

Single Particle Tracking ◽

Synthetic Image ◽

Particle Detection ◽

Very Large Datasets

Recent advances in optical microscopy have enabled the acquisition of very large datasets from living cells with unprecedented spatial and temporal resolutions. Our ability to process these datasets now plays an essential role in order to understand many biological processes. In this paper, we present an automated particle detection algorithm capable of operating in low signal-to-noise fluorescence microscopy environments and handling large datasets. When combined with our particle linking framework, it can provide hitherto intractable quantitative measurements describing the dynamics of large cohorts of cellular components from organelles to single molecules. We begin with validating the performance of our method on synthetic image data, and then extend the validation to include experiment images with ground truth. Finally, we apply the algorithm to two single-particle-tracking photo-activated localization microscopy biological datasets, acquired from living primary cells with very high temporal rates. Our analysis of the dynamics of very large cohorts of 10 000 s of membrane-associated protein molecules show that they behave as if caged in nanodomains. We show that the robustness and efficiency of our method provides a tool for the examination of single-molecule behaviour with unprecedented spatial detail and high acquisition rates.

Download Full-text

Deep Learning Model for Predicting Preference of Space by Estimating the Depth Information of Space using Omnidirectional Images

Proceedings of The City Planning Institute of Japan, Kansai Branch ◽

10.11361/cpijkansai.17.0_9 ◽

2019 ◽

Vol 17 (0) ◽

pp. 9-12

Author(s):

Hina Kinugawa ◽

Atsushi Takizawa

Keyword(s):

Deep Learning ◽

Learning Model ◽

Depth Information ◽

Omnidirectional Images ◽

Deep Learning Model

Download Full-text

Proceedings of the International Workshop on Video and Image Ground Truth in Computer Vision Applications

10.1145/2501105 ◽

2013 ◽

Cited By ~ 1

Keyword(s):

Computer Vision ◽

International Workshop ◽

Ground Truth ◽

Computer Vision Applications

Download Full-text

A computer vision approach to improving cattle digestive health by the monitoring of faecal samples

Scientific Reports ◽

10.1038/s41598-020-74511-0 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Gary A. Atkinson ◽

Lyndon N. Smith ◽

Melvyn L. Smith ◽

Christopher K. Reynolds ◽

David J. Humphries ◽

...

Keyword(s):

Computer Vision ◽

Near Infrared ◽

Feeding Strategy ◽

Ground Truth ◽

Well Being ◽

Economic Losses ◽

Detection Rates ◽

Imaging Device ◽

Faecal Samples ◽

Digestive Health

Abstract The digestive health of cows is one of the primary factors that determine their well-being and productivity. Under- and over-feeding are both commonplace in the beef and dairy industry; leading to welfare issues, negative environmental impacts, and economic losses. Unfortunately, digestive health is difficult for farmers to routinely monitor in large farms due to many factors including the need to transport faecal samples to a laboratory for compositional analysis. This paper describes a novel means for monitoring digestive health via a low-cost and easy to use imaging device based on computer vision. The method involves the rapid capture of multiple visible and near-infrared images of faecal samples. A novel three-dimensional analysis algorithm is then applied to objectively score the condition of the sample based on its geometrical features. While there is no universal ground truth for comparison of results, the order of scores matched a qualitative human prediction very closely. The algorithm is also able to detect the presence of undigested fibres and corn kernels using a deep learning approach. Detection rates for corn and fibre in image regions were of the order 90%. These results indicate the potential to develop this system for on-farm, real time monitoring of the digestive health of individual animals, allowing early intervention to effectively adjust feeding strategy.

Download Full-text

An Efficient and Robust Multi-Object Recognition and Tracking Algorithm using Mask Region based Convolution Neural Network (R-CNN)

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.i7569.078919 ◽

2019 ◽

Vol 8 (9) ◽

pp. 607-613

Keyword(s):

Neural Network ◽

Computer Vision ◽

Object Recognition ◽

Object Tracking ◽

Ground Truth ◽

Benchmark Dataset ◽

Tracking Algorithm ◽

Multiple Objects ◽

High Quality ◽

Extensive Simulation

Presently, Multi-Object tracking (MOT) is mainly applied for predicting the positions of many predefined objects across many successive frames with the provided ground truth position of the target in the first frame. The area of MOT gains more interest in the area of computer vision because of its applicability in various fields. Many works have been presented in recent years that intended to design a MOT algorithm with maximum accuracy and robustness. In this paper, we introduce an efficient as well as robust MOT algorithm using Mask R-CNN. The usage of Mask R-CNN effectively identifies the objects present in the image while concurrently creating a high-quality segmentation mask for every instance. The presented MOT algorithm is validated using three benchmark dataset and the results are extensive simulation. The presented tracking algorithm shows its efficiency to track multiple objects precisely

Download Full-text

Deep-Learning-Based Cerebral Artery Semantic Segmentation in Neurosurgical Operating Microscope Vision Using Indocyanine Green Fluorescence Videoangiography

Frontiers in Neurorobotics ◽

10.3389/fnbot.2021.735177 ◽

2022 ◽

Vol 15 ◽

Author(s):

Min-seok Kim ◽

Joon Hyuk Cha ◽

Seonhwa Lee ◽

Lihong Han ◽

Wonhyoung Park ◽

...

Keyword(s):

Computer Vision ◽

Deep Learning ◽

Indocyanine Green ◽

Blood Vessels ◽

Cerebral Artery ◽

Ground Truth ◽

Neural Network Models ◽

Indocyanine Green Fluorescence ◽

Surgical Assistance ◽

Ground Truth Generation

There have been few anatomical structure segmentation studies using deep learning. Numbers of training and ground truth images applied were small and the accuracies of which were low or inconsistent. For a surgical video anatomy analysis, various obstacles, including a variable fast-changing view, large deformations, occlusions, low illumination, and inadequate focus occur. In addition, it is difficult and costly to obtain a large and accurate dataset on operational video anatomical structures, including arteries. In this study, we investigated cerebral artery segmentation using an automatic ground-truth generation method. Indocyanine green (ICG) fluorescence intraoperative cerebral videoangiography was used to create a ground-truth dataset mainly for cerebral arteries and partly for cerebral blood vessels, including veins. Four different neural network models were trained using the dataset and compared. Before augmentation, 35,975 training images and 11,266 validation images were used. After augmentation, 260,499 training and 90,129 validation images were used. A Dice score of 79% for cerebral artery segmentation was achieved using the DeepLabv3+ model trained using an automatically generated dataset. Strict validation in different patient groups was conducted. Arteries were also discerned from the veins using the ICG videoangiography phase. We achieved fair accuracy, which demonstrated the appropriateness of the methodology. This study proved the feasibility of operating field view of the cerebral artery segmentation using deep learning, and the effectiveness of the automatic blood vessel ground truth generation method using ICG fluorescence videoangiography. Using this method, computer vision can discern blood vessels and arteries from veins in a neurosurgical microscope field of view. Thus, this technique is essential for neurosurgical field vessel anatomy-based navigation. In addition, surgical assistance, safety, and autonomous surgery neurorobotics that can detect or manipulate cerebral vessels would require computer vision to identify blood vessels and arteries.

Download Full-text