scholarly journals Hand Depth Image Denoising and Superresolution via Noise-Aware Dictionaries

2016 ◽  
Vol 2016 ◽  
pp. 1-12
Author(s):  
Huayang Li ◽  
Dehui Kong ◽  
Shaofan Wang ◽  
Baocai Yin

This paper proposes a two-stage method for hand depth image denoising and superresolution, using bilateral filters and learned dictionaries via noise-aware orthogonal matching pursuit (NAOMP) based K-SVD. The bilateral filtering phase recovers singular points and removes artifacts on silhouettes by averaging depth data using neighborhood pixels on which both depth difference and RGB similarity restrictions are imposed. The dictionary learning phase uses NAOMP for training dictionaries which separates faithful depth from noisy data. Compared with traditional OMP, NAOMP adds a residual reduction step which effectively weakens the noise term within the residual during the residual decomposition in terms of atoms. Experimental results demonstrate that the bilateral phase and the NAOMP-based learning dictionaries phase corporately denoise both virtual and real depth images effectively.

2019 ◽  
Vol 1 (3) ◽  
pp. 883-903 ◽  
Author(s):  
Daulet Baimukashev ◽  
Alikhan Zhilisbayev ◽  
Askat Kuzdeuov ◽  
Artemiy Oleinikov ◽  
Denis Fadeyev ◽  
...  

Recognizing objects and estimating their poses have a wide range of application in robotics. For instance, to grasp objects, robots need the position and orientation of objects in 3D. The task becomes challenging in a cluttered environment with different types of objects. A popular approach to tackle this problem is to utilize a deep neural network for object recognition. However, deep learning-based object detection in cluttered environments requires a substantial amount of data. Collection of these data requires time and extensive human labor for manual labeling. In this study, our objective was the development and validation of a deep object recognition framework using a synthetic depth image dataset. We synthetically generated a depth image dataset of 22 objects randomly placed in a 0.5 m × 0.5 m × 0.1 m box, and automatically labeled all objects with an occlusion rate below 70%. Faster Region Convolutional Neural Network (R-CNN) architecture was adopted for training using a dataset of 800,000 synthetic depth images, and its performance was tested on a real-world depth image dataset consisting of 2000 samples. Deep object recognizer has 40.96% detection accuracy on the real depth images and 93.5% on the synthetic depth images. Training the deep learning model with noise-added synthetic images improves the recognition accuracy for real images to 46.3%. The object detection framework can be trained on synthetically generated depth data, and then employed for object recognition on the real depth data in a cluttered environment. Synthetic depth data-based deep object detection has the potential to substantially decrease the time and human effort required for the extensive data collection and labeling.


Author(s):  
Alaa Abid Muslam Abid Ali ◽  
Mohammed Iqbal Dohan ◽  
Saif Khalid Musluh

One of the very efficient and resource conservative image processing methodology is with the help of bilateral filters. This technique filters the image without the help of edge smoothing but it does employs spatial averaging in a non-linear way. The filtering technique discussed above is very much dependent on the parameters of its filters. A very slight change in filter parameter values effects the outputs and results in a most drastic manner. In this paper, the author has worked on two contributions. In the applications concerning image denoising, the author has contributed in study of the parameter selection of bilateral filters which are optimal in nature. The contribution number two is about extending the present work i.e. extension of the filters which are bilateral in nature. In this process, the bilateral filtering of images is applied to the lower frequency sub-bands which is also known as approximation sub-band. This sub-band is obtained by using the wavelet transformations. Hence, a new framework for image denoising will be created which will be combination of multiresolution bilateral filtering and wavelets transformation techniques. As a matter of fact, this combination is efficient in contradicting noise from an image.


Author(s):  
Alaa Abid Muslam Abid Ali ◽  
Mohammed Iqbal Dohan ◽  
Saif Khalid Musluh

One of the very efficient and resource conservative image processing methodology is with the help of bilateral filters. This technique filters the image without the help of edge smoothing but it does employs spatial averaging in a non-linear way. The filtering technique discussed above is very much dependent on the parameters of its filters. A very slight change in filter parameter values effects the outputs and results in a most drastic manner. In this paper, the author has worked on two contributions. In the applications concerning image denoising, the author has contributed in study of the parameter selection of bilateral filters which are optimal in nature. The contribution number two is about extending the present work i.e. extension of the filters which are bilateral in nature. In this process, the bilateral filtering of images is applied to the lower frequency sub-bands which is also known as approximation sub-band. This sub-band is obtained by using the wavelet transformations. Hence, a new framework for image denoising will be created which will be combination of multiresolution bilateral filtering and wavelets transformation techniques. As a matter of fact, this combination is efficient in contradicting noise from an image.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Seyed Muhammad Hossein Mousavi ◽  
S. Younes Mirinezhad

AbstractThis study presents a new color-depth based face database gathered from different genders and age ranges from Iranian subjects. Using suitable databases, it is possible to validate and assess available methods in different research fields. This database has application in different fields such as face recognition, age estimation and Facial Expression Recognition and Facial Micro Expressions Recognition. Image databases based on their size and resolution are mostly large. Color images usually consist of three channels namely Red, Green and Blue. But in the last decade, another aspect of image type has emerged, named “depth image”. Depth images are used in calculating range and distance between objects and the sensor. Depending on the depth sensor technology, it is possible to acquire range data differently. Kinect sensor version 2 is capable of acquiring color and depth data simultaneously. Facial expression recognition is an important field in image processing, which has multiple uses from animation to psychology. Currently, there is a few numbers of color-depth (RGB-D) facial micro expressions recognition databases existing. With adding depth data to color data, the accuracy of final recognition will be increased. Due to the shortage of color-depth based facial expression databases and some weakness in available ones, a new and almost perfect RGB-D face database is presented in this paper, covering Middle-Eastern face type. In the validation section, the database will be compared with some famous benchmark face databases. For evaluation, Histogram Oriented Gradients features are extracted, and classification algorithms such as Support Vector Machine, Multi-Layer Neural Network and a deep learning method, called Convolutional Neural Network or are employed. The results are so promising.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1299
Author(s):  
Honglin Yuan ◽  
Tim Hoogenkamp ◽  
Remco C. Veltkamp

Deep learning has achieved great success on robotic vision tasks. However, when compared with other vision-based tasks, it is difficult to collect a representative and sufficiently large training set for six-dimensional (6D) object pose estimation, due to the inherent difficulty of data collection. In this paper, we propose the RobotP dataset consisting of commonly used objects for benchmarking in 6D object pose estimation. To create the dataset, we apply a 3D reconstruction pipeline to produce high-quality depth images, ground truth poses, and 3D models for well-selected objects. Subsequently, based on the generated data, we produce object segmentation masks and two-dimensional (2D) bounding boxes automatically. To further enrich the data, we synthesize a large number of photo-realistic color-and-depth image pairs with ground truth 6D poses. Our dataset is freely distributed to research groups by the Shape Retrieval Challenge benchmark on 6D pose estimation. Based on our benchmark, different learning-based approaches are trained and tested by the unified dataset. The evaluation results indicate that there is considerable room for improvement in 6D object pose estimation, particularly for objects with dark colors, and photo-realistic images are helpful in increasing the performance of pose estimation algorithms.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1356
Author(s):  
Linda Christin Büker ◽  
Finnja Zuber ◽  
Andreas Hein ◽  
Sebastian Fudickar

With approaches for the detection of joint positions in color images such as HRNet and OpenPose being available, consideration of corresponding approaches for depth images is limited even though depth images have several advantages over color images like robustness to light variation or color- and texture invariance. Correspondingly, we introduce High- Resolution Depth Net (HRDepthNet)—a machine learning driven approach to detect human joints (body, head, and upper and lower extremities) in purely depth images. HRDepthNet retrains the original HRNet for depth images. Therefore, a dataset is created holding depth (and RGB) images recorded with subjects conducting the timed up and go test—an established geriatric assessment. The images were manually annotated RGB images. The training and evaluation were conducted with this dataset. For accuracy evaluation, detection of body joints was evaluated via COCO’s evaluation metrics and indicated that the resulting depth image-based model achieved better results than the HRNet trained and applied on corresponding RGB images. An additional evaluation of the position errors showed a median deviation of 1.619 cm (x-axis), 2.342 cm (y-axis) and 2.4 cm (z-axis).


Author(s):  
Maryam Abedini ◽  
Horriyeh Haddad ◽  
Marzieh Faridi Masouleh ◽  
Asadollah Shahbahrami

This study proposes an image denoising algorithm based on sparse representation and Principal Component Analysis (PCA). The proposed algorithm includes the following steps. First, the noisy image is divided into overlapped [Formula: see text] blocks. Second, the discrete cosine transform is applied as a dictionary for the sparse representation of the vectors created by the overlapped blocks. To calculate the sparse vector, the orthogonal matching pursuit algorithm is used. Then, the dictionary is updated by means of the PCA algorithm to achieve the sparsest representation of vectors. Since the signal energy, unlike the noise energy, is concentrated on a small dataset by transforming into the PCA domain, the signal and noise can be well distinguished. The proposed algorithm was implemented in a MATLAB environment and its performance was evaluated on some standard grayscale images under different levels of standard deviations of white Gaussian noise by means of peak signal-to-noise ratio, structural similarity indexes, and visual effects. The experimental results demonstrate that the proposed denoising algorithm achieves significant improvement compared to dual-tree complex discrete wavelet transform and K-singular value decomposition image denoising methods. It also obtains competitive results with the block-matching and 3D filtering method, which is the current state-of-the-art for image denoising.


Mathematics ◽  
2021 ◽  
Vol 9 (21) ◽  
pp. 2815
Author(s):  
Shih-Hung Yang ◽  
Yao-Mao Cheng ◽  
Jyun-We Huang ◽  
Yon-Ping Chen

Automatic fingerspelling recognition tackles the communication barrier between deaf and hearing individuals. However, the accuracy of fingerspelling recognition is reduced by high intra-class variability and low inter-class variability. In the existing methods, regular convolutional kernels, which have limited receptive fields (RFs) and often cannot detect subtle discriminative details, are applied to learn features. In this study, we propose a receptive field-aware network with finger attention (RFaNet) that highlights the finger regions and builds inter-finger relations. To highlight the discriminative details of these fingers, RFaNet reweights the low-level features of the hand depth image with those of the non-forearm image and improves finger localization, even when the wrist is occluded. RFaNet captures neighboring and inter-region dependencies between fingers in high-level features. An atrous convolution procedure enlarges the RFs at multiple scales and a non-local operation computes the interactions between multi-scale feature maps, thereby facilitating the building of inter-finger relations. Thus, the representation of a sign is invariant to viewpoint changes, which are primarily responsible for intra-class variability. On an American Sign Language fingerspelling dataset, RFaNet achieved 1.77% higher classification accuracy than state-of-the-art methods. RFaNet achieved effective transfer learning when the number of labeled depth images was insufficient. The fingerspelling representation of a depth image can be effectively transferred from large- to small-scale datasets via highlighting the finger regions and building inter-finger relations, thereby reducing the requirement for expensive fingerspelling annotations.


Sign in / Sign up

Export Citation Format

Share Document