scholarly journals Robust 3D Hand Detection from a Single RGB-D Image in Unconstrained Environments

Sensors ◽  
2020 ◽  
Vol 20 (21) ◽  
pp. 6360
Author(s):  
Chi Xu ◽  
Jun Zhou ◽  
Wendi Cai ◽  
Yunkai Jiang ◽  
Yongbo Li ◽  
...  

Three-dimensional hand detection from a single RGB-D image is an important technology which supports many useful applications. Practically, it is challenging to robustly detect human hands in unconstrained environments because the RGB-D channels can be affected by many uncontrollable factors, such as light changes. To tackle this problem, we propose a 3D hand detection approach which improves the robustness and accuracy by adaptively fusing the complementary features extracted from the RGB-D channels. Using the fused RGB-D feature, the 2D bounding boxes of hands are detected first, and then the 3D locations along the z-axis are estimated through a cascaded network. Furthermore, we represent a challenging RGB-D hand detection dataset collected in unconstrained environments. Different from previous works which primarily rely on either the RGB or D channel, we adaptively fuse the RGB-D channels for hand detection. Specifically, evaluation results show that the D-channel is crucial for hand detection in unconstrained environments. Our RGB-D fusion-based approach significantly improves the hand detection accuracy from 69.1 to 74.1 comparing to one of the most state-of-the-art RGB-based hand detectors. The existing RGB- or D-based methods are unstable in unseen lighting conditions: in dark conditions, the accuracy of the RGB-based method significantly drops to 48.9, and in back-light conditions, the accuracy of the D-based method dramatically drops to 28.3. Compared with these methods, our RGB-D fusion based approach is much more robust without accuracy degrading, and our detection results are 62.5 and 65.9, respectively, in these two extreme lighting conditions for accuracy.

2021 ◽  
Author(s):  
Saihua Cai ◽  
Jinfu Chen ◽  
Haibo Chen ◽  
Chi Zhang ◽  
Qian Li ◽  
...  

Abstract Existing association-based outlier detection approaches were proposed to seek for potential outliers from huge full set of uncertain data streams ($UDS$), but could not effectively process the small scale of $UDS$ that satisfies preset constraints; thus, they were time consuming. To solve this problem, this paper proposes a novel minimal rare pattern-based outlier detection approach, namely Constrained Minimal Rare Pattern-based Outlier Detection (CMRP-OD), to discover outliers from small sets of $UDS$ that satisfy the user-preset succinct or convertible monotonic constraints. First, two concepts of ‘maximal probability’ and ‘support cap’ are proposed to compress the scale of extensible patterns, and then the matrix is designed to store the information of each valid pattern to reduce the scanning times of $UDS$, thus decreasing the time consumption. Second, more factors that can influence the determination of outlier are considered in the design of deviation indices, thus increasing the detection accuracy. Extensive experiments show that compared with the state-of-the-art approaches, CMRP-OD approach has at least 10% improvement on detection accuracy, and its time cost is also almost reduced half.


2020 ◽  
Vol 34 (07) ◽  
pp. 11474-11481 ◽  
Author(s):  
Minghui Liao ◽  
Zhaoyi Wan ◽  
Cong Yao ◽  
Kai Chen ◽  
Xiang Bai

Recently, segmentation-based methods are quite popular in scene text detection, as the segmentation results can more accurately describe scene text of various shapes such as curve text. However, the post-processing of binarization is essential for segmentation-based detection, which converts probability maps produced by a segmentation method into bounding boxes/regions of text. In this paper, we propose a module named Differentiable Binarization (DB), which can perform the binarization process in a segmentation network. Optimized along with a DB module, a segmentation network can adaptively set the thresholds for binarization, which not only simplifies the post-processing but also enhances the performance of text detection. Based on a simple segmentation network, we validate the performance improvements of DB on five benchmark datasets, which consistently achieves state-of-the-art results, in terms of both detection accuracy and speed. In particular, with a light-weight backbone, the performance improvements by DB are significant so that we can look for an ideal tradeoff between detection accuracy and efficiency. Specifically, with a backbone of ResNet-18, our detector achieves an F-measure of 82.8, running at 62 FPS, on the MSRA-TD500 dataset. Code is available at: https://github.com/MhLiao/DB.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1213
Author(s):  
Xiaoke Shen ◽  
Ioannis Stamos

Instance segmentation and object detection are significant problems in the fields of computer vision and robotics. We address those problems by proposing a novel object segmentation and detection system. First, we detect 2D objects based on RGB, depth only, or RGB-D images. A 3D convolutional-based system, named Frustum VoxNet, is proposed. This system generates frustums from 2D detection results, proposes 3D candidate voxelized images for each frustum, and uses a 3D convolutional neural network (CNN) based on these candidates voxelized images to perform the 3D instance segmentation and object detection. Results on the SUN RGB-D dataset show that our RGB-D-based system’s 3D inference is much faster than state-of-the-art methods, without a significant loss of accuracy. At the same time, we can provide segmentation and detection results using depth only images, with accuracy comparable to RGB-D-based systems. This is important since our methods can also work well in low lighting conditions, or with sensors that do not acquire RGB images. Finally, the use of segmentation as part of our pipeline increases detection accuracy, while providing at the same time 3D instance segmentation.


Electronics ◽  
2020 ◽  
Vol 9 (4) ◽  
pp. 661
Author(s):  
Jin Yeong Bok ◽  
Kun Ha Suh ◽  
Eui Chul Lee

Face recognition is a representative biometric that can be easily used; however, spoofing attacks threaten the security of face biometric systems by generating fake faces. Thus, it is not advisable to only consider sophisticated spoofing cases, such as three-dimensional masks, because they require additional equipment, thereby increasing the implementation cost. To prevent easy face spoofing attacks through print and display, the two-dimensional (2D) image analysis method using existing face recognition systems is reasonable. Therefore, we proposed a new database called the “pattern recognition-face spoofing advancement database” that can be used to prevent such attacks based on 2D image analysis. To the best of our knowledge, this is the first face spoofing database that considers the changes in both the angle and distance. Therefore, it can be used to train various positional relationships between a face and camera. We conducted various experiments to verify the efficiency of this database. The spoofing detection accuracy of our database using ResNet-18 was found to be 96.75%. The experimental results for various scenarios demonstrated that the spoof detection performances were better for images with pinch angle, near distance images, and replay attacks than those for front images, far distance images, and print attacks, respectively. In the cross-database verification result, the performance when tested with other databases (DBs) after training with our DB was better than the opposite. The results of cross-device verification in terms of camera type showed negligible difference; thus, it was concluded that the type of image sensor does not affect the detection accuracy. Consequently, it was confirmed that the proposed DB that considers various distances, capture angles, lighting conditions, and backgrounds can be used as a training DB to detect spoofing attacks in general face recognition systems.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Wenjing Lu

This paper proposes a deep learning-based method for mitosis detection in breast histopathology images. A main problem in mitosis detection is that most of the datasets only have weak labels, i.e., only the coordinates indicating the center of the mitosis region. This makes most of the existing powerful object detection methods hardly be used in mitosis detection. Aiming at solving this problem, this paper firstly applies a CNN-based algorithm to pixelwisely segment the mitosis regions, based on which bounding boxes of mitosis are generated as strong labels. Based on the generated bounding boxes, an object detection network is trained to accomplish mitosis detection. Experimental results show that the proposed method is effective in detecting mitosis, and the accuracies outperform state-of-the-art literatures.


Author(s):  
Kanji Tanaka ◽  

Although image change detection (ICD) methods provide good detection accuracy for many scenarios, most existing methods rely on place-specific background modeling. The time/space cost for such place-specific models is prohibitive for large-scale scenarios, such as long-term robotic visual simultaneous localization and mapping (SLAM). Therefore, we propose a novel ICD framework that is specifically customized for long-term SLAM. This study is inspired by the multi-map-based SLAM framework, where multiple maps can perform mutual diagnosis and hence do not require any explicit background modeling/model. We extend this multi-map-based diagnosis approach to a more generic single-map-based object-level diagnosis framework (i.e., ICD), where the self-localization module of SLAM, which is the change object indicator, can be used in its original form. Furthermore, we consider map diagnosis on a state-of-the-art deep convolutional neural network (DCN)-based SLAM system (instead of on conventional bag-of-words or landmark-based systems), in which the blackbox nature of the DCN complicates the diagnosis problem. Additionally, we consider a three-dimensional point cloud (PC)-based (instead of typical monocular color image-based) SLAM and adopt a state-of-the-art scan context PC descriptor for map diagnosis for the first time.


2020 ◽  
Vol 2020 (4) ◽  
pp. 76-1-76-7
Author(s):  
Swaroop Shankar Prasad ◽  
Ofer Hadar ◽  
Ilia Polian

Image steganography can have legitimate uses, for example, augmenting an image with a watermark for copyright reasons, but can also be utilized for malicious purposes. We investigate the detection of malicious steganography using neural networkbased classification when images are transmitted through a noisy channel. Noise makes detection harder because the classifier must not only detect perturbations in the image but also decide whether they are due to the malicious steganographic modifications or due to natural noise. Our results show that reliable detection is possible even for state-of-the-art steganographic algorithms that insert stego bits not affecting an image’s visual quality. The detection accuracy is high (above 85%) if the payload, or the amount of the steganographic content in an image, exceeds a certain threshold. At the same time, noise critically affects the steganographic information being transmitted, both through desynchronization (destruction of information which bits of the image contain steganographic information) and by flipping these bits themselves. This will force the adversary to use a redundant encoding with a substantial number of error-correction bits for reliable transmission, making detection feasible even for small payloads.


2020 ◽  
Vol 17 (3) ◽  
pp. 172988142092566
Author(s):  
Dahan Wang ◽  
Sheng Luo ◽  
Li Zhao ◽  
Xiaoming Pan ◽  
Muchou Wang ◽  
...  

Fire is a fierce disaster, and smoke is the early signal of fire. Since such features as chrominance, texture, and shape of smoke are very special, a lot of methods based on these features have been developed. But these static characteristics vary widely, so there are some exceptions leading to low detection accuracy. On the other side, the motion of smoke is much more discriminating than the aforementioned features, so a time-domain neural network is proposed to extract its dynamic characteristics. This smoke recognition network has these advantages:(1) extract the spatiotemporal with the 3D filters which work on dynamic and static characteristics synchronously; (2) high accuracy, 87.31% samples being classified rightly, which is the state of the art even in a chaotic environments, and the fuzzy objects for other methods, such as haze, fog, and climbing cars, are distinguished distinctly; (3) high sensitiveness, smoke being detected averagely at the 23rd frame, which is also the state of the art, which is meaningful to alarm early fire as soon as possible; and (4) it is not been based on any hypothesis, which guarantee the method compatible. Finally, a new metric, the difference between the first frame in which smoke is detected and the first frame in which smoke happens, is proposed to compare the algorithms sensitivity in videos. The experiments confirm that the dynamic characteristics are more discriminating than the aforementioned static characteristics, and smoke recognition network is a good tool to extract compound feature.


Author(s):  
My Kieu ◽  
Andrew D. Bagdanov ◽  
Marco Bertini

Pedestrian detection is a canonical problem for safety and security applications, and it remains a challenging problem due to the highly variable lighting conditions in which pedestrians must be detected. This article investigates several domain adaptation approaches to adapt RGB-trained detectors to the thermal domain. Building on our earlier work on domain adaptation for privacy-preserving pedestrian detection, we conducted an extensive experimental evaluation comparing top-down and bottom-up domain adaptation and also propose two new bottom-up domain adaptation strategies. For top-down domain adaptation, we leverage a detector pre-trained on RGB imagery and efficiently adapt it to perform pedestrian detection in the thermal domain. Our bottom-up domain adaptation approaches include two steps: first, training an adapter segment corresponding to initial layers of the RGB-trained detector adapts to the new input distribution; then, we reconnect the adapter segment to the original RGB-trained detector for final adaptation with a top-down loss. To the best of our knowledge, our bottom-up domain adaptation approaches outperform the best-performing single-modality pedestrian detection results on KAIST and outperform the state of the art on FLIR.


2021 ◽  
Vol 11 (11) ◽  
pp. 4894
Author(s):  
Anna Scius-Bertrand ◽  
Michael Jungo ◽  
Beat Wolf ◽  
Andreas Fischer ◽  
Marc Bui

The current state of the art for automatic transcription of historical manuscripts is typically limited by the requirement of human-annotated learning samples, which are are necessary to train specific machine learning models for specific languages and scripts. Transcription alignment is a simpler task that aims to find a correspondence between text in the scanned image and its existing Unicode counterpart, a correspondence which can then be used as training data. The alignment task can be approached with heuristic methods dedicated to certain types of manuscripts, or with weakly trained systems reducing the required amount of annotations. In this article, we propose a novel learning-based alignment method based on fully convolutional object detection that does not require any human annotation at all. Instead, the object detection system is initially trained on synthetic printed pages using a font and then adapted to the real manuscripts by means of self-training. On a dataset of historical Vietnamese handwriting, we demonstrate the feasibility of annotation-free alignment as well as the positive impact of self-training on the character detection accuracy, reaching a detection accuracy of 96.4% with a YOLOv5m model without using any human annotation.


Sign in / Sign up

Export Citation Format

Share Document