Robust 3D Hand Detection from a Single RGB-D Image in Unconstrained Environments

Chi Xu; Jun Zhou; Wendi Cai; Yunkai Jiang; Yongbo Li; Yi Liu

doi:10.3390/s20216360

Robust 3D Hand Detection from a Single RGB-D Image in Unconstrained Environments

Sensors ◽

10.3390/s20216360 ◽

2020 ◽

Vol 20 (21) ◽

pp. 6360

Author(s):

Chi Xu ◽

Jun Zhou ◽

Wendi Cai ◽

Yunkai Jiang ◽

Yongbo Li ◽

...

Keyword(s):

State Of The Art ◽

Three Dimensional ◽

Detection Accuracy ◽

Light Conditions ◽

Hand Detection ◽

Lighting Conditions ◽

Detection Approach ◽

Dark Conditions ◽

Bounding Boxes ◽

Human Hands

Three-dimensional hand detection from a single RGB-D image is an important technology which supports many useful applications. Practically, it is challenging to robustly detect human hands in unconstrained environments because the RGB-D channels can be affected by many uncontrollable factors, such as light changes. To tackle this problem, we propose a 3D hand detection approach which improves the robustness and accuracy by adaptively fusing the complementary features extracted from the RGB-D channels. Using the fused RGB-D feature, the 2D bounding boxes of hands are detected first, and then the 3D locations along the z-axis are estimated through a cascaded network. Furthermore, we represent a challenging RGB-D hand detection dataset collected in unconstrained environments. Different from previous works which primarily rely on either the RGB or D channel, we adaptively fuse the RGB-D channels for hand detection. Specifically, evaluation results show that the D-channel is crucial for hand detection in unconstrained environments. Our RGB-D fusion-based approach significantly improves the hand detection accuracy from 69.1 to 74.1 comparing to one of the most state-of-the-art RGB-based hand detectors. The existing RGB- or D-based methods are unstable in unseen lighting conditions: in dark conditions, the accuracy of the RGB-based method significantly drops to 48.9, and in back-light conditions, the accuracy of the D-based method dramatically drops to 28.3. Compared with these methods, our RGB-D fusion based approach is much more robust without accuracy degrading, and our detection results are 62.5 and 65.9, respectively, in these two extreme lighting conditions for accuracy.

Download Full-text

Minimal Rare Pattern-Based Outlier Detection Approach For Uncertain Data Streams Under Monotonic Constraints

The Computer Journal ◽

10.1093/comjnl/bxab139 ◽

2021 ◽

Author(s):

Saihua Cai ◽

Jinfu Chen ◽

Haibo Chen ◽

Chi Zhang ◽

Qian Li ◽

...

Keyword(s):

Outlier Detection ◽

Data Streams ◽

State Of The Art ◽

Uncertain Data ◽

Small Scale ◽

Detection Accuracy ◽

Detection Approach ◽

The Matrix ◽

Uncertain Data Streams

Abstract Existing association-based outlier detection approaches were proposed to seek for potential outliers from huge full set of uncertain data streams ($UDS$), but could not effectively process the small scale of $UDS$ that satisfies preset constraints; thus, they were time consuming. To solve this problem, this paper proposes a novel minimal rare pattern-based outlier detection approach, namely Constrained Minimal Rare Pattern-based Outlier Detection (CMRP-OD), to discover outliers from small sets of $UDS$ that satisfy the user-preset succinct or convertible monotonic constraints. First, two concepts of ‘maximal probability’ and ‘support cap’ are proposed to compress the scale of extensible patterns, and then the matrix is designed to store the information of each valid pattern to reduce the scanning times of $UDS$, thus decreasing the time consumption. Second, more factors that can influence the determination of outlier are considered in the design of deviation indices, thus increasing the detection accuracy. Extensive experiments show that compared with the state-of-the-art approaches, CMRP-OD approach has at least 10% improvement on detection accuracy, and its time cost is also almost reduced half.

Download Full-text

Real-Time Scene Text Detection with Differentiable Binarization

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6812 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11474-11481 ◽

Cited By ~ 9

Author(s):

Minghui Liao ◽

Zhaoyi Wan ◽

Cong Yao ◽

Kai Chen ◽

Xiang Bai

Keyword(s):

State Of The Art ◽

Text Detection ◽

Detection Accuracy ◽

Post Processing ◽

Segmentation Method ◽

Performance Improvements ◽

Scene Text Detection ◽

Scene Text ◽

Benchmark Datasets ◽

Bounding Boxes

Recently, segmentation-based methods are quite popular in scene text detection, as the segmentation results can more accurately describe scene text of various shapes such as curve text. However, the post-processing of binarization is essential for segmentation-based detection, which converts probability maps produced by a segmentation method into bounding boxes/regions of text. In this paper, we propose a module named Differentiable Binarization (DB), which can perform the binarization process in a segmentation network. Optimized along with a DB module, a segmentation network can adaptively set the thresholds for binarization, which not only simplifies the post-processing but also enhances the performance of text detection. Based on a simple segmentation network, we validate the performance improvements of DB on five benchmark datasets, which consistently achieves state-of-the-art results, in terms of both detection accuracy and speed. In particular, with a light-weight backbone, the performance improvements by DB are significant so that we can look for an ideal tradeoff between detection accuracy and efficiency. Specifically, with a backbone of ResNet-18, our detector achieves an F-measure of 82.8, running at 62 FPS, on the MSRA-TD500 dataset. Code is available at: https://github.com/MhLiao/DB.

Download Full-text

3D Object Detection and Instance Segmentation from 3D Range and 2D Color Images

Sensors ◽

10.3390/s21041213 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1213

Author(s):

Xiaoke Shen ◽

Ioannis Stamos

Keyword(s):

Object Detection ◽

State Of The Art ◽

Detection System ◽

Significant Loss ◽

Detection Accuracy ◽

Lighting Conditions ◽

Rgb Images ◽

3D Object Detection ◽

And Robotics ◽

Instance Segmentation

Instance segmentation and object detection are significant problems in the fields of computer vision and robotics. We address those problems by proposing a novel object segmentation and detection system. First, we detect 2D objects based on RGB, depth only, or RGB-D images. A 3D convolutional-based system, named Frustum VoxNet, is proposed. This system generates frustums from 2D detection results, proposes 3D candidate voxelized images for each frustum, and uses a 3D convolutional neural network (CNN) based on these candidates voxelized images to perform the 3D instance segmentation and object detection. Results on the SUN RGB-D dataset show that our RGB-D-based system’s 3D inference is much faster than state-of-the-art methods, without a significant loss of accuracy. At the same time, we can provide segmentation and detection results using depth only images, with accuracy comparable to RGB-D-based systems. This is important since our methods can also work well in low lighting conditions, or with sensors that do not acquire RGB images. Finally, the use of segmentation as part of our pipeline increases detection accuracy, while providing at the same time 3D instance segmentation.

Download Full-text

Verifying the Effectiveness of New Face Spoofing DB with Capture Angle and Distance

Electronics ◽

10.3390/electronics9040661 ◽

2020 ◽

Vol 9 (4) ◽

pp. 661

Author(s):

Jin Yeong Bok ◽

Kun Ha Suh ◽

Eui Chul Lee

Keyword(s):

Image Analysis ◽

Face Recognition ◽

Three Dimensional ◽

Image Sensor ◽

Detection Accuracy ◽

Lighting Conditions ◽

Additional Equipment ◽

Recognition Systems ◽

Face Spoofing ◽

Better Than

Face recognition is a representative biometric that can be easily used; however, spoofing attacks threaten the security of face biometric systems by generating fake faces. Thus, it is not advisable to only consider sophisticated spoofing cases, such as three-dimensional masks, because they require additional equipment, thereby increasing the implementation cost. To prevent easy face spoofing attacks through print and display, the two-dimensional (2D) image analysis method using existing face recognition systems is reasonable. Therefore, we proposed a new database called the “pattern recognition-face spoofing advancement database” that can be used to prevent such attacks based on 2D image analysis. To the best of our knowledge, this is the first face spoofing database that considers the changes in both the angle and distance. Therefore, it can be used to train various positional relationships between a face and camera. We conducted various experiments to verify the efficiency of this database. The spoofing detection accuracy of our database using ResNet-18 was found to be 96.75%. The experimental results for various scenarios demonstrated that the spoof detection performances were better for images with pinch angle, near distance images, and replay attacks than those for front images, far distance images, and print attacks, respectively. In the cross-database verification result, the performance when tested with other databases (DBs) after training with our DB was better than the opposite. The results of cross-device verification in terms of camera type showed negligible difference; thus, it was concluded that the type of image sensor does not affect the detection accuracy. Consequently, it was confirmed that the proposed DB that considers various distances, capture angles, lighting conditions, and backgrounds can be used as a training DB to detect spoofing attacks in general face recognition systems.

Download Full-text

A Two-Phase Mitosis Detection Approach Based on U-Shaped Network

BioMed Research International ◽

10.1155/2021/1722652 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Wenjing Lu

Keyword(s):

Deep Learning ◽

Object Detection ◽

State Of The Art ◽

Experimental Results ◽

Detection Methods ◽

Two Phase ◽

Mitosis Detection ◽

Detection Approach ◽

Histopathology Images ◽

Bounding Boxes

This paper proposes a deep learning-based method for mitosis detection in breast histopathology images. A main problem in mitosis detection is that most of the datasets only have weak labels, i.e., only the coordinates indicating the center of the mitosis region. This makes most of the existing powerful object detection methods hardly be used in mitosis detection. Aiming at solving this problem, this paper firstly applies a CNN-based algorithm to pixelwisely segment the mitosis regions, based on which bounding boxes of mitosis are generated as strong labels. Based on the generated bounding boxes, an object detection network is trained to accomplish mitosis detection. Experimental results show that the proposed method is effective in detecting mitosis, and the accuracies outperform state-of-the-art literatures.

Download Full-text

Fault-Diagnosing Deep-Visual-SLAM for 3D Change Object Detection

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2021.p0356 ◽

2021 ◽

Vol 25 (3) ◽

pp. 356-364

Author(s):

Kanji Tanaka ◽

Keyword(s):

Large Scale ◽

Color Image ◽

State Of The Art ◽

Three Dimensional ◽

Background Modeling ◽

Original Form ◽

Detection Accuracy ◽

Time Space ◽

Localization And Mapping

Although image change detection (ICD) methods provide good detection accuracy for many scenarios, most existing methods rely on place-specific background modeling. The time/space cost for such place-specific models is prohibitive for large-scale scenarios, such as long-term robotic visual simultaneous localization and mapping (SLAM). Therefore, we propose a novel ICD framework that is specifically customized for long-term SLAM. This study is inspired by the multi-map-based SLAM framework, where multiple maps can perform mutual diagnosis and hence do not require any explicit background modeling/model. We extend this multi-map-based diagnosis approach to a more generic single-map-based object-level diagnosis framework (i.e., ICD), where the self-localization module of SLAM, which is the change object indicator, can be used in its original form. Furthermore, we consider map diagnosis on a state-of-the-art deep convolutional neural network (DCN)-based SLAM system (instead of on conventional bag-of-words or landmark-based systems), in which the blackbox nature of the DCN complicates the diagnosis problem. Additionally, we consider a three-dimensional point cloud (PC)-based (instead of typical monocular color image-based) SLAM and adopt a state-of-the-art scan context PC descriptor for map diagnosis for the first time.

Download Full-text

Detection of Malicious Spatial-Domain Steganography over Noisy Channels Using Convolutional Neural Networks

Electronic Imaging ◽

10.2352/issn.2470-1173.2020.4.mwsf-076 ◽

2020 ◽

Vol 2020 (4) ◽

pp. 76-1-76-7

Author(s):

Swaroop Shankar Prasad ◽

Ofer Hadar ◽

Ilia Polian

Keyword(s):

State Of The Art ◽

Visual Quality ◽

Channel Noise ◽

Detection Accuracy ◽

Noisy Channel ◽

Noisy Channels ◽

Reliable Transmission ◽

Reliable Detection ◽

Natural Noise ◽

Will Force

Image steganography can have legitimate uses, for example, augmenting an image with a watermark for copyright reasons, but can also be utilized for malicious purposes. We investigate the detection of malicious steganography using neural networkbased classification when images are transmitted through a noisy channel. Noise makes detection harder because the classifier must not only detect perturbations in the image but also decide whether they are due to the malicious steganographic modifications or due to natural noise. Our results show that reliable detection is possible even for state-of-the-art steganographic algorithms that insert stego bits not affecting an image’s visual quality. The detection accuracy is high (above 85%) if the payload, or the amount of the steganographic content in an image, exceeds a certain threshold. At the same time, noise critically affects the steganographic information being transmitted, both through desynchronization (destruction of information which bits of the image contain steganographic information) and by flipping these bits themselves. This will force the adversary to use a redundant encoding with a substantial number of error-correction bits for reliable transmission, making detection feasible even for small payloads.

Download Full-text

Smoke recognition network based on dynamic characteristics

International Journal of Advanced Robotic Systems ◽

10.1177/1729881420925662 ◽

2020 ◽

Vol 17 (3) ◽

pp. 172988142092566

Author(s):

Dahan Wang ◽

Sheng Luo ◽

Li Zhao ◽

Xiaoming Pan ◽

Muchou Wang ◽

...

Keyword(s):

Dynamic Characteristics ◽

State Of The Art ◽

The State ◽

Detection Accuracy ◽

Static Characteristics ◽

Good Tool ◽

Early Signal ◽

Fuzzy Objects ◽

The Difference ◽

Smoke Recognition

Fire is a fierce disaster, and smoke is the early signal of fire. Since such features as chrominance, texture, and shape of smoke are very special, a lot of methods based on these features have been developed. But these static characteristics vary widely, so there are some exceptions leading to low detection accuracy. On the other side, the motion of smoke is much more discriminating than the aforementioned features, so a time-domain neural network is proposed to extract its dynamic characteristics. This smoke recognition network has these advantages:(1) extract the spatiotemporal with the 3D filters which work on dynamic and static characteristics synchronously; (2) high accuracy, 87.31% samples being classified rightly, which is the state of the art even in a chaotic environments, and the fuzzy objects for other methods, such as haze, fog, and climbing cars, are distinguished distinctly; (3) high sensitiveness, smoke being detected averagely at the 23rd frame, which is also the state of the art, which is meaningful to alarm early fire as soon as possible; and (4) it is not been based on any hypothesis, which guarantee the method compatible. Finally, a new metric, the difference between the first frame in which smoke is detected and the first frame in which smoke happens, is proposed to compare the algorithms sensitivity in videos. The experiments confirm that the dynamic characteristics are more discriminating than the aforementioned static characteristics, and smoke recognition network is a good tool to extract compound feature.

Download Full-text

Bottom-up and Layerwise Domain Adaptation for Pedestrian Detection in Thermal Images

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3418213 ◽

2021 ◽

Vol 17 (1) ◽

pp. 1-19

Author(s):

My Kieu ◽

Andrew D. Bagdanov ◽

Marco Bertini

Keyword(s):

Domain Adaptation ◽

State Of The Art ◽

Pedestrian Detection ◽

Challenging Problem ◽

Top Down ◽

Bottom Up ◽

Security Applications ◽

Lighting Conditions ◽

Initial Layers ◽

Single Modality

Pedestrian detection is a canonical problem for safety and security applications, and it remains a challenging problem due to the highly variable lighting conditions in which pedestrians must be detected. This article investigates several domain adaptation approaches to adapt RGB-trained detectors to the thermal domain. Building on our earlier work on domain adaptation for privacy-preserving pedestrian detection, we conducted an extensive experimental evaluation comparing top-down and bottom-up domain adaptation and also propose two new bottom-up domain adaptation strategies. For top-down domain adaptation, we leverage a detector pre-trained on RGB imagery and efficiently adapt it to perform pedestrian detection in the thermal domain. Our bottom-up domain adaptation approaches include two steps: first, training an adapter segment corresponding to initial layers of the RGB-trained detector adapts to the new input distribution; then, we reconnect the adapter segment to the original RGB-trained detector for final adaptation with a top-down loss. To the best of our knowledge, our bottom-up domain adaptation approaches outperform the best-performing single-modality pedestrian detection results on KAIST and outperform the state of the art on FLIR.

Download Full-text

Transcription Alignment of Historical Vietnamese Manuscripts without Human-Annotated Learning Samples

Applied Sciences ◽

10.3390/app11114894 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4894

Author(s):

Anna Scius-Bertrand ◽

Michael Jungo ◽

Beat Wolf ◽

Andreas Fischer ◽

Marc Bui

Keyword(s):

Object Detection ◽

State Of The Art ◽

Positive Impact ◽

Detection System ◽

Training Data ◽

Detection Accuracy ◽

Current State ◽

Alignment Task ◽

Scanned Image ◽

Automatic Transcription

The current state of the art for automatic transcription of historical manuscripts is typically limited by the requirement of human-annotated learning samples, which are are necessary to train specific machine learning models for specific languages and scripts. Transcription alignment is a simpler task that aims to find a correspondence between text in the scanned image and its existing Unicode counterpart, a correspondence which can then be used as training data. The alignment task can be approached with heuristic methods dedicated to certain types of manuscripts, or with weakly trained systems reducing the required amount of annotations. In this article, we propose a novel learning-based alignment method based on fully convolutional object detection that does not require any human annotation at all. Instead, the object detection system is initially trained on synthetic printed pages using a font and then adapted to the real manuscripts by means of self-training. On a dataset of historical Vietnamese handwriting, we demonstrate the feasibility of annotation-free alignment as well as the positive impact of self-training on the character detection accuracy, reaching a detection accuracy of 96.4% with a YOLOv5m model without using any human annotation.

Download Full-text