scholarly journals A New Edge Patch with Rotation Invariance for Object Detection and Pose Estimation

Sensors ◽  
2020 ◽  
Vol 20 (3) ◽  
pp. 887 ◽  
Author(s):  
Xunwei Tong ◽  
Ruifeng Li ◽  
Lianzheng Ge ◽  
Lijun Zhao ◽  
Ke Wang

Local patch-based methods of object detection and pose estimation are promising. However, to the best of the authors’ knowledge, traditional red-green-blue and depth (RGB-D) patches contain scene interference (foreground occlusion and background clutter) and have little rotation invariance. To solve these problems, a new edge patch is proposed and experimented with in this study. The edge patch is a local sampling RGB-D patch centered at the edge pixel of the depth image. According to the normal direction of the depth edge, the edge patch is sampled along a canonical orientation, making it rotation invariant. Through a process of depth detection, scene interference is eliminated from the edge patch, which improves the robustness. The framework of the edge patch-based method is described, and the method was evaluated on three public datasets. Compared with existing methods, the proposed method achieved a higher average F1-score (0.956) on the Tejani dataset and a better average detection rate (62%) on the Occlusion dataset, even in situations of serious scene interference. These results showed that the proposed method has higher detection accuracy and stronger robustness.

Sensors ◽  
2019 ◽  
Vol 19 (5) ◽  
pp. 1032 ◽  
Author(s):  
Mingliang Fu ◽  
Weijia Zhou

In recent years, estimating the 6D pose of object instances with convolutional neural network (CNN) has received considerable attention. Depending on whether intermediate cues are used, the relevant literature can be roughly divided into two broad categories: direct methods and two-stage pipelines. For the latter, intermediate cues, such as 3D object coordinates, semantic keypoints, or virtual control points instead of pose parameters are regressed by CNN in the first stage. Object pose can then be solved by correspondence constraints constructed with these intermediate cues. In this paper, we focus on the postprocessing of a two-stage pipeline and propose to combine two learning concepts for estimating object pose under challenging scenes: projection grouping on one side, and correspondence learning on the other. We firstly employ a local-patch based method to predict projection heatmaps which denote the confidence distribution of projection of 3D bounding box’s corners. A projection grouping module is then proposed to remove redundant local maxima from each layer of heatmaps. Instead of directly feeding 2D–3D correspondences to the perspective-n-point (PnP) algorithm, multiple correspondence hypotheses are sampled from local maxima and its corresponding neighborhood and ranked by a correspondence–evaluation network. Finally, correspondences with higher confidence are selected to determine object pose. Extensive experiments on three public datasets demonstrate that the proposed framework outperforms several state of the art methods.


2021 ◽  
Vol 7 (1) ◽  
pp. 53
Author(s):  
Ángel Carro-Lagoa ◽  
Valentín Barral ◽  
Miguel González-López ◽  
Carlos J. Escudero ◽  
Luis Castedo

Indoor positioning systems usually rely on RF-based devices that should be carried by the targets, which is non-viable in certain use cases. Recent advances in AI have increased the reliability of person detection in images, thus, enabling the use of surveillance cameras to perform person localization and tracking. This paper evaluates the performance of indoor person location using cameras and edge devices with AI accelerators. We describe the video processing performed in each edge device, including the selected AI models and the post-processing of their outputs to obtain the positions of the detected persons and allow their tracking. The person location is based on pose estimation models as they provide better results than do object detection networks in occlusion situations. Experimental results are obtained with public datasets to show the feasibility of the solution.


2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Shiming Dai ◽  
Wei Liu ◽  
Wenji Yang ◽  
Lili Fan ◽  
Jihao Zhang

3D hand pose estimation can provide basic information about gestures, which has an important significance in the fields of Human-Machine Interaction (HMI) and Virtual Reality (VR). In recent years, 3D hand pose estimation from a single depth image has made great research achievements due to the development of depth cameras. However, 3D hand pose estimation from a single RGB image is still a highly challenging problem. In this work, we propose a novel four-stage cascaded hierarchical CNN (4CHNet), which leverages hierarchical network to decompose hand pose estimation into finger pose estimation and palm pose estimation, extracts separately finger features and palm features, and finally fuses them to estimate 3D hand pose. Compared with direct estimation methods, the hand feature information extracted by the hierarchical network is more representative. Furthermore, concatenating various stages of the network for end-to-end training can make each stage mutually beneficial and progress. The experimental results on two public datasets demonstrate that our 4CHNet can significantly improve the accuracy of 3D hand pose estimation from a single RGB image.


2011 ◽  
Vol 109 ◽  
pp. 285-289
Author(s):  
Xue Na Qiu ◽  
Shi Rong Liu ◽  
Jia Tao Song

This paper studies generic object detection. In the view of complexity and diversity of generic object, it proposes Boosting generic object detection method with bag-of-words. Boosting method has good detection efficiency, but it has some fault detections due to the diversity and complexity of the object. While Bag-of-words method has some advantages, such as local patch features, simplicity and robustness, and it has good classification performance of complex object. The proposed method applies Bag-of-words to remove the fault detection and to improve the tracking results of Boosting, and thus it achieves high generic object detection accuracy.


2021 ◽  
Vol 11 (11) ◽  
pp. 4894
Author(s):  
Anna Scius-Bertrand ◽  
Michael Jungo ◽  
Beat Wolf ◽  
Andreas Fischer ◽  
Marc Bui

The current state of the art for automatic transcription of historical manuscripts is typically limited by the requirement of human-annotated learning samples, which are are necessary to train specific machine learning models for specific languages and scripts. Transcription alignment is a simpler task that aims to find a correspondence between text in the scanned image and its existing Unicode counterpart, a correspondence which can then be used as training data. The alignment task can be approached with heuristic methods dedicated to certain types of manuscripts, or with weakly trained systems reducing the required amount of annotations. In this article, we propose a novel learning-based alignment method based on fully convolutional object detection that does not require any human annotation at all. Instead, the object detection system is initially trained on synthetic printed pages using a font and then adapted to the real manuscripts by means of self-training. On a dataset of historical Vietnamese handwriting, we demonstrate the feasibility of annotation-free alignment as well as the positive impact of self-training on the character detection accuracy, reaching a detection accuracy of 96.4% with a YOLOv5m model without using any human annotation.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1299
Author(s):  
Honglin Yuan ◽  
Tim Hoogenkamp ◽  
Remco C. Veltkamp

Deep learning has achieved great success on robotic vision tasks. However, when compared with other vision-based tasks, it is difficult to collect a representative and sufficiently large training set for six-dimensional (6D) object pose estimation, due to the inherent difficulty of data collection. In this paper, we propose the RobotP dataset consisting of commonly used objects for benchmarking in 6D object pose estimation. To create the dataset, we apply a 3D reconstruction pipeline to produce high-quality depth images, ground truth poses, and 3D models for well-selected objects. Subsequently, based on the generated data, we produce object segmentation masks and two-dimensional (2D) bounding boxes automatically. To further enrich the data, we synthesize a large number of photo-realistic color-and-depth image pairs with ground truth 6D poses. Our dataset is freely distributed to research groups by the Shape Retrieval Challenge benchmark on 6D pose estimation. Based on our benchmark, different learning-based approaches are trained and tested by the unified dataset. The evaluation results indicate that there is considerable room for improvement in 6D object pose estimation, particularly for objects with dark colors, and photo-realistic images are helpful in increasing the performance of pose estimation algorithms.


2021 ◽  
Vol 104 (2) ◽  
pp. 003685042110113
Author(s):  
Xianghua Ma ◽  
Zhenkun Yang

Real-time object detection on mobile platforms is a crucial but challenging computer vision task. However, it is widely recognized that although the lightweight object detectors have a high detection speed, the detection accuracy is relatively low. In order to improve detecting accuracy, it is beneficial to extract complete multi-scale image features in visual cognitive tasks. Asymmetric convolutions have a useful quality, that is, they have different aspect ratios, which can be used to exact image features of objects, especially objects with multi-scale characteristics. In this paper, we exploit three different asymmetric convolutions in parallel and propose a new multi-scale asymmetric convolution unit, namely MAC block to enhance multi-scale representation ability of CNNs. In addition, MAC block can adaptively merge the features with different scales by allocating learnable weighted parameters to three different asymmetric convolution branches. The proposed MAC blocks can be inserted into the state-of-the-art backbone such as ResNet-50 to form a new multi-scale backbone network of object detectors. To evaluate the performance of MAC block, we conduct experiments on CIFAR-100, PASCAL VOC 2007, PASCAL VOC 2012 and MS COCO 2014 datasets. Experimental results show that the detection precision can be greatly improved while a fast detection speed is guaranteed as well.


2021 ◽  
Author(s):  
Timon Hofer ◽  
Faranak Shamsafar ◽  
Nuri Benbarka ◽  
Andreas Zell

2021 ◽  
Vol 11 (8) ◽  
pp. 3531
Author(s):  
Hesham M. Eraqi ◽  
Karim Soliman ◽  
Dalia Said ◽  
Omar R. Elezaby ◽  
Mohamed N. Moustafa ◽  
...  

Extensive research efforts have been devoted to identify and improve roadway features that impact safety. Maintaining roadway safety features relies on costly manual operations of regular road surveying and data analysis. This paper introduces an automatic roadway safety features detection approach, which harnesses the potential of artificial intelligence (AI) computer vision to make the process more efficient and less costly. Given a front-facing camera and a global positioning system (GPS) sensor, the proposed system automatically evaluates ten roadway safety features. The system is composed of an oriented (or rotated) object detection model, which solves an orientation encoding discontinuity problem to improve detection accuracy, and a rule-based roadway safety evaluation module. To train and validate the proposed model, a fully-annotated dataset for roadway safety features extraction was collected covering 473 km of roads. The proposed method baseline results are found encouraging when compared to the state-of-the-art models. Different oriented object detection strategies are presented and discussed, and the developed model resulted in improving the mean average precision (mAP) by 16.9% when compared with the literature. The roadway safety feature average prediction accuracy is 84.39% and ranges between 91.11% and 63.12%. The introduced model can pervasively enable/disable autonomous driving (AD) based on safety features of the road; and empower connected vehicles (CV) to send and receive estimated safety features, alerting drivers about black spots or relatively less-safe segments or roads.


Sign in / Sign up

Export Citation Format

Share Document