scholarly journals Affordance-Based Grasping Point Detection Using Graph Convolutional Networks for Industrial Bin-Picking Applications

Sensors ◽  
2021 ◽  
Vol 21 (3) ◽  
pp. 816
Author(s):  
Ander Iriondo ◽  
Elena Lazkano ◽  
Ander Ansuategi

Grasping point detection has traditionally been a core robotic and computer vision problem. In recent years, deep learning based methods have been widely used to predict grasping points, and have shown strong generalization capabilities under uncertainty. Particularly, approaches that aim at predicting object affordances without relying on the object identity, have obtained promising results in random bin-picking applications. However, most of them rely on RGB/RGB-D images, and it is not clear up to what extent 3D spatial information is used. Graph Convolutional Networks (GCNs) have been successfully used for object classification and scene segmentation in point clouds, and also to predict grasping points in simple laboratory experimentation. In the present proposal, we adapted the Deep Graph Convolutional Network model with the intuition that learning from n-dimensional point clouds would lead to a performance boost to predict object affordances. To the best of our knowledge, this is the first time that GCNs are applied to predict affordances for suction and gripper end effectors in an industrial bin-picking environment. Additionally, we designed a bin-picking oriented data preprocessing pipeline which contributes to ease the learning process and to create a flexible solution for any bin-picking application. To train our models, we created a highly accurate RGB-D/3D dataset which is openly available on demand. Finally, we benchmarked our method against a 2D Fully Convolutional Network based method, improving the top-1 precision score by 1.8% and 1.7% for suction and gripper respectively.

2021 ◽  
Vol 11 (15) ◽  
pp. 6975
Author(s):  
Tao Zhang ◽  
Lun He ◽  
Xudong Li ◽  
Guoqing Feng

Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data and have the defects of disappearing training gradient and slow convergence. To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and a CTC objective function as the decoder. More importantly, the proposed architecture incorporates TCN as a feature learner to decode feature. It can partly eliminate the defects of RNN (LSTM, GRU) gradient disappearance and insufficient performance, and this yields notable performance improvement as well as faster convergence. Experiments show that the training and convergence speed are 50% faster than the state-of-the-art method, and improved accuracy by 2.4% on the GRID dataset.


Algorithms ◽  
2021 ◽  
Vol 14 (5) ◽  
pp. 144
Author(s):  
Yuexing Han ◽  
Xiaolong Li ◽  
Bing Wang ◽  
Lu Wang

Image segmentation plays an important role in the field of image processing, helping to understand images and recognize objects. However, most existing methods are often unable to effectively explore the spatial information in 3D image segmentation, and they neglect the information from the contours and boundaries of the observed objects. In addition, shape boundaries can help to locate the positions of the observed objects, but most of the existing loss functions neglect the information from the boundaries. To overcome these shortcomings, this paper presents a new cascaded 2.5D fully convolutional networks (FCNs) learning framework to segment 3D medical images. A new boundary loss that incorporates distance, area, and boundary information is also proposed for the cascaded FCNs to learning more boundary and contour features from the 3D medical images. Moreover, an effective post-processing method is developed to further improve the segmentation accuracy. We verified the proposed method on LITS and 3DIRCADb datasets that include the liver and tumors. The experimental results show that the performance of the proposed method is better than existing methods with a Dice Per Case score of 74.5% for tumor segmentation, indicating the effectiveness of the proposed method.


Author(s):  
Shengsheng Qian ◽  
Jun Hu ◽  
Quan Fang ◽  
Changsheng Xu

In this article, we focus on fake news detection task and aim to automatically identify the fake news from vast amount of social media posts. To date, many approaches have been proposed to detect fake news, which includes traditional learning methods and deep learning-based models. However, there are three existing challenges: (i) How to represent social media posts effectively, since the post content is various and highly complicated; (ii) how to propose a data-driven method to increase the flexibility of the model to deal with the samples in different contexts and news backgrounds; and (iii) how to fully utilize the additional auxiliary information (the background knowledge and multi-modal information) of posts for better representation learning. To tackle the above challenges, we propose a novel Knowledge-aware Multi-modal Adaptive Graph Convolutional Networks (KMAGCN) to capture the semantic representations by jointly modeling the textual information, knowledge concepts, and visual information into a unified framework for fake news detection. We model posts as graphs and use a knowledge-aware multi-modal adaptive graph learning principal for the effective feature learning. Compared with existing methods, the proposed KMAGCN addresses challenges from three aspects: (1) It models posts as graphs to capture the non-consecutive and long-range semantic relations; (2) it proposes a novel adaptive graph convolutional network to handle the variability of graph data; and (3) it leverages textual information, knowledge concepts and visual information jointly for model learning. We have conducted extensive experiments on three public real-world datasets and superior results demonstrate the effectiveness of KMAGCN compared with other state-of-the-art algorithms.


Author(s):  
Qianmu Yuan ◽  
Jianwen Chen ◽  
Huiying Zhao ◽  
Yaoqi Zhou ◽  
Yuedong Yang

Abstract Motivation Protein–protein interactions (PPI) play crucial roles in many biological processes, and identifying PPI sites is an important step for mechanistic understanding of diseases and design of novel drugs. Since experimental approaches for PPI site identification are expensive and time-consuming, many computational methods have been developed as screening tools. However, these methods are mostly based on neighbored features in sequence, and thus limited to capture spatial information. Results We propose a deep graph-based framework deep Graph convolutional network for Protein–Protein-Interacting Site prediction (GraphPPIS) for PPI site prediction, where the PPI site prediction problem was converted into a graph node classification task and solved by deep learning using the initial residual and identity mapping techniques. We showed that a deeper architecture (up to eight layers) allows significant performance improvement over other sequence-based and structure-based methods by more than 12.5% and 10.5% on AUPRC and MCC, respectively. Further analyses indicated that the predicted interacting sites by GraphPPIS are more spatially clustered and closer to the native ones even when false-positive predictions are made. The results highlight the importance of capturing spatially neighboring residues for interacting site prediction. Availability and implementation The datasets, the pre-computed features, and the source codes along with the pre-trained models of GraphPPIS are available at https://github.com/biomed-AI/GraphPPIS. The GraphPPIS web server is freely available at https://biomed.nscc-gz.cn/apps/GraphPPIS. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Vol 10 (11) ◽  
pp. 1827 ◽  
Author(s):  
Ahram Song ◽  
Jaewan Choi ◽  
Youkyung Han ◽  
Yongil Kim

Hyperspectral change detection (CD) can be effectively performed using deep-learning networks. Although these approaches require qualified training samples, it is difficult to obtain ground-truth data in the real world. Preserving spatial information during training is difficult due to structural limitations. To solve such problems, our study proposed a novel CD method for hyperspectral images (HSIs), including sample generation and a deep-learning network, called the recurrent three-dimensional (3D) fully convolutional network (Re3FCN), which merged the advantages of a 3D fully convolutional network (FCN) and a convolutional long short-term memory (ConvLSTM). Principal component analysis (PCA) and the spectral correlation angle (SCA) were used to generate training samples with high probabilities of being changed or unchanged. The strategy assisted in training fewer samples of representative feature expression. The Re3FCN was mainly comprised of spectral–spatial and temporal modules. Particularly, a spectral–spatial module with a 3D convolutional layer extracts the spectral–spatial features from the HSIs simultaneously, whilst a temporal module with ConvLSTM records and analyzes the multi-temporal HSI change information. The study first proposed a simple and effective method to generate samples for network training. This method can be applied effectively to cases with no training samples. Re3FCN can perform end-to-end detection for binary and multiple changes. Moreover, Re3FCN can receive multi-temporal HSIs directly as input without learning the characteristics of multiple changes. Finally, the network could extract joint spectral–spatial–temporal features and it preserved the spatial structure during the learning process through the fully convolutional structure. This study was the first to use a 3D FCN and a ConvLSTM for the remote-sensing CD. To demonstrate the effectiveness of the proposed CD method, we performed binary and multi-class CD experiments. Results revealed that the Re3FCN outperformed the other conventional methods, such as change vector analysis, iteratively reweighted multivariate alteration detection, PCA-SCA, FCN, and the combination of 2D convolutional layers-fully connected LSTM.


Author(s):  
M. Corongiu ◽  
A. Masiero ◽  
G. Tucci

Abstract. Nowadays, mobile mapping systems are widely used to quickly collect reliable geospatial information of relatively large areas: thanks to such characteristics, the number of applications and fields exploiting their usage is continuously increasing. Among such possible applications, mobile mapping systems have been recently considered also by railway system managers to quickly produce and update a database of the geospatial features of such system, also called assets. Despite several vehicles, devices and acquisition methods can be considered for the data collection of the railway system, the predominant one is probably that based on the use of a mobile mapping system mounted on a train, which moves all along the railway tracks, enabling the 3D reproduction of the entire railway track area.Given the large amount of data collected by such mobile mapping, automatic procedures have to be used to speed up the process of extracting the spatial information of interest, i.e. assets positions and characteristics.This paper considers the problem of extracting such information for what concerns cantilever and portal masts, by exploiting a mixed approach. First, a set of candidate areas are extracted and pre-processed by considering certain of their geometric characteristics, mainly extracted by using eigenvalues of the covariance matrix of a point neighborhood. Then, a 3D modified Fisher vector-deep learning neural net is used to classify the candidates. Tests on such approach are conducted in two areas of the Italian railway system.


Author(s):  
Zhichao Huang ◽  
Xutao Li ◽  
Yunming Ye ◽  
Michael K. Ng

Graph Convolutional Networks (GCNs) have been extensively studied in recent years. Most of existing GCN approaches are designed for the homogenous graphs with a single type of relation. However, heterogeneous graphs of multiple types of relations are also ubiquitous and there is a lack of methodologies to tackle such graphs. Some previous studies address the issue by performing conventional GCN on each single relation and then blending their results. However, as the convolutional kernels neglect the correlations across relations, the strategy is sub-optimal. In this paper, we propose the Multi-Relational Graph Convolutional Network (MR-GCN) framework by developing a novel convolution operator on multi-relational graphs. In particular, our multi-dimension convolution operator extends the graph spectral analysis into the eigen-decomposition of a Laplacian tensor. And the eigen-decomposition is formulated with a generalized tensor product, which can correspond to any unitary transform instead of limited merely to Fourier transform. We conduct comprehensive experiments on four real-world multi-relational graphs to solve the semi-supervised node classification task, and the results show the superiority of MR-GCN against the state-of-the-art competitors.


Author(s):  
Y.-H. Lu ◽  
J.-Y. Han

Abstract. Global Navigation Satellite System (GNSS) is a matured modern technique for spatial data acquisition. Its performance has a great correlation with GNSS receiver position. However, high-density building in urban areas causes signal obstructions and thus hinders GNSS’s serviceability. Consequently, GNSS positioning is weakened in urban areas, so deriving proper improvement resolutions is a necessity. Because topographic effects are considered the main factor that directly block signal transmission between satellites and receivers, this study integrated aerial borne LiDAR point clouds and a 2D building boundary map to provide reliable 3D spatial information to analyze topographic effects. Using such vector data not only reflected high-quality GNSS satellite visibility calculations, but also significantly reduced data amount and processing time. A signal obstruction analysis technique and optimized computational algorithm were also introduced. In conclusion, this paper proposes using superimposed column method to analyze GNSS receivers’ surrounding environments and thus improve GNSS satellite visibility predictions in an efficient and reliable manner.


Geophysics ◽  
2021 ◽  
pp. 1-66
Author(s):  
Guanqun Sheng ◽  
Shuangyu Yang ◽  
Xiaolong Guo ◽  
Xingong Tang

Arrival-time picking of microseismic events is a critical procedure in microseismic data processing. However, as field monitoring data contain many microseismic events with low signal-to-noise ratios (SNRs), traditional arrival-time picking methods based on the instantaneous characteristics of seismic signals cannot meet the picking accuracy and efficiency requirements of microseismic monitoring owing to the large volume of monitoring data. Conversely, methods based on deep neural networks can significantly improve arrival-time picking accuracy and efficiency in low-SNR environments. Therefore, we propose a deep convolutional network that combines the U-net and DenseNet approaches to pick arrival times automatically. This novel network, called MSNet not only retains the spatial information of any input signal or profile based on the U-net, but also extracts and integrates more essential features of events and non-events through dense blocks, thereby further improving the picking accuracy and efficiency. An effective workflow is developed to verify the superiority of the proposed method. First, we describe the structure of MSNet and the workflow of the proposed picking method. Then, datasets are constructed using variable microseismic traces from field microseismic monitoring records and from the finite-difference forward modeling of microseismic data to train the network. Subsequently, hyperparameter tuning is conducted to optimize the MSNet. Finally, we test the MSNet using modeled signals with different SNRs and field microseismic data from different monitoring areas. By comparing the picking results of the proposed method with the results of U-net and short-term average and long-term average (STA/LTA) methods, the effectiveness of the proposed method is verified. The arrival picking results of synthetic data and microseismic field data show that the proposed network has increased adaptability and can achieve high accuracy for picking the arrival-time of microseismic events.


Author(s):  
Parian Haghighat ◽  
Aden Prince ◽  
Heejin Jeong

The growth in self-fitness mobile applications has encouraged people to turn to personal fitness, which entails integrating self-tracking applications with exercise motion data to reduce fatigue and mitigate the risk of injury. The advancements in computer vision and motion capture technologies hold great promise to improve exercise classification performance. This study investigates a supervised deep learning model performance, Graph Convolutional Network (GCN) to classify three workouts using the Azure Kinect device’s motion data. The model defines the skeleton as a graph and combines GCN layers, a readout layer, and multi-layer perceptrons to build an end-to-end framework for graph classification. The model achieves an accuracy of 95.86% in classifying 19,442 frames. The current model exchanges feature information between each joint and its 1-nearest neighbor, which impact fades in graph-level classification. Therefore, a future study on improved feature utilization can enhance the model performance in classifying inter-user exercise variation.


Sign in / Sign up

Export Citation Format

Share Document