Accurate Instance-Based Segmentation for Boundary Detection in Robot Grasping Application

Hong Hai Hoang; Bao Long Tran

doi:10.3390/app11094248

Accurate Instance-Based Segmentation for Boundary Detection in Robot Grasping Application

Applied Sciences ◽

10.3390/app11094248 ◽

2021 ◽

Vol 11 (9) ◽

pp. 4248

Author(s):

Hong Hai Hoang ◽

Bao Long Tran

Keyword(s):

Object Segmentation ◽

State Of The Art ◽

Rapid Development ◽

Spatial Relationship ◽

Learning Technologies ◽

Average Precision ◽

Novel Approach ◽

3D Camera ◽

Robot Grasping ◽

Instance Segmentation

With the rapid development of cameras and deep learning technologies, computer vision tasks such as object detection, object segmentation and object tracking are being widely applied in many fields of life. For robot grasping tasks, object segmentation aims to classify and localize objects, which helps robots to be able to pick objects accurately. The state-of-the-art instance segmentation network framework, Mask Region-Convolution Neural Network (Mask R-CNN), does not always perform an excellent accurate segmentation at the edge or border of objects. The approach using 3D camera, however, is able to extract the entire (foreground) objects easily but can be difficult or require a large amount of computation effort to classify it. We propose a novel approach, in which we combine Mask R-CNN with 3D algorithms by adding a 3D process branch for instance segmentation. Both outcomes of two branches are contemporaneously used to classify the pixels at the edge objects by dealing with the spatial relationship between edge region and mask region. We analyze the effectiveness of the method by testing with harsh cases of object positions, for example, objects are closed, overlapped or obscured by each other to focus on edge and border segmentation. Our proposed method is about 4 to 7% higher and more stable in IoU (intersection of union). This leads to a reach of 46% of mAP (mean Average Precision), which is a higher accuracy than its counterpart. The feasibility experiment shows that our method could be a remarkable promoting for the research of the grasping robot.

Download Full-text

AVILNet: A New Pliable Network with a Novel Metric for Small-Object Segmentation and Detection in Infrared Images

Remote Sensing ◽

10.3390/rs13040555 ◽

2021 ◽

Vol 13 (4) ◽

pp. 555

Author(s):

Ikhwan Song ◽

Sungho Kim

Keyword(s):

Object Segmentation ◽

State Of The Art ◽

Threshold Value ◽

Small Object ◽

Infrared Images ◽

Average Precision ◽

Trade Off ◽

Good Evaluation ◽

Single Dataset ◽

Evaluation Metric

Infrared small-object segmentation (ISOS) has a persistent trade-off problem—that is, which came first, recall or precision? Constructing a fine balance between of them is, au fond, of vital importance to obtain the best performance in real applications, such as surveillance, tracking, and many fields related to infrared searching and tracking. F1-score may be a good evaluation metric for this problem. However, since the F1-score only depends upon a specific threshold value, it cannot reflect the user’s requirements according to the various application environment. Therefore, several metrics are commonly used together. Now we introduce F-area, a novel metric for a panoptic evaluation of average precision and F1-score. It can simultaneously consider the performance in terms of real application and the potential capability of a model. Furthermore, we propose a new network, called the Amorphous Variable Inter-located Network (AVILNet), which is of pliable structure based on GridNet, and it is also an ensemble network consisting of the main and its sub-network. Compared with the state-of-the-art ISOS methods, our model achieved an AP of 51.69%, F1-score of 63.03%, and F-area of 32.58% on the International Conference on Computer Vision 2019 ISOS Single dataset by using one generator. In addition, an AP of 53.6%, an F1-score of 60.99%, and F-area of 32.69% by using dual generators, with beating the existing best record (AP, 51.42%; F1-score, 57.04%; and F-area, 29.33%).

Download Full-text

WS-RCNN: Learning to Score Proposals for Weakly Supervised Instance Segmentation

Sensors ◽

10.3390/s21103475 ◽

2021 ◽

Vol 21 (10) ◽

pp. 3475

Author(s):

Jia-Rong Ou ◽

Shu-Le Deng ◽

Jin-Gang Yu

Keyword(s):

State Of The Art ◽

Spatial Relationship ◽

The State ◽

Weak Supervision ◽

Large Margin ◽

Deep Network ◽

Heuristic Strategies ◽

Model Training ◽

Weakly Supervised ◽

Instance Segmentation

Weakly supervised instance segmentation (WSIS) provides a promising way to address instance segmentation in the absence of sufficient labeled data for training. Previous attempts on WSIS usually follow a proposal-based paradigm, critical to which is the proposal scoring strategy. These works mostly rely on certain heuristic strategies for proposal scoring, which largely hampers the sustainable advances concerning WSIS. Towards this end, this paper introduces a novel framework for weakly supervised instance segmentation, called Weakly Supervised R-CNN (WS-RCNN). The basic idea is to deploy a deep network to learn to score proposals, under the special setting of weak supervision. To tackle the key issue of acquiring proposal-level pseudo labels for model training, we propose a so-called Attention-Guided Pseudo Labeling (AGPL) strategy, which leverages the local maximal (peaks) in image-level attention maps and the spatial relationship among peaks and proposals to infer pseudo labels. We also suggest a novel training loss, called Entropic OpenSet Loss, to handle background proposals more effectively so as to further improve the robustness. Comprehensive experiments on two standard benchmarking datasets demonstrate that the proposed WS-RCNN can outperform the state-of-the-art by a large margin, with an improvement of 11.6% on PASCAL VOC 2012 and 10.7% on MS COCO 2014 in terms of mAP50, which indicates that learning-based proposal scoring and the proposed WS-RCNN framework might be a promising way towards WSIS.

Download Full-text

Mask-Refined R-CNN: A Network for Refining Object Details in Instance Segmentation

Sensors ◽

10.3390/s20041010 ◽

2020 ◽

Vol 20 (4) ◽

pp. 1010 ◽

Cited By ~ 13

Author(s):

Yiqing Zhang ◽

Jun Chu ◽

Lu Leng ◽

Jun Miao

Keyword(s):

Receptive Field ◽

Spatial Information ◽

Feature Fusion ◽

State Of The Art ◽

Rapid Development ◽

Experimental Results ◽

Small Scale ◽

Feature Maps ◽

Segmentation Accuracy ◽

Instance Segmentation

With the rapid development of flexible vision sensors and visual sensor networks, computer vision tasks, such as object detection and tracking, are entering a new phase. Accordingly, the more challenging comprehensive task, including instance segmentation, can develop rapidly. Most state-of-the-art network frameworks, for instance, segmentation, are based on Mask R-CNN (mask region-convolutional neural network). However, the experimental results confirm that Mask R-CNN does not always successfully predict instance details. The scale-invariant fully convolutional network structure of Mask R-CNN ignores the difference in spatial information between receptive fields of different sizes. A large-scale receptive field focuses more on detailed information, whereas a small-scale receptive field focuses more on semantic information. So the network cannot consider the relationship between the pixels at the object edge, and these pixels will be misclassified. To overcome this problem, Mask-Refined R-CNN (MR R-CNN) is proposed, in which the stride of ROIAlign (region of interest align) is adjusted. In addition, the original fully convolutional layer is replaced with a new semantic segmentation layer that realizes feature fusion by constructing a feature pyramid network and summing the forward and backward transmissions of feature maps of the same resolution. The segmentation accuracy is substantially improved by combining the feature layers that focus on the global and detailed information. The experimental results on the COCO (Common Objects in Context) and Cityscapes datasets demonstrate that the segmentation accuracy of MR R-CNN is about 2% higher than that of Mask R-CNN using the same backbone. The average precision of large instances reaches 56.6%, which is higher than those of all state-of-the-art methods. In addition, the proposed method requires low time cost and is easily implemented. The experiments on the Cityscapes dataset also prove that the proposed method has great generalization ability.

Download Full-text

Deep Learning for Transient Image Reconstruction from ToF Data

Sensors ◽

10.3390/s21061962 ◽

2021 ◽

Vol 21 (6) ◽

pp. 1962

Author(s):

Enrico Buratto ◽

Adriano Simonetto ◽

Gianluca Agresti ◽

Henrik Schäfer ◽

Pietro Zanuttigh

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Light Response ◽

Real Data ◽

Depth Image ◽

Learning Approach ◽

Multiple Reflections ◽

Noisy Input ◽

Novel Approach ◽

Incoming Light

In this work, we propose a novel approach for correcting multi-path interference (MPI) in Time-of-Flight (ToF) cameras by estimating the direct and global components of the incoming light. MPI is an error source linked to the multiple reflections of light inside a scene; each sensor pixel receives information coming from different light paths which generally leads to an overestimation of the depth. We introduce a novel deep learning approach, which estimates the structure of the time-dependent scene impulse response and from it recovers a depth image with a reduced amount of MPI. The model consists of two main blocks: a predictive model that learns a compact encoded representation of the backscattering vector from the noisy input data and a fixed backscattering model which translates the encoded representation into the high dimensional light response. Experimental results on real data show the effectiveness of the proposed approach, which reaches state-of-the-art performances.

Download Full-text

A New Multi-Person Pose Estimation Method Using the Partitioned CenterPose Network

Applied Sciences ◽

10.3390/app11094241 ◽

2021 ◽

Vol 11 (9) ◽

pp. 4241

Author(s):

Jiahua Wu ◽

Hyo Jong Lee

Keyword(s):

Pose Estimation ◽

Human Body ◽

State Of The Art ◽

Estimation Method ◽

Bottom Up ◽

Center Point ◽

Novel Approach ◽

Body Joints

In bottom-up multi-person pose estimation, grouping joint candidates into the appropriately structured corresponding instance of a person is challenging. In this paper, a new bottom-up method, the Partitioned CenterPose (PCP) Network, is proposed to better cluster the detected joints. To achieve this goal, we propose a novel approach called Partition Pose Representation (PPR) which integrates the instance of a person and its body joints based on joint offset. PPR leverages information about the center of the human body and the offsets between that center point and the positions of the body’s joints to encode human poses accurately. To enhance the relationships between body joints, we divide the human body into five parts, and then, we generate a sub-PPR for each part. Based on this PPR, the PCP Network can detect people and their body joints simultaneously, then group all body joints according to joint offset. Moreover, an improved l1 loss is designed to more accurately measure joint offset. Using the COCO keypoints and CrowdPose datasets for testing, it was found that the performance of the proposed method is on par with that of existing state-of-the-art bottom-up methods in terms of accuracy and speed.

Download Full-text

PCAN—Part-Based Context Attention Network for Thermal Power Plant Detection in Remote Sensing Imagery

Remote Sensing ◽

10.3390/rs13071243 ◽

2021 ◽

Vol 13 (7) ◽

pp. 1243

Author(s):

Wenxin Yin ◽

Wenhui Diao ◽

Peijin Wang ◽

Xin Gao ◽

Ya Li ◽

...

Keyword(s):

Remote Sensing ◽

Power Plants ◽

State Of The Art ◽

Thermal Power ◽

Image Interpretation ◽

Remote Sensing Image ◽

Thermal Power Plants ◽

Average Precision ◽

Deep Convolutional Neural Networks ◽

Multi Scale

The detection of Thermal Power Plants (TPPs) is a meaningful task for remote sensing image interpretation. It is a challenging task, because as facility objects TPPs are composed of various distinctive and irregular components. In this paper, we propose a novel end-to-end detection framework for TPPs based on deep convolutional neural networks. Specifically, based on the RetinaNet one-stage detector, a context attention multi-scale feature extraction network is proposed to fuse global spatial attention to strengthen the ability in representing irregular objects. In addition, we design a part-based attention module to adapt to TPPs containing distinctive components. Experiments show that the proposed method outperforms the state-of-the-art methods and can achieve 68.15% mean average precision.

Download Full-text

Density Guarantee on Finding Multiple Subgraphs and Subtensors

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3446668 ◽

2021 ◽

Vol 15 (5) ◽

pp. 1-32

Author(s):

Quang-huy Duong ◽

Heri Ramampiaro ◽

Kjetil Nørvåg ◽

Thu-lan Dam

Keyword(s):

Lower Bound ◽

State Of The Art ◽

The State ◽

The Other ◽

Exact Methods ◽

Practical Solution ◽

Novel Approach ◽

Wide Range ◽

Real World Datasets ◽

Tensor Data

Dense subregion (subgraph & subtensor) detection is a well-studied area, with a wide range of applications, and numerous efficient approaches and algorithms have been proposed. Approximation approaches are commonly used for detecting dense subregions due to the complexity of the exact methods. Existing algorithms are generally efficient for dense subtensor and subgraph detection, and can perform well in many applications. However, most of the existing works utilize the state-or-the-art greedy 2-approximation algorithm to capably provide solutions with a loose theoretical density guarantee. The main drawback of most of these algorithms is that they can estimate only one subtensor, or subgraph, at a time, with a low guarantee on its density. While some methods can, on the other hand, estimate multiple subtensors, they can give a guarantee on the density with respect to the input tensor for the first estimated subsensor only. We address these drawbacks by providing both theoretical and practical solution for estimating multiple dense subtensors in tensor data and giving a higher lower bound of the density. In particular, we guarantee and prove a higher bound of the lower-bound density of the estimated subgraph and subtensors. We also propose a novel approach to show that there are multiple dense subtensors with a guarantee on its density that is greater than the lower bound used in the state-of-the-art algorithms. We evaluate our approach with extensive experiments on several real-world datasets, which demonstrates its efficiency and feasibility.

Download Full-text

New Multi-View Classification Method with Uncertain Data

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3458282 ◽

2021 ◽

Vol 16 (1) ◽

pp. 1-23

Author(s):

Bo Liu ◽

Haowen Zhong ◽

Yanshan Xiao

Keyword(s):

Learning Strategy ◽

State Of The Art ◽

Uncertain Data ◽

Real Life ◽

Support Vector ◽

Classification Methods ◽

Complementary Information ◽

Novel Approach ◽

Svm Model ◽

Iterative Framework

Multi-view classification aims at designing a multi-view learning strategy to train a classifier from multi-view data, which are easily collected in practice. Most of the existing works focus on multi-view classification by assuming the multi-view data are collected with precise information. However, we always collect the uncertain multi-view data due to the collection process is corrupted with noise in real-life application. In this case, this article proposes a novel approach, called uncertain multi-view learning with support vector machine (UMV-SVM) to cope with the problem of multi-view learning with uncertain data. The method first enforces the agreement among all the views to seek complementary information of multi-view data and takes the uncertainty of the multi-view data into consideration by modeling reachability area of the noise. Then it proposes an iterative framework to solve the proposed UMV-SVM model such that we can obtain the multi-view classifier for prediction. Extensive experiments on real-life datasets have shown that the proposed UMV-SVM can achieve a better performance for uncertain multi-view classification in comparison to the state-of-the-art multi-view classification methods.

Download Full-text

Investigating the Impact of the Bit Depth of Fluorescence-Stained Images on the Performance of Deep Learning-Based Nuclei Instance Segmentation

Diagnostics ◽

10.3390/diagnostics11060967 ◽

2021 ◽

Vol 11 (6) ◽

pp. 967

Author(s):

Amirreza Mahbod ◽

Gerald Schaefer ◽

Christine Löw ◽

Georg Dorffner ◽

Rupert Ecker ◽

...

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Computer Assisted ◽

Important Criterion ◽

Histological Image ◽

Computer Mediated ◽

Image Patches ◽

Processing Techniques ◽

The Impact ◽

Instance Segmentation

Nuclei instance segmentation can be considered as a key point in the computer-mediated analysis of histological fluorescence-stained (FS) images. Many computer-assisted approaches have been proposed for this task, and among them, supervised deep learning (DL) methods deliver the best performances. An important criterion that can affect the DL-based nuclei instance segmentation performance of FS images is the utilised image bit depth, but to our knowledge, no study has been conducted so far to investigate this impact. In this work, we released a fully annotated FS histological image dataset of nuclei at different image magnifications and from five different mouse organs. Moreover, by different pre-processing techniques and using one of the state-of-the-art DL-based methods, we investigated the impact of image bit depth (i.e., eight bits vs. sixteen bits) on the nuclei instance segmentation performance. The results obtained from our dataset and another publicly available dataset showed very competitive nuclei instance segmentation performances for the models trained with 8 bit and 16 bit images. This suggested that processing 8 bit images is sufficient for nuclei instance segmentation of FS images in most cases. The dataset including the raw image patches, as well as the corresponding segmentation masks is publicly available in the published GitHub repository.

Download Full-text

InstantDL: an easy-to-use deep learning pipeline for image segmentation and classification

BMC Bioinformatics ◽

10.1186/s12859-021-04037-3 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Dominik Jens Elias Waibel ◽

Sayedali Shetab Boushehri ◽

Carsten Marr

Keyword(s):

Image Processing ◽

Deep Learning ◽

Specific Problem ◽

State Of The Art ◽

Image Data ◽

Semantic Segmentation ◽

Parameter Tuning ◽

Cellular Processes ◽

Minimal Effort ◽

Instance Segmentation

Abstract Background Deep learning contributes to uncovering molecular and cellular processes with highly performant algorithms. Convolutional neural networks have become the state-of-the-art tool to provide accurate and fast image data processing. However, published algorithms mostly solve only one specific problem and they typically require a considerable coding effort and machine learning background for their application. Results We have thus developed InstantDL, a deep learning pipeline for four common image processing tasks: semantic segmentation, instance segmentation, pixel-wise regression and classification. InstantDL enables researchers with a basic computational background to apply debugged and benchmarked state-of-the-art deep learning algorithms to their own data with minimal effort. To make the pipeline robust, we have automated and standardized workflows and extensively tested it in different scenarios. Moreover, it allows assessing the uncertainty of predictions. We have benchmarked InstantDL on seven publicly available datasets achieving competitive performance without any parameter tuning. For customization of the pipeline to specific tasks, all code is easily accessible and well documented. Conclusions With InstantDL, we hope to empower biomedical researchers to conduct reproducible image processing with a convenient and easy-to-use pipeline.

Download Full-text