Affinity Derivation for Accurate Instance Segmentation

Author(s):  
Yiding Liu ◽  
Siyu Yang ◽  
Bin Li ◽  
Wengang Zhou ◽  
Jizheng Xu ◽  
...  

Affinity, which represents whether two pixels belong to a same instance, is an equivalent representation to the instance segmentation labels. Conventional works do not make an explicit exploration on the affinity. In this article, we present two instance segmentation schemes based on pixel affinity information and show the effectiveness of affinity in both aspects. For proposal-free method, we predict pixel affinity for each image and then propose a simple yet effective graph merge algorithm to cluster pixels into instances. It shows that the affinity is powerful as an instance-relevant information to guide the clustering procedure in proposal-free instance segmentation. For proposal-based methods, we extend conventional framework with affinity head and introduce affinity as attached supervision in training phase. Without any additional inference cost, we can improve the performance of existing proposal-based instance segmentation methods, which shows that the affinity can also be applied as an auxiliary loss and training with such extra loss is beneficial to the training progress. Experimental results show that our schemes achieve comparable performance to other state-of-the-art instance segmentation methods. With Cityscapes training data, the proposed proposal-free method achieves 28.8 AP and the proposal-based method gets 27.2 AP both on test sets.

Author(s):  
Panpan Zheng ◽  
Shuhan Yuan ◽  
Xintao Wu ◽  
Jun Li ◽  
Aidong Lu

Many online applications, such as online social networks or knowledge bases, are often attacked by malicious users who commit different types of actions such as vandalism on Wikipedia or fraudulent reviews on eBay. Currently, most of the fraud detection approaches require a training dataset that contains records of both benign and malicious users. However, in practice, there are often no or very few records of malicious users. In this paper, we develop one-class adversarial nets (OCAN) for fraud detection with only benign users as training data. OCAN first uses LSTM-Autoencoder to learn the representations of benign users from their sequences of online activities. It then detects malicious users by training a discriminator of a complementary GAN model that is different from the regular GAN model. Experimental results show that our OCAN outperforms the state-of-the-art oneclass classification models and achieves comparable performance with the latest multi-source LSTM model that requires both benign and malicious users in the training phase.


Forecasting ◽  
2021 ◽  
Vol 3 (4) ◽  
pp. 741-762
Author(s):  
Panagiotis Stalidis ◽  
Theodoros Semertzidis ◽  
Petros Daras

In this paper, a detailed study on crime classification and prediction using deep learning architectures is presented. We examine the effectiveness of deep learning algorithms in this domain and provide recommendations for designing and training deep learning systems for predicting crime areas, using open data from police reports. Having time-series of crime types per location as training data, a comparative study of 10 state-of-the-art methods against 3 different deep learning configurations is conducted. In our experiments with 5 publicly available datasets, we demonstrate that the deep learning-based methods consistently outperform the existing best-performing methods. Moreover, we evaluate the effectiveness of different parameters in the deep learning architectures and give insights for configuring them to achieve improved performance in crime classification and finally crime prediction.


Author(s):  
Hui Ying ◽  
Zhaojin Huang ◽  
Shu Liu ◽  
Tianjia Shao ◽  
Kun Zhou

Current instance segmentation methods can be categorized into segmentation-based methods and proposal-based methods. The former performs segmentation first and then does clustering, while the latter detects objects first and then predicts the mask for each object proposal. In this work, we propose a single-stage method, named EmbedMask, that unifies both methods by taking their advantages, so it can achieve good performance in instance segmentation and produce high-resolution masks in a high speed. EmbedMask introduces two newly defined embeddings for mask prediction, which are pixel embedding and proposal embedding. During training, we enforce the pixel embedding to be close to its coupled proposal embedding if they belong to the same instance. During inference, pixels are assigned to the mask of the proposal if their embeddings are similar. This mechanism brings several benefits. First, the pixel-level clustering enables EmbedMask to generate high-resolution masks and avoids the complicated two-stage mask prediction. Second, the existence of proposal embedding simplifies and strengthens the clustering procedure, so our method can achieve high speed and better performance than segmentation-based methods. Without any bell or whistle, EmbedMask outperforms the state-of-the-art instance segmentation method Mask R-CNN on the challenging COCO dataset, obtaining more detailed masks at a higher speed.


2020 ◽  
Vol 34 (05) ◽  
pp. 7407-7414
Author(s):  
Trapit Bansal ◽  
Pat Verga ◽  
Neha Choudhary ◽  
Andrew McCallum

Understanding the meaning of text often involves reasoning about entities and their relationships. This requires identifying textual mentions of entities, linking them to a canonical concept, and discerning their relationships. These tasks are nearly always viewed as separate components within a pipeline, each requiring a distinct model and training data. While relation extraction can often be trained with readily available weak or distant supervision, entity linkers typically require expensive mention-level supervision – which is not available in many domains. Instead, we propose a model which is trained to simultaneously produce entity linking and relation decisions while requiring no mention-level annotations. This approach avoids cascading errors that arise from pipelined methods and more accurately predicts entity relationships from text. We show that our model outperforms a state-of-the art entity linking and relation extraction pipeline on two biomedical datasets and can drastically improve the overall recall of the system.


2020 ◽  
Vol 10 (22) ◽  
pp. 7982
Author(s):  
Lorenzo Putzu ◽  
Giorgio Fumera

Cell nuclei segmentation is a challenging task, especially in real applications, when the target images significantly differ between them. This task is also challenging for methods based on convolutional neural networks (CNNs), which have recently boosted the performance of cell nuclei segmentation systems. However, when training data are scarce or not representative of deployment scenarios, they may suffer from overfitting to a different extent, and may hardly generalise to images that differ from the ones used for training. In this work, we focus on real-world, challenging application scenarios when no annotated images from a given dataset are available, or when few images (even unlabelled) of the same domain are available to perform domain adaptation. To simulate this scenario, we performed extensive cross-dataset experiments on several CNN-based state-of-the-art cell nuclei segmentation methods. Our results show that some of the existing CNN-based approaches are capable of generalising to target images which resemble the ones used for training. In contrast, their effectiveness considerably degrades when target and source significantly differ in colours and scale.


Author(s):  
Zhipeng Dong ◽  
Hongkun Tian ◽  
Xuefeng Bao ◽  
Yunhui Yan ◽  
Fei Chen

AbstractGrasp estimation is a fundamental technique crucial for robot manipulation tasks. In this work, we present a scene-oriented grasp estimation scheme taking constraints of the grasp pose imposed by the environment into consideration and training on samples satisfying the constraints. We formulate valid grasps for a parallel-jaw gripper as vectors in a two-dimensional (2D) image and detect them with a fully convolutional network that simultaneously estimates the vectors’ origins and directions. The detected vectors are then converted to 6 degree-of-freedom (6-DOF) grasps with a tailored strategy. As such, the network is able to detect multiple grasp candidates from a cluttered scene in one shot using only an RGB image as input. We evaluate our approach on the GraspNet-1Billion dataset and archived comparable performance as state-of-the-art while being efficient in runtime.


2019 ◽  
Vol 12 (2) ◽  
pp. 120-127 ◽  
Author(s):  
Wael Farag

Background: In this paper, a Convolutional Neural Network (CNN) to learn safe driving behavior and smooth steering manoeuvring, is proposed as an empowerment of autonomous driving technologies. The training data is collected from a front-facing camera and the steering commands issued by an experienced driver driving in traffic as well as urban roads. Methods: This data is then used to train the proposed CNN to facilitate what it is called “Behavioral Cloning”. The proposed Behavior Cloning CNN is named as “BCNet”, and its deep seventeen-layer architecture has been selected after extensive trials. The BCNet got trained using Adam’s optimization algorithm as a variant of the Stochastic Gradient Descent (SGD) technique. Results: The paper goes through the development and training process in details and shows the image processing pipeline harnessed in the development. Conclusion: The proposed approach proved successful in cloning the driving behavior embedded in the training data set after extensive simulations.


Electronics ◽  
2021 ◽  
Vol 10 (15) ◽  
pp. 1807
Author(s):  
Sascha Grollmisch ◽  
Estefanía Cano

Including unlabeled data in the training process of neural networks using Semi-Supervised Learning (SSL) has shown impressive results in the image domain, where state-of-the-art results were obtained with only a fraction of the labeled data. The commonality between recent SSL methods is that they strongly rely on the augmentation of unannotated data. This is vastly unexplored for audio data. In this work, SSL using the state-of-the-art FixMatch approach is evaluated on three audio classification tasks, including music, industrial sounds, and acoustic scenes. The performance of FixMatch is compared to Convolutional Neural Networks (CNN) trained from scratch, Transfer Learning, and SSL using the Mean Teacher approach. Additionally, a simple yet effective approach for selecting suitable augmentation methods for FixMatch is introduced. FixMatch with the proposed modifications always outperformed Mean Teacher and the CNNs trained from scratch. For the industrial sounds and music datasets, the CNN baseline performance using the full dataset was reached with less than 5% of the initial training data, demonstrating the potential of recent SSL methods for audio data. Transfer Learning outperformed FixMatch only for the most challenging dataset from acoustic scene classification, showing that there is still room for improvement.


2021 ◽  
Vol 11 (11) ◽  
pp. 4894
Author(s):  
Anna Scius-Bertrand ◽  
Michael Jungo ◽  
Beat Wolf ◽  
Andreas Fischer ◽  
Marc Bui

The current state of the art for automatic transcription of historical manuscripts is typically limited by the requirement of human-annotated learning samples, which are are necessary to train specific machine learning models for specific languages and scripts. Transcription alignment is a simpler task that aims to find a correspondence between text in the scanned image and its existing Unicode counterpart, a correspondence which can then be used as training data. The alignment task can be approached with heuristic methods dedicated to certain types of manuscripts, or with weakly trained systems reducing the required amount of annotations. In this article, we propose a novel learning-based alignment method based on fully convolutional object detection that does not require any human annotation at all. Instead, the object detection system is initially trained on synthetic printed pages using a font and then adapted to the real manuscripts by means of self-training. On a dataset of historical Vietnamese handwriting, we demonstrate the feasibility of annotation-free alignment as well as the positive impact of self-training on the character detection accuracy, reaching a detection accuracy of 96.4% with a YOLOv5m model without using any human annotation.


Sign in / Sign up

Export Citation Format

Share Document