Instance Sequence Queries for Video Instance Segmentation with Transformers

Zhujun Xu; Damien Vivet

doi:10.3390/s21134507

Instance Sequence Queries for Video Instance Segmentation with Transformers

Sensors ◽

10.3390/s21134507 ◽

2021 ◽

Vol 21 (13) ◽

pp. 4507

Author(s):

Zhujun Xu ◽

Damien Vivet

Keyword(s):

Data Association ◽

Video Clip ◽

Complex Data ◽

Training Procedure ◽

Post Processing ◽

Bipartite Matching ◽

Frame Method ◽

Memory Constraints ◽

Instance Segmentation

Existing methods for video instance segmentation (VIS) mostly rely on two strategies: (1) building a sophisticated post-processing to associate frame level segmentation results and (2) modeling a video clip as a 3D spatial-temporal volume with a limit of resolution and length due to memory constraints. In this work, we propose a frame-to-frame method built upon transformers. We use a set of queries, called instance sequence queries (ISQs), to drive the transformer decoder and produce results at each frame. Each query represents one instance in a video clip. By extending the bipartite matching loss to two frames, our training procedure enables the decoder to adjust the ISQs during inference. The consistency of instances is preserved by the corresponding order between query slots and network outputs. As a result, there is no need for complex data association. On TITAN Xp GPU, our method achieves a competitive 34.4% mAP at 33.5 FPS with ResNet-50 and 35.5% mAP at 26.6 FPS with ResNet-101 on the Youtube-VIS dataset.

Download Full-text

A Two-Stage Data Association Approach for 3D Multi-Object Tracking

Sensors ◽

10.3390/s21092894 ◽

2021 ◽

Vol 21 (9) ◽

pp. 2894

Author(s):

Minh-Quan Dao ◽

Vincent Frémont

Keyword(s):

Object Detection ◽

Object Tracking ◽

Moving Objects ◽

Data Association ◽

Autonomous Driving ◽

Tracking Accuracy ◽

Two Stage ◽

Bipartite Matching ◽

3D Object ◽

3D Object Detection

Multi-Object Tracking (MOT) is an integral part of any autonomous driving pipelines because it produces trajectories of other moving objects in the scene and predicts their future motion. Thanks to the recent advances in 3D object detection enabled by deep learning, track-by-detection has become the dominant paradigm in 3D MOT. In this paradigm, a MOT system is essentially made of an object detector and a data association algorithm which establishes track-to-detection correspondence. While 3D object detection has been actively researched, association algorithms for 3D MOT has settled at bipartite matching formulated as a Linear Assignment Problem (LAP) and solved by the Hungarian algorithm. In this paper, we adapt a two-stage data association method which was successfully applied to image-based tracking to the 3D setting, thus providing an alternative for data association for 3D MOT. Our method outperforms the baseline using one-stage bipartite matching for data association by achieving 0.587 Average Multi-Object Tracking Accuracy (AMOTA) in NuScenes validation set and 0.365 AMOTA (at level 2) in Waymo test set.

Download Full-text

Real-Time Instance Segmentation of Traffic Videos for Embedded Devices

Sensors ◽

10.3390/s21010275 ◽

2021 ◽

Vol 21 (1) ◽

pp. 275

Author(s):

Ruben Panero Martinez ◽

Ionut Schiopu ◽

Bruno Cornelis ◽

Adrian Munteanu

Keyword(s):

Real Time ◽

Network Architecture ◽

Training Procedure ◽

Segmentation Method ◽

Embedded Devices ◽

Network Training ◽

Assignment Algorithm ◽

Ablation Study ◽

Reduced Rate ◽

Instance Segmentation

The paper proposes a novel instance segmentation method for traffic videos devised for deployment on real-time embedded devices. A novel neural network architecture is proposed using a multi-resolution feature extraction backbone and improved network designs for the object detection and instance segmentation branches. A novel post-processing method is introduced to ensure a reduced rate of false detection by evaluating the quality of the output masks. An improved network training procedure is proposed based on a novel label assignment algorithm. An ablation study on speed-vs.-performance trade-off further modifies the two branches and replaces the conventional ResNet-based performance-oriented backbone with a lightweight speed-oriented design. The proposed architectural variations achieve real-time performance when deployed on embedded devices. The experimental results demonstrate that the proposed instance segmentation method for traffic videos outperforms the you only look at coefficients algorithm, the state-of-the-art real-time instance segmentation method. The proposed architecture achieves qualitative results with 31.57 average precision on the COCO dataset, while its speed-oriented variations achieve speeds of up to 66.25 frames per second on the Jetson AGX Xavier module.

Download Full-text

OpSeF: Open source Python framework for collaborative instance segmentation of bioimages

10.1101/2020.04.29.068023 ◽

2020 ◽

Cited By ~ 2

Author(s):

Tobias M. Rasse ◽

Réka Hollandi ◽

Péter Horváth

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Complex Analysis ◽

Ease Of Use ◽

Problem Definition ◽

Training Data ◽

Post Processing ◽

Gpu Clusters ◽

User Tasks ◽

Instance Segmentation

AbstractVarious pre-trained deep learning models for the segmentation of bioimages have been made available as ‘developer-to-end-user’ solutions. They usually require neither knowledge of machine learning nor coding skills, are optimized for ease of use, and deployability on laptops. However, testing these tools individually is tedious and success is uncertain.Here, we present the ‘Op’en ‘Se’gmentation ‘F’ramework (OpSeF), a Python framework for deep learning-based instance segmentation. OpSeF aims at facilitating the collaboration of biomedical users with experienced image analysts. It builds on the analysts’ knowledge in Python, machine learning, and workflow design to solve complex analysis tasks at any scale in a reproducible, well-documented way. OpSeF defines standard inputs and outputs, thereby facilitating modular workflow design and interoperability with other software. Users play an important role in problem definition, quality control, and manual refinement of results. All analyst tasks are optimized for deployment on Linux workstations or GPU clusters, all user tasks may be performed on any laptop in ImageJ.OpSeF semi-automates preprocessing, convolutional neural network (CNN)-based segmentation in 2D or 3D, and post-processing. It facilitates benchmarking of multiple models in parallel. OpSeF streamlines the optimization of parameters for pre- and post-processing such, that an available model may frequently be used without retraining. Even if sufficiently good results are not achievable with this approach, intermediate results can inform the analysts in the selection of the most promising CNN-architecture in which the biomedical user might invest the effort of manually labeling training data.We provide Jupyter notebooks that document sample workflows based on various image collections. Analysts may find these notebooks useful to illustrate common segmentation challenges, as they prepare the advanced user for gradually taking over some of their tasks and completing their projects independently. The notebooks may also be used to explore the analysis options available within OpSeF in an interactive way and to document and share final workflows.Currently, three mechanistically distinct CNN-based segmentation methods, the U-Net implementation used in Cellprofiler 3.0, StarDist, and Cellpose have been integrated within OpSeF. The addition of new networks requires little, the addition of new models requires no coding skills. Thus, OpSeF might soon become both an interactive model repository, in which pre-trained models might be shared, evaluated, and reused with ease.

Download Full-text

Automatic instance segmentation of mitochondria in electron microscopy data

10.1101/2021.05.24.444785 ◽

2021 ◽

Author(s):

Luke Nightingale ◽

Joost de Folter ◽

Helen Spiers ◽

Amy Strange ◽

Lucy M Collinson ◽

...

Keyword(s):

Electron Microscopy ◽

Large Scale ◽

Machine Learning Algorithms ◽

Public Repository ◽

Post Processing ◽

Human Cortex ◽

Electron Microscopy Data ◽

Processing Procedure ◽

Microscopy Data ◽

Instance Segmentation

We present a new method for rapid, automated, large-scale 3D mitochondria instance segmentation, developed in response to the ISBI 2021 MitoEM Challenge. In brief, we trained separate machine learning algorithms to predict (1) mitochondria areas and (2) mitochondria boundaries in image volumes acquired from both rat and human cortex with multi-beam scanning electron microscopy. The predictions from these algorithms were combined in a multi-step post-processing procedure, that resulted in high semantic and instance segmentation performance. All code is provided via a public repository.

Download Full-text

Instance Segmentation Enabled Hybrid Data Association and Discriminative Hashing for Online Multi-Object Tracking

IEEE Transactions on Multimedia ◽

10.1109/tmm.2018.2885922 ◽

2019 ◽

Vol 21 (7) ◽

pp. 1709-1723 ◽

Cited By ~ 1

Author(s):

Peng Dai ◽

Xue Wang ◽

Weihang Zhang ◽

Junfeng Chen

Keyword(s):

Object Tracking ◽

Data Association ◽

Hybrid Data ◽

Instance Segmentation

Download Full-text

DETECTION OF FRAME DUPLICATION FORGERY IN VIDEOS BASED ON SPATIAL AND TEMPORAL ANALYSIS

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001412500176 ◽

2012 ◽

Vol 26 (07) ◽

pp. 1250017 ◽

Cited By ~ 20

Author(s):

GUO-SHIANG LIN ◽

JIE-FAN CHANG

Keyword(s):

Video Clip ◽

Color Space ◽

Computation Time ◽

Temporal Analysis ◽

Processing Technique ◽

Post Processing ◽

Spatial And Temporal Analysis ◽

Block Based ◽

Measurement Frame ◽

Frame Duplication

In this paper, we present a passive-blind scheme for detection of frame duplication forgery in videos. The scheme is a coarse-to-fine approach that is implemented in four stages: candidate segment selection, spatial similarity measurement, frame duplication classification, and post-processing. To screen and select duplicated candidates in the temporal domain, the histogram difference of two adjacent frames in the RGB color space is adopted as a feature. Then, to evaluate the similarity of two images, we use a block-based algorithm to measure the spatial correlation between the candidate segment and the corresponding frame in the query template. Based on the results of spatial and temporal analysis, we construct a classifier to detect duplicated clips. In addition, to deal with the partial detection problem, we develop a post-processing technique that examines and merges two adjacent detected candidates into a complete duplicated video clip. Our experiment results demonstrate that the proposed scheme can not only achieve detection of frame duplication forgery but also accurately detect and localize duplicated clips in different kinds of videos. The results also show that the scheme outperforms an existing method in terms of precision, recall, accuracy, and computation time.

Download Full-text

Deep Learning-Based Instance Segmentation Method of Litchi Canopy from UAV-Acquired Images

Remote Sensing ◽

10.3390/rs13193919 ◽

2021 ◽

Vol 13 (19) ◽

pp. 3919

Author(s):

Jiawei Mo ◽

Yubin Lan ◽

Dongzi Yang ◽

Fei Wen ◽

Hongbin Qiu ◽

...

Keyword(s):

Deep Learning ◽

Image Annotation ◽

Strong Dependence ◽

Training Data ◽

Complex Data ◽

Segmentation Method ◽

Proposed Model ◽

Tree Canopies ◽

Digital Orthophoto ◽

Instance Segmentation

Instance segmentation of fruit tree canopies from images acquired by unmanned aerial vehicles (UAVs) is of significance for the precise management of orchards. Although deep learning methods have been widely used in the fields of feature extraction and classification, there are still phenomena of complex data and strong dependence on software performances. This paper proposes a deep learning-based instance segmentation method of litchi trees, which has a simple structure and lower requirements for data form. Considering that deep learning models require a large amount of training data, a labor-friendly semi-auto method for image annotation is introduced. The introduction of this method allows for a significant improvement in the efficiency of data pre-processing. Facing the high requirement of a deep learning method for computing resources, a partition-based method is presented for the segmentation of high-resolution digital orthophoto maps (DOMs). Citrus data is added to the training set to alleviate the lack of diversity of the original litchi dataset. The average precision (AP) is selected to evaluate the metric of the proposed model. The results show that with the help of training with the litchi-citrus datasets, the best AP on the test set reaches 96.25%.

Download Full-text

Linear and nonlinear post-processing of numerically forecasted surface temperature

Nonlinear Processes in Geophysics ◽

10.5194/npg-10-373-2003 ◽

2003 ◽

Vol 10 (4/5) ◽

pp. 373-383 ◽

Cited By ~ 12

Author(s):

M. Casaioli ◽

R. Mantovani ◽

F. Proietti Scorzoni ◽

S. Puca ◽

A. Speranza ◽

...

Keyword(s):

Error Function ◽

Training Procedure ◽

Post Processing ◽

Absolute Minimum ◽

Weather Forecasts ◽

Network Training ◽

Air Temperatures ◽

Weather Stations ◽

Processing Techniques ◽

Annealing Method

Abstract. In this paper we test different approaches to the statistical post-processing of gridded numerical surface air temperatures (provided by the European Centre for Medium-Range Weather Forecasts) onto the temperature measured at surface weather stations located in the Italian region of Puglia. We consider simple post-processing techniques, like correction for altitude, linear regression from different input parameters and Kalman filtering, as well as a neural network training procedure, stabilised (i.e. driven into the absolute minimum of the error function over the learning set) by means of a Simulated Annealing method. A comparative analysis of the results shows that the performance with neural networks is the best. It is encouraging for systematic use in meteorological forecast-analysis service operations.

Download Full-text

Improved Multi Target Tracking in MIMO Radar System Using New Hybrid Monte Carlo–PDAF Algorithm

10.5772/intechopen.95948 ◽

2021 ◽

Author(s):

Khaireddine Zarai ◽

Adnan Cherif

Keyword(s):

Monte Carlo ◽

State Estimation ◽

Target Tracking ◽

Data Association ◽

Mimo Radar ◽

Radar System ◽

Complex Data ◽

Interference Phenomenon ◽

Experimental Database ◽

Multi Target Tracking

This article deals with the multi-target tracking problem (MTT) in MIMO radar systems. As a result, this problem is now seen as a new technological challenge. Thus, in different tracking scenarios, measurements from sensors are usually subject to a complex data association issue. The MTT data association problem of assigning measurements-to-target or target-state-estimates becomes more complex in MIMO radar system, once the crossing target tracking scenario arises, hence the interference phenomenon may interrupt the received signal and miss the state estimation process. To avoid most of these problems, we have improved a new hybrid algorithm based on particle filter called “Monte Carlo” associated to Joint Probabilistic data Association filter (JPDAF), the whole approach named MC-JPDAF algorithm has been proposed to replace the traditional method as is known by the Extended KALMAN filter (EKF) combined with JPDAF method, such as EKF-JPDAF algorithm. The obtained experimental results showed a challenging remediation. Where, the MC-JPDAF converges towards the accurate state estimation. Thus, more efficient than EKF-JPDAF. The simulation results prove that the designed system meets the objectives set for MC-JPDA by referring to an experimental database using the MATLAB Software Development Framework.

Download Full-text

A Super-pixel based Method for Instance Segmentation Post-processing

2020 13th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) ◽

10.1109/cisp-bmei51763.2020.9263652 ◽

2020 ◽

Author(s):

Yao Li ◽

Lizhuang Ma

Keyword(s):

Post Processing ◽

Instance Segmentation

Download Full-text