Structure from Articulated Motion: Accurate and Stable Monocular 3D Reconstruction without Training Data

Onorina Kovalenko; Vladislav Golyanik; Jameel Malik; Ahmed Elhayek;  Stricker

doi:10.3390/s19204603

Structure from Articulated Motion: Accurate and Stable Monocular 3D Reconstruction without Training Data

Sensors ◽

10.3390/s19204603 ◽

2019 ◽

Vol 19 (20) ◽

pp. 4603 ◽

Cited By ~ 7

Author(s):

Onorina Kovalenko ◽

Vladislav Golyanik ◽

Jameel Malik ◽

Ahmed Elhayek ◽

Stricker

Keyword(s):

State Of The Art ◽

3D Structure ◽

General Purpose ◽

Human Motion ◽

Training Data ◽

Optimization Strategy ◽

Model Based ◽

Lower Accuracy ◽

Articulated Motion ◽

Data Collections

Recovery of articulated 3D structure from 2D observations is a challenging computer vision problem with many applications. Current learning-based approaches achieve state-of-the-art accuracy on public benchmarks but are restricted to specific types of objects and motions covered by the training datasets. Model-based approaches do not rely on training data but show lower accuracy on these datasets. In this paper, we introduce a model-based method called Structure from Articulated Motion (SfAM), which can recover multiple object and motion types without training on extensive data collections. At the same time, it performs on par with learning-based state-of-the-art approaches on public benchmarks and outperforms previous non-rigid structure from motion (NRSfM) methods. SfAM is built upon a general-purpose NRSfM technique while integrating a soft spatio-temporal constraint on the bone lengths. We use alternating optimization strategy to recover optimal geometry (i.e., bone proportions) together with 3D joint positions by enforcing the bone lengths consistency over a series of frames. SfAM is highly robust to noisy 2D annotations, generalizes to arbitrary objects and does not rely on training data, which is shown in extensive experiments on public benchmarks and real video sequences. We believe that it brings a new perspective on the domain of monocular 3D recovery of articulated structures, including human motion capture.

Download Full-text

Unsupervised Deep Learning via Affinity Diffusion

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6757 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11029-11036

Author(s):

Jiabo Huang ◽

Qi Dong ◽

Shaogang Gong ◽

Xiatian Zhu

Keyword(s):

Deep Learning ◽

State Of The Art ◽

General Purpose ◽

Training Data ◽

Learning Approach ◽

Model Learning ◽

Feature Representations ◽

Discriminative Feature ◽

Training Samples ◽

Unsupervised Deep Learning

Convolutional neural networks (CNNs) have achieved unprecedented success in a variety of computer vision tasks. However, they usually rely on supervised model learning with the need for massive labelled training data, limiting dramatically their usability and deployability in real-world scenarios without any labelling budget. In this work, we introduce a general-purpose unsupervised deep learning approach to deriving discriminative feature representations. It is based on self-discovering semantically consistent groups of unlabelled training samples with the same class concepts through a progressive affinity diffusion process. Extensive experiments on object image classification and clustering show the performance superiority of the proposed method over the state-of-the-art unsupervised learning models using six common image recognition benchmarks including MNIST, SVHN, STL10, CIFAR10, CIFAR100 and ImageNet.

Download Full-text

Robust Mouse Tracking in Complex Environments using Neural Networks

10.1101/336685 ◽

2018 ◽

Author(s):

Brian Q. Geuther ◽

Sean P. Deats ◽

Kai J. Fox ◽

Steve A. Murray ◽

Robert E. Braun ◽

...

Keyword(s):

Neural Network ◽

Environmental Conditions ◽

State Of The Art ◽

High Accuracy ◽

General Purpose ◽

Training Data ◽

Network Architectures ◽

Complex Environments ◽

Behavioral Experiments ◽

Modern Machine

AbstractThe ability to track animals accurately is critical for behavioral experiments. For video-based assays, this is often accomplished by manipulating environmental conditions to increase contrast between the animal and the background, in order to achieve proper foreground/background detection (segmentation). However, as behavioral paradigms become more sophisticated with ethologically relevant environments, the approach of modifying environmental conditions offers diminishing returns, particularly for scalable experiments. Currently, there is a need for methods to monitor behaviors over long periods of time, under dynamic environmental conditions, and in animals that are genetically and behaviorally heterogeneous. To address this need, we developed a state-of-the-art neural network-based tracker for mice, using modern machine vision techniques. We test three different neural network architectures to determine their performance on genetically diverse mice under varying environmental conditions. We find that an encoder-decoder segmentation neural network achieves high accuracy and speed with minimal training data. Furthermore, we provide a labeling interface, labeled training data, tuned hyperparameters, and a pre-trained network for the mouse behavior and neuroscience communities. This general-purpose neural network tracker can be easily extended to other experimental paradigms and even to other animals, through transfer learning, thus providing a robust, generalizable solution for biobehavioral research.

Download Full-text

ADAPTIVE SEGMENTATION APPROACH FOR HUMAN ACTION DATA

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s021800141455012x ◽

2014 ◽

Vol 28 (08) ◽

pp. 1455012

Author(s):

CHUNMING GAO ◽

CHANGHUI LI ◽

GUANGHUA TAN ◽

SONGRUI GUO ◽

KE XIAO

Keyword(s):

State Of The Art ◽

Synthetic Data ◽

Human Action ◽

Human Motion ◽

Temporal Segmentation ◽

Cluster Number ◽

Maximum Mean Discrepancy ◽

Motion Data ◽

Articulated Motion ◽

Segmentation Approach

Temporal segmentation of human motion data is an essential preparation process for action recognition. Due to the variability in the temporal scale of human action and the complexity of representing articulated motion, the research of it encounters many difficulties. Especially, when the number of behaviors contained in the motion sequences is unknown in advance, traditional algorithms cannot segment sequences successfully. In this paper, we extend previous works on change-points detection by probabilistic principle component analysis (PPCA). Based on it, an algorithm which is an extension of PCA and Maximum Mean Discrepancy between samples is proposed for estimating the cluster number. Finally, we optimize our approach and detect cyclic units of each action by aligned cluster analysis. We evaluate and compare the approach with the state-of-the-art methods on Synthetic data, Motion Capture Dataset and Kinect data. Experimental results demonstrate the effectiveness of our approach.

Download Full-text

Risk-Aware Model-Based Control

Frontiers in Robotics and AI ◽

10.3389/frobt.2021.617839 ◽

2021 ◽

Vol 8 ◽

Author(s):

Chen Yu ◽

Andre Rosendo

Keyword(s):

Reinforcement Learning ◽

Value At Risk ◽

Learning Algorithm ◽

State Of The Art ◽

Training Data ◽

Conditional Value At Risk ◽

High Dimensional ◽

Model Based Control ◽

Model Based ◽

Model Free

Model-Based Reinforcement Learning (MBRL) algorithms have been shown to have an advantage on data-efficiency, but often overshadowed by state-of-the-art model-free methods in performance, especially when facing high-dimensional and complex problems. In this work, a novel MBRL method is proposed, called Risk-Aware Model-Based Control (RAMCO). It combines uncertainty-aware deep dynamics models and the risk assessment technique Conditional Value at Risk (CVaR). This mechanism is appropriate for real-world application since it takes epistemic risk into consideration. In addition, we use a model-free solver to produce warm-up training data, and this setting improves the performance in low-dimensional environments and covers the shortage of MBRL’s nature in the high-dimensional scenarios. In comparison with other state-of-the-art reinforcement learning algorithms, we show that it produces superior results on a walking robot model. We also evaluate the method with an Eidos environment, which is a novel experimental method with multi-dimensional randomly initialized deep neural networks to measure the performance of any reinforcement learning algorithm, and the advantages of RAMCO are highlighted.

Download Full-text

Global optimization of an encapsulated Si/SiO$$_2$$ L3 cavity with a 43 million quality factor

Scientific Reports ◽

10.1038/s41598-021-89410-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

J. P. Vasco ◽

V. Savona

Keyword(s):

Global Optimization ◽

Photonic Crystal ◽

Quality Factor ◽

State Of The Art ◽

Optimization Strategy ◽

High Quality ◽

Optimal Value ◽

Out Of Plane ◽

Fabrication Tolerances ◽

Structural Imperfections

AbstractWe optimize a silica-encapsulated silicon L3 photonic crystal cavity for ultra-high quality factor by means of a global optimization strategy, where the closest holes surrounding the cavity are varied to minimize out-of-plane losses. We find an optimal value of $$Q_c=4.33\times 10^7$$ Q c = 4.33 × 10 7 , which is predicted to be in the 2 million regime in presence of structural imperfections compatible with state-of-the-art silicon fabrication tolerances.

Download Full-text

Improving Semi-Supervised Learning for Audio Classification with FixMatch

Electronics ◽

10.3390/electronics10151807 ◽

2021 ◽

Vol 10 (15) ◽

pp. 1807

Author(s):

Sascha Grollmisch ◽

Estefanía Cano

Keyword(s):

Neural Networks ◽

Supervised Learning ◽

Transfer Learning ◽

Data Transfer ◽

State Of The Art ◽

Training Data ◽

Audio Classification ◽

Image Domain ◽

Full Dataset ◽

Audio Data

Including unlabeled data in the training process of neural networks using Semi-Supervised Learning (SSL) has shown impressive results in the image domain, where state-of-the-art results were obtained with only a fraction of the labeled data. The commonality between recent SSL methods is that they strongly rely on the augmentation of unannotated data. This is vastly unexplored for audio data. In this work, SSL using the state-of-the-art FixMatch approach is evaluated on three audio classification tasks, including music, industrial sounds, and acoustic scenes. The performance of FixMatch is compared to Convolutional Neural Networks (CNN) trained from scratch, Transfer Learning, and SSL using the Mean Teacher approach. Additionally, a simple yet effective approach for selecting suitable augmentation methods for FixMatch is introduced. FixMatch with the proposed modifications always outperformed Mean Teacher and the CNNs trained from scratch. For the industrial sounds and music datasets, the CNN baseline performance using the full dataset was reached with less than 5% of the initial training data, demonstrating the potential of recent SSL methods for audio data. Transfer Learning outperformed FixMatch only for the most challenging dataset from acoustic scene classification, showing that there is still room for improvement.

Download Full-text

Transcription Alignment of Historical Vietnamese Manuscripts without Human-Annotated Learning Samples

Applied Sciences ◽

10.3390/app11114894 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4894

Author(s):

Anna Scius-Bertrand ◽

Michael Jungo ◽

Beat Wolf ◽

Andreas Fischer ◽

Marc Bui

Keyword(s):

Object Detection ◽

State Of The Art ◽

Positive Impact ◽

Detection System ◽

Training Data ◽

Detection Accuracy ◽

Current State ◽

Alignment Task ◽

Scanned Image ◽

Automatic Transcription

The current state of the art for automatic transcription of historical manuscripts is typically limited by the requirement of human-annotated learning samples, which are are necessary to train specific machine learning models for specific languages and scripts. Transcription alignment is a simpler task that aims to find a correspondence between text in the scanned image and its existing Unicode counterpart, a correspondence which can then be used as training data. The alignment task can be approached with heuristic methods dedicated to certain types of manuscripts, or with weakly trained systems reducing the required amount of annotations. In this article, we propose a novel learning-based alignment method based on fully convolutional object detection that does not require any human annotation at all. Instead, the object detection system is initially trained on synthetic printed pages using a font and then adapted to the real manuscripts by means of self-training. On a dataset of historical Vietnamese handwriting, we demonstrate the feasibility of annotation-free alignment as well as the positive impact of self-training on the character detection accuracy, reaching a detection accuracy of 96.4% with a YOLOv5m model without using any human annotation.

Download Full-text

New polyp image classification technique using transfer learning of network-in-network structure in endoscopic images

Scientific Reports ◽

10.1038/s41598-021-83199-9 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Young Jae Kim ◽

Jang Pyo Bae ◽

Jun-Won Chung ◽

Dong Kyun Park ◽

Kwang Gi Kim ◽

...

Keyword(s):

Colorectal Cancer ◽

Transfer Learning ◽

Test Data ◽

State Of The Art ◽

Early Stage ◽

Statistical Significance ◽

Recall Rate ◽

Training Data ◽

Fine Tuning ◽

Accuracy Evaluation

AbstractWhile colorectal cancer is known to occur in the gastrointestinal tract. It is the third most common form of cancer of 27 major types of cancer in South Korea and worldwide. Colorectal polyps are known to increase the potential of developing colorectal cancer. Detected polyps need to be resected to reduce the risk of developing cancer. This research improved the performance of polyp classification through the fine-tuning of Network-in-Network (NIN) after applying a pre-trained model of the ImageNet database. Random shuffling is performed 20 times on 1000 colonoscopy images. Each set of data are divided into 800 images of training data and 200 images of test data. An accuracy evaluation is performed on 200 images of test data in 20 experiments. Three compared methods were constructed from AlexNet by transferring the weights trained by three different state-of-the-art databases. A normal AlexNet based method without transfer learning was also compared. The accuracy of the proposed method was higher in statistical significance than the accuracy of four other state-of-the-art methods, and showed an 18.9% improvement over the normal AlexNet based method. The area under the curve was approximately 0.930 ± 0.020, and the recall rate was 0.929 ± 0.029. An automatic algorithm can assist endoscopists in identifying polyps that are adenomatous by considering a high recall rate and accuracy. This system can enable the timely resection of polyps at an early stage.

Download Full-text

A Probabilistic Model for Real-Time Semantic Prediction of Human Motion Intentions from RGBD-Data

Sensors ◽

10.3390/s21124141 ◽

2021 ◽

Vol 21 (12) ◽

pp. 4141

Author(s):

Wouter Houtman ◽

Gosse Bijlenga ◽

Elena Torta ◽

René van de Molengraft

Keyword(s):

Real Time ◽

Collision Avoidance ◽

Probabilistic Model ◽

Real Life ◽

Human Motion ◽

Model Based ◽

Multiple Hypotheses ◽

Navigation Algorithms ◽

The Right ◽

Motion Behavior

For robots to execute their navigation tasks both fast and safely in the presence of humans, it is necessary to make predictions about the route those humans intend to follow. Within this work, a model-based method is proposed that relates human motion behavior perceived from RGBD input to the constraints imposed by the environment by considering typical human routing alternatives. Multiple hypotheses about routing options of a human towards local semantic goal locations are created and validated, including explicit collision avoidance routes. It is demonstrated, with real-time, real-life experiments, that a coarse discretization based on the semantics of the environment suffices to make a proper distinction between a person going, for example, to the left or the right on an intersection. As such, a scalable and explainable solution is presented, which is suitable for incorporation within navigation algorithms.

Download Full-text

BeautyNet: Joint Multiscale CNN and Transfer Learning Method for Unconstrained Facial Beauty Prediction

Computational Intelligence and Neuroscience ◽

10.1155/2019/1910624 ◽

2019 ◽

Vol 2019 ◽

pp. 1-14 ◽

Cited By ~ 4

Author(s):

Yikui Zhai ◽

He Cao ◽

Wenbo Deng ◽

Junying Gan ◽

Vincenzo Piuri ◽

...

Keyword(s):

Transfer Learning ◽

Classification Accuracy ◽

Learning Strategy ◽

State Of The Art ◽

Activation Function ◽

Training Data ◽

Fine Grained ◽

Pattern Recognition Problem ◽

Face Features ◽

Facial Beauty

Because of the lack of discriminative face representations and scarcity of labeled training data, facial beauty prediction (FBP), which aims at assessing facial attractiveness automatically, has become a challenging pattern recognition problem. Inspired by recent promising work on fine-grained image classification using the multiscale architecture to extend the diversity of deep features, BeautyNet for unconstrained facial beauty prediction is proposed in this paper. Firstly, a multiscale network is adopted to improve the discriminative of face features. Secondly, to alleviate the computational burden of the multiscale architecture, MFM (max-feature-map) is utilized as an activation function which can not only lighten the network and speed network convergence but also benefit the performance. Finally, transfer learning strategy is introduced here to mitigate the overfitting phenomenon which is caused by the scarcity of labeled facial beauty samples and improves the proposed BeautyNet’s performance. Extensive experiments performed on LSFBD demonstrate that the proposed scheme outperforms the state-of-the-art methods, which can achieve 67.48% classification accuracy.

Download Full-text