Active learning with uncertainty sampling for large scale activity recognition in smart homes

We report a new computational technique, PathFinder, that uses retrosynthetic analysis followed by combinatorial synthesis to generate novel compounds in synthetically accessible chemical space. Coupling PathFinder with active learning and cloud-based free energy calculations allows for large-scale potency predictions of compounds on a timescale that impacts drug discovery. The process is further accelerated by using a combination of population-based statistics and active learning techniques. Using this approach, we rapidly optimized R-groups and core hops for inhibitors of cyclin-dependent kinase 2. We explored greater than 300 thousand ideas and identified 35 ligands with diverse commercially available R-groups and a predicted IC50 < 100 nM, and four unique cores with a predicted IC50 < 100 nM. The rapid turnaround time, and scale of chemical exploration, suggests that this is a useful approach to accelerate the discovery of novel chemical matter in drug discovery campaigns.

Download Full-text

Reaction-based Enumeration, Active Learning, and Free Energy Calculations to Rapidly Explore Synthetically Tractable Chemical Space and Optimize Potency of Cyclin Dependent Kinase 2 Inhibitors

10.26434/chemrxiv.7841270 ◽

2019 ◽

Author(s):

Kyle Konze ◽

Pieter Bos ◽

Markus Dahlgren ◽

Karl Leswing ◽

Ivan Tubert-Brohman ◽

...

Keyword(s):

Free Energy ◽

Drug Discovery ◽

Active Learning ◽

Large Scale ◽

Chemical Space ◽

Population Based ◽

Free Energy Calculations ◽

Computational Technique ◽

Cyclin Dependent Kinase ◽

Energy Calculations

We report a new computational technique, PathFinder, that uses retrosynthetic analysis followed by combinatorial synthesis to generate novel compounds in synthetically accessible chemical space. Coupling PathFinder with active learning and cloud-based free energy calculations allows for large-scale potency predictions of compounds on a timescale that impacts drug discovery. The process is further accelerated by using a combination of population-based statistics and active learning techniques. Using this approach, we rapidly optimized R-groups and core hops for inhibitors of cyclin-dependent kinase 2. We explored greater than 300 thousand ideas and identified 35 ligands with diverse commercially available R-groups and a predicted IC50 < 100 nM, and four unique cores with a predicted IC50 < 100 nM. The rapid turnaround time, and scale of chemical exploration, suggests that this is a useful approach to accelerate the discovery of novel chemical matter in drug discovery campaigns.

Download Full-text

How to measure uncertainty in uncertainty sampling for active learning

Machine Learning ◽

10.1007/s10994-021-06003-9 ◽

2021 ◽

Author(s):

Vu-Linh Nguyen ◽

Mohammad Hossein Shaker ◽

Eyke Hüllermeier

Keyword(s):

Machine Learning ◽

Active Learning ◽

Sampling Strategies ◽

Total Uncertainty ◽

Uncertainty Sampling ◽

Different Types ◽

Alternative Approaches ◽

Active Learner ◽

Probabilistic Nature ◽

Different Sources

AbstractVarious strategies for active learning have been proposed in the machine learning literature. In uncertainty sampling, which is among the most popular approaches, the active learner sequentially queries the label of those instances for which its current prediction is maximally uncertain. The predictions as well as the measures used to quantify the degree of uncertainty, such as entropy, are traditionally of a probabilistic nature. Yet, alternative approaches to capturing uncertainty in machine learning, alongside with corresponding uncertainty measures, have been proposed in recent years. In particular, some of these measures seek to distinguish different sources and to separate different types of uncertainty, such as the reducible (epistemic) and the irreducible (aleatoric) part of the total uncertainty in a prediction. The goal of this paper is to elaborate on the usefulness of such measures for uncertainty sampling, and to compare their performance in active learning. To this end, we instantiate uncertainty sampling with different measures, analyze the properties of the sampling strategies thus obtained, and compare them in an experimental study.

Download Full-text

A distributable event-oriented architecture for activity recognition in smart homes

Journal of Reliable Intelligent Environments ◽

10.1007/s40860-020-00125-y ◽

2021 ◽

Author(s):

Cédric Demongivert ◽

Kévin Bouchard ◽

Sébastien Gaboury ◽

Bruno Bouchard ◽

Maxime Lussier ◽

...

Keyword(s):

Activity Recognition ◽

Smart Homes

Download Full-text

Gravity Control-Based Data Augmentation Technique for Improving VR User Activity Recognition

Symmetry ◽

10.3390/sym13050845 ◽

2021 ◽

Vol 13 (5) ◽

pp. 845

Author(s):

Dongheun Han ◽

Chulwoo Lee ◽

Hyeongyeop Kang

Keyword(s):

Activity Recognition ◽

Large Scale ◽

Data Augmentation ◽

Training Data ◽

Measurement Unit ◽

Gravitational Acceleration ◽

The Neural Network ◽

Typical Data ◽

Robust Recognition ◽

Gravity Acceleration

The neural-network-based human activity recognition (HAR) technique is being increasingly used for activity recognition in virtual reality (VR) users. The major issue of a such technique is the collection large-scale training datasets which are key for deriving a robust recognition model. However, collecting large-scale data is a costly and time-consuming process. Furthermore, increasing the number of activities to be classified will require a much larger number of training datasets. Since training the model with a sparse dataset can only provide limited features to recognition models, it can cause problems such as overfitting and suboptimal results. In this paper, we present a data augmentation technique named gravity control-based augmentation (GCDA) to alleviate the sparse data problem by generating new training data based on the existing data. The benefits of the symmetrical structure of the data are that it increased the number of data while preserving the properties of the data. The core concept of GCDA is two-fold: (1) decomposing the acceleration data obtained from the inertial measurement unit (IMU) into zero-gravity acceleration and gravitational acceleration, and augmenting them separately, and (2) exploiting gravity as a directional feature and controlling it to augment training datasets. Through the comparative evaluations, we validated that the application of GCDA to training datasets showed a larger improvement in classification accuracy (96.39%) compared to the typical data augmentation methods (92.29%) applied and those that did not apply the augmentation method (85.21%).

Download Full-text

Active learning and relevance vector machine in efficient estimate of basin stability for large-scale dynamic networks

Chaos An Interdisciplinary Journal of Nonlinear Science ◽

10.1063/5.0044899 ◽

2021 ◽

Vol 31 (5) ◽

pp. 053129

Author(s):

Yiming Che ◽

Changqing Cheng

Keyword(s):

Active Learning ◽

Large Scale ◽

Dynamic Networks ◽

Relevance Vector Machine ◽

Efficient Estimate ◽

Basin Stability

Download Full-text

From Activity Recognition to Intention Recognition for Assisted Living Within Smart Homes

IEEE Transactions on Human-Machine Systems ◽

10.1109/thms.2016.2641388 ◽

2017 ◽

Vol 47 (3) ◽

pp. 368-379 ◽

Cited By ~ 60

Author(s):

Joseph Rafferty ◽

Chris D. Nugent ◽

Jun Liu ◽

Liming Chen

Keyword(s):

Activity Recognition ◽

Assisted Living ◽

Smart Homes ◽

Intention Recognition

Download Full-text

Bayesian active learning of interatomic force field for molecular dynamics simulation of Pt/Ag(111)

10.26434/chemrxiv-2021-sk6lf-v2 ◽

2021 ◽

Author(s):

Kai Xu ◽

Lei Yan ◽

Bingran You

Keyword(s):

Molecular Dynamics ◽

Active Learning ◽

Force Field ◽

Density Functional ◽

Process Model ◽

Large Scale ◽

Computational Cost ◽

Dynamics Simulation ◽

Potential Energy Landscape ◽

Three Body

Force field is a central requirement in molecular dynamics (MD) simulation for accurate description of the potential energy landscape and the time evolution of individual atomic motions. Most energy models are limited by a fundamental tradeoff between accuracy and speed. Although ab initio MD based on density functional theory (DFT) has high accuracy, its high computational cost prevents its use for large-scale and long-timescale simulations. Here, we use Bayesian active learning to construct a Gaussian process model of interatomic forces to describe Pt deposited on Ag(111). An accurate model is obtained within one day of wall time after selecting only 126 atomic environments based on two- and three-body interactions, providing mean absolute errors of 52 and 142 meV/Å for Ag and Pt, respectively. Our work highlights automated and minimalistic training of machine-learning force fields with high fidelity to DFT, which would enable large-scale and long-timescale simulations of alloy surfaces at first-principles accuracy.

Download Full-text