Interaction from Structure using Machine Learning: in and out of Equilibrium

Random Forest Refinement of Pairwise Potentials for Protein-ligand Decoy Detection

10.26434/chemrxiv.8047820.v1 ◽

2019 ◽

Cited By ~ 1

Author(s):

Jun Pei ◽

Zheng Zheng ◽

Hyunji Kim ◽

Lin Song ◽

Sarah Walworth ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Probability Function ◽

Pair Potential ◽

Scoring Function ◽

Stable Structure ◽

Scoring Functions ◽

Atom Pair ◽

Data Set ◽

Atom Pairs

An accurate scoring function is expected to correctly select the most stable structure from a set of pose candidates. One can hypothesize that a scoring function’s ability to identify the most stable structure might be improved by emphasizing the most relevant atom pairwise interactions. However, it is hard to evaluate the relevant importance for each atom pair using traditional means. With the introduction of machine learning methods, it has become possible to determine the relative importance for each atom pair present in a scoring function. In this work, we use the Random Forest (RF) method to refine a pair potential developed by our laboratory (GARF6) by identifying relevant atom pairs that optimize the performance of the potential on our given task. Our goal is to construct a machine learning (ML) model that can accurately differentiate the native ligand binding pose from candidate poses using a potential refined by RF optimization. We successfully constructed RF models on an unbalanced data set with the ‘comparison’ concept and, the resultant RF models were tested on CASF-2013.5 In a comparison of the performance of our RF models against 29 scoring functions, we found our models outperformed the other scoring functions in predicting the native pose. In addition, we used two artificial designed potential models to address the importance of the GARF potential in the RF models: (1) a scrambled probability function set, which was obtained by mixing up atom pairs and probability functions in GARF, and (2) a uniform probability function set, which share the same peak positions with GARF but have fixed peak heights. The results of accuracy comparison from RF models based on the scrambled, uniform, and original GARF potential clearly showed that the peak positions in the GARF potential are important while the well depths are not.

Download Full-text

Towards Behaviour Recognition with Unlabelled Sensor Data

Human Behavior Recognition Technologies ◽

10.4018/978-1-4666-3682-8.ch005 ◽

2013 ◽

pp. 86-110

Author(s):

Sook-Ling Chua ◽

Stephen Marsland ◽

Hans W. Guesgen

Keyword(s):

Machine Learning ◽

Data Mining ◽

Inverse Problem ◽

Sensor Data ◽

Training Set ◽

Learning Methods ◽

Machine Learning Methods ◽

Using Data ◽

Symbolic Approach ◽

Behaviour Recognition

The problem of behaviour recognition based on data from sensors is essentially an inverse problem: given a set of sensor observations, identify the sequence of behaviours that gave rise to them. In a smart home, the behaviours are likely to be the standard human behaviours of living, and the observations will depend upon the sensors that the house is equipped with. There are two main approaches to identifying behaviours from the sensor stream. One is to use a symbolic approach, which explicitly models the recognition process. Another is to use a sub-symbolic approach to behaviour recognition, which is the focus in this chapter, using data mining and machine learning methods. While there have been many machine learning methods of identifying behaviours from the sensor stream, they have generally relied upon a labelled dataset, where a person has manually identified their behaviour at each time. This is particularly tedious to do, resulting in relatively small datasets, and is also prone to significant errors as people do not pinpoint the end of one behaviour and commencement of the next correctly. In this chapter, the authors consider methods to deal with unlabelled sensor data for behaviour recognition, and investigate their use. They then consider whether they are best used in isolation, or should be used as preprocessing to provide a training set for a supervised method.

Download Full-text

ML-descent: An optimization algorithm for full-waveform inversion using machine learning

Geophysics ◽

10.1190/geo2019-0641.1 ◽

2020 ◽

Vol 85 (6) ◽

pp. R477-R492 ◽

Cited By ~ 2

Author(s):

Bingbing Sun ◽

Tariq Alkhalifah

Keyword(s):

Neural Network ◽

Machine Learning ◽

Inverse Problem ◽

Optimization Algorithm ◽

Waveform Inversion ◽

Slight Modification ◽

Descent Method ◽

Quadratic Functions ◽

Full Waveform Inversion ◽

Full Waveform

Full-waveform inversion (FWI) is a nonlinear optimization problem, and a typical optimization algorithm such as the nonlinear conjugate gradient or limited-memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) would iteratively update the model mainly along the gradient-descent direction of the misfit function or a slight modification of it. Based on the concept of meta-learning, rather than using a hand-designed optimization algorithm, we have trained the machine (represented by a neural network) to learn an optimization algorithm, entitled the “ML-descent,” and apply it in FWI. Using a recurrent neural network (RNN), we use the gradient of the misfit function as the input, and the hidden states in the RNN incorporate the history information of the gradient similar to an LBFGS algorithm. However, unlike the fixed form of the LBFGS algorithm, the machine-learning (ML) version evolves in response to the gradient. The loss function for training is formulated as a weighted summation of the L2 norm of the data residuals in the original inverse problem. As with any well-defined nonlinear inverse problem, the optimization can be locally approximated by a linear convex problem; thus, to accelerate the training, we train the neural network by minimizing randomly generated quadratic functions instead of performing time-consuming FWIs. To further improve the accuracy and robustness, we use a variational autoencoder that projects and represents the model in latent space. We use the Marmousi and the overthrust examples to demonstrate that the ML-descent method shows faster convergence and outperforms conventional optimization algorithms. The energy in the deeper part of the models can be recovered by the ML-descent even when the pseudoinverse of the Hessian is not incorporated in the FWI update.

Download Full-text

Towards a Fast and Accurate EIT Inverse Problem Solver: A Machine Learning Approach

Electronics ◽

10.3390/electronics7120422 ◽

2018 ◽

Vol 7 (12) ◽

pp. 422 ◽

Cited By ~ 4

Author(s):

Xosé Fernández-Fuentes ◽

David Mera ◽

Andrés Gómez ◽

Ignacio Vidal-Franco

Keyword(s):

Machine Learning ◽

Inverse Problem ◽

Computation Time ◽

Percentage Error ◽

Problem Solver ◽

Impedance Tomography ◽

Quantitative Metrics ◽

Machine Learning Approach ◽

Primal Dual ◽

Internal Body

Different industrial and medical situations require the non-invasive extraction of information from the inside of bodies. This is usually done through tomographic methods that generate images based on internal body properties. However, the image reconstruction involves a mathematical inverse problem, for which accurate resolution demands large computation time and capacity. In this paper we explore the use of Machine Learning to develop an accurate solver for reconstructing Electrical Impedance Tomography images in real-time. We compare the results with the Iterative Gauss-Newton and the Primal Dual Interior Point Method, which are both largely used and well-validated solvers. The approaches were compared from the qualitative as well as the quantitative viewpoints. The former was focused on correctly detecting the internal body features. The latter was based on accurately predicting internal property distributions. Experiments revealed that our approach achieved better accuracy and Cohen’s kappa coefficient (97.57% and 94.60% respectively) from the qualitative viewpoint. Moreover, it also obtained better quantitative metrics with a Mean Absolute Percentage Error of 18.28%. Experiments confirmed that Neural Networks algorithms can reconstruct internal body properties with high accuracy, so they would be able to replace more complex and slower alternatives.

Download Full-text

Higher-order interactions in statistical physics and machine learning: A model-independent solution to the inverse problem at equilibrium

Physical Review E ◽

10.1103/physreve.102.053314 ◽

2020 ◽

Vol 102 (5) ◽

Author(s):

Sjoerd Viktor Beentjes ◽

Ava Khamseh

Keyword(s):

Machine Learning ◽

Inverse Problem ◽

Statistical Physics ◽

Independent Solution ◽

Higher Order ◽

Model Independent

Download Full-text

Groundwater pollution monitoring and the inverse problem of source identification. Evaluation of various Machine Learning methods

10.5194/egusphere-egu21-16388 ◽

2021 ◽

Author(s):

Yiannis Kontos ◽

Theodosios Kassandros ◽

Konstantinos Katsifarakis ◽

Kostas Karatzas

Keyword(s):

Neural Network ◽

Machine Learning ◽

Inverse Problem ◽

Real Time ◽

Groundwater Pollution ◽

Source Identification ◽

Monitoring Network ◽

Feature Subset Selection ◽

Pollution Monitoring ◽

Pollution Detection

Groundwater pollution numerical simulations coupled with Genetic Algorithms (GAs) lead to vast computational load, while flow fields&#8217; simplification can compensate in design, but not real-time/operational, applications. Various Machine Learning/Deep Learning (ML/DL) methods/problem-formulations were tested/evaluated for real-time inverse problems of aquifer pollution source identification. Aim: investigate data-driven approaches towards replacing flow simulation with ML/DL trained models identifying the source, faster but efficiently enough.Steady flow in a 1500mx1500m theoretical confined, isotropic aquifer of known characteristics is studied. Two pumping wells (PWs) near the southern boundary provide irrigation/drinking water, defining the flow together with a varying North-South natural flow. Six suspected possible sources, capable of instantaneous leakage, may spread a conservative pollutant. Particle tracking simulates advective mass transport, in a 2D flow-field for 2500 1-day timesteps. The 14x14 inner field grid nodes serve as locations of sources, PWs and monitoring wells (MWs; for simple daily yes/no pollution detection and/or drawdown measuring). 15,246 combinations of 6 Source Nrs, 21 N-S hydraulic gradients, 11+11 PW1,2 flow-rates were simulated with existing own software, providing the necessary data-sets for ML training/evaluation.Two basic ML/DL approaches were implemented: Classification (CL) and Computer Vision (CV). In CL, every source is a discrete class, while each MW is a discrete variable. The target variable Y can equal 1 to 6, while input variables X can be: a) 0/1 (MWi polluted or not), b) the first day of MWi&#8217;s pollution, c) the duration of MWi&#8217;s pollution, d) hydraulic drawdown of MWi. For a bit more realism, the two southern rows of 28 MWs, and the MWs on/around PWs are concealed. CL features the advantage of facilitating Correlation-based Feature Subset Selection (CFSS), indirectly leading to a pseudo-optimization of the monitoring network, minimizing the number of MWs (not the sampling frequency though), based solely on the efficiency in identifying the source criterion. As a downside, time dimension and spatial correlation of MWs are not considered. Approach (b) being the best scheme, Random Forests (RFs; 86.5576% accuracy), Multi-Layer Perceptron (MLP; 77.5%), and Nearest Neighbors (NN; 86.5%) were tested. CFSS led to 8 only MWs being important, so training with the optimal subsets gave promising results: RF=85.4%, MLP=73.1%, NN=85.4%. In CV, MWis&#8217; pollution input data on a 10-day basis (0-60, 800-on concealed) were formulated into 14x14-pixel black/white images, that is 14x14 binary (0,1) matrices, the t=0 image being the desideratum. A Convolutional Neural Network (CNN; U-Net architecture for image segmentation) achieved 97.1% accuracy. A Convolutional Long/Short-Term Memory Neural Network (CLSTM), training a model to back-propagate predicting each given time step, with unchanged data formulation (60-800d, step 10), exhibits 82.3% accuracy. CLSTM&#8217;s performance is timestep-sensitive, best results yielded (98% accuracy) using configuration 5-800d, step 6.Concluding, CL&#8217;s CFSS minimizes the input space, while CV approaches yield more promising results in terms of accuracy. Each approach has certain constraints in operational applicability, concerning the number of MWs, the sampling resolution and the total elapsed time. This process paves the way for realistic inverse problem solutions, ML-GAs monitoring network optimization, and real-time pollution detection operational systems.&#160;

Download Full-text

Forward-inverse modeling based on scalar and vector radiative transfer models for coupled atmosphere-surface systems and machine learning tools

10.5194/egusphere-egu2020-4217 ◽

2020 ◽

Author(s):

Knut Stamnes ◽

Børge Hamre ◽

Snorre Stamnes ◽

Nan Chen ◽

Yongzhen Fan ◽

...

Keyword(s):

Machine Learning ◽

Inverse Problem ◽

Radiative Transfer ◽

Inverse Problems ◽

Surface Properties ◽

Inverse Modeling ◽

Atmospheric Correction ◽

Machine Learning Techniques ◽

Ocean Bottom ◽

Curvature Effects

Reliable retrieval of atmospheric and surface properties from sensors deployed on satellite platforms rely on accurate simulations of the electromagnetic (EM) signal measured by such sensors. A forward radiative transfer (RT) model of the coupled atmosphere-surface system can be used to simulate how the EM signal responds to changes in atmospheric and surface properties. Realistic RT modeling is a prerequisite for solving the inverse problem, i.e. to infer atmospheric and surface parameters from the EM signals measured at the top of the atmosphere. The surface may consist of a soil-plant canopy, a snow/ice covered surface or an open water body (ocean, lake, river system). An overview will be provided of forward and inverse RT in such coupled atmosphere-surface systems. A coupled system consisting of two adjacent slabs separated by an interface across which the refractive index changes abruptly from its value in air to that in water /ice [1] will be used as an example. Several examples of how to formulate and solve inverse problems involving coupled atmosphere-water systems [2] will be provided to illustrate how solutions to the RT equation can be used as a forward model to solve practical inverse problems. Cloud screening [3], atmospheric correction [4], treatment of two-dimensional surface roughness, Earth curvature effects, and ocean bottom reflection for shallow water in coastal areas will be discussed, and the advantage of using powerful machine learning techniques to solve the inverse problem will be emphasized.References[1] Stamnes, K., and J. J. Stamnes, Radiative Transfer in Coupled Environmental Systems, , 2015.[2] Stamnes, K., B. Hamre, S. Stamnes, N. Chen, Y. Fan, W. Li, Z. Lin, and J. J. Stamnes, Progress in forward-inverse modeling based on radiative transfer tools for coupled atmosphere-snow/ice-ocean systems: A review and description of the AccuRT model, , 8, 2682, 2018.[3] Chen N., W. Li, C. Gatebe, T. Tanikawa, M. Hori, R. Shimada; T. Aoki, and K. Stamnes, New cloud mask algorithm based on machine learning methods and radiative transfer simulations, , 219, 62-71, 2018.[4] Fan, Y., W. Li, C. K. Gatebe, C. Jamet, G. Zibordi, T. Schroeder, and K. Stamnes, Atmospheric correction and aerosol retrieval over coastal waters using multilayer neural networks, , 199, 218-240, 2017.

Download Full-text

Design of High-Speed Links via a Machine Learning Surrogate Model for the Inverse Problem

2019 Electrical Design of Advanced Packaging and Systems (EDAPS) ◽

10.1109/edaps47854.2019.9011627 ◽

2019 ◽

Author(s):

R. Trinchero ◽

M. Ahadi Dolatsara ◽

K. Roy ◽

M. Swaminathan ◽

F. G. Canavero

Keyword(s):

Machine Learning ◽

Inverse Problem ◽

Surrogate Model ◽

High Speed

Download Full-text

DETERMINATION OF DEFORMATION PROPERTIES AND NATURAL STRESSES IN ROCK MASS BY UNDERGROUND GEODESY DATA

Interexpo GEO-Siberia ◽

10.33764/2618-981x-2021-2-4-52-61 ◽

2021 ◽

Vol 2 (4) ◽

pp. 52-61

Author(s):

Anton V. Panov ◽

Leonid A. Nazarov

Keyword(s):

Inverse Problem ◽

Measurement Data ◽

Domain Size ◽

Structural Elements ◽

Room And Pillar Mining ◽

3D Geomechanical Model ◽

Deformation Properties ◽

Pillar Mining ◽

Typical Configuration

The authors have developed and implemented a 3D geomechanical model using the finite element method for a typical configuration of an underground space during room-and-pillar mining. The authors formulate and solve an inverse problem on determination of values and orientation of external horizontal stresses and deformation characteristics of structural elements of the geotechnology by the measurement data of sidewall convergence in rooms in the course of mining. The level curves of different objective functions are analyzed, the mixed inverse problem resolvability is demonstrated, and the equivalence domain size is correlated with the relative error of input data.

Download Full-text

Thermodynamically Consistent Theory of Classical Liquid Structure and Inverse Problem of Extracting Pair Potential

Physics and Chemistry of Liquids ◽

10.1080/00319108408080787 ◽

1984 ◽

Vol 13 (4) ◽

pp. 285-292 ◽

Cited By ~ 10

Author(s):

N. H. March ◽

G. Senatore

Keyword(s):

Inverse Problem ◽

Pair Potential ◽

Liquid Structure ◽

Consistent Theory

Download Full-text