scholarly journals Feasibility of Kd-Trees in Gaussian Process Regression to Partition Test Points in High Resolution Input Space

Algorithms ◽  
2020 ◽  
Vol 13 (12) ◽  
pp. 327
Author(s):  
Ivan De Boi ◽  
Bart Ribbens ◽  
Pieter Jorissen ◽  
Rudi Penne

Bayesian inference using Gaussian processes on large datasets have been studied extensively over the past few years. However, little attention has been given on how to apply these on a high resolution input space. By approximating the set of test points (where we want to make predictions, not the set of training points in the dataset) by a kd-tree, a multi-resolution data structure arises that allows for considerable gains in performance and memory usage without a significant loss of accuracy. In this paper, we study the feasibility and efficiency of constructing and using such a kd-tree in Gaussian process regression. We propose a cut-off rule that is easy to interpret and to tune. We show our findings on generated toy data in a 3D point cloud and a simulated 2D vibrometry example. This survey is beneficial for researchers that are working on a high resolution input space. The kd-tree approximation outperforms the naïve Gaussian process implementation in all experiments.

2000 ◽  
Vol 12 (11) ◽  
pp. 2719-2741 ◽  
Author(s):  
Volker Tresp

The Bayesian committee machine (BCM) is a novel approach to combining estimators that were trained on different data sets. Although the BCM can be applied to the combination of any kind of estimators, the main foci are gaussian process regression and related systems such as regularization networks and smoothing splines for which the degrees of freedom increase with the number of training data. Somewhat surprisingly, we find that the performance of the BCM improves if several test points are queried at the same time and is optimal if the number of test points is at least as large as the degrees of freedom of the estimator. The BCM also provides a new solution for on-line learning with potential applications to data mining. We apply the BCM to systems with fixed basis functions and discuss its relationship to gaussian process regression. Finally, we show how the ideas behind the BCM can be applied in a non-Bayesian setting to extend the input-dependent combination of estimators.


Author(s):  
John L. Hutchison

Over the past five years or so the development of a new generation of high resolution electron microscopes operating routinely in the 300-400 kilovolt range has produced a dramatic increase in resolution, to around 1.6 Å for “structure resolution” and approaching 1.2 Å for information limits. With a large number of such instruments now in operation it is timely to assess their impact in the various areas of materials science where they are now being used. Are they falling short of the early expectations? Generally, the manufacturers’ claims regarding resolution are being met, but one unexpected factor which has emerged is the extreme sensitivity of these instruments to both floor-borne and acoustic vibrations. Successful measures to counteract these disturbances may require the use of special anti-vibration blocks, or even simple oil-filled dampers together with springs, with heavy curtaining around the microscope room to reduce noise levels. In assessing performance levels, optical diffraction analysis is becoming the accepted method, with rotational averaging useful for obtaining a good measure of information limits. It is worth noting here that microscope alignment becomes very critical for the highest resolution.In attempting an appraisal of the contributions of intermediate voltage HREMs to materials science we will outline a few of the areas where they are most widely used. These include semiconductors, oxides, and small metal particles, in addition to metals and minerals.


Author(s):  
H. Kohl

High-Resolution Electron Microscopy is able to determine structures of crystals and interfaces with a spatial resolution of somewhat less than 2 Å. As the image is strongly dependent on instrumental parameters, notably the defocus and the spherical aberration, the interpretation of micrographs necessitates a comparison with calculated images. Whereas one has often been content with a qualitative comparison of theory with experiment in the past, one is currently striving for quantitative procedures to extract information from the images [1,2]. For the calculations one starts by assuming a static potential, thus neglecting inelastic scattering processes.We shall confine the discussion to periodic specimens. All electrons, which have only been elastically scattered, are confined to very few directions, the Bragg spots. In-elastically scattered electrons, however, can be found in any direction. Therefore the influence of inelastic processes on the elastically (= Bragg) scattered electrons can be described as an attenuation [3]. For the calculation of high-resolution images this procedure would be correct only if we had an imaging energy filter capable of removing all phonon-scattered electrons. This is not realizable in practice. We are therefore forced to include the contribution of the phonon-scattered electrons.


2020 ◽  
Author(s):  
Marc Philipp Bahlke ◽  
Natnael Mogos ◽  
Jonny Proppe ◽  
Carmen Herrmann

Heisenberg exchange spin coupling between metal centers is essential for describing and understanding the electronic structure of many molecular catalysts, metalloenzymes, and molecular magnets for potential application in information technology. We explore the machine-learnability of exchange spin coupling, which has not been studied yet. We employ Gaussian process regression since it can potentially deal with small training sets (as likely associated with the rather complex molecular structures required for exploring spin coupling) and since it provides uncertainty estimates (“error bars”) along with predicted values. We compare a range of descriptors and kernels for 257 small dicopper complexes and find that a simple descriptor based on chemical intuition, consisting only of copper-bridge angles and copper-copper distances, clearly outperforms several more sophisticated descriptors when it comes to extrapolating towards larger experimentally relevant complexes. Exchange spin coupling is similarly easy to learn as the polarizability, while learning dipole moments is much harder. The strength of the sophisticated descriptors lies in their ability to linearize structure-property relationships, to the point that a simple linear ridge regression performs just as well as the kernel-based machine-learning model for our small dicopper data set. The superior extrapolation performance of the simple descriptor is unique to exchange spin coupling, reinforcing the crucial role of choosing a suitable descriptor, and highlighting the interesting question of the role of chemical intuition vs. systematic or automated selection of features for machine learning in chemistry and material science.


2018 ◽  
Author(s):  
Caitlin C. Bannan ◽  
David Mobley ◽  
A. Geoff Skillman

<div>A variety of fields would benefit from accurate pK<sub>a</sub> predictions, especially drug design due to the affect a change in ionization state can have on a molecules physiochemical properties.</div><div>Participants in the recent SAMPL6 blind challenge were asked to submit predictions for microscopic and macroscopic pK<sub>a</sub>s of 24 drug like small molecules.</div><div>We recently built a general model for predicting pK<sub>a</sub>s using a Gaussian process regression trained using physical and chemical features of each ionizable group.</div><div>Our pipeline takes a molecular graph and uses the OpenEye Toolkits to calculate features describing the removal of a proton.</div><div>These features are fed into a Scikit-learn Gaussian process to predict microscopic pK<sub>a</sub>s which are then used to analytically determine macroscopic pK<sub>a</sub>s.</div><div>Our Gaussian process is trained on a set of 2,700 macroscopic pK<sub>a</sub>s from monoprotic and select diprotic molecules.</div><div>Here, we share our results for microscopic and macroscopic predictions in the SAMPL6 challenge.</div><div>Overall, we ranked in the middle of the pack compared to other participants, but our fairly good agreement with experiment is still promising considering the challenge molecules are chemically diverse and often polyprotic while our training set is predominately monoprotic.</div><div>Of particular importance to us when building this model was to include an uncertainty estimate based on the chemistry of the molecule that would reflect the likely accuracy of our prediction. </div><div>Our model reports large uncertainties for the molecules that appear to have chemistry outside our domain of applicability, along with good agreement in quantile-quantile plots, indicating it can predict its own accuracy.</div><div>The challenge highlighted a variety of means to improve our model, including adding more polyprotic molecules to our training set and more carefully considering what functional groups we do or do not identify as ionizable. </div>


2019 ◽  
Vol 150 (4) ◽  
pp. 041101 ◽  
Author(s):  
Iakov Polyak ◽  
Gareth W. Richings ◽  
Scott Habershon ◽  
Peter J. Knowles

Sign in / Sign up

Export Citation Format

Share Document