Classical Statistics and Modern Machine Learning

Author(s):  
Mark Chang
2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Abdulkadir Canatar ◽  
Blake Bordelon ◽  
Cengiz Pehlevan

AbstractA theoretical understanding of generalization remains an open problem for many machine learning models, including deep networks where overparameterization leads to better performance, contradicting the conventional wisdom from classical statistics. Here, we investigate generalization error for kernel regression, which, besides being a popular machine learning method, also describes certain infinitely overparameterized neural networks. We use techniques from statistical mechanics to derive an analytical expression for generalization error applicable to any kernel and data distribution. We present applications of our theory to real and synthetic datasets, and for many kernels including those that arise from training deep networks in the infinite-width limit. We elucidate an inductive bias of kernel regression to explain data with simple functions, characterize whether a kernel is compatible with a learning task, and show that more data may impair generalization when noisy or not expressible by the kernel, leading to non-monotonic learning curves with possibly many peaks.


Author(s):  
Ari S. Benjamin ◽  
Hugo L. Fernandes ◽  
Tucker Tomlinson ◽  
Pavan Ramkumar ◽  
Chris VerSteeg ◽  
...  

Author(s):  
Zhe Bai ◽  
Liqian Peng

AbstractAlthough projection-based reduced-order models (ROMs) for parameterized nonlinear dynamical systems have demonstrated exciting results across a range of applications, their broad adoption has been limited by their intrusivity: implementing such a reduced-order model typically requires significant modifications to the underlying simulation code. To address this, we propose a method that enables traditionally intrusive reduced-order models to be accurately approximated in a non-intrusive manner. Specifically, the approach approximates the low-dimensional operators associated with projection-based reduced-order models (ROMs) using modern machine-learning regression techniques. The only requirement of the simulation code is the ability to export the velocity given the state and parameters; this functionality is used to train the approximated low-dimensional operators. In addition to enabling nonintrusivity, we demonstrate that the approach also leads to very low computational complexity, achieving up to $$10^3{\times }$$ 10 3 × in run time. We demonstrate the effectiveness of the proposed technique on two types of PDEs. The domain of applications include both parabolic and hyperbolic PDEs, regardless of the dimension of full-order models (FOMs).


Author(s):  
Elizaveta Shmalko ◽  
Yuri Rumyantsev ◽  
Ruslan Baynazarov ◽  
Konstantin Yamshanov

To calculate the optimal control, a satisfactory mathematical model of the control object is required. Further, when implementing the calculated controls on a real object, the same model can be used in robot navigation to predict its position and correct sensor data, therefore, it is important that the model adequately reflects the dynamics of the object. Model derivation is often time-consuming and sometimes even impossible using traditional methods. In view of the increasing diversity and extremely complex nature of control objects, including the variety of modern robotic systems, the identification problem is becoming increasingly important, which allows you to build a mathematical model of the control object, having input and output data about the system. The identification of a nonlinear system is of particular interest, since most real systems have nonlinear dynamics. And if earlier the identification of the system model consisted in the selection of the optimal parameters for the selected structure, then the emergence of modern machine learning methods opens up broader prospects and allows you to automate the identification process itself. In this paper, a wheeled robot with a differential drive in the Gazebo simulation environment, which is currently the most popular software package for the development and simulation of robotic systems, is considered as a control object. The mathematical model of the robot is unknown in advance. The main problem is that the existing mathematical models do not correspond to the real dynamics of the robot in the simulator. The paper considers the solution to the problem of identifying a mathematical model of a control object using machine learning technique of the neural networks. A new mixed approach is proposed. It is based on the use of well-known simple models of the object and identification of unaccounted dynamic properties of the object using a neural network based on a training sample. To generate training data, a software package was written that automates the collection process using two ROS nodes. To train the neural network, the PyTorch framework was used and an open source software package was created. Further, the identified object model is used to calculate the optimal control. The results of the computational experiment demonstrate the adequacy and performance of the resulting model. The presented approach based on a combination of a well-known mathematical model and an additional identified neural network model allows using the advantages of the accumulated physical apparatus and increasing its efficiency and accuracy through the use of modern machine learning tools.


Metagenomics ◽  
2017 ◽  
Vol 1 (1) ◽  
Author(s):  
Hayssam Soueidan ◽  
Macha Nikolski

AbstractOwing to the complexity and variability of metagenomic studies, modern machine learning approaches have seen increased usage to answer a variety of question encompassing the full range of metagenomic NGS data analysis.We review here the contribution of machine learning techniques for the field of metagenomics, by presenting known successful approaches in a unified framework. This review focuses on five important metagenomic problems:OTU-clustering, binning, taxonomic proffiing and assignment, comparative metagenomics and gene prediction. For each of these problems, we identify the most prominent methods, summarize the machine learning approaches used and put them into perspective of similar methods.We conclude our review looking further ahead at the challenge posed by the analysis of interactions within microbial communities and different environments, in a field one could call “integrative metagenomics”.


2021 ◽  
Vol 73 (03) ◽  
pp. 25-30
Author(s):  
Srikanta Mishra ◽  
Jared Schuetter ◽  
Akhil Datta-Gupta ◽  
Grant Bromhal

Algorithms are taking over the world, or so we are led to believe, given their growing pervasiveness in multiple fields of human endeavor such as consumer marketing, finance, design and manufacturing, health care, politics, sports, etc. The focus of this article is to examine where things stand in regard to the application of these techniques for managing subsurface energy resources in domains such as conventional and unconventional oil and gas, geologic carbon sequestration, and geothermal energy. It is useful to start with some definitions to establish a common vocabulary. Data analytics (DA)—Sophisticated data collection and analysis to understand and model hidden patterns and relationships in complex, multivariate data sets Machine learning (ML)—Building a model between predictors and response, where an algorithm (often a black box) is used to infer the underlying input/output relationship from the data Artificial intelligence (AI)—Applying a predictive model with new data to make decisions without human intervention (and with the possibility of feedback for model updating) Thus, DA can be thought of as a broad framework that helps determine what happened (descriptive analytics), why it happened (diagnostic analytics), what will happen (predictive analytics), or how can we make something happen (prescriptive analytics) (Sankaran et al. 2019). Although DA is built upon a foundation of classical statistics and optimization, it has increasingly come to rely upon ML, especially for predictive and prescriptive analytics (Donoho 2017). While the terms DA, ML, and AI are often used interchangeably, it is important to recognize that ML is basically a subset of DA and a core enabling element of the broader application for the decision-making construct that is AI. In recent years, there has been a proliferation in studies using ML for predictive analytics in the context of subsurface energy resources. Consider how the number of papers on ML in the OnePetro database has been increasing exponentially since 1990 (Fig. 1). These trends are also reflected in the number of technical sessions devoted to ML/AI topics in conferences organized by SPE, AAPG, and SEG among others; as wells as books targeted to practitioners in these professions (Holdaway 2014; Mishra and Datta-Gupta 2017; Mohaghegh 2017; Misra et al. 2019). Given these high levels of activity, our goal is to provide some observations and recommendations on the practice of data-driven model building using ML techniques. The observations are motivated by our belief that some geoscientists and petroleum engineers may be jumping the gun by applying these techniques in an ad hoc manner without any foundational understanding, whereas others may be holding off on using these methods because they do not have any formal ML training and could benefit from some concrete advice on the subject. The recommendations are conditioned by our experience in applying both conventional statistical modeling and data analytics approaches to practical problems.


2021 ◽  
Vol 82 (8) ◽  
pp. 1293-1320
Author(s):  
P. A. Mukhachev ◽  
T. R. Sadretdinov ◽  
D. A. Pritykin ◽  
A. B. Ivanov ◽  
S. V. Solov’ev

Sign in / Sign up

Export Citation Format

Share Document