Using Machine Learning to Predict Protein Structure from Spectral Data

2010 ◽  
Author(s):  
Myra Kinalwa ◽  
Andrew J. Doig ◽  
Ewan W. Blanch ◽  
P. M. Champion ◽  
L. D. Ziegler
2019 ◽  
Vol 14 (3) ◽  
pp. 178-189 ◽  
Author(s):  
Xiaoyang Jing ◽  
Qimin Dong ◽  
Ruqian Lu ◽  
Qiwen Dong

Background:Protein inter-residue contacts prediction play an important role in the field of protein structure and function research. As a low-dimensional representation of protein tertiary structure, protein inter-residue contacts could greatly help de novo protein structure prediction methods to reduce the conformational search space. Over the past two decades, various methods have been developed for protein inter-residue contacts prediction.Objective:We provide a comprehensive and systematic review of protein inter-residue contacts prediction methods.Results:Protein inter-residue contacts prediction methods are roughly classified into five categories: correlated mutations methods, machine-learning methods, fusion methods, templatebased methods and 3D model-based methods. In this paper, firstly we describe the common definition of protein inter-residue contacts and show the typical application of protein inter-residue contacts. Then, we present a comprehensive review of the three main categories for protein interresidue contacts prediction: correlated mutations methods, machine-learning methods and fusion methods. Besides, we analyze the constraints for each category. Furthermore, we compare several representative methods on the CASP11 dataset and discuss performances of these methods in detail.Conclusion:Correlated mutations methods achieve better performances for long-range contacts, while the machine-learning method performs well for short-range contacts. Fusion methods could take advantage of the machine-learning and correlated mutations methods. Employing more effective fusion strategy could be helpful to further improve the performances of fusion methods.


2021 ◽  
Author(s):  
Oliver Dixon ◽  
William McCarthy ◽  
Nasser Madani ◽  
Michael Petronis ◽  
Steve McRobbie ◽  
...  

<p>Copper is one of the most important critical metal resources needed to achieve carbon neutrality with a projected increase in demand of >300% over the next half century from electronics and renewables.  Porphyry deposits account for most of the global copper production, but the discovery of new reserves is ever more challenging. Machine learning presents an opportunity to cross reference new and traditionally under-utilised data sets with a view to developing quantitative predictive models of hydrothermal alteration zones to guide new, ambitious exploration programs.</p><p>The aim of this study is to demonstrate a new alteration classification scheme driven by quantitative magnetic and spectral data to feed a machine learning algorithm. The benefits of an alteration model based on quantitative data rather than subjective observations by geologists, are that there is no bias in the data collected, the arising model is quantifiable and therefore easy to model and the process be fully automated. Ultimately, this approach aids more detailed exploration and mine modelling, in turn, reducing the extraction process carbon footprint and more effectively identifying new deposits.</p><p>Presented here are magnetic susceptibility and shortwave infrared (SWIR) data collected from the KazMinerals plc. owned Aktogay Cu-Mo giant porphyry deposit, eastern Kazakhstan, which has a throughput of 30Mtpa of ore. These data are cross referenced using a newly developed machine learning algorithm. Generated autonomously, our results reveal twelve statistically and geologically significant clusters that define a new alteration classification for porphyry style mineralisation. Results are entirely non-subjective, reproducible, quantitative and modellable.</p><p>Importantly, magnetic susceptibility measurements improve the algorithm’s ability to identify clusters by between 29-36%; enhancing the sophistication of the included magnetic data promises to yield substantially better statistical results. Magnetic remanence data are therefore being complied on representative samples from each of the twelve identified clusters, including hysteresis, isothermal remanent magnetisation (IRM) acquisition, FORC measurements, natural remanent magnetisation (NRM) and anhysteretic remanent magnetisation (ARM). Through collaboration with industry partners, we aim to develop an automated means of collecting these magnetic remanence data to accompany the machine learning algorithm.</p>


Author(s):  
Arun G. Ingale

To predict the structure of protein from a primary amino acid sequence is computationally difficult. An investigation of the methods and algorithms used to predict protein structure and a thorough knowledge of the function and structure of proteins are critical for the advancement of biology and the life sciences as well as the development of better drugs, higher-yield crops, and even synthetic bio-fuels. To that end, this chapter sheds light on the methods used for protein structure prediction. This chapter covers the applications of modeled protein structures and unravels the relationship between pure sequence information and three-dimensional structure, which continues to be one of the greatest challenges in molecular biology. With this resource, it presents an all-encompassing examination of the problems, methods, tools, servers, databases, and applications of protein structure prediction, giving unique insight into the future applications of the modeled protein structures. In this chapter, current protein structure prediction methods are reviewed for a milieu on structure prediction, the prediction of structural fundamentals, tertiary structure prediction, and functional imminent. The basic ideas and advances of these directions are discussed in detail.


2019 ◽  
Vol 11 (22) ◽  
pp. 2605 ◽  
Author(s):  
Wang ◽  
Chen ◽  
Wang ◽  
Li

Salt-affected soil is a prominent ecological and environmental problem in dry farming areas throughout the world. China has nearly 9.9 million km2 of salt-affected land. The identification, monitoring, and utilization of soil salinization have become important research topics for promoting sustainable progress. In this paper, using field-measured spectral data and soil salinity parameter data, through analysis and transformation of spectral data, five machine learning models, namely, random forest regression (RFR), support vector regression (SVR), gradient-boosted regression tree (GBRT), multilayer perceptron regression (MLPR), and least angle regression (Lars) are compared. The following performance measures of each model were evaluated: the collinear problems, handling data noise, stability, and the accuracy. In terms of these four aspects, the performance of each model on estimating soil salinity is evaluated. The results demonstrate that among the five models, RFR has the best performance in dealing with collinearity, RFR and MLPR have the best performance in dealing with data noise, and the SVR model is the most stable. The Lars model has the highest accuracy, with a determination coefficient (R2) of 0.87, ratio of performance to deviation (RPD) of 2.67, root mean square error (RMSE) of 0.18, and mean absolute percentage error (MAPE) of 0.11. Then, the comprehensive comparison and analysis of the five models are carried out, and it is found that the comprehensive performance of RFR model is the best; hence, this method is most suitable for estimating soil salinity using hyperspectral data. This study can provide a reference for the selection of regression methods in subsequent studies on estimating soil salinity using hyperspectral data.


Author(s):  
M. Cianciosa ◽  
K.J.H. Law ◽  
E.H. Martin ◽  
D.L. Green

Sign in / Sign up

Export Citation Format

Share Document