scholarly journals Protein structure featurization via standard image classification neural networks

2019 ◽  
Author(s):  
Tobias Sikosek

ABSTRACTMany applications in the biomedical domain involve the detailed molecular and functional characterization of macro-molecules such as proteins. Where possible, this involves the knowledge of detailed 3D coordinates of every atom within a protein. At the same time, machine learning has become the basis of much innovation within this domain in recent years. There are, however, a few challenges in applying machine learning to 3D protein structures, such as variability in size and high dimensionality of the data. It would therefore be beneficial to be able to map every protein structure to a smaller fixed-dimensional representation that is directly learned from the structure without manual curation. In addition, it would be valuable for biomedical researchers if such approaches would require little method development and instead draw from cutting-edge research such as image classification via deep neural networks. Here, such an approach is outlined that first re-formats protein structures as 2D color images and then applies off-the-shelf neural networks for image classification. It is shown that such neural networks can be trained to effectively encode the CATH protein classification database and that feature vectors extracted from such networks, once trained, can be transferred to a completely new task that is likely to benefit from molecular protein information, namely that of small molecule binding.

Author(s):  
B. Biletskyy

Introduction. The task of determining the spatial structure of proteins is one of the most important unsolved problems of mankind. Life on the planet Earth is called protein, because protein molecules are the drivers of life processes in living organisms. Proteins make up about 80% of the dry mass of the cell and coordinate the processes of metabolism. The functions of proteins are defined by its spatial structure. The results of recent competitions in methods for determining protein structures have shown significant progress in this important area. One of the research groups presented the AlphaFold 2 method, the accuracy of which reached the accuracy of experimental methods. Purpose of the article. The aim of the work is to consider and analyze the basic principles of the AlphaFold software package for determining the spatial structure of proteins. Results. We consider the main stages in the process of recognizing the structure of a protein using the AlphaFold program complex. The stages and corresponding methods include: search for homologous proteins based on multiple alignment methods, construction of protein-specific differentiated potential using artificial neural networks and protein structure energy optimization using gradient descent and limited sampling. We discuss how combination of various bioinformatics techniques powered by data from open data sources can lead to significant improvements in accuracy of protein structure prediction. Special attention is paid to the use of artificial neural networks for building the smooth protein-specific potential and following energy minimization based on constructed potential. Conclusions. The combination of a number of methods and the use of information from protein and genetic data banks allows us to make significant progress in solving the extremely important task of determining the structure of a protein. Keywords: protein spatial structure, Machine Learning, AlphaFold.


2018 ◽  
Vol 7 (2.7) ◽  
pp. 614 ◽  
Author(s):  
M Manoj krishna ◽  
M Neelima ◽  
M Harshali ◽  
M Venu Gopala Rao

The image classification is a classical problem of image processing, computer vision and machine learning fields. In this paper we study the image classification using deep learning. We use AlexNet architecture with convolutional neural networks for this purpose. Four test images are selected from the ImageNet database for the classification purpose. We cropped the images for various portion areas and conducted experiments. The results show the effectiveness of deep learning based image classification using AlexNet.  


2019 ◽  
Vol 116 (18) ◽  
pp. 8960-8965 ◽  
Author(s):  
Michael Hicks ◽  
Istvan Bartha ◽  
Julia di Iulio ◽  
J. Craig Venter ◽  
Amalio Telenti

Sequence variation data of the human proteome can be used to analyze 3D protein structures to derive functional insights. We used genetic variant data from nearly 140,000 individuals to analyze 3D positional conservation in 4,715 proteins and 3,951 homology models using 860,292 missense and 465,886 synonymous variants. Sixty percent of protein structures harbor at least one intolerant 3D site as defined by significant depletion of observed over expected missense variation. Structural intolerance data correlated with deep mutational scanning functional readouts for PPARG, MAPK1/ERK2, UBE2I, SUMO1, PTEN, CALM1, CALM2, and TPK1 and with shallow mutagenesis data for 1,026 proteins. The 3D structural intolerance analysis revealed different features for ligand binding pockets and orthosteric and allosteric sites. Large-scale data on human genetic variation support a definition of functional 3D sites proteome-wide.


2021 ◽  
Vol 11 (Suppl_1) ◽  
pp. S13-S13
Author(s):  
Valery Novoseletsky ◽  
Mikhail Lozhnikov ◽  
Grigoriy Armeev ◽  
Aleksandr Kudriavtsev ◽  
Alexey Shaytan ◽  
...  

Background: Protein structure determination using X-ray free-electron laser (XFEL) includes analysis and merging a large number of snapshot diffraction patterns. Convolutional neural networks are widely used to solve numerous computer vision problems, e.g. image classification, and can be used for diffraction pattern analysis. But the task of protein structure determination with the use of CNNs only is not yet solved. Methods: We simulated the diffraction patterns using the Condor software library and obtained more than 1000 diffraction patterns for each structure with simulation parameters resembling real ones. To classify diffraction patterns, we tried two approaches, which are widely known in the area of image classification: a classic VGG network and residual networks. Results: 1. Recognition of a protein class (GPCRs vs globins). Globins and GPCR-like proteins are typical α-helical proteins. Each of these protein families has a large number of representatives (including those with known structure) but we used only 8 structures from every family. 12,000 of diffraction patterns were used for training and 4,000 patterns for testing. Results indicate that all considered networks are able to recognize the protein family type with high accuracy. 2. Recognition of the number of protein molecules in the liposome. We considered the usage of lyposomes as carriers of membrane or globular proteins for sample delivery in XFEL experiments in order to improve the X-ray beam hit rate. Three sets of diffractograms for liposomes of various radius were calculated, including diffractograms for empty liposomes, liposomes loaded with 5 bacteriorhodopsin molecules, and liposomes loaded with 10 bacteriorhodopsin molecules. The training set consisted of 23625 diffraction patterns, and test set of 7875 patterns. We found that all networks used in our study were able to identify the number of protein molecules in liposomes independent of the liposome radius. Our findings make this approach rather promising for the usage of liposomes as protein carriers in XFEL experiments. Conclusion: Thus, the performed numerical experiments show that the use of neural network algorithms for the recognition of diffraction images from single macromolecular particles makes it possible to determine changes in the structure at the angstrom scale.


2019 ◽  
Author(s):  
Larry Bliss ◽  
Ben Pascoe ◽  
Samuel K Sheppard

AbstractMotivationProtein structure predictions, that combine theoretical chemistry and bioinformatics, are an increasingly important technique in biotechnology and biomedical research, for example in the design of novel enzymes and drugs. Here, we present a new ensemble bi-layered machine learning architecture, that directly builds on ten existing pipelines providing rapid, high accuracy, 3-State secondary structure prediction of proteins.ResultsAfter training on 1348 solved protein structures, we evaluated the model with four independent datasets: JPRED4 - compiled by the authors of the successful predictor with the same name, and CASP11, CASP12 & CASP13 - assembled by the Critical Assessment of protein Structure Prediction consortium who run biannual experiments focused on objective testing of predictors. These rigorous, pre-established protocols included 7-fold cross-validation and blind testing. This led to a mean Hermes accuracy of 95.5%, significantly (p<0.05) better than the ten previously published models analysed in this paper. Furthermore, Hermes yielded a reduction in standard deviation, lower boundary outliers, and reduced dependency on solved structures of homologous proteins, as measured by NEFF score. This architecture provides advantages over other pipelines, while remaining accessible to users at any level of bioinformatics experience.Availability and ImplementationThe source code for Hermes is freely available at: https://github.com/HermesPrediction/Hermes. This page also includes the cross-validation with corresponding models, and all training/testing data presented in this study with predictions and accuracy.


Author(s):  
Hmidi Alaeddine ◽  
Malek Jihene

The reduction in the size of convolution filters has been shown to be effective in image classification models. They make it possible to reduce the calculation and the number of parameters used in the operations of the convolution layer while increasing the efficiency of the representation. The authors present a deep architecture for classification with improved performance. The main objective of this architecture is to improve the main performances of the network thanks to a new design based on CONVblock. The proposal is evaluated on a classification database: CIFAR-10 and MNIST. The experimental results demonstrate the effectiveness of the proposed method. This architecture offers an error of 1.4% on CIFAR-10 and 0.055% on MNIST.


2002 ◽  
Vol 12 (06) ◽  
pp. 447-465 ◽  
Author(s):  
STEPHAN K. CHALUP

Incremental learning concepts are reviewed in machine learning and neurobiology. They are identified in evolution, neurodevelopment and learning. A timeline of qualitative axon, neuron and synapse development summarizes the review on neurodevelopment. A discussion of experimental results on data incremental learning with recurrent artificial neural networks reveals that incremental learning often seems to be more efficient or powerful than standard learning but can produce unexpected side effects. A characterization of incremental learning is proposed which takes the elaborated biological and machine learning concepts into account.


2021 ◽  
Author(s):  
Anastasiya V Kulikova ◽  
Daniel J Diaz ◽  
James M Loy ◽  
Andrew D Ellington ◽  
Claus O Wilke

The fundamental problem of protein biochemistry is to predict protein structure from amino acid sequence. The inverse problem, predicting either entire sequences or individual mutations that are consistent with a given protein structure, has received much less attention even though it has important applications in both protein engineering and evolutionary biology. Here, we ask whether 3D convolutional neural networks (3D CNNs) can learn the local fitness landscape of protein structure to reliably predict either the wild-type amino acid or the consensus in a multiple sequence alignment from the local structural context surrounding a site of interest. We find that the network can predict wild type with good accuracy, and that network confidence is a reliable measure of whether a given prediction is likely going to be correct or not. Predictions of consensus are less accurate, and are primarily driven by whether or not the consensus matches the wild type. Our work suggests that high-confidence mis-predictions of the wild type may identify sites that are primed for mutation and likely targets for protein engineering.


Author(s):  
Anuraag Velamati Et.al

The world is quickly and continuously advancing towards better technological advancements that will make life quite easier for us, human beings [22]. Humans are looking for more interactive and advanced ways to improve their learning. One such dream is making a machine think like a computer, which lead to innovations like AI and deep learning [25]. The world is running at a higher pace in the domain of AI, deep learning, robotics and machine learning Using this knowledge and technology, we could develop anything right now [36]. As a part of sub-domain, the introduction of Convolution Neural Networks made deep learning extensively strong in the domain of image classification and detection [1]..The research that we have conducted is one of its kind. Our research used Convolution Neural Network, TensorFlow and Keras.


Author(s):  
Thamires Quadros Froes ◽  
Maria Cristina Nonato ◽  
Marcelo Santos Castilho ◽  
Luana Carlos Campisano Zapata ◽  
Juliana Sayuri Akamine

Background: Dihydroorotate dehydrogenase (DHODH) has long been recognized as an important drug target for proliferative and parasitic diseases, including compounds that exhibit trypanocidal action and broad-spectrum antiviral activity. Despite numerous and successful efforts in structural and functional characterization of DHODHs, as well as in the development of inhibitors, DHODH hot spots remain largely unmapped and underexplored. Objective: This review describes the tools that are currently available for the identification and characterization of hot spots in protein structures and how freely available webservers can be exploited to predict DHODH hot spots. Moreover, it provides for the first time a review of the antiviral properties of DHODH inhibitors. Method: X-ray structures from human (HsDHODH) and Trypanosoma cruzi DHODH (TcDHODH) had their hot spots predicted by both FTMap and Fragment Hotspot Maps web servers. Result: FTMap showed that hot spot occupancy in HsDHODH is correlated with the ligand efficiency (LE) of its known inhibitors, and Fragment Hotspot Maps pointed out the contribution of selected moieties to the overall LE. The conformational flexibility of the active site loop in TcDHODH was found to have a major impact on the druggability of the orotate binding site. In addition, both FTMap and Fragment Hotspot Maps servers predict a novel pocket in TcDHODH dimer interface (S6 site). Conclusion: This review reports how hot spots can be exploited during hit-to-lead steps, docking studies or even to improve inhibitor binding profile and by doing so using DHODH as a model, points to new drug development opportunities.


Sign in / Sign up

Export Citation Format

Share Document