Using Deep Learning to Annotate the Protein Universe

Mapping Intimacies ◽

10.1101/626507 ◽

2019 ◽

Cited By ~ 9

Author(s):

Maxwell L. Bileschi ◽

David Belanger ◽

Drew Bryant ◽

Theo Sanderson ◽

Brandon Carter ◽

...

Keyword(s):

Deep Learning ◽

Amino Acid ◽

Protein Function ◽

Protein Function Prediction ◽

Structural Disorder ◽

Amino Acid Sequences ◽

Computationally Efficient ◽

Learning Sequence ◽

Convolutional Networks ◽

The Relationship

AbstractUnderstanding the relationship between amino acid sequence and protein function is a long-standing problem in molecular biology with far-reaching scientific implications. Despite six decades of progress, state-of-the-art techniques cannot annotate 1/3 of microbial protein sequences, hampering our ability to exploit sequences collected from diverse organisms. In this paper, we explore an alternative methodology based on deep learning that learns the relationship between unaligned amino acid sequences and their functional annotations across all 17929 families of the Pfam database. Using the Pfam seed sequences we establish rigorous benchmark assessments that use both random and clustered data splits to control for potentially confounding sequence similarities between train and test sequences. Using Pfam full, we report convolutional networks that are significantly more accurate and computationally efficient than BLASTp, while learning sequence features such as structural disorder and transmembrane helices. Our model co-locates sequences from unseen families in embedding space, allowing sequences from novel families to be accurately annotated. These results suggest deep learning models will be a core component of future protein function prediction tools.

Download Full-text

Sars-Cov-2 Spike protein function prediction using a convolutional neural network ensemble

Design Engineering ◽

10.17762/de.vi.4293 ◽

2021 ◽

pp. 7831-7845

Author(s):

Raghad Monther Eid, Eman K. Elsayed, Fatma T. Ghanam

Keyword(s):

Neural Network ◽

Amino Acid ◽

Protein Function ◽

Protein Function Prediction ◽

Small Error ◽

Amino Acid Sequences ◽

Spike Protein ◽

Neural Network Ensemble ◽

Classification Problems ◽

Past Experiences

Introduction: SARS-CoV-2 has become a worldwide pandemic that affects all aspects of life; therefore, numerous organizations and open exploration foundations focus their efforts on research for viable therapeutics. Given past experiences and involvement in SARS, the essential focus has been the Spike protein, considered as the perfect objective for COVID-19 immunotherapies. Most of the vaccines being developed target the spike proteins because this protein covers the virus and helps it invade human cells. Methods: Applications of deep neural network is a quickly expanding field now reaching many areas including proteomics. Results: To be precise, convolutional neural networks have been used for identifying the functional role of amino acid sequences, because of its ability to give nearly accurate results for multi-label classification problems. Here we present a modified convolutional deep learning model that can identify if a given amino acid sequence is a spike protein or not based on the length of the sequence and the function of the protein, that will be done with a short execution time and a relatively small error rate. Conclusion: CNN is an efficient tool at supervised multilabel classification problems

Download Full-text

Predicting functions of maize proteins using graph convolutional network

BMC Bioinformatics ◽

10.1186/s12859-020-03745-6 ◽

2020 ◽

Vol 21 (S16) ◽

Author(s):

Guangjie Zhou ◽

Jun Wang ◽

Xiangliang Zhang ◽

Maozu Guo ◽

Guoxian Yu

Keyword(s):

Deep Learning ◽

Amino Acid ◽

Protein Function ◽

Structural Information ◽

Semantic Representation ◽

Model Organism ◽

Amino Acid Sequences ◽

Feature Representation ◽

Convolutional Network ◽

Go Terms

Abstract Background Maize (Zea mays ssp. mays L.) is the most widely grown and yield crop in the world, as well as an important model organism for fundamental research of the function of genes. The functions of Maize proteins are annotated using the Gene Ontology (GO), which has more than 40000 terms and organizes GO terms in a direct acyclic graph (DAG). It is a huge challenge to accurately annotate relevant GO terms to a Maize protein from such a large number of candidate GO terms. Some deep learning models have been proposed to predict the protein function, but the effectiveness of these approaches is unsatisfactory. One major reason is that they inadequately utilize the GO hierarchy. Results To use the knowledge encoded in the GO hierarchy, we propose a deep Graph Convolutional Network (GCN) based model (DeepGOA) to predict GO annotations of proteins. DeepGOA firstly quantifies the correlations (or edges) between GO terms and updates the edge weights of the DAG by leveraging GO annotations and hierarchy, then learns the semantic representation and latent inter-relations of GO terms in the way by applying GCN on the updated DAG. Meanwhile, Convolutional Neural Network (CNN) is used to learn the feature representation of amino acid sequences with respect to the semantic representations. After that, DeepGOA computes the dot product of the two representations, which enable to train the whole network end-to-end coherently. Extensive experiments show that DeepGOA can effectively integrate GO structural information and amino acid information, and then annotates proteins accurately. Conclusions Experiments on Maize PH207 inbred line and Human protein sequence dataset show that DeepGOA outperforms the state-of-the-art deep learning based methods. The ablation study proves that GCN can employ the knowledge of GO and boost the performance. Codes and datasets are available at http://mlda.swu.edu.cn/codes.php?name=DeepGOA.

Download Full-text

Convolutional neural networks with image representation of amino acid sequences for protein function prediction

Computational Biology and Chemistry ◽

10.1016/j.compbiolchem.2021.107494 ◽

2021 ◽

Vol 92 ◽

pp. 107494

Author(s):

Samia Tasnim Sara ◽

Md Mehedi Hasan ◽

Ahsan Ahmad ◽

Swakkhar Shatabda

Keyword(s):

Neural Networks ◽

Amino Acid ◽

Convolutional Neural Networks ◽

Protein Function ◽

Protein Function Prediction ◽

Image Representation ◽

Function Prediction ◽

Amino Acid Sequences

Download Full-text

ProteInfer: deep networks for protein functional inference

10.1101/2021.09.20.461077 ◽

2021 ◽

Author(s):

Theo Sanderson ◽

Maxwell L Bileschi ◽

David Belanger ◽

Lucy Colwell

Keyword(s):

Amino Acid ◽

Amino Acid Sequence ◽

Protein Function ◽

Protein Function Prediction ◽

Query Sequence ◽

Functional Space ◽

Amino Acid Sequences ◽

Deep Convolutional Neural Networks ◽

Software Interfaces ◽

Downstream Analysis

Predicting the function of a protein from its amino acid sequence is a long-standing challenge in bioinformatics. Traditional approaches use sequence alignment to compare a query sequence either to thousands of models of protein families or to large databases of individual protein sequences. Here we instead employ deep convolutional neural networks to directly predict a variety of protein functions -- EC numbers and GO terms -- directly from an unaligned amino acid sequence. This approach provides precise predictions which complement alignment-based methods, and the computational efficiency of a single neural network permits novel and lightweight software interfaces, which we demonstrate with an in-browser graphical interface for protein function prediction in which all computation is performed on the user's personal computer with no data uploaded to remote servers. Moreover, these models place full-length amino acid sequences into a generalised functional space, facilitating downstream analysis and interpretation. To read the interactive version of this paper, visit https://google-research.github.io/proteinfer/

Download Full-text

Deep_CNN_LSTM_GO: Protein function prediction from amino-acid sequences

Computational Biology and Chemistry ◽

10.1016/j.compbiolchem.2021.107584 ◽

2021 ◽

Vol 95 ◽

pp. 107584

Author(s):

Mohamed E.M. Elhaj-Abdou ◽

Hassan El-Dib ◽

Amr El-Helw ◽

Mohamed El-Habrouk

Keyword(s):

Amino Acid ◽

Protein Function ◽

Protein Function Prediction ◽

Function Prediction ◽

Amino Acid Sequences

Download Full-text

Protein function prediction with gene ontology: from traditional to deep learning models

PeerJ ◽

10.7717/peerj.12019 ◽

2021 ◽

Vol 9 ◽

pp. e12019

Author(s):

Thi Thuy Duong Vu ◽

Jaehee Jung

Keyword(s):

Gene Ontology ◽

Deep Learning ◽

Protein Function ◽

High Throughput Sequencing ◽

Protein Function Prediction ◽

Rapid Development ◽

Function Prediction ◽

Amino Acid Sequences ◽

Go Annotation ◽

Go Terms

Protein function prediction is a crucial part of genome annotation. Prediction methods have recently witnessed rapid development, owing to the emergence of high-throughput sequencing technologies. Among the available databases for identifying protein function terms, Gene Ontology (GO) is an important resource that describes the functional properties of proteins. Researchers are employing various approaches to efficiently predict the GO terms. Meanwhile, deep learning, a fast-evolving discipline in data-driven approach, exhibits impressive potential with respect to assigning GO terms to amino acid sequences. Herein, we reviewed the currently available computational GO annotation methods for proteins, ranging from conventional to deep learning approach. Further, we selected some suitable predictors from among the reviewed tools and conducted a mini comparison of their performance using a worldwide challenge dataset. Finally, we discussed the remaining major challenges in the field, and emphasized the future directions for protein function prediction with GO.

Download Full-text

Protein Function Prediction: From Traditional Classifier to Deep Learning

PROTEOMICS ◽

10.1002/pmic.201900119 ◽

2019 ◽

pp. 1900119 ◽

Cited By ~ 27

Author(s):

Zhibin Lv ◽

Chunyan Ao ◽

Quan Zou

Keyword(s):

Deep Learning ◽

Protein Function ◽

Protein Function Prediction ◽

Function Prediction

Download Full-text

A Novel Method for Functional Annotation Prediction Based on Combination of Classification Methods

The Scientific World JOURNAL ◽

10.1155/2014/542824 ◽

2014 ◽

Vol 2014 ◽

pp. 1-9

Author(s):

Jaehee Jung ◽

Heung Ki Lee ◽

Gangman Yi

Keyword(s):

Protein Function ◽

Protein Function Prediction ◽

Controlled Vocabulary ◽

Functional Annotations ◽

Functional Homology ◽

Large Sets ◽

Unknown Protein ◽

Protein Functions ◽

Novel Method ◽

The Relationship

Automated protein function prediction defines the designation of functions of unknown protein functions by using computational methods. This technique is useful to automatically assign gene functional annotations for undefined sequences in next generation genome analysis (NGS). NGS is a popular research method since high-throughput technologies such as DNA sequencing and microarrays have created large sets of genes. These huge sequences have greatly increased the need for analysis. Previous research has been based on the similarities of sequences as this is strongly related to the functional homology. However, this study aimed to designate protein functions by automatically predicting the function of the genome by utilizing InterPro (IPR), which can represent the properties of the protein family and groups of the protein function. Moreover, we used gene ontology (GO), which is the controlled vocabulary used to comprehensively describe the protein function. To define the relationship between IPR and GO terms, three pattern recognition techniques have been employed under different conditions, such as feature selection and weighted value, instead of a binary one.

Download Full-text

Structure and function of voltage-dependent sodium channels: comparison of brain II and cardiac isoforms

Physiological Reviews ◽

10.1152/physrev.1996.76.3.887 ◽

1996 ◽

Vol 76 (3) ◽

pp. 887-926 ◽

Cited By ~ 193

Author(s):

H. A. Fozzard ◽

D. A. Hanck

Keyword(s):

Amino Acid ◽

Three Dimensional ◽

Point Mutations ◽

Amino Acid Sequences ◽

Na Channel ◽

Na Channels ◽

Voltage Dependent ◽

And Function ◽

Relationship Of ◽

The Relationship

Cardiac and nerve Na channels have broadly similar functional properties and amino acid sequences, but they demonstrate specific differences in gating, permeation, ionic block, modulation, and pharmacology. Resolution of three-dimensional structures of Na channels is unlikely in the near future, but a number of amino acid sequences from a variety of species and isoforms are known so that channel differences can be exploited to gain insight into the relationship of structure to function. The combination of molecular biology to create chimeras and channels with point mutations and high-resolution electrophysiological techniques to study function encourage the idea that predictions of structure from function are possible. With the goal of understanding the special properties of the cardiac Na channel, this review examines the structural (sequence) similarities between the cardiac and nerve channels and considers what is known about the relationship of structure to function for voltage-dependent Na channels in general and for the cardiac Na channels in particular.

Download Full-text

A Deep Learning Approach Based on Stacked Denoising Autoencoders for Protein Function Prediction

2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC) ◽

10.1109/compsac.2018.00074 ◽

2018 ◽

Cited By ~ 1

Author(s):

Lester James Miranda ◽

Jinglu Hu

Keyword(s):

Deep Learning ◽

Protein Function ◽

Protein Function Prediction ◽

Function Prediction ◽

Learning Approach

Download Full-text