Function Prediction of Human Proteins

Prashansa Roy Bhavesh Tanawala and Hetal Gaudani

doi:10.46501/ijmtst061111

Function Prediction of Human Proteins

International Journal for Modern Trends in Science and Technology - RTT2020 ◽

10.46501/ijmtst061111 ◽

2020 ◽

Vol 6 (11) ◽

pp. 63-66

Author(s):

Prashansa Roy Bhavesh Tanawala and Hetal Gaudani

Keyword(s):

Computational Methods ◽

Protein Function ◽

Structural Features ◽

Biological Data ◽

Protein Mass ◽

Research Areas ◽

Protein Functions ◽

Human Proteins ◽

Enzyme Nomenclature ◽

Biological Methods

Due to the rapidness in research, accumulation of biological data is happening at an overwhelming rate. Advanced computation techniques are required to gather the useful information from this enormous amount of protein data such that the knowledge is practically useful and easily interpretable. For instance, drug discoverers need biological or computational methods to predict the functions of proteins, responsible for different sort of diseases in human body. Since traditional biological methods were time consuming and comparatively expensive, various computational methods have been introduced in the respective research areas. In this project, we have tried to generate machine learning models that predict the protein function of unknown proteins and analyze their performance to get a model with highest accuracy. Protein function's sequence annotations such as Amino Acid modifications, Molecule Processing and other structural features like Active Site, Beta strand, Chain, etc. along with it even protein mass and length are considered for prediction of protein functions. To further improve the accuracy feature selection has been performed. According to the enzyme nomenclature scheme the protein are classified into 6 groups. This enzyme classes is nothing but the crystalize reactions of proteins and shows the functions of it.

Download Full-text

NPF：Network propagation for protein function prediction

10.21203/rs.3.rs-16452/v1 ◽

2020 ◽

Author(s):

bihai zhao ◽

Zhihong Zhang ◽

Meiping Jiang ◽

Sai Hu ◽

Yingchun Luo ◽

...

Keyword(s):

Protein Interaction ◽

Protein Function ◽

Function Prediction ◽

Biological Data ◽

Protein Interaction Networks ◽

Functional Similarity ◽

Interaction Networks ◽

Omics Data ◽

Protein Functions ◽

Network Propagation

Abstract Background: The accurate annotation of protein functions is of great significance in elucidating the phenomena of life, disease treatment and new drug development. Various methods have been developed to facilitate the prediction of functions by combining protein interaction networks (PINs) with multi-omics data. However, how to make full use of multiple biological data to improve the performance of functions annotation is still a dilemma.Results: We presented NPF (Network Propagation for Functions prediction), an integrative protein function predicting framework assisted by network propagation and functional module detection, for discovering interacting partners with similar functions to target proteins. NPF leverages knowledge of the protein interaction network architecture and multi-omics data, such as domain annotation and protein complex information, to augment protein-protein functional similarity in a propagation manner. We have verified the great potential of NPF for accurately inferring protein functions. Comprehensive evaluation of NPF indicates that NPF archived higher performance than competing methods in terms of leave-one-out cross-validation and ten-fold cross validation.Conclusions: We demonstrated that network propagation combined with multi-omics data can not only discover more partners with similar function, but also effectively free from the constraints of the "small-world" feature of protein interaction networks. We conclude that the performance of function prediction depends greatly on whether we can extract and exploit proper functional similarity information from protein correlations.

Download Full-text

The Applications of Clustering Methods in Predicting Protein Functions

Current Proteomics ◽

10.2174/1570164616666181212114612 ◽

2019 ◽

Vol 16 (5) ◽

pp. 354-358

Author(s):

Weiyang Chen ◽

Weiwei Li ◽

Guohua Huang ◽

Matthew Flavel

Keyword(s):

Computational Methods ◽

Protein Function ◽

Protein Function Prediction ◽

Function Prediction ◽

Biological Processes ◽

Clustering Methods ◽

Protein Functions

Background: The understanding of protein function is essential to the study of biological processes. However, the prediction of protein function has been a difficult task for bioinformatics to overcome. This has resulted in many scholars focusing on the development of computational methods to address this problem. Objective: In this review, we introduce the recently developed computational methods of protein function prediction and assess the validity of these methods. We then introduce the applications of clustering methods in predicting protein functions.

Download Full-text

Integrated Network Approach to Protein Function Prediction

Information Technology and Management Science ◽

10.7250/itms-2018-0016 ◽

2018 ◽

Vol 21 ◽

pp. 98-103

Author(s):

Natalia Novoselova ◽

Igar Tom

Keyword(s):

Protein Function ◽

Protein Function Prediction ◽

Function Prediction ◽

Biological Data ◽

Label Propagation ◽

Integrated Network ◽

Functional Annotations ◽

Additional Information ◽

Integration Schemes ◽

Protein Functions

One of the main problems in functional genomics is the prediction of the unknown gene/protein functions. With the rapid increase of high-throughput technologies, the vast amount of biological data describing different aspects of cellular functioning became available and made it possible to use them as the additional information sources for function prediction and to improve their accuracy.In our research, we have described an approach to protein function prediction on the basis of integration of several biological datasets. Initially, each dataset is presented in the form of a graph (or network), where the nodes represent genes or their products and the edges represent physical, functional or chemical relationships between nodes. The integration process makes it possible to estimate the network importance for the prediction of a particular function taking into account the imbalance between the functional annotations, notably the disproportion between positively and negatively annotated proteins. The protein function prediction consists in applying the label propagation algorithm to the integrated biological network in order to annotate the unknown proteins or determine the new function to already known proteins. The comparative analysis of the prediction efficiency with several integration schemes shows the positive effect in terms of several performance measures.

Download Full-text

Conservation motifs - a novel evolutionary-based classification of proteins

10.1101/2020.01.12.903138 ◽

2020 ◽

Author(s):

Hodaya Beer ◽

Dana Sherill-Rofe ◽

Irene Unterman ◽

Idit Bloch ◽

Mendel Isseroff ◽

...

Keyword(s):

Natural Selection ◽

Protein Evolution ◽

Protein Function ◽

Biological Significance ◽

Protein Protein Interaction ◽

Protein Functions ◽

Human Proteins ◽

Conservation Patterns ◽

Protein Conservation

Cross-species protein conservation patterns, as directed by natural selection, are indicative of the interplay between protein function, protein-protein interaction and evolution. Since the beginning of the genomic era, proteins were characterized as either conserved or not conserved. This simple classification became archaic and cursory once data on protein orthologs became available for thousands of species. To enrich the language used to describe protein conservation patterns, and to understand their biological significance, we classified 20,294 human proteins against 1096 species. Analyses of the conservation patterns of human proteins in different eukaryotic clades yielded extremely variable and rich patterns that had never been characterized or studied before. Using mathematical classifications, we defined seven conservation motifs: Steps, Critical, Lately Developed, Plateau, Clade Loss, Trait Loss and Gain, which describe the evolution of human proteins. Overall, our work offers novel terms for conservation patterns and defines a new language intended to comprehensively describe protein evolution. This novel terminology enables the classification of proteins based on evolution, reveals aspects of protein evolution, and improves the understanding of protein functions.

Download Full-text

Towards Computational Models of Identifying Protein Ubiquitination Sites

Current Drug Targets ◽

10.2174/1389450119666180924150202 ◽

2019 ◽

Vol 20 (5) ◽

pp. 565-578 ◽

Cited By ~ 1

Author(s):

Lidong Wang ◽

Ruijun Zhang

Keyword(s):

Computational Methods ◽

Computational Models ◽

Feature Representation ◽

Biological Sequence ◽

Post Translational Modification ◽

Test Dataset ◽

Protein Ubiquitination ◽

Protein Functions ◽

Independent Test Dataset ◽

Benchmark Datasets

Ubiquitination is an important post-translational modification (PTM) process for the regulation of protein functions, which is associated with cancer, cardiovascular and other diseases. Recent initiatives have focused on the detection of potential ubiquitination sites with the aid of physicochemical test approaches in conjunction with the application of computational methods. The identification of ubiquitination sites using laboratory tests is especially susceptible to the temporality and reversibility of the ubiquitination processes, and is also costly and time-consuming. It has been demonstrated that computational methods are effective in extracting potential rules or inferences from biological sequence collections. Up to the present, the computational strategy has been one of the critical research approaches that have been applied for the identification of ubiquitination sites, and currently, there are numerous state-of-the-art computational methods that have been developed from machine learning and statistical analysis to undertake such work. In the present study, the construction of benchmark datasets is summarized, together with feature representation methods, feature selection approaches and the classifiers involved in several previous publications. In an attempt to explore pertinent development trends for the identification of ubiquitination sites, an independent test dataset was constructed and the predicting results obtained from five prediction tools are reported here, together with some related discussions.

Download Full-text

Avoided motifs: short amino acid strings missing from protein datasets

Biological Chemistry ◽

10.1515/hsz-2020-0383 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Pablo Mier ◽

Miguel A. Andrade-Navarro

Keyword(s):

Amino Acids ◽

Amino Acid ◽

Protein Function ◽

Large Protein ◽

New Approach ◽

Cellular Context ◽

Human Proteins ◽

Context Specific ◽

Protein Datasets

Abstract According to the amino acid composition of natural proteins, it could be expected that all possible sequences of three or four amino acids will occur at least once in large protein datasets purely by chance. However, in some species or cellular context, specific short amino acid motifs are missing due to unknown reasons. We describe these as Avoided Motifs, short amino acid combinations missing from biological sequences. Here we identify 209 human and 154 bacterial Avoided Motifs of length four amino acids, and discuss their possible functionality according to their presence in other species. Furthermore, we determine two Avoided Motifs of length three amino acids in human proteins specifically located in the cytoplasm, and two more in secreted proteins. Our results support the hypothesis that the characterization of Avoided Motifs in particular contexts can provide us with information about functional motifs, pointing to a new approach in the use of molecular sequences for the discovery of protein function.

Download Full-text

Three-Dimensional Structures of Carbohydrates and Where to Find Them

International Journal of Molecular Sciences ◽

10.3390/ijms21207702 ◽

2020 ◽

Vol 21 (20) ◽

pp. 7702 ◽

Cited By ~ 1

Author(s):

Sofya I. Scherbinina ◽

Philip V. Toukach

Keyword(s):

Experimental Data ◽

Molecular Modeling ◽

Computational Methods ◽

Three Dimensional ◽

Structural Diversity ◽

Structural Data ◽

Structural Features ◽

Data Validation ◽

Data Generation ◽

Efficient Treatment

Analysis and systematization of accumulated data on carbohydrate structural diversity is a subject of great interest for structural glycobiology. Despite being a challenging task, development of computational methods for efficient treatment and management of spatial (3D) structural features of carbohydrates breaks new ground in modern glycoscience. This review is dedicated to approaches of chemo- and glyco-informatics towards 3D structural data generation, deposition and processing in regard to carbohydrates and their derivatives. Databases, molecular modeling and experimental data validation services, and structure visualization facilities developed for last five years are reviewed.

Download Full-text

A Survey for Predicting ATP Binding Residues of Proteins Using Machine Learning Methods

Current Medicinal Chemistry ◽

10.2174/0929867328666210910125802 ◽

2021 ◽

Vol 28 ◽

Author(s):

Yu-He Yang ◽

Jia-Shu Wang ◽

Shi-Shi Yuan ◽

Meng-Lu Liu ◽

Wei Su ◽

...

Keyword(s):

Machine Learning ◽

Protein Function ◽

Vital Role ◽

Atp Binding ◽

Learning Methods ◽

Machine Learning Methods ◽

Protein Ligand Interactions ◽

Protein Functions ◽

Ligand Interactions ◽

Binding Residues

: Protein-ligand interactions are necessary for majority protein functions. Adenosine-5’-triphosphate (ATP) is one such ligand that plays vital role as a coenzyme in providing energy for cellular activities, catalyzing biological reaction and signaling. Knowing ATP binding residues of proteins is helpful for annotation of protein function and drug design. However, due to the huge amounts of protein sequences influx into databases in the post-genome era, experimentally identifying ATP binding residues is cost-ineffective and time-consuming. To address this problem, computational methods have been developed to predict ATP binding residues. In this review, we briefly summarized the application of machine learning methods in detecting ATP binding residues of proteins. We expect this review will be helpful for further research.

Download Full-text

Optobiochemistry: Genetically Encoded Control of Protein Activity by Light

Annual Review of Biochemistry ◽

10.1146/annurev-biochem-072420-112431 ◽

2021 ◽

Vol 90 (1) ◽

Author(s):

Jihye Seong ◽

Michael Z. Lin

Keyword(s):

Protein Function ◽

Living Cells ◽

Optical Methods ◽

Annual Review ◽

Publication Date ◽

Protein Activity ◽

Spatiotemporal Resolution ◽

Protein Functions ◽

Protein Classes ◽

Control Protein

Optobiochemical control of protein activities allows the investigation of protein functions in living cells with high spatiotemporal resolution. Over the last two decades, numerous natural photosensory domains have been characterized and synthetic domains engineered and assembled into photoregulatory systems to control protein function with light.Here, we review the field of optobiochemistry, categorizing photosensory domains by chromophore, describing photoregulatory systems by mechanism of action, and discussing protein classes frequently investigated using optical methods. We also present examples of how spatial or temporal control of proteins in living cells has provided new insights not possible with traditional biochemical or cell biological techniques. Expected final online publication date for the Annual Review of Biochemistry, Volume 90 is June 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Download Full-text

Embeddings from protein language models predict conservation and variant effects

10.21203/rs.3.rs-584804/v1 ◽

2021 ◽

Author(s):

Céline Marquet ◽

Michael Heinzinger ◽

Tobias Olenyi ◽

Christian Dallago ◽

Michael Bernhofer ◽

...

Keyword(s):

Protein Function ◽

Language Models ◽

Single Amino Acid ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Human Proteins ◽

Entire Sequence ◽

Embedding Methods ◽

Better Than

Abstract The emergence of SARS-CoV-2 variants stressed the demand for tools allowing to interpret the effect of single amino acid variants (SAVs) on protein function. While Deep Mutational Scanning (DMS) sets continue to expand our understanding of the mutational landscape of single proteins, the results continue to challenge analyses. Protein Language Models (LMs) use the latest deep learning (DL) algorithms to leverage growing databases of protein sequences. These methods learn to predict missing or marked amino acids from the context of entire sequence regions. Here, we explored how to benefit from learned protein LM representations (embeddings) to predict SAV effects. Although we have failed so far to predict SAV effects directly from embeddings, this input alone predicted residue conservation almost as accurately from single sequences as using multiple sequence alignments (MSAs) with a two-state per-residue accuracy (conserved/not) of Q2=80% (embeddings) vs. 81% (ConSeq). Considering all SAVs at all residue positions predicted as conserved to affect function reached 68.6% (Q2: effect/neutral; for PMD) without optimization, compared to an expert solution such as SNAP2 (Q2=69.8). Combining predicted conservation with BLOSUM62 to obtain variant-specific binary predictions, DMS experiments of four human proteins were predicted better than by SNAP2, and better than by applying the same simplistic approach to conservation taken from ConSeq. Thus, embedding methods have become competitive with methods relying on MSAs for SAV effect prediction at a fraction of the costs in computing/energy. This allowed prediction of SAV effects for the entire human proteome (~20k proteins) within 17 minutes on a single GPU.

Download Full-text