Computational Prediction of Protein Function Based on Weighted Mapping of Domains and GO Terms

BioMed Research International ◽

10.1155/2014/641469 ◽

2014 ◽

Vol 2014 ◽

pp. 1-9 ◽

Cited By ~ 2

Author(s):

Zhixia Teng ◽

Maozu Guo ◽

Qiguo Dai ◽

Chunyu Wang ◽

Jin Li ◽

...

Keyword(s):

Conditional Probability ◽

Protein Function ◽

Protein Function Prediction ◽

Computational Prediction ◽

Host Protein ◽

Association Strength ◽

Predict Protein Function ◽

Domain Composition ◽

Novel Method ◽

Go Terms

In this paper, we propose a novel method, SeekFun, to predict protein function based on weighted mapping of domains and GO terms. Firstly, a weighted mapping of domains and GO terms is constructed according to GO annotations and domain composition of the proteins. The association strength between domain and GO term is weighted by symmetrical conditional probability. Secondly, the mapping is extended along the true paths of the terms based on GO hierarchy. Finally, the terms associated with resident domains are transferred to host protein and real annotations of the host protein are determined by association strengths. Our careful comparisons demonstrate that SeekFun outperforms the concerned methods on most occasions. SeekFun provides a flexible and effective way for protein function prediction. It benefits from the well-constructed mapping of domains and GO terms, as well as the reasonable strategy for inferring annotations of protein from those of its domains.

Download Full-text

A Novel Method for Functional Annotation Prediction Based on Combination of Classification Methods

The Scientific World JOURNAL ◽

10.1155/2014/542824 ◽

2014 ◽

Vol 2014 ◽

pp. 1-9

Author(s):

Jaehee Jung ◽

Heung Ki Lee ◽

Gangman Yi

Keyword(s):

Protein Function ◽

Protein Function Prediction ◽

Controlled Vocabulary ◽

Functional Annotations ◽

Functional Homology ◽

Large Sets ◽

Unknown Protein ◽

Protein Functions ◽

Novel Method ◽

The Relationship

Automated protein function prediction defines the designation of functions of unknown protein functions by using computational methods. This technique is useful to automatically assign gene functional annotations for undefined sequences in next generation genome analysis (NGS). NGS is a popular research method since high-throughput technologies such as DNA sequencing and microarrays have created large sets of genes. These huge sequences have greatly increased the need for analysis. Previous research has been based on the similarities of sequences as this is strongly related to the functional homology. However, this study aimed to designate protein functions by automatically predicting the function of the genome by utilizing InterPro (IPR), which can represent the properties of the protein family and groups of the protein function. Moreover, we used gene ontology (GO), which is the controlled vocabulary used to comprehensively describe the protein function. To define the relationship between IPR and GO terms, three pattern recognition techniques have been employed under different conditions, such as feature selection and weighted value, instead of a binary one.

Download Full-text

A Topology-Based Metric for Measuring Term Similarity in the Gene Ontology

Advances in Bioinformatics ◽

10.1155/2012/975783 ◽

2012 ◽

Vol 2012 ◽

pp. 1-17 ◽

Cited By ~ 27

Author(s):

Gaston K. Mazandu ◽

Nicola J. Mulder

Keyword(s):

Gene Ontology ◽

Protein Function ◽

Protein Function Prediction ◽

Similarity Measures ◽

Biological Knowledge ◽

Online Tool ◽

Protein Protein Interaction ◽

Or Groups ◽

Term Similarity ◽

Go Terms

The wide coverage and biological relevance of the Gene Ontology (GO), confirmed through its successful use in protein function prediction, have led to the growth in its popularity. In order to exploit the extent of biological knowledge that GO offers in describing genes or groups of genes, there is a need for an efficient, scalable similarity measure for GO terms and GO-annotated proteins. While several GO similarity measures exist, none adequately addresses all issues surrounding the design and usage of the ontology. We introduce a new metric for measuring the distance between two GO terms using the intrinsic topology of the GO-DAG, thus enabling the measurement of functional similarities between proteins based on their GO annotations. We assess the performance of this metric using a ROC analysis on human protein-protein interaction datasets and correlation coefficient analysis on the selected set of protein pairs from the CESSM online tool. This metric achieves good performance compared to the existing annotation-based GO measures. We used this new metric to assess functional similarity between orthologues, and show that it is effective at determining whether orthologues are annotated with similar functions and identifying cases where annotation is inconsistent between orthologues.

Download Full-text

DYNAMICALLY SEARCHING FOR A DOMAIN FOR PROTEIN FUNCTION PREDICTION

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972001350008x ◽

2013 ◽

Vol 11 (04) ◽

pp. 1350008 ◽

Cited By ~ 1

Author(s):

JINGYU HOU ◽

YONGQING JIANG

Keyword(s):

Protein Function ◽

Structural Information ◽

Protein Function Prediction ◽

Function Prediction ◽

Computational Approaches ◽

Protein Protein Interaction ◽

Protein Functions ◽

Novel Method ◽

Function Information ◽

The Relationship

The availability of large amounts of protein–protein interaction (PPI) data makes it feasible to use computational approaches to predict protein functions. The base of existing computational approaches is to exploit the known function information of annotated proteins in the PPI data to predict functions of un-annotated proteins. However, these approaches consider the prediction domain (i.e. the set of proteins from which the functions are predicted) as unchangeable during the prediction procedure. This may lead to valuable information being overwhelmed by the unavoidable noise information in the PPI data when predicting protein functions, and in turn, the prediction results will be distorted. In this paper, we propose a novel method to dynamically predict protein functions from the PPI data. Our method regards the function prediction as a dynamic process of finding a suitable prediction domain, from which representative functions of the domain are selected to predict functions of un-annotated proteins. Our method exploits the topological structural information of a PPI network and the semantic relationship between protein functions to measure the relationship between proteins, dynamically select a suitable prediction domain and predict functions. The evaluation on real PPI datasets demonstrated the effectiveness of our proposed method, and generated better prediction results.

Download Full-text

COMPUTATIONAL METHOD FOR PROTEIN FUNCTION PREDICTION BY CONSTRUCTING PROTEIN INTERACTION NETWORK DICTIONARY

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001406004661 ◽

2006 ◽

Vol 20 (02) ◽

pp. 285-295 ◽

Cited By ~ 2

Author(s):

HEE-JEONG JIN ◽

HWAN-GUE CHO

Keyword(s):

Protein Interaction ◽

Protein Interaction Network ◽

Protein Function ◽

Protein Function Prediction ◽

Interaction Network ◽

Computational Method ◽

Chi Square ◽

Protein Protein Interaction ◽

Protein Functions ◽

Novel Method

In the post-genomic era, predicting protein function is a challenging problem. It is difficult and burdensome work to unravel the functions of a protein by wet experiments only. In this paper, we propose a novel method to predict protein functions by building a "Protein Interaction Network Dictionary (PIND)". This method deduces the protein functions by searching the most similar "words"(an anagram of functions in neighbor proteins on a protein–protein interaction graph) using global alignments. An evaluation of sensitivity and specificity shows that this PIND approach outperforms previous approaches such as Majority Rule and Chi-Square measure, and that it competes with the recently introduced Random Markov Model approach.

Download Full-text

DeepGOPlus: improved protein function prediction from sequence

Bioinformatics ◽

10.1093/bioinformatics/btz595 ◽

2019 ◽

Cited By ~ 17

Author(s):

Maxat Kulmanov ◽

Robert Hoehndorf

Keyword(s):

Protein Function ◽

Drug Targets ◽

Sequence Similarity ◽

Protein Function Prediction ◽

Function Prediction ◽

Supplementary Information ◽

Protein Protein Interaction ◽

Wide Range ◽

Protein Functions ◽

Novel Method

Abstract Motivation Protein function prediction is one of the major tasks of bioinformatics that can help in wide range of biological problems such as understanding disease mechanisms or finding drug targets. Many methods are available for predicting protein functions from sequence based features, protein–protein interaction networks, protein structure or literature. However, other than sequence, most of the features are difficult to obtain or not available for many proteins thereby limiting their scope. Furthermore, the performance of sequence-based function prediction methods is often lower than methods that incorporate multiple features and predicting protein functions may require a lot of time. Results We developed a novel method for predicting protein functions from sequence alone which combines deep convolutional neural network (CNN) model with sequence similarity based predictions. Our CNN model scans the sequence for motifs which are predictive for protein functions and combines this with functions of similar proteins (if available). We evaluate the performance of DeepGOPlus using the CAFA3 evaluation measures and achieve an Fmax of 0.390, 0.557 and 0.614 for BPO, MFO and CCO evaluations, respectively. These results would have made DeepGOPlus one of the three best predictors in CCO and the second best performing method in the BPO and MFO evaluations. We also compare DeepGOPlus with state-of-the-art methods such as DeepText2GO and GOLabeler on another dataset. DeepGOPlus can annotate around 40 protein sequences per second on common hardware, thereby making fast and accurate function predictions available for a wide range of proteins. Availability and implementation http://deepgoplus.bio2vec.net/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

INGA 2.0: improving protein function prediction for the dark proteome

Nucleic Acids Research ◽

10.1093/nar/gkz375 ◽

2019 ◽

Vol 47 (W1) ◽

pp. W373-W378 ◽

Cited By ~ 8

Author(s):

Damiano Piovesan ◽

Silvio C E Tosatto

Keyword(s):

Protein Function ◽

Current Knowledge ◽

Protein Function Prediction ◽

Domain Architecture ◽

Function Prediction ◽

Intrinsically Disordered ◽

Intrinsically Disordered Regions ◽

Additional Input ◽

Consensus Prediction ◽

Go Terms

Abstract Our current knowledge of complex biological systems is stored in a computable form through the Gene Ontology (GO) which provides a comprehensive description of genes function. Prediction of GO terms from the sequence remains, however, a challenging task, which is particularly critical for novel genomes. Here we present INGA 2.0, a new version of the INGA software for protein function prediction. INGA exploits homology, domain architecture, interaction networks and information from the ‘dark proteome’, like transmembrane and intrinsically disordered regions, to generate a consensus prediction. INGA was ranked in the top ten methods on both CAFA2 and CAFA3 blind tests. The new algorithm can process entire genomes in a few hours or even less when additional input files are provided. The new interface provides a better user experience by integrating filters and widgets to explore the graph structure of the predicted terms. The INGA web server, databases and benchmarking are available from URL: https://inga.bio.unipd.it/.

Download Full-text

NetGO: Improving Large-scale Protein Function Prediction with Massive Network Information

10.1101/439554 ◽

2018 ◽

Author(s):

Ronghui You ◽

Shuwei Yao ◽

Xiaodi Huang ◽

Fengzhu Sun ◽

Hiroshi Mamitsuka ◽

...

Keyword(s):

Protein Function ◽

Large Scale ◽

Protein Function Prediction ◽

Learning To Rank ◽

Performance Comparison ◽

Function Prediction ◽

Sequence Information ◽

Network Information ◽

Species Specific ◽

Go Terms

AbstractAutomated function prediction (AFP) of proteins is of great significance in biology. In essence, AFP is a large-scale multi-label classification over pairs of proteins and GO terms. Existing AFP approaches, however, have their limitations on both sides of proteins and GO terms. Using various sequence information and the robust learning to rank (LTR) framework, we have developed GOLabeler, a state-of-the-art approach of CAFA3, which overcomes the limitation of the GO term side, such as imbalanced GO terms. Unfortunately, for the protein side issue, available abundant protein information, except for sequences, have not been effectively used for large-scale AFP in CAFA. We propose NetGO that is able to improve large-scale AFP with massive network information. The novelties of NetGO have threefold in using network information: 1) the powerful LTR framework of NetGO efficiently and effectively integrates both sequence and network information, which can easily make large-scale AFP; 2) NetGO can use whole and massive network information of all species (>2000) in STRING (other than only high confidence links and/or some specific species); and 3) NetGO can still use network information to annotate a protein by homology transfer even if it is not covered in STRING. Under numerous experimental settings, we examined the performance of NetGO, such as general performance comparison, species-specific prediction, and prediction on difficult proteins, by using training and test data separated by time-delayed settings of CAFA. Experimental results have clearly demonstrated that NetGO outperforms GOLabeler, DeepGO, and other compared baseline methods significantly. In addition, several interesting findings from our experiments on NetGO would be useful for future AFP research.

Download Full-text

CrowdGO: a wisdom of the crowd-based Gene Ontology annotation tool

10.1101/731596 ◽

2019 ◽

Cited By ~ 1

Author(s):

Maarten J.M.F. Reijnders

Keyword(s):

Gene Ontology ◽

Protein Function ◽

Directed Acyclic Graph ◽

Support Vector Machine Model ◽

Protein Function Prediction ◽

Support Vector ◽

Annotation Tool ◽

Machine Model ◽

Prediction Tools ◽

Go Terms

AbstractMotivationProtein function prediction tools vary widely in their methodologies, resulting in different sets of GO terms being correctly predicted. Ideally, multiple tools are combined to achieve a higher recall of GO terms while increasing precision.ResultsCrowdGO combines input predictions from any number of tools and combines them based on the Gene Ontology Directed Acyclic Graph. Using each GO terms information content, the semantic similarity between GO predictions of different tools, and a Support Vector Machine model, it achieves improved precision and recall compared to each of the tools separately (Figure 1).AvailabilityCrowdGO can be found at https://gitlab.com/mreijnders/CrowdGO

Download Full-text

FLORA: A Novel Method to Predict Protein Function from Structure in Diverse Superfamilies

PLoS Computational Biology ◽

10.1371/journal.pcbi.1000485 ◽

2009 ◽

Vol 5 (8) ◽

pp. e1000485 ◽

Cited By ~ 32

Author(s):

Oliver C. Redfern ◽

Benoît H. Dessailly ◽

Timothy J. Dallman ◽

Ian Sillitoe ◽

Christine A. Orengo

Keyword(s):

Protein Function ◽

Predict Protein Function ◽

Novel Method

Download Full-text

Protein function prediction from protein–protein interaction network using gene ontology based neighborhood analysis and physico-chemical features

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720018500257 ◽

2018 ◽

Vol 16 (06) ◽

pp. 1850025 ◽

Cited By ~ 5

Author(s):

Sovan Saha ◽

Abhimanyu Prasad ◽

Piyali Chatterjee ◽

Subhadip Basu ◽

Mita Nasipuri

Keyword(s):

Functional Groups ◽

Protein Function ◽

Functional Group ◽

Protein Function Prediction ◽

Interaction Network ◽

Function Prediction ◽

Protein Protein Interaction ◽

Physico Chemical ◽

Go Terms ◽

Protein Protein Interaction Network

Protein Function Prediction from Protein–Protein Interaction Network (PPIN) and physico-chemical features using the Gene Ontology (GO) classification are indeed very useful for assigning biological or biochemical functions to a protein. They also lead to the identification of those significant proteins which are responsible for the generation of various diseases whose drugs are still yet to be discovered. So, the prediction of GO functional terms from PPIN and sequence is an important field of study. In this work, we have proposed a methodology, Multi Label Protein Function Prediction (ML_PFP) which is based on Neighborhood analysis empowered with physico-chemical features of constituent amino acids to predict the functional group of unannotated protein. A protein does not perform functions in isolation rather it performs functions in a group by interacting with others. So a protein is involved in many functions or, in other words, may be associated with multiple functional groups or labels or GO terms. Though functional group of other known interacting partner protein and its physico-chemical features provide useful information, assignment of multiple labels to unannotated protein is a very challenging task. Here, we have taken Homo sapiens or Human PPIN as well as Saccharomyces cerevisiae or yeast PPIN along with their GO terms to predict functional groups or GO terms of unannotated proteins. This work has become very challenging as both Human and Yeast protein dataset are voluminous and complex in nature and multi-label functional groups assignment has also added a new dimension to this challenge. Our algorithm has been observed to achieve a better performance in Cellular Function, Molecular Function and Biological Process of both yeast and human network when compared with the other existing state-of-the-art methodologies which will be discussed in detail in the results section.

Download Full-text