scholarly journals Ensemble gene function prediction database reveals genes important for complex I formation inArabidopsis thaliana

2017 ◽  
Vol 217 (4) ◽  
pp. 1521-1534 ◽  
Author(s):  
Bjoern Oest Hansen ◽  
Etienne H. Meyer ◽  
Camilla Ferrari ◽  
Neha Vaid ◽  
Sara Movahedi ◽  
...  
2017 ◽  
Author(s):  
Bjoern Oest Hansen ◽  
Etienne H. Meyer ◽  
Camilla Ferrari ◽  
Neha Vaid ◽  
Sara Movahedi ◽  
...  

Despite increasing availability of sequenced genomes, accurate characterization of gene functions is needed to close the genotype-phenotype gap. Recent advances in gene function prediction rely on ensemble approaches that integrate the results from multiple inference methods to produce superior predictions. Yet, these developments remain largely unexplored in plants. We present Neighbor Counting Ensemble, a gene function prediction method which integrates eleven gene co-function networks for Arabidopsis thaliana, and produces more accurate gene function predictions for a larger fraction of genes with unknown function. We used these predictions to identify genes involved in mitochondrial complex I formation, and for five of them we confirmed the predictions experimentally. The ensemble predictions are provided as a user-friendly online database, EnsembleNet, available at http://www.gene2function.de/ensemblenet.html.


Author(s):  
Jeffrey N Law ◽  
Shiv D Kale ◽  
T M Murali

Abstract Motivation Nearly 40% of the genes in sequenced genomes have no experimentally or computationally derived functional annotations. To fill this gap, we seek to develop methods for network-based gene function prediction that can integrate heterogeneous data for multiple species with experimentally based functional annotations and systematically transfer them to newly sequenced organisms on a genome-wide scale. However, the large sizes of such networks pose a challenge for the scalability of current methods. Results We develop a label propagation algorithm called FastSinkSource. By formally bounding its rate of progress, we decrease the running time by a factor of 100 without sacrificing accuracy. We systematically evaluate many approaches to construct multi-species bacterial networks and apply FastSinkSource and other state-of-the-art methods to these networks. We find that the most accurate and efficient approach is to pre-compute annotation scores for species with experimental annotations, and then to transfer them to other organisms. In this manner, FastSinkSource runs in under 3 min for 200 bacterial species. Availability and implementation An implementation of our framework and all data used in this research are available at https://github.com/Murali-group/multi-species-GOA-prediction. Supplementary information Supplementary data are available at Bioinformatics online.


2010 ◽  
Vol 26 (7) ◽  
pp. 912-918 ◽  
Author(s):  
Christoph Lippert ◽  
Zoubin Ghahramani ◽  
Karsten M. Borgwardt

Sign in / Sign up

Export Citation Format

Share Document