scholarly journals Mining functional annotations across species

2018 ◽  
Author(s):  
Sven Warris ◽  
Steven Dijkxhoorn ◽  
Teije van Sloten ◽  
Bart van de Vossenberg

AbstractMotivationNumerous tools and databases exist to annotate and interpret the functions encoded in genomes (InterProScan, KEGG, GO etc.). However, analyzing and comparing functionality across a number of genomes, for example of related species, is not trivial.ResultsWe present a novel approach, for which KEGG and Gene Ontology data are imported into a Neo4j graph database and InterProScan results from several species are added. Using the Neo4j plugin for Cytoscape, users can query this database and visualize functional annotations (sub)graphs, to compare and group functional annotation across species.

2007 ◽  
Vol 31 (3) ◽  
pp. 374-384 ◽  
Author(s):  
Wei Shi ◽  
Wanlei Zhou ◽  
Dakang Xu

Discovery of cis-regulatory elements in gene promoters is a highly challenging research issue in computational molecular biology. This paper presents a novel approach to searching putative cis-regulatory elements in human promoters by first finding 8-mer sequences of high statistical significance from gene promoters of humans, mice, and Drosophila melanogaster, respectively, and then identifying the most conserved ones across the three species (phylogenetic footprinting). In this study, a conservation analysis on both closely related species (humans and mice) and distantly related species (humans/mice and Drosophila) is conducted not only to examine more candidates but also to improve the prediction accuracy. We have found 124 putative cis-regulatory elements and grouped these into 20 clusters. The investigation on the coexistence of these clusters in human gene promoters reveals that SP1, EGR, and NRF-1 are the dominant clusters appearing in the combinatorial combination of up to five clusters. Gene Ontology (GO) analysis also shows that many GO categories of transcription factors binding to these cis-regulatory elements match the GO categories of genes whose promoters contain these elements. Compared with previous research, the contribution of this study lies not only in the finding of new cis-regulatory elements, but also in its pioneering exploration on the coexistence of discovered elements and the GO relationship between transcription factors and regulated genes. This exploration verifies the putative cis-regulatory elements that have been found from this study and also gives new insight on the regulation mechanisms of gene expression.


2018 ◽  
Vol 14 (1) ◽  
pp. 4-10
Author(s):  
Fang Jing ◽  
Shao-Wu Zhang ◽  
Shihua Zhang

Background:Biological network alignment has been widely studied in the context of protein-protein interaction (PPI) networks, metabolic networks and others in bioinformatics. The topological structure of networks and genomic sequence are generally used by existing methods for achieving this task.Objective and Method:Here we briefly survey the methods generally used for this task and introduce a variant with incorporation of functional annotations based on similarity in Gene Ontology (GO). Making full use of GO information is beneficial to provide insights into precise biological network alignment.Results and Conclusion:We analyze the effect of incorporation of GO information to network alignment. Finally, we make a brief summary and discuss future directions about this topic.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Naihui Zhou ◽  
Yuxiang Jiang ◽  
Timothy R. Bergquist ◽  
Alexandra J. Lee ◽  
Balint Z. Kacsoh ◽  
...  

Abstract Background The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Results Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. Conclusion We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e11052
Author(s):  
Sushma Naithani ◽  
Daemon Dikeman ◽  
Priyanka Garg ◽  
Noor Al-Bader ◽  
Pankaj Jaiswal

The S-domain subfamily of receptor-like kinases (SDRLKs) in plants is poorly characterized. Most members of this subfamily are currently assigned gene function based on the S-locus Receptor Kinase from Brassica that acts as the female determinant of self-incompatibility (SI). However, Brassica like SI mechanisms does not exist in most plants. Thus, automated Gene Ontology (GO) pipelines are not sufficient for functional annotation of SDRLK subfamily members and lead to erroneous association with the GO biological process of SI. Here, we show that manual bio-curation can help to correct and improve the gene annotations and association with relevant biological processes. Using publicly available genomic and transcriptome datasets, we conducted a detailed analysis of the expansion of the rice (Oryza sativa) SDRLK subfamily, the structure of individual genes and proteins, and their expression.The 144-member SDRLK family in rice consists of 82 receptor-like kinases (RLKs) (67 full-length, 15 truncated),12 receptor-like proteins, 14 SD kinases, 26 kinase-like and 10 GnK2 domain-containing kinases and RLKs. Except for nine genes, all other SDRLK family members are transcribed in rice, but they vary in their tissue-specific and stress-response expression profiles. Furthermore, 98 genes show differential expression under biotic stress and 98 genes show differential expression under abiotic stress conditions, but share 81 genes in common.Our analysis led to the identification of candidate genes likely to play important roles in plant development, pathogen resistance, and abiotic stress tolerance. We propose a nomenclature for 144 SDRLK gene family members based on gene/protein conserved structural features, gene expression profiles, and literature review. Our biocuration approach, rooted in the principles of findability, accessibility, interoperability and reusability, sets forth an example of how manual annotation of large-gene families can fill in the knowledge gap that exists due to the implementation of automated GO projections, thereby helping to improve the quality and contents of public databases.


2012 ◽  
pp. 899-917 ◽  
Author(s):  
Luciano Milanesi ◽  
Ivan Merelli ◽  
Gabriele Trombetti ◽  
Paolo Cozzi ◽  
Alessandro Orro

A common ongoing task for Functional Genomics is to compare full organisms’ genome with those of related species, to search in huge database for functional annotation of novel sequences and to identify specific patterns of them, such as ESTs, genes, and microRNA. The prediction of these patterns has a relevant computational cost, while public genome archives exceed one billion sequence traces from over 1,000 organisms and this number is increasing rapidly as costs decline, but powerful solution must be enabled in order to perform efficient searches. This means that Functional Genomics applications require significant computational infrastructures, where reusable tools and resources can be accessed. In particular, grid computing seems to fulfill both the computational and data management requirements, even if porting applications on this infrastructure can be difficult. The implementation of a suitable environment for the management of distributed computations can provide reliable advantage, reducing the gap between the requirements of the functional genomic domain and the potential of this technology.


2011 ◽  
Vol 12 (5) ◽  
pp. 449-462 ◽  
Author(s):  
P. Gaudet ◽  
M. S. Livstone ◽  
S. E. Lewis ◽  
P. D. Thomas

2016 ◽  
Author(s):  
Valentina Iotchkova ◽  
Graham R.S. Ritchie ◽  
Matthias Geihs ◽  
Sandro Morganella ◽  
Josine L. Min ◽  
...  

Loci discovered by genome-wide association studies (GWAS) predominantly map outside protein-coding genes. The interpretation of functional consequences of non-coding variants can be greatly enhanced by catalogs of regulatory genomic regions in cell lines and primary tissues. However, robust and readily applicable methods are still lacking to systematically evaluate the contribution of these regions to genetic variation implicated in diseases or quantitative traits. Here we propose a novel approach that leverages GWAS findings with regulatory or functional annotations to classify features relevant to a phenotype of interest. Within our framework, we account for major sources of confounding that current methods do not offer. We further assess enrichment statistics for 27 GWAS traits within regulatory regions from the ENCODE and Roadmap projects. We characterise unique enrichment patterns for traits and annotations, driving novel biological insights. The method is implemented in standalone software and R package to facilitate its application by the research community.


Sign in / Sign up

Export Citation Format

Share Document