Mining functional annotations across species

Mapping Intimacies ◽

10.1101/369785 ◽

2018 ◽

Author(s):

Sven Warris ◽

Steven Dijkxhoorn ◽

Teije van Sloten ◽

Bart van de Vossenberg

Keyword(s):

Gene Ontology ◽

Related Species ◽

Functional Annotation ◽

Graph Database ◽

Functional Annotations ◽

Novel Approach

AbstractMotivationNumerous tools and databases exist to annotate and interpret the functions encoded in genomes (InterProScan, KEGG, GO etc.). However, analyzing and comparing functionality across a number of genomes, for example of related species, is not trivial.ResultsWe present a novel approach, for which KEGG and Gene Ontology data are imported into a Neo4j graph database and InterProScan results from several species are added. Using the Neo4j plugin for Cytoscape, users can query this database and visualize functional annotations (sub)graphs, to compare and group functional annotation across species.

Download Full-text

Identifying cis-regulatory elements by statistical analysis and phylogenetic footprinting and analyzing their coexistence and related gene ontology

Physiological Genomics ◽

10.1152/physiolgenomics.00085.2006 ◽

2007 ◽

Vol 31 (3) ◽

pp. 374-384 ◽

Cited By ~ 1

Author(s):

Wei Shi ◽

Wanlei Zhou ◽

Dakang Xu

Keyword(s):

Gene Ontology ◽

Transcription Factors ◽

Related Species ◽

Statistical Significance ◽

Phylogenetic Footprinting ◽

Regulatory Elements ◽

Gene Promoters ◽

Computational Molecular Biology ◽

Novel Approach ◽

Regulation Mechanisms

Discovery of cis-regulatory elements in gene promoters is a highly challenging research issue in computational molecular biology. This paper presents a novel approach to searching putative cis-regulatory elements in human promoters by first finding 8-mer sequences of high statistical significance from gene promoters of humans, mice, and Drosophila melanogaster, respectively, and then identifying the most conserved ones across the three species (phylogenetic footprinting). In this study, a conservation analysis on both closely related species (humans and mice) and distantly related species (humans/mice and Drosophila) is conducted not only to examine more candidates but also to improve the prediction accuracy. We have found 124 putative cis-regulatory elements and grouped these into 20 clusters. The investigation on the coexistence of these clusters in human gene promoters reveals that SP1, EGR, and NRF-1 are the dominant clusters appearing in the combinatorial combination of up to five clusters. Gene Ontology (GO) analysis also shows that many GO categories of transcription factors binding to these cis-regulatory elements match the GO categories of genes whose promoters contain these elements. Compared with previous research, the contribution of this study lies not only in the finding of new cis-regulatory elements, but also in its pioneering exploration on the coexistence of discovered elements and the GO relationship between transcription factors and regulated genes. This exploration verifies the putative cis-regulatory elements that have been found from this study and also gives new insight on the regulation mechanisms of gene expression.

Download Full-text

Brief Survey of Biological Network Alignment and a Variant with Incorporation of Functional Annotations

Current Bioinformatics ◽

10.2174/1574893612666171020103747 ◽

2018 ◽

Vol 14 (1) ◽

pp. 4-10

Author(s):

Fang Jing ◽

Shao-Wu Zhang ◽

Shihua Zhang

Keyword(s):

Gene Ontology ◽

Topological Structure ◽

Metabolic Networks ◽

Biological Network ◽

Genomic Sequence ◽

Network Alignment ◽

Future Directions ◽

Functional Annotations ◽

Protein Protein Interaction ◽

Ppi Networks

Background:Biological network alignment has been widely studied in the context of protein-protein interaction (PPI) networks, metabolic networks and others in bioinformatics. The topological structure of networks and genomic sequence are generally used by existing methods for achieving this task.Objective and Method:Here we briefly survey the methods generally used for this task and introduce a variant with incorporation of functional annotations based on similarity in Gene Ontology (GO). Making full use of GO information is beneficial to provide insights into precise biological network alignment.Results and Conclusion:We analyze the effect of incorporation of GO information to network alignment. Finally, we make a brief summary and discuss future directions about this topic.

Download Full-text

Gene ontology functional annotations at the structural domain level

Proteins Structure Function and Bioinformatics ◽

10.1002/prot.22373 ◽

2009 ◽

Vol 76 (3) ◽

pp. 598-607 ◽

Cited By ~ 10

Author(s):

Daniel Lopez ◽

Florencio Pazos

Keyword(s):

Gene Ontology ◽

Structural Domain ◽

Functional Annotations ◽

Domain Level

Download Full-text

Methods of Gene Ontology Term Similarity Analysis in Graph Database Environment

Communications in Computer and Information Science - Beyond Databases, Architectures, and Structures ◽

10.1007/978-3-319-06932-6_33 ◽

2014 ◽

pp. 345-354 ◽

Cited By ~ 1

Author(s):

Łukasz Stypka ◽

Michał Kozielski

Keyword(s):

Gene Ontology ◽

Similarity Analysis ◽

Graph Database ◽

Gene Ontology Term ◽

Ontology Term ◽

Term Similarity

Download Full-text

The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens

Genome Biology ◽

10.1186/s13059-019-1835-8 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 41

Author(s):

Naihui Zhou ◽

Yuxiang Jiang ◽

Timothy R. Bergquist ◽

Alexandra J. Lee ◽

Balint Z. Kacsoh ◽

...

Keyword(s):

Protein Function ◽

Functional Annotation ◽

Protein Function Prediction ◽

Mutation Screening ◽

Function Prediction ◽

Long Term Memory ◽

Functional Annotations ◽

Genome Wide ◽

New Development ◽

Working Together

Abstract Background The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Results Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. Conclusion We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.

Download Full-text

Beyond gene ontology (GO): using biocuration approach to improve the gene nomenclature and functional annotation of rice S-domain kinase subfamily

PeerJ ◽

10.7717/peerj.11052 ◽

2021 ◽

Vol 9 ◽

pp. e11052

Author(s):

Sushma Naithani ◽

Daemon Dikeman ◽

Priyanka Garg ◽

Noor Al-Bader ◽

Pankaj Jaiswal

Keyword(s):

Gene Ontology ◽

Abiotic Stress ◽

Differential Expression ◽

Functional Annotation ◽

Expression Profiles ◽

Abiotic Stress Tolerance ◽

Family Members ◽

Gene Families ◽

Structural Features ◽

Response Expression

The S-domain subfamily of receptor-like kinases (SDRLKs) in plants is poorly characterized. Most members of this subfamily are currently assigned gene function based on the S-locus Receptor Kinase from Brassica that acts as the female determinant of self-incompatibility (SI). However, Brassica like SI mechanisms does not exist in most plants. Thus, automated Gene Ontology (GO) pipelines are not sufficient for functional annotation of SDRLK subfamily members and lead to erroneous association with the GO biological process of SI. Here, we show that manual bio-curation can help to correct and improve the gene annotations and association with relevant biological processes. Using publicly available genomic and transcriptome datasets, we conducted a detailed analysis of the expansion of the rice (Oryza sativa) SDRLK subfamily, the structure of individual genes and proteins, and their expression.The 144-member SDRLK family in rice consists of 82 receptor-like kinases (RLKs) (67 full-length, 15 truncated),12 receptor-like proteins, 14 SD kinases, 26 kinase-like and 10 GnK2 domain-containing kinases and RLKs. Except for nine genes, all other SDRLK family members are transcribed in rice, but they vary in their tissue-specific and stress-response expression profiles. Furthermore, 98 genes show differential expression under biotic stress and 98 genes show differential expression under abiotic stress conditions, but share 81 genes in common.Our analysis led to the identification of candidate genes likely to play important roles in plant development, pathogen resistance, and abiotic stress tolerance. We propose a nomenclature for 144 SDRLK gene family members based on gene/protein conserved structural features, gene expression profiles, and literature review. Our biocuration approach, rooted in the principles of findability, accessibility, interoperability and reusability, sets forth an example of how manual annotation of large-gene families can fill in the knowledge gap that exists due to the implementation of automated GO projections, thereby helping to improve the quality and contents of public databases.

Download Full-text

Functional Genomics Applications in GRID

Grid and Cloud Computing ◽

10.4018/978-1-4666-0879-5.ch408 ◽

2012 ◽

pp. 899-917 ◽

Cited By ~ 2

Author(s):

Luciano Milanesi ◽

Ivan Merelli ◽

Gabriele Trombetti ◽

Paolo Cozzi ◽

Alessandro Orro

Keyword(s):

Grid Computing ◽

Data Management ◽

Functional Genomics ◽

Related Species ◽

Functional Annotation ◽

Computational Cost ◽

Functional Genomic ◽

Ongoing Task ◽

Distributed Computations ◽

Suitable Environment

A common ongoing task for Functional Genomics is to compare full organisms’ genome with those of related species, to search in huge database for functional annotation of novel sequences and to identify specific patterns of them, such as ESTs, genes, and microRNA. The prediction of these patterns has a relevant computational cost, while public genome archives exceed one billion sequence traces from over 1,000 organisms and this number is increasing rapidly as costs decline, but powerful solution must be enabled in order to perform efficient searches. This means that Functional Genomics applications require significant computational infrastructures, where reusable tools and resources can be accessed. In particular, grid computing seems to fulfill both the computational and data management requirements, even if porting applications on this infrastructure can be difficult. The implementation of a suitable environment for the management of distributed computations can provide reliable advantage, reducing the gap between the requirements of the functional genomic domain and the potential of this technology.

Download Full-text

An algorithm for generating representative functional annotations based on Gene Ontology

14th International Workshop on Database and Expert Systems Applications, 2003. Proceedings. ◽

10.1109/dexa.2003.1231990 ◽

2004 ◽

Author(s):

In-Yee Lee ◽

Jan-Ming Ho ◽

Wen-Chang Lin

Keyword(s):

Gene Ontology ◽

Functional Annotations

Download Full-text

Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium

Briefings in Bioinformatics ◽

10.1093/bib/bbr042 ◽

2011 ◽

Vol 12 (5) ◽

pp. 449-462 ◽

Cited By ~ 218

Author(s):

P. Gaudet ◽

M. S. Livstone ◽

S. E. Lewis ◽

P. D. Thomas

Keyword(s):

Gene Ontology ◽

Functional Annotations ◽

Gene Ontology Consortium

Download Full-text

GARFIELD - GWAS Analysis of Regulatory or Functional Information Enrichment with LD correction

10.1101/085738 ◽

2016 ◽

Cited By ~ 17

Author(s):

Valentina Iotchkova ◽

Graham R.S. Ritchie ◽

Matthias Geihs ◽

Sandro Morganella ◽

Josine L. Min ◽

...

Keyword(s):

Association Studies ◽

R Package ◽

Genome Wide Association Studies ◽

Protein Coding ◽

Functional Annotations ◽

Novel Approach ◽

Genome Wide ◽

Functional Consequences ◽

Genomic Regions ◽

Coding Variants

Loci discovered by genome-wide association studies (GWAS) predominantly map outside protein-coding genes. The interpretation of functional consequences of non-coding variants can be greatly enhanced by catalogs of regulatory genomic regions in cell lines and primary tissues. However, robust and readily applicable methods are still lacking to systematically evaluate the contribution of these regions to genetic variation implicated in diseases or quantitative traits. Here we propose a novel approach that leverages GWAS findings with regulatory or functional annotations to classify features relevant to a phenotype of interest. Within our framework, we account for major sources of confounding that current methods do not offer. We further assess enrichment statistics for 27 GWAS traits within regulatory regions from the ENCODE and Roadmap projects. We characterise unique enrichment patterns for traits and annotations, driving novel biological insights. The method is implemented in standalone software and R package to facilitate its application by the research community.

Download Full-text