scholarly journals UniRule: a unified rule resource for automatic annotation in the UniProt Knowledgebase

Author(s):  
Alistair MacDougall ◽  
Vladimir Volynkin ◽  
Rabie Saidi ◽  
Diego Poggioli ◽  
Hermann Zellner ◽  
...  

Abstract Motivation The number of protein records in the UniProt Knowledgebase (UniProtKB: https://www.uniprot.org) continues to grow rapidly as a result of genome sequencing and the prediction of protein-coding genes. Providing functional annotation for these proteins presents a significant and continuing challenge. Results In response to this challenge, UniProt has developed a method of annotation, known as UniRule, based on expertly curated rules, which integrates related systems (RuleBase, HAMAP, PIRSR, PIRNR) developed by the members of the UniProt consortium. UniRule uses protein family signatures from InterPro, combined with taxonomic and other constraints, to select sets of reviewed proteins which have common functional properties supported by experimental evidence. This annotation is propagated to unreviewed records in UniProtKB that meet the same selection criteria, most of which do not have (and are never likely to have) experimentally verified functional annotation. Release 2020_01 of UniProtKB contains 6496 UniRule rules which provide annotation for 53 million proteins, accounting for 30% of the 178 million records in UniProtKB. UniRule provides scalable enrichment of annotation in UniProtKB. Availability and implementation UniRule rules are integrated into UniProtKB and can be viewed at https://www.uniprot.org/unirule/. UniRule rules and the code required to run the rules, are publicly available for researchers who wish to annotate their own sequences. The implementation used to run the rules is known as UniFIRE and is available at https://gitlab.ebi.ac.uk/uniprot-public/unifire.

2017 ◽  
Author(s):  
Morgan N. Price ◽  
Adam P. Arkin

AbstractLarge-scale genome sequencing has identified millions of protein-coding genes whose function is unknown. Many of these proteins are similar to characterized proteins from other organisms, but much of this information is missing from annotation databases and is hidden in the scientific literature. To make this information accessible, PaperBLAST uses EuropePMC to search the full text of scientific articles for references to genes. PaperBLAST also takes advantage of curated resources that link protein sequences to scientific articles (Swiss-Prot, GeneRIF, and EcoCyc). PaperBLAST’s database includes over 700,000 scientific articles that mention over 400,000 different proteins. Given a protein of interest, PaperBLAST quickly finds similar proteins that are discussed in the literature and presents snippets of text from relevant articles or from the curators. PaperBLAST is available at http://papers.genomics.lbl.gov/.


2020 ◽  
Vol 9 (18) ◽  
Author(s):  
Rita Zgheib ◽  
Hussein Anani ◽  
Didier Raoult ◽  
Pierre-Edouard Fournier

In 2007, Salirhabdus euzebyi was first described as a bacterial isolate from a sea salt evaporation pond. As no genome sequence was previously available for this species, we performed whole-genome sequencing. The chromosome of strain Q1438 was 3,784,443 bp long with 36% G+C content, 3,830 protein-coding genes, and 74 RNA genes.


2017 ◽  
Author(s):  
Kokulapalan Wimalanathan ◽  
Iddo Friedberg ◽  
Carson M. Andorf ◽  
Carolyn J. Lawrence-Dill

1SummaryWe created a new high-coverage, robust, and reproducible functional annotation of maize protein coding genes based on Gene Ontology (GO) term assignments. Whereas the existing Phytozome and Gramene maize GO annotation sets only cover 41% and 56% of maize protein coding genes, respectively, this study provides annotations for 100% of the genes. We also compared the quality of our newly-derived annotations with the existing Gramene and Phytozome functional annotation sets by comparing all three to a manually annotated gold standard set of 1,619 genes where annotations were primarily inferred from direct assay or mutant phenotype. Evaluations based on the gold standard indicate that our new annotation set is measurably more accurate than those from Phytozome and Gramene. To derive this new high-coverage, high-confidence annotation set we used sequence-similarity and protein-domain-presence methods as well as mixed-method pipelines that developed for the Critical Assessment of Function Annotation (CAFA) challenge. Our project to improve maize annotations is called maize-GAMER (GO Annotation Method, Evaluation, and Review) and the newly-derived annotations are accessible via MaizeGDB (http://download.maizegdb.org/maize-GAMER) and CyVerse (B73 RefGen_v3 5b+ at doi: doi.org/10.7946/P2S62P and B73 RefGen_v4 Zm00001d.2 at doi: doi.org/10.7946/P2M925).


2019 ◽  
Vol 8 (31) ◽  
Author(s):  
Hussein Anani ◽  
May Khodor ◽  
Didier Raoult ◽  
Pierre-Edouard Fournier

In 2013, Olivibacter jilunii was reported as a bacterial species isolated from contaminated soil. In 2018, a clinical strain from the same species was isolated from the rectal swab of a Hajj pilgrim. Genome sequencing yielded 6,704,032 bp, with 41.2% G+C content, 5,406 protein-coding genes, and 54 predicted RNA genes in strain P8502.


2016 ◽  
Vol 4 (6) ◽  
Author(s):  
Leandro Maza Garrido ◽  
João Marcelo Pereira Alves ◽  
Liliane Santana Oliveira ◽  
Arthur Gruber ◽  
Gabriel Padilla ◽  
...  

Herein, we report a draft genome sequence of the endophytic Curtobacterium sp. strain ER1/6, isolated from a surface-sterilized Citrus sinensis branch, and it presented the capability to control phytopathogens. Functional annotation of the ~3.4-Mb genome revealed 3,100 protein-coding genes, with many products related to known ecological and biotechnological aspects of this bacterium.


RSC Advances ◽  
2018 ◽  
Vol 8 (38) ◽  
pp. 21414-21430 ◽  
Author(s):  
Muhammad Naveed ◽  
Komal Imran ◽  
Ayesha Mushtaq ◽  
Abdul Samad Mumtaz ◽  
Hussnain A. Janjua ◽  
...  

Since the last decade, various genome sequencing projects have led to the accumulation of an enormous set of genomic data; however, numerous protein-coding genes still need to be functionally characterized.


Sign in / Sign up

Export Citation Format

Share Document