profile hmms
Recently Published Documents


TOTAL DOCUMENTS

15
(FIVE YEARS 2)

H-INDEX

8
(FIVE YEARS 0)

2021 ◽  
Author(s):  
Jaya Srivastava ◽  
Ritu Hembrom ◽  
Ankita Kumawat ◽  
Petety V. Balaji

UniProt and BFD databases together have 2.5 billion protein sequences. A large majority of these proteins have been electronically annotated. Automated annotation pipelines, vis-á-vis manual curation, have the advantage of scale and speed but are fraught with relatively higher error rates. This is because sequence homology does not necessarily translate to functional homology, molecular function specification is hierarchic and not all functional families have the same amount of experimental data that one can exploit for annotation. Consequently, customization of annotation workflow is inevitable to minimize annotation errors. In this study, we illustrate possible ways of customizing the search of sequence databases for functional homologs using profile HMMs. Choosing an optimal bit score threshold is a critical step in the application of HMMs. We illustrate ways in which an optimal bit score can be arrived at using four Case Studies. These are the single domain nucleotide sugar 6-dehydrogenase and lysozyme-C families, and SH3 and GT-A domains which are typically found as a part of multi-domain proteins. We also discuss the limitations of using profile HMMs for functional annotation and suggests some possible ways to partially overcome such limitations.


2019 ◽  
Vol 35 (19) ◽  
pp. 3829-3830 ◽  
Author(s):  
Shaun P Wilkinson

Abstract Summary Hidden Markov models (HMMs) and profile HMMs form an integral part of biological sequence analysis, supporting an ever-growing list of applications. The aphid R package can be used to derive, train, plot, import and export HMMs and profile HMMs in the R environment. Computationally-intensive dynamic programing recursions, such as the Viterbi, forward and backward algorithms are implemented in C++ and parallelized for increased speed and efficiency. Availability and implementation The aphid package is released under the GPL-3 license, and is freely available for download from CRAN and GitHub (https://github.com/shaunpwilkinson/aphid). Supplementary information Supplementary data are available at Bioinformatics online.


2016 ◽  
Vol 7 ◽  
Author(s):  
João M. P. Alves ◽  
André L. de Oliveira ◽  
Tatiana O. M. Sandberg ◽  
Jaime L. Moreno-Gallego ◽  
Marcelo A. F. de Toledo ◽  
...  

2016 ◽  
Vol 42 ◽  
pp. 109-119 ◽  
Author(s):  
Wenwen Ding ◽  
Kai Liu ◽  
Xujia Fu ◽  
Fei Cheng

PLoS ONE ◽  
2014 ◽  
Vol 9 (1) ◽  
pp. e84558 ◽  
Author(s):  
Sandra Van der Auwera ◽  
Ingo Bulla ◽  
Mario Ziller ◽  
Anne Pohlmann ◽  
Timm Harder ◽  
...  

2013 ◽  
Vol 29 (19) ◽  
pp. 2487-2489 ◽  
Author(s):  
T. J. Wheeler ◽  
S. R. Eddy

2011 ◽  
Vol 2011 ◽  
pp. 1-12 ◽  
Author(s):  
Dhwani K. Desai ◽  
Soumyadeep Nandi ◽  
Prashant K. Srivastava ◽  
Andrew M. Lynn

Various enzyme identification protocols involving homology transfer by sequence-sequence or profile-sequence comparisons have been devised which utilise Swiss-Prot sequences associated with EC numbers as the training set. A profile HMM constructed for a particular EC number might select sequences which perform a different enzymatic function due to the presence of certain fold-specific residues which are conserved in enzymes sharing a common fold. We describe a protocol, ModEnzA (HMM-ModE Enzyme Annotation), which generates profile HMMs highly specific at a functional level as defined by the EC numbers by incorporating information from negative training sequences. We enrich the training dataset by mining sequences from the NCBI Non-Redundant database for increased sensitivity. We compare our method with other enzyme identification methods, both for assigning EC numbers to a genome as well as identifying protein sequences associated with an enzymatic activity. We report a sensitivity of 88% and specificity of 95% in identifying EC numbers and annotating enzymatic sequences from the E. coli genome which is higher than any other method. With the next-generation sequencing methods producing a huge amount of sequence data, the development and use of fully automated yet accurate protocols such as ModEnzA is warranted for rapid annotation of newly sequenced genomes and metagenomic sequences.


BMC Genomics ◽  
2010 ◽  
Vol 11 (Suppl 1) ◽  
pp. S10 ◽  
Author(s):  
Paul T Edlefsen ◽  
Jun S Liu
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document