Improving the consistency of domain annotation within the Conserved Domain Database

CDD: specific functional annotation with the Conserved Domain Database

Nucleic Acids Research ◽

10.1093/nar/gkn845 ◽

2009 ◽

Vol 37 (Database) ◽

pp. D205-D210 ◽

Cited By ~ 761

Author(s):

A. Marchler-Bauer ◽

J. B. Anderson ◽

F. Chitsaz ◽

M. K. Derbyshire ◽

C. DeWeese-Scott ◽

...

Keyword(s):

Functional Annotation ◽

Conserved Domain ◽

Domain Database

Download Full-text

CDD: a Conserved Domain Database for the functional annotation of proteins

Nucleic Acids Research ◽

10.1093/nar/gkq1189 ◽

2010 ◽

Vol 39 (Database) ◽

pp. D225-D229 ◽

Cited By ~ 1921

Author(s):

A. Marchler-Bauer ◽

S. Lu ◽

J. B. Anderson ◽

F. Chitsaz ◽

M. K. Derbyshire ◽

...

Keyword(s):

Functional Annotation ◽

Conserved Domain ◽

Domain Database

Download Full-text

NCBI's Conserved Domain Database and Tools for Protein Domain Analysis

Current Protocols in Bioinformatics ◽

10.1002/cpbi.90 ◽

2019 ◽

Vol 69 (1) ◽

Cited By ~ 4

Author(s):

Mingzhang Yang ◽

Myra K. Derbyshire ◽

Roxanne A. Yamashita ◽

Aron Marchler‐Bauer

Keyword(s):

Domain Analysis ◽

Protein Domain ◽

Conserved Domain ◽

Domain Database

Download Full-text

S-Glutathionyl-(chloro)hydroquinone reductases: a novel class of glutathione transferases

Biochemical Journal ◽

10.1042/bj20091863 ◽

2010 ◽

Vol 428 (3) ◽

pp. 419-427 ◽

Cited By ~ 31

Author(s):

Luying Xun ◽

Sara M. Belchik ◽

Randy Xun ◽

Yan Huang ◽

Huina Zhou ◽

...

Keyword(s):

Disulfide Bond ◽

Distinct Group ◽

Degradation Pathway ◽

Glutathione Transferases ◽

Amino Acid Residues ◽

Tblastn Search ◽

Conserved Domain ◽

Ping Pong ◽

Sphingobium Chlorophenolicum ◽

Domain Database

Sphingobium chlorophenolicum completely mineralizes PCP (pentachlorophenol). Two GSTs (glutathione transferases), PcpC and PcpF, are involved in the degradation. PcpC uses GSH to reduce TeCH (tetrachloro-p-hydroquinone) to TriCH (trichloro-p-hydroquinone) and then to DiCH (dichloro-p-hydroquinone) during PCP degradation. However, oxidatively damaged PcpC produces GS-TriCH (S-glutathionyl-TriCH) and GS-DiCH (S-glutathionyl-TriCH) conjugates. PcpF converts the conjugates into TriCH and DiCH, re-entering the degradation pathway. PcpF was further characterized in the present study. It catalysed GSH-dependent reduction of GS-TriCH via a Ping Pong mechanism. First, PcpF reacted with GS-TriCH to release TriCH and formed disulfide bond between its Cys53 residue and the GS moiety. Then, a GSH came in to regenerate PcpF and release GS–SG. A TBLASTN search revealed that PcpF homologues were widely distributed in bacteria, halobacteria (archaea), fungi and plants, and they belonged to ECM4 (extracellular mutant 4) group COG0435 in the conserved domain database. Phylogenetic analysis grouped PcpF and homologues into a distinct group, separated from Omega class GSTs. The two groups shared conserved amino acid residues, for GSH binding, but had different residues for the binding of the second substrate. Several recombinant PcpF homologues and two human Omega class GSTs were produced in Escherichia coli and purified. They had zero or low activities for transferring GSH to standard substrates, but all had reasonable activities for GSH-dependent reduction of disulfide bond (thiol transfer), dehydroascorbate and dimethylarsinate. All the tested PcpF homologues reduced GS-TriCH, but the two Omega class GSTs did not. Thus PcpF homologues were tentatively named S-glutathionyl-(chloro)hydroquinone reductases for catalysing the GSH-dependent reduction of GS-TriCH.

Download Full-text

Protein Domain Annotations of the SARS-CoV-2 Proteomics as a Blue-Print for Mapping the Features for Drug and Vaccine Designs

Jurnal Matematika dan Sains ◽

10.5614/jms.2020.25.2.1 ◽

2020 ◽

Vol 25 (2) ◽

pp. 26-32

Author(s):

Arli Parikesit ◽

Keyword(s):

Drug Design ◽

Vaccine Design ◽

Causal Agent ◽

Protein Domain ◽

Potential Drug ◽

Potential Target ◽

Conserved Domain ◽

Drug Lead ◽

Blue Print ◽

Domain Database

SARS-CoV-2 virus, as the causal agent for the COVID-19 pandemic, remains an enigma in the bioinformatics sense. Current efforts in drug and vaccine design in primarily targeting general devised protein domain while overlooking the specific features in the proteomics repertoire. However, the NCBI Conserved Domain Database (CDD) could annotate the specific features that are indispensable for a more advanced drug and vaccine design. In this regard, the annotation efforts were initiated with CDD database, and visualized with the 3D Protein Visualizer of Cn3D. The exsistence of the ATP and ADP binding protein with respected domains were found to be a very potential target for drug design. It is recommended that nucleoside inhibitor that could mimick the ATP molecule could serve as a potential drug lead agains SARS-CoV-2.

Download Full-text

CDD: NCBI's conserved domain database

Nucleic Acids Research ◽

10.1093/nar/gku1221 ◽

2014 ◽

Vol 43 (D1) ◽

pp. D222-D226 ◽

Cited By ~ 1839

Author(s):

Aron Marchler-Bauer ◽

Myra K. Derbyshire ◽

Noreen R. Gonzales ◽

Shennan Lu ◽

Farideh Chitsaz ◽

...

Keyword(s):

Conserved Domain ◽

Domain Database

Download Full-text

Protein domain hierarchy Gibbs sampling strategies

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2014-0008 ◽

2014 ◽

Vol 13 (4) ◽

Cited By ~ 7

Author(s):

Andrew F. Neuwald

Keyword(s):

Gibbs Sampling ◽

Sequence Alignment ◽

Gibbs Sampler ◽

Multiple Sequence Alignment ◽

Protein Domain ◽

Sampling Strategies ◽

Multiple Sequence ◽

Manual Curation ◽

Conserved Domain ◽

Domain Database

AbstractHierarchically-arranged multiple sequence alignment profiles are useful for modeling protein domains that have functionally diverged into evolutionarily-related subgroups. Currently such alignment hierarchies are largely constructed through manual curation, as for the NCBI Conserved Domain Database (CDD). Recently, however, I developed a Gibbs sampler that uses an approach termed

Download Full-text

PubMed Text Similarity Model and its application to curation efforts in the Conserved Domain Database

Database ◽

10.1093/database/baz064 ◽

2019 ◽

Vol 2019 ◽

Cited By ~ 2

Author(s):

Rezarta Islamaj ◽

W John Wilbur ◽

Natalie Xie ◽

Noreen R Gonzales ◽

Narmada Thanki ◽

...

Keyword(s):

Pearson Correlation ◽

Information Access ◽

Specific Protein ◽

Reference List ◽

Text Similarity ◽

Multiple Sequence ◽

Conserved Domain ◽

Similarity Model ◽

And Function ◽

Domain Database

AbstractThis study proposes a text similarity model to help biocuration efforts of the Conserved Domain Database (CDD). CDD is a curated resource that catalogs annotated multiple sequence alignment models for ancient domains and full-length proteins. These models allow for fast searching and quick identification of conserved motifs in protein sequences via Reverse PSI-BLAST. In addition, CDD curators prepare summaries detailing the function of these conserved domains and specific protein families, based on published peer-reviewed articles. To facilitate information access for database users, it is desirable to specifically identify the referenced articles that support the assertions of curator-composed sentences. Moreover, CDD curators desire an alert system that scans the newly published literature and proposes related articles of relevance to the existing CDD records. Our approach to address these needs is a text similarity method that automatically maps a curator-written statement to candidate sentences extracted from the list of referenced articles, as well as the articles in the PubMed Central database. To evaluate this proposal, we paired CDD description sentences with the top 10 matching sentences from the literature, which were given to curators for review. Through this exercise, we discovered that we were able to map the articles in the reference list to the CDD description statements with an accuracy of 77%. In the dataset that was reviewed by curators, we were able to successfully provide references for 86% of the curator statements. In addition, we suggested new articles for curator review, which were accepted by curators to be added into the reference list at an acceptance rate of 50%. Through this process, we developed a substantial corpus of similar sentences from biomedical articles on protein sequence, structure and function research, which constitute the CDD text similarity corpus. This corpus contains 5159 sentence pairs judged for their similarity on a scale from 1 (low) to 5 (high) doubly annotated by four CDD curators. Curator-assigned similarity scores have a Pearson correlation coefficient of 0.70 and an inter-annotator agreement of 85%. To date, this is the largest biomedical text similarity resource that has been manually judged, evaluated and made publicly available to the community to foster research and development of text similarity algorithms.

Download Full-text

CDD: a conserved domain database for interactive domain family analysis

Nucleic Acids Research ◽

10.1093/nar/gkl951 ◽

2007 ◽

Vol 35 (Database) ◽

pp. D237-D240 ◽

Cited By ~ 560

Author(s):

A. Marchler-Bauer ◽

J. B. Anderson ◽

M. K. Derbyshire ◽

C. DeWeese-Scott ◽

N. R. Gonzales ◽

...

Keyword(s):

Domain Family ◽

Conserved Domain ◽

Family Analysis ◽

Domain Database

Download Full-text

CDD: a Conserved Domain Database for protein classification

Nucleic Acids Research ◽

10.1093/nar/gki069 ◽

2004 ◽

Vol 33 (Database issue) ◽

pp. D192-D196 ◽

Cited By ~ 620

Author(s):

A. Marchler-Bauer

Keyword(s):

Protein Classification ◽

Conserved Domain ◽

Domain Database

Download Full-text