TIAMMAt: Leveraging biodiversity to revise protein domain models, evidence from innate immunity

Molecular Biology and Evolution ◽

10.1093/molbev/msab258 ◽

2021 ◽

Author(s):

Michael G Tassia ◽

Kyle T David ◽

James P Townsend ◽

Kenneth M Halanych

Keyword(s):

Innate Immunity ◽

Consensus Sequence ◽

Protein Domain ◽

Homologous Sequence ◽

Valuable Insight ◽

Sequence Evolution ◽

Protein Families ◽

Model Species ◽

Domain Profile ◽

Domain Models

Abstract Sequence annotation is fundamental for studying the evolution of protein families, particularly when working with non-model species. Given the rapid, ever-increasing number of species receiving high-quality genome sequencing, accurate domain modeling that is representative of species diversity is crucial for understanding protein family sequence evolution and their inferred function(s). Here, we describe a bioinformatic tool called TIAMMAt ( Taxon-Informed Adjustment of Markov Model Attributes) which revises domain profile hidden Markov models (HMMs) by incorporating homologous domain sequences from underrepresented and non-model species. Using innate immunity pathways as a case study, we show that revising profile HMM parameters to directly account for variation in homologs among underrepresented species provides valuable insight into the evolution of protein families. Following adjustment by TIAMMAt, domain profile HMMs exhibit changes in their per-site amino acid state emission probabilities and insertion/deletion probabilities while maintaining the overall structure of the consensus sequence. Our results show that domain revision can heavily impact evolutionary interpretations for some families (i.e., NLR’s NACHT domain), whereas impact on other domains (e.g., rel homology domain and interferon regulatory factor domains) is minimal due to high levels of sequence conservation across the sampled phylogenetic depth (i.e., Metazoa). Importantly, TIAMMAt revises target domain models to reflect homologous sequence variation using the taxonomic distribution under consideration by the user. TIAMMAt’s flexibility to revise any subset of the Pfam database using a user-defined taxonomic pool will make it a valuable tool for future protein evolution studies, particularly when incorporating (or focusing) on non-model species.

Download Full-text

Galectins in teleost fish: Zebrafish (Danio rerio) as a model species to address their biological roles in development and innate immunity

Glycoconjugate Journal ◽

10.1007/s10719-004-5541-7 ◽

2004 ◽

Vol 21 (8-9) ◽

pp. 503-521 ◽

Cited By ~ 69

Author(s):

Gerardo R. Vasta ◽

Hafiz Ahmed ◽

Shao- J. Du ◽

Davin Henrikson

Keyword(s):

Innate Immunity ◽

Danio Rerio ◽

Teleost Fish ◽

Model Species

Download Full-text

Fundamental Characteristics of Bat Interferon Systems

Frontiers in Cellular and Infection Microbiology ◽

10.3389/fcimb.2020.527921 ◽

2020 ◽

Vol 10 ◽

Author(s):

Emily Clayton ◽

Muhammad Munir

Keyword(s):

Innate Immunity ◽

Immune System ◽

Model Species ◽

Novel Genes ◽

Highly Pathogenic ◽

Essential Component ◽

Biological Features ◽

Zoonotic Viruses ◽

Recent Developments ◽

Flying Fox

Interferons are an essential component of the innate arm of the immune system and are arguably one of the most important lines of defence against viruses. The human IFN system and its functionality has already been largely characterized and studied in detail. However, the IFN systems of bats have only been marginally examined to date up until the recent developments of the Bat1k project which have now opened new opportunities in research by identifying six new bat genomes to possess novel genes that are likely associated with viral tolerance exhibited in bats. Interestingly, bats have been hypothesized to possess the ability to establish a host-virus relationship where despite being infected, they exhibit limited signs of disease and still retain the ability to transmit the disease into other susceptible hosts. Bats are one of the most abundant and widespread vertebrates on the planet and host many zoonotic viruses that are highly pathogenic to humans. Several genomics, immunological, and biological features are thought to underlie novel antiviral mechanisms of bats. This review aims to explore the bat IFN system and developments in its diverse IFN features, focusing mainly on the model species, the Australian black flying fox (Pteropus alecto), while also highlighting bat innate immunity as an exciting and fruitful area of research to understand their ability to control viral-mediated pathogenesis.

Download Full-text

Faculty Opinions recommendation of System wide analysis of the evolution of innate immunity in the nematode model species Caenorhabditis elegans and Pristionchus pacificus.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.717959303.793462703 ◽

2012 ◽

Author(s):

Sebastian Fugmann

Keyword(s):

Innate Immunity ◽

Caenorhabditis Elegans ◽

Model Species ◽

Pristionchus Pacificus

Download Full-text

Faster rates of molecular sequence evolution in reproduction-related genes and in species with hypodermic sperm morphologies

10.1101/2021.08.16.456242 ◽

2021 ◽

Author(s):

R Axel W Wiberg ◽

Jeremias N Brand ◽

Lukas Schaerer

Keyword(s):

Sexual Selection ◽

Sequence Evolution ◽

Model Species ◽

Free Living ◽

Molecular Sequence ◽

Functional Annotations ◽

Genome Wide ◽

Large Genus ◽

Signatures Of Selection ◽

Efficiency Of Selection

Sexual selection is expected to drive the evolution of many striking behaviours and morphologies, leaving signatures of selection at loci underlying these phenotypes. However, relatively few studies have contrasted molecular sequence evolution at such loci across lineages that differ in their sexual selection context. Our comparative genomics study of Macrostomum, a large genus of free-living simultaneously hermaphroditic flatworms, takes advantage of functional annotations from the model species, M. lignano, and transcriptome assemblies of 97 congeners. We compare molecular sequence evolution in species with contrasting sperm morphologies, which are strongly associated with multiple convergent shifts in the mating strategy and thus reflect the sexual selection context in Macrostomum. The sperm of most reciprocally mating species carry lateral bristles, likely functioning as anchoring mechanisms against post-copulatory sperm removal. Hypodermically mating species lack these bristles, potentially as adaptations to a different environment experienced by hypodermic sperm. We document faster molecular sequence evolution in reproduction-related, compared to ubiquitously-expressed, genes across all sperm morphologies, consistent with more intense selection acting on the former. Furthermore, we observed faster molecular sequence evolution in species with hypodermic sperm morphologies, in both reproduction-related and ubiquitously-expressed genes. These genome-wide patterns suggest that shifts to hypodermic mating reduce the efficiency of selection, possibly due to higher selfing rates in hypodermically mating species. Moreover, we find little evidence for convergent amino acid changes across species. We provide the first comprehensive comparative analysis of molecular sequence evolution in a group of simultaneously hermaphroditic animals, across well-replicated contrasts of lineages with divergent sperm morphologies.

Download Full-text

THE JANUS-FACED E-VALUES OF HMMER2: EXTREME VALUE DISTRIBUTION OR LOGISTIC FUNCTION?

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720011005264 ◽

2011 ◽

Vol 09 (01) ◽

pp. 179-206 ◽

Cited By ~ 7

Author(s):

WING-CHEONG WONG ◽

SEBASTIAN MAURER-STROH ◽

FRANK EISENHABER

Keyword(s):

Query Sequence ◽

Extreme Value Distribution ◽

Logistic Function ◽

Extreme Value ◽

Value Distribution ◽

Domain Model ◽

Protein Domain ◽

Global Mode ◽

Ongoing Research ◽

Domain Models

E-value guided extrapolation of protein domain annotation from libraries such as Pfam with the HMMER suite is indispensable for hypothesizing about the function of experimentally uncharacterized protein sequences. Since the recent release of HMMER3 does not supersede all functions of HMMER2, the latter will remain relevant for ongoing research as well as for the evaluation of annotations that reside in databases and in the literature. In HMMER2, the E-value is computed from the score via a logistic function or via a domain model-specific extreme value distribution (EVD); the lower of the two is returned as E-value for the domain hit in the query sequence. We find that, for thousands of domain models, this treatment results in switching from the EVD to the statistical model with the logistic function when scores grow (for Pfam release 23, 99% in the global mode and 75% in the fragment mode). If the score corresponding to the breakpoint results in an E-value above a user-defined threshold (e.g. 0.1), a critical score region with conflicting E-values from the logistic function (below the threshold) and from EVD (above the threshold) does exist. Thus, this switch will affect E-value guided annotation decisions in an automated mode. To emphasize, switching in the fragment mode is of no practical relevance since it occurs only at E-values far below 0.1. Unfortunately, a critical score region does exist for 185 domain models in the hmmpfam and 1,748 domain models in the hmmsearch global-search mode. For 145 out the respective 185 models, the critical score region is indeed populated by actual sequences. In total, 24.4% of their hits have a logistic function-derived E-value < 0.1 when the EVD provides an E-value > 0.1. We provide examples of false annotations and critically discuss the appropriateness of a logistic function as alternative to the EVD.

Download Full-text

Viral Sequence Evolution in Acute Hepatitis C Virus Infection

Journal of Virology ◽

10.1128/jvi.00995-07 ◽

2007 ◽

Vol 81 (21) ◽

pp. 11658-11668 ◽

Cited By ~ 73

Author(s):

Thomas Kuntzen ◽

Joerg Timm ◽

Andrew Berical ◽

Lia L. Lewis-Ximenez ◽

Andrea Jones ◽

...

Keyword(s):

T Cells ◽

Hepatitis C Virus ◽

Hepatitis C ◽

T Cell ◽

Consensus Sequence ◽

T Cell Responses ◽

Hcv Infection ◽

Sequence Evolution ◽

Cell Responses ◽

Immunodeficiency Virus

ABSTRACT CD8+-T-cell responses play an important role in the containment and clearance of hepatitis C virus (HCV) infection, and an association between viral persistence and development of viral escape mutations has been postulated. While escape from CD8+-T-cell responses has been identified as a major driving force for the evolution of human immunodeficiency virus (HIV) and simian immunodeficiency virus (SIV), a broader characterization of this relationship is needed in HCV infection. To determine the extent, kinetics, and driving forces of HCV sequence evolution, we sequenced the entire HCV genome longitudinally in four subjects monitored for up to 30 months after acute infection. For two subjects the transmission sources were also available. Of 53 total nonenvelope amino acid substitutions detected, a majority represented forward mutations away from the consensus sequence. In contrast to studies in HIV and SIV, however, only 11% of these were associated with detectable CD8+ T-cell responses. Interestingly, 19% of nonenvelope mutations represented changes toward the consensus sequence, suggesting reversion in the absence of immune pressure upon transmission. Notably, the rate of evolution of forward and reverse mutations correlated with the conservation of each residue, which is indicative of structural constraints influencing the kinetics of viral evolution. Finally, the rate of sequence evolution was observed to decline over the course of infection, possibly reflective of diminishing selection pressure by dysfunctional CD8+ T cells. Taken together, these data provide insight into the extent to which HCV is capable of evading early CD8+ T-cell responses and support the hypothesis that dysfunction of CD8+ T cells may be associated with failure to resolve HCV infections.

Download Full-text

Application of genetic semihomology algorithm to theoretical studies on various protein families.

Acta Biochimica Polonica ◽

10.18388/abp.2001_5109 ◽

2001 ◽

Vol 48 (1) ◽

pp. 21-33 ◽

Cited By ~ 5

Author(s):

J Leluk ◽

B Hanus-Lorenz ◽

A F Sikorski

Keyword(s):

Consensus Sequence ◽

High Accuracy ◽

Theoretical Studies ◽

Protein Families ◽

Optimal Sequence ◽

Multiple Alignments ◽

Low Degree ◽

Correct Alignment ◽

Statistical Approaches ◽

Related Proteins

Several protein families of different nature were studied for genetic relationship, correct alignment at non-homologous fragments, optimal sequence consensus construction, and confirmation of their actual relevance. A comparison of the genetic semihomology approach with statistical approaches indicates a high accuracy and cognition significance of the former. This is particularly pronounced in the study of related proteins that show a low degree of homology. The sequence multiple alignments were verified and corrected with respect to the questionable, non-homologous fragments. The verified alignments were the basis for consensus sequence formation. The frequency of six-codon amino acids occurrence versus position variability was studied and their possible role in amino acid mutational exchange at variable positions is discussed.

Download Full-text

Consensus sequence design as a general strategy to create hyperstable, biologically active proteins

10.1101/466391 ◽

2018 ◽

Cited By ~ 1

Author(s):

Matt Sternke ◽

Katherine W. Tripp ◽

Doug Barrick

Keyword(s):

Biological Activity ◽

High Stability ◽

Biological Activities ◽

Consensus Sequence ◽

Biologically Active ◽

General Strategy ◽

Protein Families ◽

Sequence Design ◽

Consensus Sequences ◽

Naturally Occurring

AbstractConsensus sequence design offers a promising strategy for designing proteins of high stability while retaining biological activity since it draws upon an evolutionary history in which residues important for both stability and function are likely to be conserved. Although there have been several reports of successful consensus design of individual targets, it is unclear from these anecdotal studies how often this approach succeeds, and how often it fails. Here, we attempt to assess generality by designing consensus sequences for a set of six protein families with a range of chain-lengths, structures, and activities. We characterize the resulting consensus proteins for stability, structure, and biological activities in an unbiased way. We find that all six consensus proteins adopt cooperatively folded structures in solution. Strikingly, four out of six of these consensus proteins show increased thermodynamic stability over naturally-occurring homologues. Each consensus protein tested for function maintained at least partial biological activity. Though peptide binding affinity by a consensus-designed SH3 is rather low, Km values for consensus enzymes are similar to values from extant homologues. Though consensus enzymes are slower than extant homologues at low temperature, they are faster than some thermophilic enzymes at high temperature. An analysis of sequence properties shows consensus proteins to be enriched in charged residues, and rarified in uncharged polar residues. Sequence differences between consensus and extant homologues are predominantly located at weakly conserved surface residues, highlighting the importance of these residues in the success of the consensus strategy.Significance StatementA major goal of protein design is to create proteins that have high stability and biological activity. Drawing on evolutionary information encoded within extant protein sequences, consensus sequence design has produced several successes in achieving this goal. Here we explore the generality with which consensus design can be used to enhance protein stability and maintain biological activity. By designing and characterizing consensus sequences for six unrelated protein families, we find that consensus design shows high success rates in creating well-folded, hyperstable proteins that retain biological activities. Remarkably, many of these consensus proteins show higher stabilities than naturally-occurring sequences of their respective protein families. Our study highlights the utility of consensus sequence design and informs the mechanisms by which it works.

Download Full-text

CDD/SPARCLE: the conserved domain database in 2020

Nucleic Acids Research ◽

10.1093/nar/gkz991 ◽

2019 ◽

Vol 48 (D1) ◽

pp. D265-D268 ◽

Cited By ~ 159

Author(s):

Shennan Lu ◽

Jiyao Wang ◽

Farideh Chitsaz ◽

Myra K Derbyshire ◽

Renata C Geer ◽

...

Keyword(s):

Protein Domain ◽

Molecular Function ◽

Protein Families ◽

Bacterial Proteins ◽

Protein Architecture ◽

Conserved Domain ◽

Single Protein ◽

Domain Database ◽

User Queries

Abstract As NLM’s Conserved Domain Database (CDD) enters its 20th year of operations as a publicly available resource, CDD curation staff continues to develop hierarchical classifications of widely distributed protein domain families, and to record conserved sites associated with molecular function, so that they can be mapped onto user queries in support of hypothesis-driven biomolecular research. CDD offers both an archive of pre-computed domain annotations as well as live search services for both single protein or nucleotide queries and larger sets of protein query sequences. CDD staff has continued to characterize protein families via conserved domain architectures and has built up a significant corpus of curated domain architectures in support of naming bacterial proteins in RefSeq. These architecture definitions are available via SPARCLE, the Subfamily Protein Architecture Labeling Engine. CDD can be accessed at https://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.

Download Full-text

Transcriptomic analysis in the leech Theromyzon tessulatum: involvement of cystatin B in innate immunity

Biochemical Journal ◽

10.1042/bj20040478 ◽

2004 ◽

Vol 380 (3) ◽

pp. 617-625 ◽

Cited By ~ 19

Author(s):

Christophe LEFEBVRE ◽

Claude COCQUERELLE ◽

Franck VANDENBULCKE ◽

David HOT ◽

Ludovic HUOT ◽

...

Keyword(s):

Immune Response ◽

Innate Immunity ◽

Cysteine Proteinase ◽

Consensus Sequence ◽

Structural Features ◽

Cysteine Proteinase Inhibitor ◽

Bacterial Challenge ◽

Threonine Deaminase ◽

Cystatin B ◽

Coelomic Cells

At the present time, there is little information on mechanisms of innate immunity in invertebrate groups other than insects, especially annelids. In the present study, we have performed a transcriptomic study of the immune response in the leech Theromyzon tessulatum after bacterial challenge, by a combination of differential display RT (reverse transcriptase)–PCR and cDNA microarrays. The results show relevant modulations concerning several known and unknown genes. Indeed, threonine deaminase, malate dehydrogenase, cystatin B, polyadenylate-binding protein and α-tubulin-like genes are up-regulated after immunostimulation. We focused on cystatin B (stefin B), which is an inhibitor of cysteine proteinases involved in the vertebrate immune response. We have cloned the full-length cDNA and named the T. tessulatum gene as Tt-cysb. Main structural features of cystatins were identified in the derived amino acid sequence of Tt-cysb cDNA; namely, a glycine residue in the N-terminus and a consensus sequence of Gln-Xaa-Val-Xaa-Gly (QXVXG) corresponding to the catalytic site. Moreover, Tt-cysb is the first cystatin B gene characterized in invertebrates. We have determined by in situ hybridization and immunocytochemistry that Tt-cysb is only expressed in large coelomic cells. In addition, this analysis confirmed that Tt-cysb is up-regulated after bacterial challenge, and that increased expression occurs only in coelomic cells. These data demonstrate that the innate immune response in the leech involves a cysteine proteinase inhibitor that is not found in ecdysozoan models, such as Drosophila melanogaster or Caenorhabditis elegans, and so underlines the great need for information about innate immunity mechanisms in different invertebrate groups.

Download Full-text