Are Nonsynonymous Transversions Generally More Deleterious than Nonsynonymous Transitions?

Zhengting Zou; Jianzhi Zhang

doi:10.1093/molbev/msaa200

Are Nonsynonymous Transversions Generally More Deleterious than Nonsynonymous Transitions?

Molecular Biology and Evolution ◽

10.1093/molbev/msaa200 ◽

2020 ◽

Vol 38 (1) ◽

pp. 181-191

Author(s):

Zhengting Zou ◽

Jianzhi Zhang

Keyword(s):

Amino Acid ◽

Dna Sequences ◽

Sequence Evolution ◽

Codon Model ◽

Protein Coding ◽

Fitness Effects ◽

Genome Wide ◽

Species Pairs ◽

Species Specific ◽

Evolutionary Lineages

Abstract It has been suggested that, due to the structure of the genetic code, nonsynonymous transitions are less likely than transversions to cause radical changes in amino acid physicochemical properties so are on average less deleterious. This view was supported by some but not all mutagenesis experiments. Because laboratory measures of fitness effects have limited sensitivities and relative frequencies of different mutations in mutagenesis studies may not match those in nature, we here revisit this issue using comparative genomics. We extend the standard codon model of sequence evolution by adding the parameter η that quantifies the ratio of the fixation probability of transitional nonsynonymous mutations to that of transversional nonsynonymous mutations. We then estimate η from the concatenated alignment of all protein-coding DNA sequences of two closely related genomes. Surprisingly, η ranges from 0.13 to 2.0 across 90 species pairs sampled from the tree of life, with 51 incidences of η < 1 and 30 incidences of η >1 that are statistically significant. Hence, whether nonsynonymous transversions are overall more deleterious than nonsynonymous transitions is species-dependent. Because the corresponding groups of amino acid replacements differ between nonsynonymous transitions and transversions, η is influenced by the relative exchangeabilities of amino acid pairs. Indeed, an extensive search reveals that the large variation in η is primarily explainable by the recently reported among-species disparity in amino acid exchangeabilities. These findings demonstrate that genome-wide nucleotide substitution patterns in coding sequences have species-specific features and are more variable among evolutionary lineages than are currently thought.

Download Full-text

Codon harmonization reduces amino acid misincorporation in bacterially expressed P. falciparum proteins and improves their immunogenicity

AMB Express ◽

10.1186/s13568-019-0890-6 ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 1

Author(s):

Neeraja Punde ◽

Jennifer Kooken ◽

Dagmar Leary ◽

Patricia M. Legler ◽

Evelina Angov

Keyword(s):

Protein Structure ◽

Amino Acid ◽

Codon Usage ◽

Dna Sequences ◽

Structural Integrity ◽

Host Cells ◽

Loss Of Function ◽

Species Specific ◽

And Function ◽

The Impact

Abstract Codon usage frequency influences protein structure and function. The frequency with which codons are used potentially impacts primary, secondary and tertiary protein structure. Poor expression, loss of function, insolubility, or truncation can result from species-specific differences in codon usage. “Codon harmonization” more closely aligns native codon usage frequencies with those of the expression host particularly within putative inter-domain segments where slower rates of translation may play a role in protein folding. Heterologous expression of Plasmodium falciparum genes in Escherichia coli has been a challenge due to their AT-rich codon bias and the highly repetitive DNA sequences. Here, codon harmonization was applied to the malarial antigen, CelTOS (Cell-traversal protein for ookinetes and sporozoites). CelTOS is a highly conserved P. falciparum protein involved in cellular traversal through mosquito and vertebrate host cells. It reversibly refolds after thermal denaturation making it a desirable malarial vaccine candidate. Protein expressed in E. coli from a codon harmonized sequence of P. falciparum CelTOS (CH-PfCelTOS) was compared with protein expressed from the native codon sequence (N-PfCelTOS) to assess the impact of codon usage on protein expression levels, solubility, yield, stability, structural integrity, recognition with CelTOS-specific mAbs and immunogenicity in mice. While the translated proteins were expected to be identical, the translated products produced from the codon-harmonized sequence differed in helical content and showed a smaller distribution of polypeptides in mass spectra indicating lower heterogeneity of the codon harmonized version and fewer amino acid misincorporations. Substitutions of hydrophobic-to-hydrophobic amino acid were observed more commonly than any other. CH-PfCelTOS induced significantly higher antibody levels compared with N-PfCelTOS; however, no significant differences in either IFN-γ or IL-4 cellular responses were detected between the two antigens.

Download Full-text

Gene Families, Epistasis and the Amino Acid Preferences of Protein Homologs

Evolutionary Bioinformatics ◽

10.1177/1176934319870485 ◽

2019 ◽

Vol 15 ◽

pp. 117693431987048

Author(s):

Evandro Ferrada

Keyword(s):

Amino Acid ◽

Genetic Background ◽

Sequence Divergence ◽

Gene Families ◽

Structure And Function ◽

Sequence Evolution ◽

Systematic Analysis ◽

Fitness Effects ◽

Computational Work ◽

And Function

In order to preserve structure and function, proteins tend to preferentially conserve amino acids at particular sites along the sequence. Because mutations can affect structure and function, the question arises whether the preference of a protein site for a particular amino acid varies between protein homologs, and to what extent that variation depends on sequence divergence. Answering these questions can help in the development of models of sequence evolution, as well as provide insights on the dependence of the fitness effects of mutations on the genetic background of sequences, a phenomenon known as epistasis. Here, I comment on recent computational work providing a systematic analysis of the extent to which the amino acid preferences of proteins depend on the background mutations of protein homologs.

Download Full-text

Gene Expression in Trypanosomatid Parasites

Journal of Biomedicine and Biotechnology ◽

10.1155/2010/525241 ◽

2010 ◽

Vol 2010 ◽

pp. 1-15 ◽

Cited By ~ 91

Author(s):

Santiago Martínez-Calvillo ◽

Juan C. Vizuet-de-Rueda ◽

Luis E. Florencio-Martínez ◽

Rebeca G. Manning-Cela ◽

Elisa E. Figueroa-Angulo

Keyword(s):

Gene Expression ◽

Dna Sequences ◽

Transcription Initiation ◽

Histone Variants ◽

Protein Coding ◽

Functional Studies ◽

Polycistronic Transcription ◽

General Transcription Factors ◽

Genome Wide ◽

Trypanosomatid Parasites

The parasitesLeishmaniaspp.,Trypanosoma brucei,andTrypanosoma cruziare the trypanosomatid protozoa that cause the deadly human diseases leishmaniasis, African sleeping sickness, and Chagas disease, respectively. These organisms possess unique mechanisms for gene expression such as constitutive polycistronic transcription of protein-coding genes and trans-splicing. Little is known about either the DNA sequences or the proteins that are involved in the initiation and termination of transcription in trypanosomatids.In silicoanalyses of the genome databases of these parasites led to the identification of a small number of proteins involved in gene expression. However, functional studies have revealed that trypanosomatids have more general transcription factors than originally estimated. Many posttranslational histone modifications, histone variants, and chromatin modifying enzymes have been identified in trypanosomatids, and recent genome-wide studies showed that epigenetic regulation might play a very important role in gene expression in this group of parasites. Here, we review and comment on the most recent findings related to transcription initiation and termination in trypanosomatid protozoa.

Download Full-text

Genome-wide alignment-free phylogenetic distance estimation under a no strand-bias model

10.1101/2021.11.10.468111 ◽

2021 ◽

Author(s):

Metin Balaban ◽

Nishat Anjum Bristy ◽

Ahnaf Faisal ◽

Md Shamsuzzoha Bayzid ◽

Siavash Mirarab

Keyword(s):

Dna Sequences ◽

Distance Estimation ◽

Sequence Evolution ◽

Phylogenetic Distance ◽

Strand Bias ◽

Alignment Free ◽

Bias Model ◽

Genome Wide ◽

Genome Wide Data ◽

Complex Models

While aligning sequences has been the dominant approach for determining homology prior to phylogenetic inference, alignment-free methods have much appeal in terms of simplifying the process of inference, especially when analyzing genome-wide data. Furthermore, alignment-free methods present the only option for some emerging forms of data such as genome skims, which cannot be assembled. Despite the appeal, alignment-free methods have not been competitive with alignment-based methods in terms of accuracy. One limitation of alignment-free methods is that they typically rely on simplified models of sequence evolution such as Jukes-Cantor. It is possible to compute pairwise distances under more complex models by computing frequencies of base substitutions provided that these quantities can be estimated in the alignment-free setting. A particular limitation is that for many forms of genome-wide data, which arguably present the best use case for alignment-free methods, the strand of DNA sequences is unknown. Under such conditions, the so-called no-strand bias models are the most complex models that can be used. Here, we show how to calculate distances under a no-strain bias restriction of the General Time Reversible (GTR) model called TK4 without relying on alignments. The method relies on replacing letters in the input sequences, and subsequent computation of Jaccard indices between k-mer sets. For the method to work on large genomes, we also need to compute the number of k-mer mismatches after replacement due to random chance. We show in simulation that these alignment-free distances can be highly accurate when genomes evolve under the assumed models, and we examine the effectiveness of the method on real genomic data.

Download Full-text

Neutral theories of molecular evolution

A Primer of Molecular Population Genetics ◽

10.1093/oso/9780198838944.003.0004 ◽

2019 ◽

pp. 67-84

Author(s):

Asher D. Cutter

Keyword(s):

Molecular Evolution ◽

Dna Sequences ◽

Neutral Theory ◽

Null Model ◽

Sequence Evolution ◽

Neutral Model ◽

Frequency Distributions ◽

Fitness Effects ◽

Theoretical Predictions ◽

Dna Sequence Evolution

Chapter 4, “Neutral theories of molecular evolution,” outlines the logic and predictions of the neutral theory of molecular evolution and its derivatives as a simple conceptual framework for understanding DNA sequence evolution. It introduces the standard neutral model as a null model of evolutionary change in DNA sequences to describe patterns of polymorphism within species and divergence between species. An overview is provided for the molecular clock concept and for predictions about the amount of polymorphism and allele frequency distributions within populations. This chapter covers how population size and selection intersect to define nearly neutral fitness effects and their implications, as well as misinterpretations and misapplications of Neutral Theory. This overview provides a foundation for how theoretical predictions offer null models for tests of molecular evolution developed in later chapters.

Download Full-text

Phylogenomics for Chagas Disease Vectors of the Rhodnius Genus (Hemiptera, Triatominae): What We Learn From Mito-Nuclear Conflicts and Recommendations

Frontiers in Ecology and Evolution ◽

10.3389/fevo.2021.750317 ◽

2022 ◽

Vol 9 ◽

Author(s):

Jonathan Filée ◽

Marie Merle ◽

Héloïse Bastide ◽

Florence Mougel ◽

Jean-Michel Bérenger ◽

...

Keyword(s):

Molecular Data ◽

Mitochondrial Genes ◽

Valid Species ◽

Close Relative ◽

Protein Coding ◽

Genome Wide ◽

A Genome ◽

Chagas Disease Vectors ◽

Tree Topologies ◽

Species Specific

We provide in this study a very large DNA dataset on Rhodnius species including 36 samples representing 16 valid species of the three Rhodnius groups, pictipes, prolixus and pallescens. Samples were sequenced at low-depth with whole-genome shotgun sequencing (Illumina technology). Using phylogenomics including 15 mitochondrial genes (13.3 kb), partial nuclear rDNA (5.2 kb) and 51 nuclear protein-coding genes (36.3 kb), we resolve sticking points in the Rhodnius phylogeny. At the species level, we confirmed the species-specific status of R. montenegrensis and R. marabaensis and we agree with the synonymy of R. taquarussuensis with R. neglectus. We also invite to revisit the species-specific status of R. milesi that is more likely R. nasutus. We proposed to define a robustus species complex that comprises the four close relative species: R. marabaensis, R. montenegrensis, R. prolixus and R. robustus. As Psammolestes tertius was included in the Rhodnius clade, we strongly recommend reclassifying this species as R. tertius. At the Rhodnius group level, molecular data consistently supports the clustering of the pictipes and pallescens groups, more related to each other than they are to the prolixus group. Moreover, comparing mitochondrial and nuclear tree topologies, our results demonstrated that various introgression events occurred in all the three Rhodnius groups, in laboratory strains but also in wild specimens. We demonstrated that introgressions occurred frequently in the prolixus group, involving the related species of the robustus complex but also the pairwise R. nasutus and R. neglectus. A genome wide analysis highlighted an introgression event in the pictipes group between R. stali and R. brethesi and suggested a complex gene flow between the three species of the pallescens group, R. colombiensis, R. pallescens and R. ecuadoriensis. The molecular data supports also a sylvatic distribution of R. prolixus in Brazil (Pará state) and the monophyly of R. robustus. As we detected extensive introgression events and selective pressure on mitochondrial genes, we strongly recommend performing separate mitochondrial and nuclear phylogenies and to take advantages of mito-nuclear conflicts in order to have a comprehensive evolutionary vision of this genus.

Download Full-text

Genome-Wide Mining, Characterization and Development of miRNA-SSRs in Arabidopsis thaliana

10.1101/203851 ◽

2017 ◽

Cited By ~ 4

Author(s):

Anuj Kumar ◽

Aditi Chauhan ◽

Mansi Sharma ◽

Sai Kumar Kompelli ◽

Vijay Gahlaut ◽

...

Keyword(s):

Arabidopsis Thaliana ◽

Dna Sequences ◽

Tandem Repeats ◽

Full Length ◽

Coding Region ◽

Protein Coding ◽

Coding Regions ◽

Mirna Genes ◽

Genome Wide ◽

Varying Length

AbstractSimple Sequence Repeats (SSRs), also known as microsatellites are short tandem repeats of DNA sequences that are 1-6 bp long. In plants, SSRs serve as a source of important class of molecular markers because of their hypervariabile and co-dominant nature, making them useful both for the genetic studies and marker-assisted breeding. The SSRs are widespread throughout the genome of an organism, so that a large number of SSR datasets are available, most of them from either protein-coding regions or untranslated regions. It is only recently, that their occurrence within microRNAs (miRNA) genes has received attention. As is widely known, miRNA themselves are a class of non-coding RNAs (ncRNAs) with varying length of 19-22 nucleotides (nts), which play an important role in regulating gene expression in plants under different biotic and abiotic stresses. In this communication, we describe the results of a study, where miRNA-SSRs in full length pre-miRNA sequences of Arabidopsis thaliana were mined. The sequences were retrieved by annotations available at EnsemblPlants using BatchPrimer3 server with miRNA-SSR flanking primers found to be well distributed. Our analysis shows that miRNA-SSRs are relatively rare in protein-coding regions but abundant in non-coding region. All the observed 147 di-, tri-, tetra-, penta- and hexanucleotide SSRs were located in non-coding regions of all the 5 chromosomes of A. thaliana. While we confirm that miRNA-SSRs were commonly spread across the full length pre-miRNAs, we envisage that such studies would allow us to identify newly discovered markers for breeding studies.

Download Full-text

Physicochemical amino acid properties better describe substitution rates in large populations

10.1101/378893 ◽

2018 ◽

Author(s):

Claudia C. Weber ◽

Simon Whelan

Keyword(s):

Amino Acids ◽

Amino Acid ◽

Protein A ◽

Sequence Evolution ◽

Codon Model ◽

Codon Substitution ◽

Amino Acid Properties ◽

Large Populations ◽

History Of ◽

Treat All

AbstractSubstitutions between chemically distant amino acids are known to occur less frequently than those between more similar amino acids. This knowledge, however, is not reflected in most codon substitution models, which treat all non-synonymous changes as if they were equivalent in terms of impact on the protein. A variety of methods for integrating chemical distances into models have been proposed, with a common approach being to divide substitutions into radical or conservative categories. Nevertheless, it remains unclear whether the resulting models describe sequence evolution better than their simpler counterparts.We propose a parametric codon model that distinguishes between radical and conservative substitutions, allowing us to assess if radical substitutions are preferentially removed by selection. Applying our new model to a range of phylogenomic data, we find differentiating between radical and conservative substitutions provides significantly better fit for large populations, but see no equivalent improvement for smaller populations. Comparing codon- and amino acid models using these same data shows that alignments from large populations tend to select phylogenetic models containing information about amino acid exchangeabilities, whereas the structure of the genetic code is more important for smaller populations.Our results suggest selection against radical substitutions is, on average, more pronounced in large populations than smaller ones. The reduced observable effect of selection in smaller populations may be due to stronger genetic drift making it more challenging to detect preferences. Our results imply an important connection between the life history of a phylogenetic group and the model that best describes its evolution.

Download Full-text

Introduction of restriction enzyme sites in protein-coding DNA sequences by site-specific mutagenesis not affecting the amino acid sequence: a computer program

Nucleic Acids Research ◽

10.1093/nar/12.1part2.777 ◽

1984 ◽

Vol 12 (1Part2) ◽

pp. 777-787 ◽

Cited By ~ 7

Author(s):

R. Arentzen ◽

W.C. Ripka

Keyword(s):

Amino Acid ◽

Amino Acid Sequence ◽

Computer Program ◽

Restriction Enzyme ◽

Dna Sequences ◽

Protein Coding ◽

Site Specific

Download Full-text

Organization, primary structure, and evolution of histone H2A and H2B genes of the fission yeast Schizosaccharomyces pombe.

Molecular and Cellular Biology ◽

10.1128/mcb.5.11.3261 ◽

1985 ◽

Vol 5 (11) ◽

pp. 3261-3269 ◽

Cited By ~ 21

Author(s):

J Choe ◽

T Schuster ◽

M Grunstein

Keyword(s):

Amino Acid ◽

Schizosaccharomyces Pombe ◽

Fission Yeast ◽

Dna Sequences ◽

Sequence Evolution ◽

Hydrophobic Core ◽

Histone H2a ◽

Acid Protein ◽

Close Proximity ◽

Pair Sequence

The histone H2A and H2B genes of the fission yeast Schizosaccharomyces pombe were cloned and sequenced. Southern blot and sequence analyses showed that, unlike other eucaryotes, Saccharomyces cerevisiae included, S. pombe has unequal numbers of these genes, containing two histone H2A genes (H2A-alpha and -beta) and only one H2B gene (H2B-alpha) per haploid genome. H2A- and H2B-alpha are adjacent to each other and are divergently transcribed. H2A-beta has no other histone gene in close proximity. Preceding both H2A-alpha and -beta is a highly conserved 19-base-pair sequence (5'-CATCAC/AAACCCTAACCCTG-3'). The H2A DNA sequences encode two histone H2A subtypes differing in amino acid sequence (three residues) and size (H2A-alpha, 131 residues; H2A-beta, 130 residues). H2B-alpha codes for a 125-amino-acid protein. Sequence evolution is extensive between S. pombe and S. cerevisiae and displays unique patterns of divergence. Certain N-terminal sequences normally divergent between eucaryotes are conserved between the two yeasts. In contrast, the normally conserved hydrophobic core of H2A is as divergent between the yeasts as between S. pombe and calf.

Download Full-text