MCALIGN: Stochastic Alignment of Noncoding DNA Sequences Based on an Evolutionary Model of Sequence Evolution

P. D. Keightley

doi:10.1101/gr.1571904

Perturbative formulation of general continuous-time Markov model of sequence evolution via insertions/deletions, Part II: Perturbation analyses

10.1101/023606 ◽

2015 ◽

Cited By ~ 4

Author(s):

Kiyoshi Ezawa ◽

Dan Graur ◽

Giddy Landan

Keyword(s):

Markov Model ◽

Exact Solutions ◽

Ab Initio ◽

Dna Sequences ◽

Continuous Time ◽

Evolutionary Model ◽

Local Alignment ◽

Sequence Evolution ◽

Sequence Alignments ◽

Insertions And Deletions

AbstractBackgroundInsertions and deletions (indels) account for more nucleotide differences between two related DNA sequences than substitutions do, and thus it is imperative to develop a stochastic evolutionary model that enables us to reliably calculate the probability of the sequence evolution through indel processes. In a separate paper (Ezawa, Graur and Landan 2015a), we established a theoretical basis of our ab initio perturbative formulation of a genuine evolutionary model, more specifically, a continuous-time Markov model of the evolution of an entire sequence via insertions and deletions. And we showed that, under some conditions, the ab initio probability of an alignment can be factorized into the product of an overall factor and contributions from regions (or local alignments) separated by gapless columns.ResultsThis paper describes how our ab initio perturbative formulation can be concretely used to approximately calculate the probabilities of all types of local pairwise alignments (PWAs) and some typical types of local multiple sequence alignments (MSAs). For each local alignment type, we calculated the fewest-indel contribution and the next-fewest-indel contribution to its probability, and we compared them under various conditions. We also derived a system of integral equations that can be numerically solved to give “exact solutions” for some common types of local PWAs. And we compared the obtained “exact solutions” with the fewest-indel contributions. The results indicated that even the fewest-indel terms alone can quite accurately approximate the probabilities of local alignments, as long as the segments and the branches in the tree are of modest lengths. Moreover, in the light of our formulation, we examined parameter regions where other indel models can safely approximate the correct evolutionary probabilities. The analyses also suggested some modifications necessary for these models to improve the accuracy of their probability estimations.ConclusionsAt least under modest conditions, our ab initio perturbative formulation can quite accurately calculate alignment probabilities under biologically realistic indel models. It also provides a sound reference point that other indel models can be compared to. [This paper and three other papers (Ezawa, Graur and Landan 2015a,b,c) describe a series of our efforts to develop, apply, and extend the ab initio perturbative formulation of a general continuous-time Markov model of indels.]

Download Full-text

Perturbative formulation of general continuous-time Markov model of sequence evolution via insertions/deletions, Part IV: Incorporation of substitutions and other mutations

10.1101/023622 ◽

2015 ◽

Cited By ~ 5

Author(s):

Kiyoshi Ezawa ◽

Dan Graur ◽

Giddy Landan

Keyword(s):

Markov Model ◽

Ab Initio ◽

Dna Sequences ◽

Continuous Time ◽

Evolutionary Model ◽

Genomic Rearrangements ◽

Sequence Evolution ◽

Time Axis ◽

Insertions And Deletions ◽

Sufficient Set

BackgroundInsertions and deletions (indels) account for more nucleotide differences between two related DNA sequences than substitutions do, and thus it is imperative to develop a stochastic evolutionary model that enables us to reliably calculate the probability of the sequence evolution through indel processes. In a separate paper (Ezawa, Graur and Landan 2015a), we established the theoretical basis of ourab initioperturbative formulation of a continuous-time Markov model of the evolution of anentiresequence via insertions and deletions along time axis. In other separate papers (Ezawa, Graur and Landan 2015b,c), we also developed various analytical and computational methods to concretely calculate alignment probabilities via our formulation. In terms of frequencies, however, substitutions are usually more common than indels. Moreover, many experiments suggest that other mutations, such as genomic rearrangements and recombination, also play some important roles in sequence evolution.ResultsHere, we extend ourab initioperturbative formulation of agenuineevolutionary model so that it can incorporate other mutations. We give a sufficient set of conditions that the probability of evolution via both indels and substitutions is factorable into the product of an overall factor and local contributions. We also show that, under a set of conditions, the probability can be factorized into two sub-probabilities, one via indels alone and the other via substitutions alone. Moreover, we show that our formulation can be extended so that it can also incorporate genomic rearrangements, such as inversions and duplications. We also discuss how to accommodate some other types of mutations within our formulation.ConclusionsOurab initioperturbative formulation thus extended could in principle describe the stochastic evolution of anentiresequence along time axis via major types of mutations. [This paper and three other papers (Ezawa, Graur and Landan 2015a,b,c) describe a series of our efforts to develop, apply, and extend theab initioperturbative formulation of a general continuous-time Markov model of indels.]

Download Full-text

Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis

Physical Review E ◽

10.1103/physreve.51.5084 ◽

1995 ◽

Vol 51 (5) ◽

pp. 5084-5091 ◽

Cited By ~ 426

Author(s):

S. V. Buldyrev ◽

A. L. Goldberger ◽

S. Havlin ◽

R. N. Mantegna ◽

M. E. Matsa ◽

...

Keyword(s):

Long Range ◽

Dna Sequences ◽

Noncoding Dna ◽

Range Correlation ◽

Correlation Properties

Download Full-text

Chloroplast Noncoding DNA Sequences Reveal Genetic Distinction and Diversity between Wild and Cultivated Prunus yedoensis

Journal of the American Society for Horticultural Science ◽

10.21273/jashs.142.6.434 ◽

2017 ◽

Vol 142 (6) ◽

pp. 434-443

Author(s):

Eun Ju Cheong ◽

Myong-Suk Cho ◽

Seung-Chul Kim ◽

Chan-Soo Kim

Keyword(s):

United States ◽

Dna Sequences ◽

The United States ◽

Jeju Island ◽

Noncoding Dna ◽

Noncoding Regions ◽

Naturally Occurring ◽

Prunus Yedoensis ◽

History Of ◽

Ornamental Trees

Cultivated flowering cherries (Prunus subgenus Cerasus), which are one of the most popular ornamental trees around the world, have been developed through artificial hybridizations among wild flowering cherries. Among the hundreds of cultivars of flowering cherries, Prunus ×yedoensis ‘Somei-yoshino’ is the most common and widespread. However, its origin and genetic relationship to wild P. yedoensis, naturally occurring on Jeju Island, South Korea, have long been debated. We used sequence polymorphisms in eight chloroplast DNA (cpDNA) noncoding regions to distinguish wild and cultivated flowering cherries among 104 individuals (55 accessions). We were able to distinguish two distinct groups, one corresponding to wild P. yedoensis collections from Jeju Island and the other collections of cultivated P. ×yedoensis from Korea, Japan, and the United States. The chlorotype diversity of wild P. yedoensis in Jeju Island and cultivated P. ×yedoensis collections in the United States was quite high, suggesting multiple natural hybrid origins and long history of cultivation from different original sources, respectively.

Download Full-text

Variation in the Neisseria meningitidis FadL-like protein: an evolutionary model for a relatively low-abundance surface antigen

Microbiology ◽

10.1099/mic.0.043182-0 ◽

2010 ◽

Vol 156 (12) ◽

pp. 3596-3608 ◽

Cited By ~ 3

Author(s):

Daniel Yero ◽

Caroline Vipond ◽

Yanet Climent ◽

Gretel Sardiñas ◽

Ian M. Feavers ◽

...

Keyword(s):

Neisseria Meningitidis ◽

Dna Sequences ◽

Outer Membrane Proteins ◽

Evolutionary Model ◽

Surface Protein ◽

Synonymous Substitution ◽

Immune Selection ◽

Positive Selection Pressure ◽

Population Structuring ◽

Sequence Types

The molecular diversity of a novel Neisseria meningitidis antigen, encoded by the ORF NMB0088 of MC58 (FadL-like protein), was assessed in a panel of 64 diverse meningococcal strains. The panel consisted of strains belonging to different serogroups, serotypes, serosubtypes and MLST sequence types, of different clinical sources, years and countries of isolation. Based on the sequence variability of the protein, the FadL-like protein has been divided into four variant groups in this species. Antigen variants were associated with specific serogroups and MLST clonal complexes. Maximum-likelihood analyses were used to determine the relationships among sequences and to compare the selection pressures acting on the encoded protein. Furthermore, a model of population genetics and molecular evolution was used to detect natural selection in DNA sequences using the non-synonymous : synonymous substitution (d N : d S) ratio. The meningococcal sequences were also compared with those of the related surface protein in non-pathogenic commensal Neisseria species to investigate potential horizontal gene transfer. The N. meningitidis fadL gene was subject to only weak positive selection pressure and was less diverse than meningococcal major outer-membrane proteins. The majority of the variability in fadL was due to recombination among existing alleles from the same or related species that resulted in a discrete mosaic structure in the meningococcal population. In general, the population structuring observed based on the FadL-like membrane protein indicates that it is under intermediate immune selection. However, the emergence of a new subvariant within the hyperinvasive lineages demonstrates the phenotypic adaptability of N. meningitidis, probably in response to selective pressure.

Download Full-text

Rapid genomic changes in newly synthesized amphiploids of Triticum and Aegilops. I. Changes in low-copy noncoding DNA sequences

Genome ◽

10.1139/gen-41-2-272 ◽

1998 ◽

Vol 41 (2) ◽

pp. 272-277 ◽

Cited By ~ 33

Author(s):

B. Liu ◽

J.M. Vega ◽

G. Segal ◽

S. Abbo ◽

M. Rodova ◽

...

Keyword(s):

Dna Sequences ◽

Noncoding Dna ◽

Genomic Changes

Download Full-text

Comment on “Linguistic Features of Noncoding DNA Sequences”

Physical Review Letters ◽

10.1103/physrevlett.76.1978 ◽

1996 ◽

Vol 76 (11) ◽

pp. 1978-1978 ◽

Cited By ~ 20

Author(s):

Richard F. Voss

Keyword(s):

Dna Sequences ◽

Linguistic Features ◽

Noncoding Dna

Download Full-text

Clustering of Identical Oligomers in Coding and Noncoding DNA Sequences

Journal of Biomolecular Structure and Dynamics ◽

10.1080/07391102.1999.10508342 ◽

1999 ◽

Vol 17 (1) ◽

pp. 79-87 ◽

Cited By ~ 6

Author(s):

Rachel H. R. Stanley ◽

Nikolay V. Dokholyan ◽

Sergey V. Buldyrev ◽

Shlomo Havlin ◽

H. Eugene Stanley

Keyword(s):

Dna Sequences ◽

Noncoding Dna

Download Full-text

Genome-wide alignment-free phylogenetic distance estimation under a no strand-bias model

10.1101/2021.11.10.468111 ◽

2021 ◽

Author(s):

Metin Balaban ◽

Nishat Anjum Bristy ◽

Ahnaf Faisal ◽

Md Shamsuzzoha Bayzid ◽

Siavash Mirarab

Keyword(s):

Dna Sequences ◽

Distance Estimation ◽

Sequence Evolution ◽

Phylogenetic Distance ◽

Strand Bias ◽

Alignment Free ◽

Bias Model ◽

Genome Wide ◽

Genome Wide Data ◽

Complex Models

While aligning sequences has been the dominant approach for determining homology prior to phylogenetic inference, alignment-free methods have much appeal in terms of simplifying the process of inference, especially when analyzing genome-wide data. Furthermore, alignment-free methods present the only option for some emerging forms of data such as genome skims, which cannot be assembled. Despite the appeal, alignment-free methods have not been competitive with alignment-based methods in terms of accuracy. One limitation of alignment-free methods is that they typically rely on simplified models of sequence evolution such as Jukes-Cantor. It is possible to compute pairwise distances under more complex models by computing frequencies of base substitutions provided that these quantities can be estimated in the alignment-free setting. A particular limitation is that for many forms of genome-wide data, which arguably present the best use case for alignment-free methods, the strand of DNA sequences is unknown. Under such conditions, the so-called no-strand bias models are the most complex models that can be used. Here, we show how to calculate distances under a no-strain bias restriction of the General Time Reversible (GTR) model called TK4 without relying on alignments. The method relies on replacing letters in the input sequences, and subsequent computation of Jaccard indices between k-mer sets. For the method to work on large genomes, we also need to compute the number of k-mer mismatches after replacement due to random chance. We show in simulation that these alignment-free distances can be highly accurate when genomes evolve under the assumed models, and we examine the effectiveness of the method on real genomic data.

Download Full-text

The impact of epigenetic information on genome evolution

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2020.0114 ◽

2021 ◽

Vol 376 (1826) ◽

Cited By ~ 3

Author(s):

Soojin V. Yi ◽

Michael A. D. Goodisman

Keyword(s):

Genome Evolution ◽

Dna Sequence ◽

Dna Sequences ◽

Sequence Evolution ◽

Epigenetic Information ◽

Dna Mutation ◽

Dna Mutations ◽

Physical Interactions ◽

Eukaryotic Organisms ◽

The Impact

Epigenetic information affects gene function by interacting with chromatin, while not changing the DNA sequence itself. However, it has become apparent that the interactions between epigenetic information and chromatin can, in fact, indirectly lead to DNA mutations and ultimately influence genome evolution. This review evaluates the ways in which epigenetic information affects genome sequence and evolution. We discuss how DNA methylation has strong and pervasive effects on DNA sequence evolution in eukaryotic organisms. We also review how the physical interactions arising from the connections between histone proteins and DNA affect DNA mutation and repair. We then discuss how a variety of epigenetic mechanisms exert substantial effects on genome evolution by suppressing the movement of transposable elements. Finally, we examine how genome expansion through gene duplication is also partially controlled by epigenetic information. Overall, we conclude that epigenetic information has widespread indirect effects on DNA sequences in eukaryotes and represents a potent cause and constraint of genome evolution.This article is part of the theme issue ‘How does epigenetics influence the course of evolution?’

Download Full-text