scholarly journals Model-based inference of punctuated molecular evolution

2019 ◽  
Author(s):  
Marc Manceau ◽  
Julie Marin ◽  
Hélène Morlon ◽  
Amaury Lambert

AbstractIn standard models of molecular evolution, DNA sequences evolve through asynchronous substitutions according to Poisson processes with a constant rate (called the molecular clock) or a time-varying rate (relaxed clock). However, DNA sequences can also undergo episodes of fast divergence that will appear as synchronous substitutions affecting several sites simultaneously at the macroevolutionary time scale. Here, we develop a model combining basal, clock-like molecular evolution with episodes of fast divergence called spikes arising at speciation events. Given a multiple sequence alignment and its time-calibrated species phylogeny, our model is able to detect speciation events (including hidden ones) co-occurring with spike events and to estimate the probability and amplitude of these spikes on the phylogeny. We identify the conditions under which spikes can be distinguished from the natural variance of the clock-like component of molecular evolution and from temporal variations of the clock. We apply the method to genes underlying snake venom proteins and identify several spikes at gene-specific locations in the phylogeny. This work should pave the way for analyses relying on whole genomes to inform on modes of species diversification.

2020 ◽  
Vol 37 (11) ◽  
pp. 3308-3323 ◽  
Author(s):  
Marc Manceau ◽  
Julie Marin ◽  
Hélène Morlon ◽  
Amaury Lambert

Abstract In standard models of molecular evolution, DNA sequences evolve through asynchronous substitutions according to Poisson processes with a constant rate (called the molecular clock) or a rate that can vary (relaxed clock). However, DNA sequences can also undergo episodes of fast divergence that will appear as synchronous substitutions affecting several sites simultaneously at the macroevolutionary timescale. Here, we develop a model, which we call the Relaxed Clock with Spikes model, combining basal, clock-like molecular substitutions with episodes of fast divergence called spikes arising at speciation events. Given a multiple sequence alignment and its time-calibrated species phylogeny, our model is able to detect speciation events (including hidden ones) cooccurring with spike events and to estimate the probability and amplitude of these spikes on the phylogeny. We identify the conditions under which spikes can be distinguished from the natural variance of the clock-like component of molecular substitutions and from variations of the clock. We apply the method to genes underlying snake venom proteins and identify several spikes at gene-specific locations in the phylogeny. This work should pave the way for analyses relying on whole genomes to inform on modes of species diversification.


2019 ◽  
Vol 15 (01) ◽  
pp. 1-8
Author(s):  
Ashish C Patel ◽  
C G Joshi

Current data storage technologies cannot keep pace longer with exponentially growing amounts of data through the extensive use of social networking photos and media, etc. The "digital world” with 4.4 zettabytes in 2013 has predicted it to reach 44 zettabytes by 2020. From the past 30 years, scientists and researchers have been trying to develop a robust way of storing data on a medium which is dense and ever-lasting and found DNA as the most promising storage medium. Unlike existing storage devices, DNA requires no maintenance, except the need to store at a cool and dark place. DNA has a small size with high density; just 1 gram of dry DNA can store about 455 exabytes of data. DNA stores the informations using four bases, viz., A, T, G, and C, while CDs, hard disks and other devices stores the information using 0’s and 1’s on the spiral tracks. In the DNA based storage, after binarization of digital file into the binary codes, encoding and decoding are important steps in DNA based storage system. Once the digital file is encoded, the next step is to synthesize arbitrary single-strand DNA sequences and that can be stored in the deep freeze until use.When there is a need for information to be recovered, it can be done using DNA sequencing. New generation sequencing (NGS) capable of producing sequences with very high throughput at a much lower cost about less than 0.1 USD for one MB of data than the first sequencing technologies. Post-sequencing processing includes alignment of all reads using multiple sequence alignment (MSA) algorithms to obtain different consensus sequences. The consensus sequence is decoded as the reversal of the encoding process. Most prior DNA data storage efforts sequenced and decoded the entire amount of stored digital information with no random access, but nowadays it has become possible to extract selective files (e.g., retrieving only required image from a collection) from a DNA pool using PCR-based random access. Various scientists successfully stored up to 110 zettabytes data in one gram of DNA. In the future, with an efficient encoding, error corrections, cheaper DNA synthesis,and sequencing, DNA based storage will become a practical solution for storage of exponentially growing digital data.


Genome ◽  
2004 ◽  
Vol 47 (4) ◽  
pp. 732-741 ◽  
Author(s):  
Wolfgang Staiber

The origin of germline-limited chromosomes (Ks) as descendants of somatic chromosomes (Ss) and their structural evolution was recently elucidated in the chironomid Acricotopus. The Ks consist of large S-homologous sections and of heterochromatic segments containing germline-specific, highly repetitive DNA sequences. Less is known about the molecular evolution and features of the sequences in the S-homologous K sections. More information about this was received by comparing homologous gene sequences of Ks and Ss. Genes for 5.8S, 18S, 28S, and 5S ribosomal RNA were choosen for the comparison and therefore isolated first by PCR from somatic DNA of Acricotopus and sequenced. Specific K DNA was collected by microdissection of monopolar moving K complements from differential gonial mitoses and was then amplified by degenerate oligonucleotide primer (DOP)-PCR. With the sequence data of the somatic rDNAs, the homologous 5.8S and 5S rDNA sequences were isolated by PCR from the DOP-PCR sequence pool of the Ks. In addition, a number of K DOP-PCR sequences were directly cloned and analysed. One K clone contained a section of a putative N-acetyltransferase gene. Compared with its homolog from the Ss, the sequence exhibited few nucleotide substitutions (99.2% sequence identity). The same was true for the 5.8S and 5S sequences from Ss and Ks (97.5%–100% identity). This supports the idea that the S-homologous K sequences may be conserved and do not evolve independently from their somatic homologs. Possible mechanisms effecting such conservation of S-derived sequences in the Ks are discussed.Key words: microdissection, DOP-PCR, germline-limited chromosomes, molecular evolution.


Molecules ◽  
2018 ◽  
Vol 23 (11) ◽  
pp. 2748 ◽  
Author(s):  
Ae-Ree Lee ◽  
Na-Hyun Kim ◽  
Yeo-Jin Seo ◽  
Seo-Ree Choi ◽  
Joon-Hwa Lee

Z-DNA is stabilized by various Z-DNA binding proteins (ZBPs) that play important roles in RNA editing, innate immune response, and viral infection. In this review, the structural and dynamics of various ZBPs complexed with Z-DNA are summarized to better understand the mechanisms by which ZBPs selectively recognize d(CG)-repeat DNA sequences in genomic DNA and efficiently convert them to left-handed Z-DNA to achieve their biological function. The intermolecular interaction of ZBPs with Z-DNA strands is mediated through a single continuous recognition surface which consists of an α3 helix and a β-hairpin. In the ZBP-Z-DNA complexes, three identical, conserved residues (N173, Y177, and W195 in the Zα domain of human ADAR1) play central roles in the interaction with Z-DNA. ZBPs convert a 6-base DNA pair to a Z-form helix via the B-Z transition mechanism in which the ZBP first binds to B-DNA and then shifts the equilibrium from B-DNA to Z-DNA, a conformation that is then selectively stabilized by the additional binding of a second ZBP molecule. During B-Z transition, ZBPs selectively recognize the alternating d(CG)n sequence and convert it to a Z-form helix in long genomic DNA through multiple sequence discrimination steps. In addition, the intermediate complex formed by ZBPs and B-DNA, which is modulated by varying conditions, determines the degree of B-Z transition.


1987 ◽  
Vol 7 (12) ◽  
pp. 4185-4193
Author(s):  
K A Roebuck ◽  
R J Walker ◽  
W E Stumph

The DNA sequence requirements of chicken U1 RNA gene expression have been examined in an oocyte transcription system. An enhancer region, which was required for efficient U1 RNA gene expression, is contained within a region of conserved DNA sequences spanning nucleotide positions -230 to -183, upstream of the transcriptional initiation site. These DNA sequences can be divided into at least two distinct subregions or domains that acted synergistically to provide a greater than 20-fold stimulation of U1 RNA synthesis. The first domain contains the octamer sequence ATGCAAAT and was recognized by a DNA-binding factor present in HeLa cell extracts. The second domain (the SPH domain) consists of conserved sequences immediately downstream of the octamer and is an essential component of the enhancer. In the oocyte, the DNA sequences of the SPH domain were able to enhance gene expression at least 10-fold in the absence of the octamer domain. In contrast, the octamer domain, although required for full U1 RNA gene activity, was unable to stimulate expression in the absence of the adjacent downstream DNA sequences. These findings imply that sequences 3' of the octamer play a major role in the function of the chicken U1 RNA gene enhancer. This concept was supported by transcriptional competition studies in which a cloned chicken U4B RNA gene was used to compete for limiting transcription factors in oocytes. Multiple sequence motifs that can function in a variety of cis-linked configurations may be a general feature of vertebrate small nuclear RNA gene enhancers.


2016 ◽  
Vol 7 (3) ◽  
pp. 36-55 ◽  
Author(s):  
El-amine Zemali ◽  
Abdelmadjid Boukra

One of the most challenging tasks in bioinformatics is the resolution of Multiple Sequence Alignment (MSA) problem. It consists in comparing a set of protein or DNA sequences, in aim of predicting their structure and function. This paper introduces a new bio-inspired approach to solve such problem. This approach named BA-MSA is based on Bat Algorithm. Bat Algorithm (BA) is a recent evolutionary algorithm inspired from Bats behavior seeking their prey. The proposed approach includes new mechanism to generate initial population. It consists in generating a guide tree for each solution with progressive approach by varying some parameters. The generated guide tree will be enhanced by Hill-Climbing algorithm. In addition, to deal with the premature convergence of BA, a new restart technique is proposed to introduce more diversification when detecting premature convergence. Balibase 2.0 datasets are used for experiments. The comparison with well-known methods as MSA-GA MSA-GA (w\prealign), ClustalW, and SAGA and recent method (BBOMP) shows the effectiveness of the proposed approach.


2004 ◽  
Vol 20 (9) ◽  
pp. 1468-1469 ◽  
Author(s):  
M. C. Frith ◽  
A. S. Halees ◽  
U. Hansen ◽  
Z. Weng
Keyword(s):  

2011 ◽  
Vol 366 (1577) ◽  
pp. 2503-2513 ◽  
Author(s):  
Lindell Bromham

DNA sequences evolve at different rates in different species. This rate variation has been most closely examined in mammals, revealing a large number of characteristics that can shape the rate of molecular evolution. Many of these traits are part of the mammalian life-history continuum: species with small body size, rapid generation turnover, high fecundity and short lifespans tend to have faster rates of molecular evolution. In addition, rate of molecular evolution in mammals might be influenced by behaviour (such as mating system), ecological factors (such as range restriction) and evolutionary history (such as diversification rate). I discuss the evidence for these patterns of rate variation, and the possible explanations of these correlations. I also consider the impact of these systematic patterns of rate variation on the reliability of the molecular date estimates that have been used to suggest a Cretaceous radiation of modern mammals, before the final extinction of the dinosaurs.


2015 ◽  
Vol 13 (04) ◽  
pp. 1550016 ◽  
Author(s):  
El-Amine Zemali ◽  
Abdelmadjid Boukra

The multiple sequence alignment (MSA) is one of the most challenging problems in bioinformatics, it involves discovering similarity between a set of protein or DNA sequences. This paper introduces a new method for the MSA problem called biogeography-based optimization with multiple populations (BBOMP). It is based on a recent metaheuristic inspired from the mathematics of biogeography named biogeography-based optimization (BBO). To improve the exploration ability of BBO, we have introduced a new concept allowing better exploration of the search space. It consists of manipulating multiple populations having each one its own parameters. These parameters are used to build up progressive alignments allowing more diversity. At each iteration, the best found solution is injected in each population. Moreover, to improve solution quality, six operators are defined. These operators are selected with a dynamic probability which changes according to the operators efficiency. In order to test proposed approach performance, we have considered a set of datasets from Balibase 2.0 and compared it with many recent algorithms such as GAPAM, MSA-GA, QEAMSA and RBT-GA. The results show that the proposed approach achieves better average score than the previously cited methods.


Sign in / Sign up

Export Citation Format

Share Document