Model-based inference of punctuated molecular evolution

Mapping Intimacies ◽

10.1101/852343 ◽

2019 ◽

Cited By ~ 1

Author(s):

Marc Manceau ◽

Julie Marin ◽

Hélène Morlon ◽

Amaury Lambert

Keyword(s):

Molecular Evolution ◽

Dna Sequences ◽

Temporal Variations ◽

Multiple Sequence ◽

Model Combining ◽

Constant Rate ◽

Whole Genomes ◽

Standard Models ◽

Natural Variance ◽

Venom Proteins

AbstractIn standard models of molecular evolution, DNA sequences evolve through asynchronous substitutions according to Poisson processes with a constant rate (called the molecular clock) or a time-varying rate (relaxed clock). However, DNA sequences can also undergo episodes of fast divergence that will appear as synchronous substitutions affecting several sites simultaneously at the macroevolutionary time scale. Here, we develop a model combining basal, clock-like molecular evolution with episodes of fast divergence called spikes arising at speciation events. Given a multiple sequence alignment and its time-calibrated species phylogeny, our model is able to detect speciation events (including hidden ones) co-occurring with spike events and to estimate the probability and amplitude of these spikes on the phylogeny. We identify the conditions under which spikes can be distinguished from the natural variance of the clock-like component of molecular evolution and from temporal variations of the clock. We apply the method to genes underlying snake venom proteins and identify several spikes at gene-specific locations in the phylogeny. This work should pave the way for analyses relying on whole genomes to inform on modes of species diversification.

Download Full-text

Model-Based Inference of Punctuated Molecular Evolution

Molecular Biology and Evolution ◽

10.1093/molbev/msaa144 ◽

2020 ◽

Vol 37 (11) ◽

pp. 3308-3323 ◽

Cited By ~ 2

Author(s):

Marc Manceau ◽

Julie Marin ◽

Hélène Morlon ◽

Amaury Lambert

Keyword(s):

Molecular Evolution ◽

Dna Sequences ◽

Multiple Sequence ◽

Model Combining ◽

Constant Rate ◽

Whole Genomes ◽

Standard Models ◽

Natural Variance ◽

Venom Proteins ◽

Relaxed Clock

Abstract In standard models of molecular evolution, DNA sequences evolve through asynchronous substitutions according to Poisson processes with a constant rate (called the molecular clock) or a rate that can vary (relaxed clock). However, DNA sequences can also undergo episodes of fast divergence that will appear as synchronous substitutions affecting several sites simultaneously at the macroevolutionary timescale. Here, we develop a model, which we call the Relaxed Clock with Spikes model, combining basal, clock-like molecular substitutions with episodes of fast divergence called spikes arising at speciation events. Given a multiple sequence alignment and its time-calibrated species phylogeny, our model is able to detect speciation events (including hidden ones) cooccurring with spike events and to estimate the probability and amplitude of these spikes on the phylogeny. We identify the conditions under which spikes can be distinguished from the natural variance of the clock-like component of molecular substitutions and from variations of the clock. We apply the method to genes underlying snake venom proteins and identify several spikes at gene-specific locations in the phylogeny. This work should pave the way for analyses relying on whole genomes to inform on modes of species diversification.

Download Full-text

Deoxyribonucleic Acid as a Tool for Digital Information Storage: An Overview

THE INDIAN JOURNAL OF VETERINARY SCIENCES AND BIOTECHNOLOGY ◽

10.21887/ijvsbt.15.1.1 ◽

2019 ◽

Vol 15 (01) ◽

pp. 1-8

Author(s):

Ashish C Patel ◽

C G Joshi

Keyword(s):

Data Storage ◽

Dna Sequences ◽

Consensus Sequence ◽

Random Access ◽

Information Storage ◽

Digital Data ◽

Digital Information ◽

Multiple Sequence ◽

Digital World ◽

Digital File

Current data storage technologies cannot keep pace longer with exponentially growing amounts of data through the extensive use of social networking photos and media, etc. The "digital world” with 4.4 zettabytes in 2013 has predicted it to reach 44 zettabytes by 2020. From the past 30 years, scientists and researchers have been trying to develop a robust way of storing data on a medium which is dense and ever-lasting and found DNA as the most promising storage medium. Unlike existing storage devices, DNA requires no maintenance, except the need to store at a cool and dark place. DNA has a small size with high density; just 1 gram of dry DNA can store about 455 exabytes of data. DNA stores the informations using four bases, viz., A, T, G, and C, while CDs, hard disks and other devices stores the information using 0’s and 1’s on the spiral tracks. In the DNA based storage, after binarization of digital file into the binary codes, encoding and decoding are important steps in DNA based storage system. Once the digital file is encoded, the next step is to synthesize arbitrary single-strand DNA sequences and that can be stored in the deep freeze until use.When there is a need for information to be recovered, it can be done using DNA sequencing. New generation sequencing (NGS) capable of producing sequences with very high throughput at a much lower cost about less than 0.1 USD for one MB of data than the first sequencing technologies. Post-sequencing processing includes alignment of all reads using multiple sequence alignment (MSA) algorithms to obtain different consensus sequences. The consensus sequence is decoded as the reversal of the encoding process. Most prior DNA data storage efforts sequenced and decoded the entire amount of stored digital information with no random access, but nowadays it has become possible to extract selective files (e.g., retrieving only required image from a collection) from a DNA pool using PCR-based random access. Various scientists successfully stored up to 110 zettabytes data in one gram of DNA. In the future, with an efficient encoding, error corrections, cheaper DNA synthesis,and sequencing, DNA based storage will become a practical solution for storage of exponentially growing digital data.

Download Full-text

Molecular evolution of homologous gene sequences in germline-limited and somatic chromosomes of Acricotopus

Genome ◽

10.1139/g04-026 ◽

2004 ◽

Vol 47 (4) ◽

pp. 732-741 ◽

Cited By ~ 2

Author(s):

Wolfgang Staiber

Keyword(s):

Molecular Evolution ◽

Dna Sequences ◽

Sequence Data ◽

Structural Evolution ◽

5S Rdna ◽

Oligonucleotide Primer ◽

Homologous Gene ◽

Gene Sequences ◽

Nucleotide Substitutions ◽

Degenerate Oligonucleotide

The origin of germline-limited chromosomes (Ks) as descendants of somatic chromosomes (Ss) and their structural evolution was recently elucidated in the chironomid Acricotopus. The Ks consist of large S-homologous sections and of heterochromatic segments containing germline-specific, highly repetitive DNA sequences. Less is known about the molecular evolution and features of the sequences in the S-homologous K sections. More information about this was received by comparing homologous gene sequences of Ks and Ss. Genes for 5.8S, 18S, 28S, and 5S ribosomal RNA were choosen for the comparison and therefore isolated first by PCR from somatic DNA of Acricotopus and sequenced. Specific K DNA was collected by microdissection of monopolar moving K complements from differential gonial mitoses and was then amplified by degenerate oligonucleotide primer (DOP)-PCR. With the sequence data of the somatic rDNAs, the homologous 5.8S and 5S rDNA sequences were isolated by PCR from the DOP-PCR sequence pool of the Ks. In addition, a number of K DOP-PCR sequences were directly cloned and analysed. One K clone contained a section of a putative N-acetyltransferase gene. Compared with its homolog from the Ss, the sequence exhibited few nucleotide substitutions (99.2% sequence identity). The same was true for the 5.8S and 5S sequences from Ss and Ks (97.5%100% identity). This supports the idea that the S-homologous K sequences may be conserved and do not evolve independently from their somatic homologs. Possible mechanisms effecting such conservation of S-derived sequences in the Ks are discussed.Key words: microdissection, DOP-PCR, germline-limited chromosomes, molecular evolution.

Download Full-text

Thermodynamic Model for B-Z Transition of DNA Induced by Z-DNA Binding Proteins

Molecules ◽

10.3390/molecules23112748 ◽

2018 ◽

Vol 23 (11) ◽

pp. 2748 ◽

Cited By ~ 5

Author(s):

Ae-Ree Lee ◽

Na-Hyun Kim ◽

Yeo-Jin Seo ◽

Seo-Ree Choi ◽

Joon-Hwa Lee

Keyword(s):

Dna Binding ◽

Dna Sequences ◽

Genomic Dna ◽

Binding Proteins ◽

Dna Binding Proteins ◽

Multiple Sequence ◽

Left Handed ◽

Transition Mechanism ◽

Dna Strands ◽

Z Dna

Z-DNA is stabilized by various Z-DNA binding proteins (ZBPs) that play important roles in RNA editing, innate immune response, and viral infection. In this review, the structural and dynamics of various ZBPs complexed with Z-DNA are summarized to better understand the mechanisms by which ZBPs selectively recognize d(CG)-repeat DNA sequences in genomic DNA and efficiently convert them to left-handed Z-DNA to achieve their biological function. The intermolecular interaction of ZBPs with Z-DNA strands is mediated through a single continuous recognition surface which consists of an α3 helix and a β-hairpin. In the ZBP-Z-DNA complexes, three identical, conserved residues (N173, Y177, and W195 in the Zα domain of human ADAR1) play central roles in the interaction with Z-DNA. ZBPs convert a 6-base DNA pair to a Z-form helix via the B-Z transition mechanism in which the ZBP first binds to B-DNA and then shifts the equilibrium from B-DNA to Z-DNA, a conformation that is then selectively stabilized by the additional binding of a second ZBP molecule. During B-Z transition, ZBPs selectively recognize the alternating d(CG)n sequence and convert it to a Z-form helix in long genomic DNA through multiple sequence discrimination steps. In addition, the intermediate complex formed by ZBPs and B-DNA, which is modulated by varying conditions, determines the degree of B-Z transition.

Download Full-text

Multiple functional motifs in the chicken U1 RNA gene enhancer

Molecular and Cellular Biology ◽

10.1128/mcb.7.12.4185-4193.1987 ◽

1987 ◽

Vol 7 (12) ◽

pp. 4185-4193

Author(s):

K A Roebuck ◽

R J Walker ◽

W E Stumph

Keyword(s):

Gene Expression ◽

Dna Sequences ◽

Initiation Site ◽

Sequence Motifs ◽

Small Nuclear Rna ◽

Multiple Sequence ◽

Transcriptional Initiation ◽

Transcription System ◽

Cell Extracts ◽

Gene Enhancer

The DNA sequence requirements of chicken U1 RNA gene expression have been examined in an oocyte transcription system. An enhancer region, which was required for efficient U1 RNA gene expression, is contained within a region of conserved DNA sequences spanning nucleotide positions -230 to -183, upstream of the transcriptional initiation site. These DNA sequences can be divided into at least two distinct subregions or domains that acted synergistically to provide a greater than 20-fold stimulation of U1 RNA synthesis. The first domain contains the octamer sequence ATGCAAAT and was recognized by a DNA-binding factor present in HeLa cell extracts. The second domain (the SPH domain) consists of conserved sequences immediately downstream of the octamer and is an essential component of the enhancer. In the oocyte, the DNA sequences of the SPH domain were able to enhance gene expression at least 10-fold in the absence of the octamer domain. In contrast, the octamer domain, although required for full U1 RNA gene activity, was unable to stimulate expression in the absence of the adjacent downstream DNA sequences. These findings imply that sequences 3' of the octamer play a major role in the function of the chicken U1 RNA gene enhancer. This concept was supported by transcriptional competition studies in which a cloned chicken U4B RNA gene was used to compete for limiting transcription factors in oocytes. Multiple sequence motifs that can function in a variety of cis-linked configurations may be a general feature of vertebrate small nuclear RNA gene enhancers.

Download Full-text

VISTA Family of Computational Tools for Comparative Analysis of DNA Sequences and Whole Genomes

Gene Mapping, Discovery, and Expression ◽

10.1385/1-59745-097-9:69 ◽

2006 ◽

pp. 69-90 ◽

Cited By ~ 3

Author(s):

Inna Dubchak ◽

Dmitriy V. Ryaboy

Keyword(s):

Comparative Analysis ◽

Dna Sequences ◽

Computational Tools ◽

Whole Genomes

Download Full-text

Using a Bio-Inspired Algorithm to Resolve the Multiple Sequence Alignment Problem

International Journal of Applied Metaheuristic Computing ◽

10.4018/ijamc.2016070103 ◽

2016 ◽

Vol 7 (3) ◽

pp. 36-55 ◽

Cited By ~ 2

Author(s):

El-amine Zemali ◽

Abdelmadjid Boukra

Keyword(s):

Sequence Alignment ◽

Dna Sequences ◽

Multiple Sequence Alignment ◽

Bat Algorithm ◽

Premature Convergence ◽

Hill Climbing ◽

Initial Population ◽

Multiple Sequence ◽

Guide Tree ◽

And Function

One of the most challenging tasks in bioinformatics is the resolution of Multiple Sequence Alignment (MSA) problem. It consists in comparing a set of protein or DNA sequences, in aim of predicting their structure and function. This paper introduces a new bio-inspired approach to solve such problem. This approach named BA-MSA is based on Bat Algorithm. Bat Algorithm (BA) is a recent evolutionary algorithm inspired from Bats behavior seeking their prey. The proposed approach includes new mechanism to generate initial population. It consists in generating a guide tree for each solution with progressive approach by varying some parameters. The generated guide tree will be enhanced by Hill-Climbing algorithm. In addition, to deal with the premature convergence of BA, a new restart technique is proposed to introduce more diversification when detecting premature convergence. Balibase 2.0 datasets are used for experiments. The comparison with well-known methods as MSA-GA MSA-GA (w\prealign), ClustalW, and SAGA and recent method (BBOMP) shows the effectiveness of the proposed approach.

Download Full-text

Site2genome: locating short DNA sequences in whole genomes

Bioinformatics ◽

10.1093/bioinformatics/bth094 ◽

2004 ◽

Vol 20 (9) ◽

pp. 1468-1469 ◽

Cited By ~ 2

Author(s):

M. C. Frith ◽

A. S. Halees ◽

U. Hansen ◽

Z. Weng

Keyword(s):

Dna Sequences ◽

Whole Genomes

Download Full-text

The genome as a life-history character: why rate of molecular evolution varies between mammal species

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2011.0014 ◽

2011 ◽

Vol 366 (1577) ◽

pp. 2503-2513 ◽

Cited By ~ 121

Author(s):

Lindell Bromham

Keyword(s):

Life History ◽

Molecular Evolution ◽

Dna Sequences ◽

Small Body ◽

Rate Variation ◽

Mammal Species ◽

Range Restriction ◽

High Fecundity ◽

Rate Of Molecular Evolution ◽

The Impact

DNA sequences evolve at different rates in different species. This rate variation has been most closely examined in mammals, revealing a large number of characteristics that can shape the rate of molecular evolution. Many of these traits are part of the mammalian life-history continuum: species with small body size, rapid generation turnover, high fecundity and short lifespans tend to have faster rates of molecular evolution. In addition, rate of molecular evolution in mammals might be influenced by behaviour (such as mating system), ecological factors (such as range restriction) and evolutionary history (such as diversification rate). I discuss the evidence for these patterns of rate variation, and the possible explanations of these correlations. I also consider the impact of these systematic patterns of rate variation on the reliability of the molecular date estimates that have been used to suggest a Cretaceous radiation of modern mammals, before the final extinction of the dinosaurs.

Download Full-text

Resolving the multiple sequence alignment problem using biogeography-based optimization with multiple populations

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972001550016x ◽

2015 ◽

Vol 13 (04) ◽

pp. 1550016 ◽

Cited By ~ 3

Author(s):

El-Amine Zemali ◽

Abdelmadjid Boukra

Keyword(s):

Sequence Alignment ◽

Dna Sequences ◽

Multiple Sequence Alignment ◽

Search Space ◽

New Method ◽

Average Score ◽

Solution Quality ◽

Multiple Sequence ◽

Multiple Populations ◽

Alignment Problem

The multiple sequence alignment (MSA) is one of the most challenging problems in bioinformatics, it involves discovering similarity between a set of protein or DNA sequences. This paper introduces a new method for the MSA problem called biogeography-based optimization with multiple populations (BBOMP). It is based on a recent metaheuristic inspired from the mathematics of biogeography named biogeography-based optimization (BBO). To improve the exploration ability of BBO, we have introduced a new concept allowing better exploration of the search space. It consists of manipulating multiple populations having each one its own parameters. These parameters are used to build up progressive alignments allowing more diversity. At each iteration, the best found solution is injected in each population. Moreover, to improve solution quality, six operators are defined. These operators are selected with a dynamic probability which changes according to the operators efficiency. In order to test proposed approach performance, we have considered a set of datasets from Balibase 2.0 and compared it with many recent algorithms such as GAPAM, MSA-GA, QEAMSA and RBT-GA. The results show that the proposed approach achieves better average score than the previously cited methods.

Download Full-text