Inferring the frequency spectrum of derived variants to quantify adaptive molecular evolution in protein-coding genes of Drosophila melanogaster

Mapping Intimacies ◽

10.1101/039404 ◽

2016 ◽

Cited By ~ 1

Author(s):

Peter D. Keightley ◽

Jose Campos ◽

Tom Booker ◽

Brian Charlesworth

Keyword(s):

Amino Acid ◽

Molecular Evolution ◽

Frequency Spectrum ◽

Population Genomics ◽

Selective Constraint ◽

Accurate Estimation ◽

Protein Coding ◽

Protein Coding Genes ◽

Adaptive Molecular Evolution ◽

Amino Acid Mutations

Many approaches for inferring adaptive molecular evolution analyze the unfolded site frequency spectrum (SFS), a vector of counts of sites with different numbers of copies of derived alleles in a sample of alleles from a population. Accurate inference of the high copy number elements of the SFS is difficult, however, because of misassignment of alleles as derived versus ancestral. This is a known problem with parsimony using outgroup species. Here, we show that the problem is particularly serious if there is variation in the substitution rate among sites brought about by variation in selective constraint levels. We present a new method for inferring the SFS using one or two outgroups, which attempts to overcome the problem of misassignment. We show that two outgroups are required for accurate estimation of the SFS if there is substantial variation in selective constraints, which is expected to be the case for nonsynonymous sites of protein-coding genes. We apply the method to estimate unfolded SFSs for synonymous and nonsynonymous sites from Phase 2 of the Drosophila Population Genomics Project. We use the unfolded spectra to estimate the frequency and strength of advantageous and deleterious mutations, and estimate that ~50% of amino acid substitutions are positively selected, but that less than 0.5% of new amino acid mutations are beneficial, with a scaled selection strength of Nes ≈ 12.

Download Full-text

Inferring the Frequency Spectrum of Derived Variants to Quantify Adaptive Molecular Evolution in Protein-Coding Genes of Drosophila melanogaster

Genetics ◽

10.1534/genetics.116.188102 ◽

2016 ◽

Vol 203 (2) ◽

pp. 975-984 ◽

Cited By ~ 43

Author(s):

Peter D. Keightley ◽

José L. Campos ◽

Tom R. Booker ◽

Brian Charlesworth

Keyword(s):

Drosophila Melanogaster ◽

Molecular Evolution ◽

Frequency Spectrum ◽

Protein Coding ◽

Protein Coding Genes ◽

Adaptive Molecular Evolution

Download Full-text

Molecular Evolution at the decapentaplegic Locus in Drosophila

Genetics ◽

10.1093/genetics/145.2.297 ◽

1997 ◽

Vol 145 (2) ◽

pp. 297-309 ◽

Cited By ~ 3

Author(s):

Stuart J Newfeld ◽

Richard W Padgett ◽

Seth D Findley ◽

Brent G Richter ◽

Michele Sanicola ◽

...

Keyword(s):

Molecular Evolution ◽

Transforming Growth Factor ◽

Transforming Growth Factor Β ◽

Selective Constraint ◽

Amino Acid Sequences ◽

Regulatory Sequences ◽

Protein Coding ◽

Terminal Ligand ◽

Cdna Sequences ◽

Interspecific Comparisons

Using an elaborate set of cis-regulatory sequences, the decapentaplegic (dpp) gene displays a dynamic pattern of gene expression during development. The C-terminal portion of the DPP protein is processed to generate a secreted signaling molecule belonging to the transforming growth factor-β (TGF-β) family. This signal, the DPP ligand, is able to influence the developmental fates of responsive cells in a concentration-dependent fashion. Here we examine the sequence level organization of a significant portion of the dpp locus in Drosophila melanogaster and use interspecific comparisons with D. simulans, D. pseudoobscura and D.virilis to explore the molecular evolution of the gene. Our interspecific analysis identified significant selective constraint on both the nucleotide and amino acid sequences. As expected, interspecific comparison of protein coding sequences shows that the C-terminal ligand region is highly conserved. However, the central portion of the protein is also conserved, while the N-terminal third is quite variable. Comparison of noncoding regions reveals significant stretches of nucleotide identity in the 3′ untranslated portion of exon 3 and in the intron between exons 2 and 3. An examination of cDNA sequences representing five classes of dpp transcripts indicates that these transcripts encode the same polypeptide.

Download Full-text

Unbiasing in the Genome Analysis of Iconic Shark Species

10.20944/preprints201911.0214.v1 ◽

2019 ◽

Author(s):

Kazuaki Yamaguchi ◽

Shigehiro Kuraku

Keyword(s):

Molecular Evolution ◽

Genome Stability ◽

White Shark ◽

Shark Species ◽

Whole Genome ◽

Potential Bias ◽

Short Article ◽

Protein Coding ◽

Protein Coding Genes ◽

Selection Of

A previous study involving whole genome sequencing of the white shark suggested unique molecular evolution accounting for gigantism and the enhanced longevity of sharks including positive selection of dozens of protein-coding genes potentially involved in genome stability. We performed a reanalysis on some of the genes and identified serious flaws in their results. In this short article, we scrutinize one of the serious problems we identified, report other concerns, and point out a potential bias in analyzing iconic shark species in general.

Download Full-text

iMKT: the integrative McDonald and Kreitman test

Nucleic Acids Research ◽

10.1093/nar/gkz372 ◽

2019 ◽

Vol 47 (W1) ◽

pp. W283-W288 ◽

Cited By ~ 5

Author(s):

Jesús Murga-Moreno ◽

Marta Coronado-Zamora ◽

Sergi Hervas ◽

Sònia Casillas ◽

Antonio Barbadilla

Keyword(s):

Reference Site ◽

Population Genomics ◽

Genomic Sequence ◽

Sequence Data ◽

Protein Coding ◽

Web Based ◽

Dna Sequence Data ◽

Protein Coding Genes ◽

Population Genomic ◽

Testing Complex

Abstract The McDonald and Kreitman test (MKT) is one of the most powerful and widely used methods to detect and quantify recurrent natural selection using DNA sequence data. Here we present iMKT (acronym for integrative McDonald and Kreitman test), a novel web-based service performing four distinct MKT types. It allows the detection and estimation of four different selection regimes −adaptive, neutral, strongly deleterious and weakly deleterious− acting on any genomic sequence. iMKT can analyze both user's own population genomic data and pre-loaded Drosophila melanogaster and human sequences of protein-coding genes obtained from the largest population genomic datasets to date. Advanced options in the website allow testing complex hypotheses such as the application example showed here: do genes located in high recombination regions undergo higher rates of adaptation? We aim that iMKT will become a reference site tool for the study of evolutionary adaptation in massive population genomics datasets, especially in Drosophila and humans. iMKT is a free resource online at https://imkt.uab.cat.

Download Full-text

Complete Genome Sequence of Salmonella enterica Siphophage Shemara

Microbiology Resource Announcements ◽

10.1128/mra.01518-19 ◽

2020 ◽

Vol 9 (6) ◽

Author(s):

Michael Chung ◽

Yicheng Xie ◽

Heather Newkirk ◽

Mei Liu ◽

Jason J. Gill ◽

...

Keyword(s):

Amino Acid ◽

Genome Sequence ◽

Complete Genome Sequence ◽

Salmonella Enterica ◽

Complete Genome ◽

Protein Coding ◽

Content Type ◽

Protein Coding Genes ◽

Amino Acid Levels

Here, we present the annotated genome of Shemara, a siphophage of Salmonella enterica. The Shemara genome is 44 kb with 83 predicted protein-coding genes. At the nucleotide and amino acid levels, Shemara is most similar to phages in the Guernseyvirinae subfamily.

Download Full-text

Comparative Molecular Evolution of Primary (Buchnera) and Secondary Symbionts of Aphids Based on Two Protein-Coding Genes

Journal of Molecular Evolution ◽

10.1007/s00239-001-2307-8 ◽

2002 ◽

Vol 55 (2) ◽

pp. 127-137 ◽

Cited By ~ 18

Author(s):

Andrés Moya ◽

Amparo Latorre ◽

Beatriz Sabater-Muñoz ◽

Francisco J. Silva

Keyword(s):

Molecular Evolution ◽

Protein Coding ◽

Protein Coding Genes

Download Full-text

Comparing Molecular Evolution in Two Mitochondrial Protein Coding Genes (Cytochromeband ND2) in the Dabbling Ducks (Tribe: Anatini)

Molecular Phylogenetics and Evolution ◽

10.1006/mpev.1997.0481 ◽

1998 ◽

Vol 10 (1) ◽

pp. 82-94 ◽

Cited By ~ 194

Author(s):

Kevin P Johnson ◽

Michael D Sorenson

Keyword(s):

Molecular Evolution ◽

Mitochondrial Protein ◽

Dabbling Ducks ◽

Protein Coding ◽

Protein Coding Genes

Download Full-text

Utilizing Amino Acid Composition and Entropy of Potential Open Reading Frames to Identify Protein-Coding Genes

Microorganisms ◽

10.3390/microorganisms9010129 ◽

2021 ◽

Vol 9 (1) ◽

pp. 129

Author(s):

Katelyn McNair ◽

Carol L. Ecale Zhou ◽

Brian Souza ◽

Stephanie Malfatti ◽

Robert A. Edwards

Keyword(s):

Amino Acid ◽

Gene Prediction ◽

Training Model ◽

Entropy Density ◽

Open Reading Frames ◽

Initial Training ◽

Training Set ◽

Protein Coding ◽

Protein Coding Genes ◽

Reading Frames

One of the main steps in gene-finding in prokaryotes is determining which open reading frames encode for a protein, and which occur by chance alone. There are many different methods to differentiate the two; the most prevalent approach is using shared homology with a database of known genes. This method presents many pitfalls, most notably the catch that you only find genes that you have seen before. The four most popular prokaryotic gene-prediction programs (GeneMark, Glimmer, Prodigal, Phanotate) all use a protein-coding training model to predict protein-coding genes, with the latter three allowing for the training model to be created ab initio from the input genome. Different methods are available for creating the training model, and to increase the accuracy of such tools, we present here GOODORFS, a method for identifying protein-coding genes within a set of all possible open reading frames (ORFS). Our workflow begins with taking the amino acid frequencies of each ORF, calculating an entropy density profile (EDP), using KMeans to cluster the EDPs, and then selecting the cluster with the lowest variation as the coding ORFs. To test the efficacy of our method, we ran GOODORFS on 14,179 annotated phage genomes, and compared our results to the initial training-set creation step of four other similar methods (Glimmer, MED2, PHANOTATE, Prodigal). We found that GOODORFS was the most accurate (0.94) and had the best F1-score (0.85), while Glimmer had the highest precision (0.92) and PHANOTATE had the highest recall (0.96).

Download Full-text

The draft genome sequence of Eucalyptus polybractea based on hybrid assembly with short- and long-reads reads

10.1101/2021.05.18.444652 ◽

2021 ◽

Author(s):

Teng Li ◽

David Kainer ◽

William J Foley ◽

Allen Rodrigo ◽

Carsten Kuelheim

Keyword(s):

Population Genomics ◽

De Novo ◽

Draft Genome ◽

Hybrid Assembly ◽

Illumina Hiseq ◽

Protein Coding ◽

Genome Coverage ◽

Protein Coding Genes ◽

Long Reads ◽

Long Read

Eucalyptus polybractea is a small, multi-stemmed tree, which is widely cultivated in Australia for the production of Eucalyptus oil. We report the hybrid assembly of the E. polybractea genome utilizing both short- and long-read technology. We generated 44 Gb of Illumina HiSeq short reads and 8 Gb of Nanopore long reads, representing approximately 83 and 15 times genome coverage, respectively. The hybrid-assembled genome, after polishing, contained 24,864 scaffolds with an accumulated length of 523 Mb (N50 = 40.3 kb; BUSCO-calculated genome completeness of 94.3%). The genome contained 35,385 predicted protein-coding genes detected by combining homology-based and de novo approaches. We have provided the first assembled genome based on hybrid sequences from the highly diverse Eucalyptus subgenus Symphyomyrtus, and revealed the value of including long-reads from Nanopore technology for enhancing the contiguity of the assembled genome, as well as for improving its completeness. We anticipate that the E. polybractea genome will be an invaluable resource supporting a range of studies in genetics, population genomics and evolution of related species in Eucalyptus.

Download Full-text

Genomic analysis of the four ecologically distinct cactus host populations ofDrosophila mojavensis

10.1101/530154 ◽

2019 ◽

Cited By ~ 3

Author(s):

Carson W. Allan ◽

Luciano M. Matzkin

Keyword(s):

Molecular Evolution ◽

Genomic Analysis ◽

Drosophila Mojavensis ◽

Shape Variation ◽

Isolated Populations ◽

High Coverage ◽

Protein Coding ◽

Protein Coding Genes ◽

Santa Catalina Island ◽

And Behavior

AbstractBackgroundRelationships between an organism and its environment can be fundamental in the understanding how populations change over time and species arise. Local ecological conditions can shape variation at multiple levels, among these are the evolutionary history and trajectories of coding genes. This study examines the rate of molecular evolution at protein-coding genes throughout the genome in response to host adaptation in the cactophilicDrosophila mojavensis. These insects are intimately associated with cactus necroses, developing as larvae and feeding as adults in these necrotic tissues.Drosophila mojavensisis composed of four isolated populations across the deserts of western North America and each population has adapted to utilize different cacti that are chemically, nutritionally, and structurally distinct.ResultsHigh coverage Illumina sequencing was performed on three previously unsequenced populations ofD. mojavensis. Genomes were assembled using the previously sequenced genome ofD. mojavensisfrom Santa Catalina Island (USA) as a template. Protein coding genes were aligned across all four populations and rates of protein evolution were determined for all loci using a several approaches.ConclusionsLoci that exhibited elevated rates of molecular evolution tended to be shorter, have fewer exons, low expression, be transcriptionally responsive to cactus host use and have fixed expression differences across the four cactus host populations. Fast evolving genes were involved with metabolism, detoxification, chemosensory reception, reproduction and behavior. Results of this study gives insight into the process and the genomic consequences of local ecological adaptation.

Download Full-text