scholarly journals Inferring the frequency spectrum of derived variants to quantify adaptive molecular evolution in protein-coding genes of Drosophila melanogaster

2016 ◽  
Author(s):  
Peter D. Keightley ◽  
Jose Campos ◽  
Tom Booker ◽  
Brian Charlesworth

Many approaches for inferring adaptive molecular evolution analyze the unfolded site frequency spectrum (SFS), a vector of counts of sites with different numbers of copies of derived alleles in a sample of alleles from a population. Accurate inference of the high copy number elements of the SFS is difficult, however, because of misassignment of alleles as derived versus ancestral. This is a known problem with parsimony using outgroup species. Here, we show that the problem is particularly serious if there is variation in the substitution rate among sites brought about by variation in selective constraint levels. We present a new method for inferring the SFS using one or two outgroups, which attempts to overcome the problem of misassignment. We show that two outgroups are required for accurate estimation of the SFS if there is substantial variation in selective constraints, which is expected to be the case for nonsynonymous sites of protein-coding genes. We apply the method to estimate unfolded SFSs for synonymous and nonsynonymous sites from Phase 2 of the Drosophila Population Genomics Project. We use the unfolded spectra to estimate the frequency and strength of advantageous and deleterious mutations, and estimate that ~50% of amino acid substitutions are positively selected, but that less than 0.5% of new amino acid mutations are beneficial, with a scaled selection strength of Nes ≈ 12.

Genetics ◽  
1997 ◽  
Vol 145 (2) ◽  
pp. 297-309 ◽  
Author(s):  
Stuart J Newfeld ◽  
Richard W Padgett ◽  
Seth D Findley ◽  
Brent G Richter ◽  
Michele Sanicola ◽  
...  

Using an elaborate set of cis-regulatory sequences, the decapentaplegic (dpp) gene displays a dynamic pattern of gene expression during development. The C-terminal portion of the DPP protein is processed to generate a secreted signaling molecule belonging to the transforming growth factor-β (TGF-β) family. This signal, the DPP ligand, is able to influence the developmental fates of responsive cells in a concentration-dependent fashion. Here we examine the sequence level organization of a significant portion of the dpp locus in Drosophila melanogaster and use interspecific comparisons with D. simulans, D. pseudoobscura and D.virilis to explore the molecular evolution of the gene. Our interspecific analysis identified significant selective constraint on both the nucleotide and amino acid sequences. As expected, interspecific comparison of protein coding sequences shows that the C-terminal ligand region is highly conserved. However, the central portion of the protein is also conserved, while the N-terminal third is quite variable. Comparison of noncoding regions reveals significant stretches of nucleotide identity in the 3′ untranslated portion of exon 3 and in the intron between exons 2 and 3. An examination of cDNA sequences representing five classes of dpp transcripts indicates that these transcripts encode the same polypeptide.


Author(s):  
Kazuaki Yamaguchi ◽  
Shigehiro Kuraku

A previous study involving whole genome sequencing of the white shark suggested unique molecular evolution accounting for gigantism and the enhanced longevity of sharks including positive selection of dozens of protein-coding genes potentially involved in genome stability. We performed a reanalysis on some of the genes and identified serious flaws in their results. In this short article, we scrutinize one of the serious problems we identified, report other concerns, and point out a potential bias in analyzing iconic shark species in general.


2019 ◽  
Vol 47 (W1) ◽  
pp. W283-W288 ◽  
Author(s):  
Jesús Murga-Moreno ◽  
Marta Coronado-Zamora ◽  
Sergi Hervas ◽  
Sònia Casillas ◽  
Antonio Barbadilla

Abstract The McDonald and Kreitman test (MKT) is one of the most powerful and widely used methods to detect and quantify recurrent natural selection using DNA sequence data. Here we present iMKT (acronym for integrative McDonald and Kreitman test), a novel web-based service performing four distinct MKT types. It allows the detection and estimation of four different selection regimes −adaptive, neutral, strongly deleterious and weakly deleterious− acting on any genomic sequence. iMKT can analyze both user's own population genomic data and pre-loaded Drosophila melanogaster and human sequences of protein-coding genes obtained from the largest population genomic datasets to date. Advanced options in the website allow testing complex hypotheses such as the application example showed here: do genes located in high recombination regions undergo higher rates of adaptation? We aim that iMKT will become a reference site tool for the study of evolutionary adaptation in massive population genomics datasets, especially in Drosophila and humans. iMKT is a free resource online at https://imkt.uab.cat.


2020 ◽  
Vol 9 (6) ◽  
Author(s):  
Michael Chung ◽  
Yicheng Xie ◽  
Heather Newkirk ◽  
Mei Liu ◽  
Jason J. Gill ◽  
...  

Here, we present the annotated genome of Shemara, a siphophage of Salmonella enterica. The Shemara genome is 44 kb with 83 predicted protein-coding genes. At the nucleotide and amino acid levels, Shemara is most similar to phages in the Guernseyvirinae subfamily.


2002 ◽  
Vol 55 (2) ◽  
pp. 127-137 ◽  
Author(s):  
Andrés Moya ◽  
Amparo Latorre ◽  
Beatriz Sabater-Muñoz ◽  
Francisco J. Silva

2021 ◽  
Vol 9 (1) ◽  
pp. 129
Author(s):  
Katelyn McNair ◽  
Carol L. Ecale Zhou ◽  
Brian Souza ◽  
Stephanie Malfatti ◽  
Robert A. Edwards

One of the main steps in gene-finding in prokaryotes is determining which open reading frames encode for a protein, and which occur by chance alone. There are many different methods to differentiate the two; the most prevalent approach is using shared homology with a database of known genes. This method presents many pitfalls, most notably the catch that you only find genes that you have seen before. The four most popular prokaryotic gene-prediction programs (GeneMark, Glimmer, Prodigal, Phanotate) all use a protein-coding training model to predict protein-coding genes, with the latter three allowing for the training model to be created ab initio from the input genome. Different methods are available for creating the training model, and to increase the accuracy of such tools, we present here GOODORFS, a method for identifying protein-coding genes within a set of all possible open reading frames (ORFS). Our workflow begins with taking the amino acid frequencies of each ORF, calculating an entropy density profile (EDP), using KMeans to cluster the EDPs, and then selecting the cluster with the lowest variation as the coding ORFs. To test the efficacy of our method, we ran GOODORFS on 14,179 annotated phage genomes, and compared our results to the initial training-set creation step of four other similar methods (Glimmer, MED2, PHANOTATE, Prodigal). We found that GOODORFS was the most accurate (0.94) and had the best F1-score (0.85), while Glimmer had the highest precision (0.92) and PHANOTATE had the highest recall (0.96).


2021 ◽  
Author(s):  
Teng Li ◽  
David Kainer ◽  
William J Foley ◽  
Allen Rodrigo ◽  
Carsten Kuelheim

Eucalyptus polybractea is a small, multi-stemmed tree, which is widely cultivated in Australia for the production of Eucalyptus oil. We report the hybrid assembly of the E. polybractea genome utilizing both short- and long-read technology. We generated 44 Gb of Illumina HiSeq short reads and 8 Gb of Nanopore long reads, representing approximately 83 and 15 times genome coverage, respectively. The hybrid-assembled genome, after polishing, contained 24,864 scaffolds with an accumulated length of 523 Mb (N50 = 40.3 kb; BUSCO-calculated genome completeness of 94.3%). The genome contained 35,385 predicted protein-coding genes detected by combining homology-based and de novo approaches. We have provided the first assembled genome based on hybrid sequences from the highly diverse Eucalyptus subgenus Symphyomyrtus, and revealed the value of including long-reads from Nanopore technology for enhancing the contiguity of the assembled genome, as well as for improving its completeness. We anticipate that the E. polybractea genome will be an invaluable resource supporting a range of studies in genetics, population genomics and evolution of related species in Eucalyptus.


2019 ◽  
Author(s):  
Carson W. Allan ◽  
Luciano M. Matzkin

AbstractBackgroundRelationships between an organism and its environment can be fundamental in the understanding how populations change over time and species arise. Local ecological conditions can shape variation at multiple levels, among these are the evolutionary history and trajectories of coding genes. This study examines the rate of molecular evolution at protein-coding genes throughout the genome in response to host adaptation in the cactophilicDrosophila mojavensis. These insects are intimately associated with cactus necroses, developing as larvae and feeding as adults in these necrotic tissues.Drosophila mojavensisis composed of four isolated populations across the deserts of western North America and each population has adapted to utilize different cacti that are chemically, nutritionally, and structurally distinct.ResultsHigh coverage Illumina sequencing was performed on three previously unsequenced populations ofD. mojavensis. Genomes were assembled using the previously sequenced genome ofD. mojavensisfrom Santa Catalina Island (USA) as a template. Protein coding genes were aligned across all four populations and rates of protein evolution were determined for all loci using a several approaches.ConclusionsLoci that exhibited elevated rates of molecular evolution tended to be shorter, have fewer exons, low expression, be transcriptionally responsive to cactus host use and have fixed expression differences across the four cactus host populations. Fast evolving genes were involved with metabolism, detoxification, chemosensory reception, reproduction and behavior. Results of this study gives insight into the process and the genomic consequences of local ecological adaptation.


Sign in / Sign up

Export Citation Format

Share Document