Frequent birth ofde novogenes in the compact yeast genome

How do we transition from non-coding to coding?

10.7287/peerj.preprints.3031v1 ◽

2017 ◽

Author(s):

Jorge Ruiz-Orera ◽

José Luis Villanueva-Cañas ◽

William Blevins ◽

M.Mar Albà

Keyword(s):

De Novo ◽

Gene Evolution ◽

Purifying Selection ◽

Neutral Evolution ◽

Functional Protein ◽

Protein Coding ◽

Coding Sequences ◽

Sequence Composition ◽

Protein Coding Genes ◽

Small Proteins

Recent years have witnessed the discovery of protein–coding genes which appear to have evolved de novo from previously non-coding sequences. This has changed the long-standing view that coding sequences can only evolve from other coding sequences. However, there are still many open questions regarding how new protein-coding sequences can arise from non-genic DNA. Two prerequisites for the birth of a new functional protein-coding gene are that the corresponding DNA fragment is transcribed and that it is also translated. Transcription is known to be pervasive in the genome, producing a large number of transcripts that do not correspond to conserved protein-coding genes, and which are usually annotated as long non-coding RNAs (lncRNA). Recently, sequencing of ribosome protected fragments (Ribo-Seq) has provided evidence that many of these transcripts actually translate small proteins. We have used mouse non-synonymous and synonymous variation data to estimate the strength of purifying selection acting on the translated open reading frames (ORFs). Whereas a subset of the lncRNAs are likely to actually be true protein-coding genes (and thus previously misclassified), the bulk of lncRNAs code for proteins which show variation patterns consistent with neutral evolution. We also show that the ORFs that have a more favorable, coding-like, sequence composition are more likely to be translated than other ORFs in lncRNAs. This study provides strong evidence that there is a large and ever-changing reservoir of lowly abundant proteins; some of these peptides may become useful and act as seeds for de novo gene evolution.

Download Full-text

Contribution of retrotransposition to developmental disorders

Nature Communications ◽

10.1038/s41467-019-12520-y ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 10

Author(s):

Eugene J. Gardner ◽

Elena Prigmore ◽

Giuseppe Gallone ◽

Petr Danecek ◽

Kaitlin E. Samocha ◽

...

Keyword(s):

Developmental Disorders ◽

De Novo ◽

Purifying Selection ◽

Selective Constraint ◽

Protein Coding ◽

Genome Wide ◽

De Novo Gene ◽

The Impact ◽

Transcribed Sequences

Abstract Mobile genetic Elements (MEs) are segments of DNA which can copy themselves and other transcribed sequences through the process of retrotransposition (RT). In humans several disorders have been attributed to RT, but the role of RT in severe developmental disorders (DD) has not yet been explored. Here we identify RT-derived events in 9738 exome sequenced trios with DD-affected probands. We ascertain 9 de novo MEs, 4 of which are likely causative of the patient’s symptoms (0.04%), as well as 2 de novo gene retroduplications. Beyond identifying likely diagnostic RT events, we estimate genome-wide germline ME mutation rate and selective constraint and demonstrate that coding RT events have signatures of purifying selection equivalent to those of truncating mutations. Overall, our analysis represents a comprehensive interrogation of the impact of retrotransposition on protein coding genes and a framework for future evolutionary and disease studies.

Download Full-text

Standard codon substitution models overestimate purifying selection for non-stationary data

10.7287/peerj.preprints.2218v1 ◽

2016 ◽

Author(s):

Benjamin D Kaehler ◽

Von Bing Yap ◽

Gavin A Huttley

Keyword(s):

Natural Selection ◽

De Novo ◽

Purifying Selection ◽

Neutral Evolution ◽

Protein Coding ◽

Synonymous Substitutions ◽

New Model ◽

Sequence Composition ◽

Codon Substitution ◽

Substitution Models

Estimation of natural selection on protein-coding sequences is a key comparative genomics approach for de novo prediction of lineage specific adaptations. Selective pressure is measured on a per-gene basis by comparing the rate of non-synonymous substitutions to the rate of neutral evolution, typically assumed to be the rate of synonymous substitutions. All published codon substitution models have been time-reversible and thus assume that sequence composition does not change over time. We previously demonstrated that if time-reversible DNA substitution models are applied blindly in the presence of changing sequence composition, the number of substitutions is systematically biased towards overestimation. We extend these findings to the case of codon substitution models and further demonstrate that the ratio of non-synonymous to synonymous rates of substitution tends to be underestimated over three data sets of insects, mammals, and vertebrates. Our basis for comparison is a non-stationary codon substitution model that allows sequence composition to change. Model selection and model fit results demonstrate that our new model tends to fit the data better. Direct measurement of non-stationarity shows that bias in estimates of natural selection and genetic distance increases with the degree of violation of the stationarity assumption. Additionally, inferences drawn under time-reversible models are systematically affected by compositional divergence. As genomic sequences accumulate at an accelerating rate, the importance of accurate de novo estimation of natural selection increases. Our results establish that our new model provides a more robust perspective on this fundamental quantity.

Download Full-text

Extreme purifying selection against point mutations in the human genome

10.1101/2021.08.23.457339 ◽

2021 ◽

Author(s):

Noah Dukler ◽

Mehreen R Mughal ◽

Ritika Ramani ◽

Yi-Fei Huang ◽

Adam Siepel

Keyword(s):

Human Genome ◽

De Novo ◽

Point Mutations ◽

Purifying Selection ◽

Selection Coefficient ◽

Sequencing Data ◽

Protein Coding ◽

Coding Regions ◽

Protein Coding Genes ◽

Selective Effects

Genome sequencing of tens of thousands of human individuals has recently enabled the measurement of large selective effects for mutations to protein-coding genes. Here we describe a new method, called ExtRaINSIGHT, for measuring similar selective effects at individual sites in noncoding as well as in coding regions of the human genome. ExtRaINSIGHT estimates the prevalance of strong purifying selection, or "ultraselection" (λs), as the fractional depletion of rare single-nucleotide variants (minor allele frequency <0.1%) in a target set of genomic sites relative to matched sites that are putatively neutrally evolving, in a manner that controls for local variation and neighbor-dependence in mutation rate. We show using simulations that, above an appropriate threshold, λs is closely related to the average site-specific selection coefficient against heterozygous point mutations, as predicted at mutation-selection balance. Applying ExtRaINSIGHT to 71,702 whole genome sequences from gnomAD v3, we find particularly strong evidence of ultraselection in evolutionarily ancient miRNAs and neuronal protein-coding genes, as well as at splice sites. Moreover, our estimated selection coefficient against heterozygous amino-acid replacements across the genome (at 1.4%) is substantially larger than previous estimates based on smaller sample sizes. By contrast, we find weak evidence of ultraselection in other noncoding RNAs and transcription factor binding sites, and only modest evidence in ultraconserved elements and human accelerated regions. We estimate that ~0.3-0.5% of the human genome is ultraselected, with one third to one half of ultraselected sites falling in coding regions. These estimates suggest ~0.3-0.4 lethal or nearly lethal de novo mutations per potential human zygote, together with ~2 de novo mutations that are more weakly deleterious. Overall, our study sheds new light on the genome-wide distribution of fitness effects for new point mutations by combining deep new sequencing data sets and classical theory from population genetics.

Download Full-text

The Paf1 complex broadly impacts the transcriptome ofSaccharomyces cerevisiae

10.1101/567495 ◽

2019 ◽

Author(s):

Mitchell A. Ellison ◽

Alex R. Lederer ◽

Marcie H. Warner ◽

Travis Mavrich ◽

Elizabeth A. Raupach ◽

...

Keyword(s):

Gene Expression ◽

De Novo ◽

Differential Expression Analysis ◽

Chromatin Modification ◽

Antisense Transcription ◽

Yeast Genome ◽

Mrna Levels ◽

Nascent Transcript ◽

Protein Coding ◽

Multiple Loci

ABSTRACTThe Polymerase Associated Factor 1 complex (Paf1C) is a multifunctional regulator of eukaryotic gene expression important for the coordination of transcription with chromatin modification and post-transcriptional processes. In this study, we investigated the extent to which the functions of Paf1C combine to regulate theSaccharomyces cerevisiaetranscriptome. While previous studies focused on the roles of Paf1C in controlling mRNA levels, here we took advantage of a genetic background that enriches for unstable transcripts and demonstrate that deletion ofPAF1affects all classes of Pol II transcripts including multiple classes of noncoding RNAs. By conducting ade novodifferential expression analysis independent of gene annotations, we found that Paf1 positively and negatively regulates antisense transcription at multiple loci. Comparisons with nascent transcript data revealed that many, but not all, changes in RNA levels detected by our analysis are due to changes in transcription instead of post-transcriptional events. To investigate the mechanisms by which Paf1 regulates protein-coding genes, we focused on genes involved in iron and phosphate homeostasis, which were differentially affected byPAF1deletion. Our results indicate that Paf1 stimulates phosphate gene expression through a mechanism that is independent of any individual Paf1C-dependent histone modification. In contrast, the inhibition of iron gene expression by Paf1 correlates with a defect in H3 K36 tri-methylation. Finally, we showed that one iron regulon gene,FET4, is coordinately controlled by Paf1 and transcription of upstream noncoding DNA. Together these data identify roles for Paf1C in controlling both coding and noncoding regions of the yeast genome.

Download Full-text

Contribution of Retrotransposition to Developmental Disorders

10.1101/471375 ◽

2018 ◽

Cited By ~ 2

Author(s):

Eugene J. Gardner ◽

Elena Prigmore ◽

Giuseppe Gallone ◽

Petr Danecek ◽

Kaitlin E. Samocha ◽

...

Keyword(s):

Developmental Disorders ◽

De Novo ◽

Purifying Selection ◽

Mobile Genetic Elements ◽

Protein Coding ◽

Protein Coding Genes ◽

Genome Wide ◽

The Impact ◽

Transcribed Sequences

AbstractMobile genetic Elements (MEs) are segments of DNA which, through an RNA intermediate, can generate new copies of themselves and other transcribed sequences through the process of retrotransposition (RT). In humans several disorders have been attributed to RT, but the role of RT in severe developmental disorders (DD) has not yet been explored. As such, we have identified RT-derived events in 9,738 exome sequenced trios with DD-affected probands as part of the Deciphering Developmental Disorders (DDD) study. We have ascertained 9 de novo MEs, 4 of which are likely causative of the patient’s symptoms (0.04% of probands), as well as 2 de novo gene retroduplications. Beyond identifying likely diagnostic RT events, we have estimated genome-wide germline ME mutagenesis and constraint and demonstrated that coding RT events have signatures of purifying selection equivalent to those of truncating mutations. Overall, our analysis represents a comprehensive interrogation of the impact of retrotransposition on protein coding genes and a framework for future evolutionary and disease studies.

Download Full-text

De novo gene evolution: How do we transition from non-coding to coding?

10.7287/peerj.preprints.3031 ◽

2017 ◽

Author(s):

Jorge Ruiz-Orera ◽

José Luis Villanueva-Cañas ◽

William Blevins ◽

M.Mar Albà

Keyword(s):

De Novo ◽

Gene Evolution ◽

Neutral Evolution ◽

Functional Protein ◽

Protein Coding ◽

Coding Sequences ◽

Sequence Composition ◽

Protein Coding Genes ◽

Small Proteins ◽

De Novo Gene

Recent years have witnessed the discovery of protein–coding genes which appear to have evolved de novo from previously non-coding sequences. This has changed the long-standing view that coding sequences can only evolve from other coding sequences. However, there are still many open questions regarding how new protein-coding sequences can arise from non-genic DNA. Two prerequisites for the birth of a new functional protein-coding gene are that the corresponding DNA fragment is transcribed and that it is also translated. Transcription is known to be pervasive in the genome, producing a large number of transcripts that do not correspond to conserved protein-coding genes, and which are usually annotated as long non-coding RNAs (lncRNA). Recently, sequencing of ribosome protected fragments (Ribo-Seq) has provided evidence that many of these transcripts actually translate small proteins. We have used mouse non-synonymous and synonymous variation data to estimate the strength of purifying selection acting on the translated open reading frames (ORFs). Whereas a subset of the lncRNAs are likely to actually be true protein-coding genes (and thus previously misclassified), the bulk of lncRNAs code for proteins which show variation patterns consistent with neutral evolution. We also show that the ORFs that have a more favorable, coding-like, sequence composition are more likely to be translated than other ORFs in lncRNAs. This study provides strong evidence that there is a large and ever-changing reservoir of lowly abundant proteins; some of these peptides may become useful and act as seeds for de novo gene evolution.

Download Full-text

Turdoides affinis mitogenome reveals the translational efficiency and importance of NADH dehydrogenase complex-I in the Leiothrichidae family

Scientific Reports ◽

10.1038/s41598-020-72674-4 ◽

2020 ◽

Vol 10 (1) ◽

Cited By ~ 1

Author(s):

Indrani Sarkar ◽

Prateek Dey ◽

Sanjeev Kumar Sharma ◽

Swapna Devi Ray ◽

Venkata Hanumat Sastry Kochiganti ◽

...

Keyword(s):

De Novo ◽

Purifying Selection ◽

Sister Group ◽

Synonymous Substitution ◽

Translational Efficiency ◽

Evolutionary Analysis ◽

Peninsular India ◽

Protein Coding ◽

Protein Coding Genes ◽

Complete Mitogenome

Abstract Mitochondrial genome provides useful information about species concerning its evolution and phylogenetics. We have taken the advantage of high throughput next-generation sequencing technique to sequence the complete mitogenome of Yellow-billed babbler (Turdoides affinis), a species endemic to Peninsular India and Sri Lanka. Both, reference-based and de-novo assemblies of mitogenome were performed and observed that de-novo assembled mitogenome was most appropriate. The complete mitogenome of yellow-billed babbler (assembled de-novo) was 17,672 bp in length with 53.2% AT composition. Thirteen protein-coding genes along with two rRNAs and 22 tRNAs were detected. The arrangement pattern of these genes was found conserved among Leiothrichidae family mitogenomes. Duplicated control regions were found in the newly sequenced mitogenome. Downstream bioinformatics analysis revealed the effect of translational efficiency and purifying selection pressure over thirteen protein-coding genes in yellow-billed babbler mitogenome. Ka/Ks analysis indicated the highest synonymous substitution rate in the nad6 gene. Evolutionary analysis revealed the conserved nature of all the protein-coding genes across Leiothrichidae family mitogenomes. Our limited phylogeny results placed T. affinis in a separate group, a sister group of Garrulax. Overall, our results provide a useful information for future studies on the evolutionary and adaptive mechanisms of birds belong to the Leiothrichidae family.

Download Full-text

De novo variants in population constrained fetal brain enhancers and intellectual disability

10.1101/621029 ◽

2019 ◽

Author(s):

Matias G De Vas ◽

Myles G Garstang ◽

Shweta S Joshi ◽

Tahir N Khan ◽

Goutham Atla ◽

...

Keyword(s):

Intellectual Disability ◽

De Novo ◽

System Development ◽

Fetal Brain ◽

Purifying Selection ◽

Nervous System Development ◽

Loss Of Function ◽

Enhancer Activity ◽

Protein Coding ◽

Preferential Expression

AbstractPurposeThe genetic aetiology of a major fraction of patients with intellectual disability (ID) remains unknown. De novo mutations (DNMs) in protein-coding genes explain up to 40% of cases, but the potential role of regulatory DNMs is still poorly understood.MethodsWe sequenced 70 whole genomes from 24 ID probands and their unaffected parents and analyzed 30 previously sequenced genomes from exome-negative ID probands.ResultsWe found that DNVs were selectively enriched in fetal brain-specific enhancers that show purifying selection in human population. DNV containing enhancers were associated with genes that show preferential expression in the pre-frontal cortex, have been previously implicated in ID or related disorders, and exhibit intolerance to loss of function variants. DNVs from ID probands preferentially disrupted putative binding sites of neuronal transcription factors, as compared to DNVs from healthy individuals and most showed allele-specific enhancer activity. In addition, we identified recurrently mutated enhancer clusters that regulate genes involved in nervous system development (CSMD1, OLFM1 and POU3F3). Moreover, CRISPR-based perturbation of a DNV-containing enhancer caused CSMD1 overexpression and abnormal expression of neurodevelopmental regulators.ConclusionOur results, therefore, provide new evidence to indicate that DNVs in constrained fetal brain-specific enhancers play a role in the etiology of ID.

Download Full-text

A map of constrained coding regions in the human genome

10.1101/220814 ◽

2017 ◽

Cited By ~ 8

Author(s):

James M. Havrilla ◽

Brent S. Pedersen ◽

Ryan M. Layer ◽

Aaron R. Quinlan

Keyword(s):

Human Genome ◽

Developmental Disorders ◽

De Novo ◽

Purifying Selection ◽

Protein Domain ◽

De Novo Mutations ◽

Protein Coding ◽

Constrained Coding ◽

Coding Regions ◽

Pathogenic Variants

ABSTRACTDeep catalogs of genetic variation collected from many thousands of humans enable the detection of intraspecies constraint by revealing coding regions with a scarcity of variation. While existing techniques summarize constraint for entire genes, single metrics cannot capture the fine-scale variability in constraint within each protein-coding gene. To provide greater resolution, we have created a detailed map of constrained coding regions (CCRs) in the human genome by leveraging coding variation observed among 123,136 humans from the Genome Aggregation Database (gnomAD). The most constrained coding regions in our map are enriched for both pathogenic variants in ClinVar and de novo mutations underlying developmental disorders. CCRs also reveal protein domain families under high constraint, suggest unannotated or incomplete protein domains, and facilitate the prioritization of previously unseen variation in studies of disease. Finally, a subset of CCRs with the highest constraint likely exist within genes that cause yet unobserved human phenotypes owing to strong purifying selection.

Download Full-text