The landscape of somatic mutations in protein coding genes in apparently benign human tissues carries signatures of relaxed purifying selection

Vinod Kumar Yadav; James DeGregori; Subhajyoti De

doi:10.1093/nar/gkw086

Analysis of Stop Codons within Prokaryotic Protein-Coding Genes Suggests Frequent Readthrough Events

International Journal of Molecular Sciences ◽

10.3390/ijms22041876 ◽

2021 ◽

Vol 22 (4) ◽

pp. 1876

Author(s):

Frida Belinky ◽

Ishan Ganguly ◽

Eugenia Poliakov ◽

Vyacheslav Yurchenko ◽

Igor B. Rogozin

Keyword(s):

Stop Codon ◽

Purifying Selection ◽

Protein Product ◽

Intermediate Step ◽

Protein Coding ◽

Stop Codons ◽

Protein Coding Genes ◽

Synonymous Sites ◽

Prokaryotic Protein ◽

Sense Codon

Nonsense mutations turn a coding (sense) codon into an in-frame stop codon that is assumed to result in a truncated protein product. Thus, nonsense substitutions are the hallmark of pseudogenes and are used to identify them. Here we show that in-frame stop codons within bacterial protein-coding genes are widespread. Their evolutionary conservation suggests that many of them are not pseudogenes, since they maintain dN/dS values (ratios of substitution rates at non-synonymous and synonymous sites) significantly lower than 1 (this is a signature of purifying selection in protein-coding regions). We also found that double substitutions in codons—where an intermediate step is a nonsense substitution—show a higher rate of evolution compared to null models, indicating that a stop codon was introduced and then changed back to sense via positive selection. This further supports the notion that nonsense substitutions in bacteria are relatively common and do not necessarily cause pseudogenization. In-frame stop codons may be an important mechanism of regulation: Such codons are likely to cause a substantial decrease of protein expression levels.

Download Full-text

Quantifying gene selection in cancer through protein functional alteration bias

Nucleic Acids Research ◽

10.1093/nar/gkz546 ◽

2019 ◽

Vol 47 (13) ◽

pp. 6642-6655 ◽

Cited By ~ 7

Author(s):

Nadav Brandes ◽

Nathan Linial ◽

Michal Linial

Keyword(s):

Somatic Mutations ◽

Gene Selection ◽

De Novo ◽

Cancer Genes ◽

Driver Genes ◽

Protein Coding ◽

Protein Coding Genes ◽

Machine Learning Model ◽

Implicit And Explicit ◽

False Discoveries

Abstract Compiling the catalogue of genes actively involved in cancer is an ongoing endeavor, with profound implications to the understanding and treatment of the disease. An abundance of computational methods have been developed to screening the genome for candidate driver genes based on genomic data of somatic mutations in tumors. Existing methods make many implicit and explicit assumptions about the distribution of random mutations. We present FABRIC, a new framework for quantifying the selection of genes in cancer by assessing the effects of de-novo somatic mutations on protein-coding genes. Using a machine-learning model, we quantified the functional effects of ∼3M somatic mutations extracted from over 10 000 human cancerous samples, and compared them against the effects of all possible single-nucleotide mutations in the coding human genome. We detected 593 protein-coding genes showing statistically significant bias towards harmful mutations. These genes, discovered without any prior knowledge, show an overwhelming overlap with known cancer genes, but also include many overlooked genes. FABRIC is designed to avoid false discoveries by comparing each gene to its own background model using rigorous statistics, making minimal assumptions about the distribution of random somatic mutations. The framework is an open-source project with a simple command-line interface.

Download Full-text

Protein-Coding Genes of Helicobacter pylori Predominantly Present Purifying Selection though Many Membrane Proteins Suffer from Selection Pressure: A Proposal to Analyze Bacterial Pangenomes

Genes ◽

10.3390/genes12030377 ◽

2021 ◽

Vol 12 (3) ◽

pp. 377

Author(s):

Alejandro Rubio ◽

Antonio Pérez-Pulido

Keyword(s):

Helicobacter Pylori ◽

Membrane Proteins ◽

Selection Pressure ◽

Purifying Selection ◽

Protein Coding ◽

Evolutionary Selection ◽

Protein Coding Genes ◽

The Core ◽

Genes Encoding ◽

Selection For

The current availability of complete genome sequences has allowed knowing that bacterial genomes can bear genes not present in the genome of all the strains from a specific species. So, the genes shared by all the strains comprise the core of the species, but the pangenome can be much greater and usually includes genes appearing in one only strain. Once the pangenome of a species is estimated, other studies can be undertaken to generate new knowledge, such as the study of the evolutionary selection for protein-coding genes. Most of the genes of a pangenome are expected to be subject to purifying selection that assures the conservation of function, especially those in the core group. However, some genes can be subject to selection pressure, such as genes involved in virulence that need to escape to the host immune system, which is more common in the accessory group of the pangenome. We analyzed 180 strains of Helicobacter pylori, a bacterium that colonizes the gastric mucosa of half the world population and presents a low number of genes (around 1500 in a strain and 3000 in the pangenome). After the estimation of the pangenome, the evolutionary selection for each gene has been calculated, and we found that 85% of them are subject to purifying selection and the remaining genes present some grade of selection pressure. As expected, the latter group is enriched with genes encoding for membrane proteins putatively involved in interaction to host tissues. In addition, this group also presents a high number of uncharacterized genes and genes encoding for putative spurious proteins. It suggests that they could be false positives from the gene finders used for identifying them. All these results propose that this kind of analyses can be useful to validate gene predictions and functionally characterize proteins in complete genomes.

Download Full-text

Extreme purifying selection against point mutations in the human genome

10.1101/2021.08.23.457339 ◽

2021 ◽

Author(s):

Noah Dukler ◽

Mehreen R Mughal ◽

Ritika Ramani ◽

Yi-Fei Huang ◽

Adam Siepel

Keyword(s):

Human Genome ◽

De Novo ◽

Point Mutations ◽

Purifying Selection ◽

Selection Coefficient ◽

Sequencing Data ◽

Protein Coding ◽

Coding Regions ◽

Protein Coding Genes ◽

Selective Effects

Genome sequencing of tens of thousands of human individuals has recently enabled the measurement of large selective effects for mutations to protein-coding genes. Here we describe a new method, called ExtRaINSIGHT, for measuring similar selective effects at individual sites in noncoding as well as in coding regions of the human genome. ExtRaINSIGHT estimates the prevalance of strong purifying selection, or "ultraselection" (λs), as the fractional depletion of rare single-nucleotide variants (minor allele frequency <0.1%) in a target set of genomic sites relative to matched sites that are putatively neutrally evolving, in a manner that controls for local variation and neighbor-dependence in mutation rate. We show using simulations that, above an appropriate threshold, λs is closely related to the average site-specific selection coefficient against heterozygous point mutations, as predicted at mutation-selection balance. Applying ExtRaINSIGHT to 71,702 whole genome sequences from gnomAD v3, we find particularly strong evidence of ultraselection in evolutionarily ancient miRNAs and neuronal protein-coding genes, as well as at splice sites. Moreover, our estimated selection coefficient against heterozygous amino-acid replacements across the genome (at 1.4%) is substantially larger than previous estimates based on smaller sample sizes. By contrast, we find weak evidence of ultraselection in other noncoding RNAs and transcription factor binding sites, and only modest evidence in ultraconserved elements and human accelerated regions. We estimate that ~0.3-0.5% of the human genome is ultraselected, with one third to one half of ultraselected sites falling in coding regions. These estimates suggest ~0.3-0.4 lethal or nearly lethal de novo mutations per potential human zygote, together with ~2 de novo mutations that are more weakly deleterious. Overall, our study sheds new light on the genome-wide distribution of fitness effects for new point mutations by combining deep new sequencing data sets and classical theory from population genetics.

Download Full-text

Patterns of Natural Selection on Mitochondrial Protein-Coding Genes in Lungless Salamanders: Relaxed Purifying Selection and Presence of Positively Selected Codon Sites in the Family Plethodontidae

International Journal of Genomics ◽

10.1155/2021/6671300 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Ryosuke Kakehashi ◽

Atsushi Kurabayashi

Keyword(s):

Mitochondrial Protein ◽

Purifying Selection ◽

Recent Common Ancestor ◽

Protein Coding ◽

Protein Coding Genes ◽

Most Recent Common Ancestor ◽

Plethodontid Salamander ◽

Branch Model ◽

The Family ◽

Oxygen Requirements

There are two distinct lungless groups in caudate amphibians (salamanders and newts) (the family Plethodontidae and the genus Onychodactylus, from the family Hynobiidae). Lunglessness is considered to have evolved in response to environmental and/or ecological adaptation with respect to oxygen requirements. We performed selection analyses on lungless salamanders to elucidate the selective patterns of mitochondrial protein-coding genes associated with lunglessness. The branch model and RELAX analyses revealed the occurrence of relaxed selection (an increase of the dN/dS ratio = ω value) in most mitochondrial protein-coding genes of plethodontid salamander branches but not in those of Onychodactylus. Additional branch model and RELAX analyses indicated that direct-developing plethodontids showed the relaxed pattern for most mitochondrial genes, although metamorphosing plethodontids had fewer relaxed genes. Furthermore, aBSREL analysis detected positively selected codons in three plethodontid branches but not in Onychodactylus. One of these three branches corresponded to the most recent common ancestor, and the others corresponded with the most recent common ancestors of direct-developing branches within Hemidactyliinae. The positive selection of mitochondrial protein-coding genes in Plethodontidae is probably associated with the evolution of direct development.

Download Full-text

Theory of prokaryotic genome evolution

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1614083113 ◽

2016 ◽

Vol 113 (41) ◽

pp. 11399-11407 ◽

Cited By ~ 62

Author(s):

Itamar Sela ◽

Yuri I. Wolf ◽

Eugene V. Koonin

Keyword(s):

Genome Size ◽

Prokaryotic Genome ◽

Genetic Material ◽

Purifying Selection ◽

Synonymous Substitution ◽

Protein Coding ◽

Protein Coding Genes ◽

New Genes ◽

Extra Energy ◽

Prokaryotic Genomes

Bacteria and archaea typically possess small genomes that are tightly packed with protein-coding genes. The compactness of prokaryotic genomes is commonly perceived as evidence of adaptive genome streamlining caused by strong purifying selection in large microbial populations. In such populations, even the small cost incurred by nonfunctional DNA because of extra energy and time expenditure is thought to be sufficient for this extra genetic material to be eliminated by selection. However, contrary to the predictions of this model, there exists a consistent, positive correlation between the strength of selection at the protein sequence level, measured as the ratio of nonsynonymous to synonymous substitution rates, and microbial genome size. Here, by fitting the genome size distributions in multiple groups of prokaryotes to predictions of mathematical models of population evolution, we show that only models in which acquisition of additional genes is, on average, slightly beneficial yield a good fit to genomic data. These results suggest that the number of genes in prokaryotic genomes reflects the equilibrium between the benefit of additional genes that diminishes as the genome grows and deletion bias (i.e., the rate of deletion of genetic material being slightly greater than the rate of acquisition). Thus, new genes acquired by microbial genomes, on average, appear to be adaptive. The tight spacing of protein-coding genes likely results from a combination of the deletion bias and purifying selection that efficiently eliminates nonfunctional, noncoding sequences.

Download Full-text

A simple method for estimating the intensity of purifying selection in protein-coding genes

Molecular Biology and Evolution ◽

10.1093/oxfordjournals.molbev.a026037 ◽

1999 ◽

Vol 16 (1) ◽

pp. 49-53 ◽

Cited By ~ 22

Author(s):

R. Ophir ◽

T. Itoh ◽

D. Graur ◽

T. Gojobori

Keyword(s):

Purifying Selection ◽

Simple Method ◽

Protein Coding ◽

Protein Coding Genes

Download Full-text

Severe Plastid Genome Size Reduction in a Mycoheterotrophic Orchid, Danxiaorchis singchiana, Reveals Heavy Gene Loss and Gene Relocations

Plants ◽

10.3390/plants9040521 ◽

2020 ◽

Vol 9 (4) ◽

pp. 521

Author(s):

Shiou Yih Lee ◽

Kaikai Meng ◽

Haowei Wang ◽

Renchao Zhou ◽

Wenbo Liao ◽

...

Keyword(s):

Purifying Selection ◽

Housekeeping Genes ◽

Single Copy ◽

Intact Protein ◽

Bootstrap Support ◽

Protein Coding ◽

Gene Block ◽

Protein Coding Genes ◽

Strong Bootstrap Support ◽

Cremastra Appendiculata

Danxiaorchis singchiana (Orchidaceae) is a leafless mycoheterotrophic orchid in the subfamily Epidendroideae. We sequenced the complete plastome of D. singchiana. The plastome has a reduced size of 87,931 bp, which includes a pair of inverted repeat (IR) regions of 13,762 bp each that are separated by a large single copy (LSC) region of 42,575 bp and a small single copy (SSC) region of 17,831 bp. When compared to its sister taxa, Cremastra appendiculata and Corallorhiza striata var. involuta, D. singchiana showed an inverted gene block in the LSC and SSC regions. A total of 61 genes were predicted, including 21 tRNA, 4 rRNA, and 36 protein-coding genes. While most of the housekeeping genes were still intact and seem to be protein-coding, only four photosynthesis-related genes appeared presumably intact. The majority of the presumably intact protein-coding genes seem to have undergone purifying selection (dN/dS < 1), and only the psaC gene was positively selected (dN/dS > 1) when compared to that in Cr. appendiculata. Phylogenetic analysis of 26 complete plastome sequences from 24 species of the tribe Epidendreae had revealed that D. singchiana diverged after Cr. appendiculata and is sister to the genus Corallorhiza with strong bootstrap support (100%).

Download Full-text

Contribution of Retrotransposition to Developmental Disorders

10.1101/471375 ◽

2018 ◽

Cited By ~ 2

Author(s):

Eugene J. Gardner ◽

Elena Prigmore ◽

Giuseppe Gallone ◽

Petr Danecek ◽

Kaitlin E. Samocha ◽

...

Keyword(s):

Developmental Disorders ◽

De Novo ◽

Purifying Selection ◽

Mobile Genetic Elements ◽

Protein Coding ◽

Protein Coding Genes ◽

Genome Wide ◽

The Impact ◽

Transcribed Sequences

AbstractMobile genetic Elements (MEs) are segments of DNA which, through an RNA intermediate, can generate new copies of themselves and other transcribed sequences through the process of retrotransposition (RT). In humans several disorders have been attributed to RT, but the role of RT in severe developmental disorders (DD) has not yet been explored. As such, we have identified RT-derived events in 9,738 exome sequenced trios with DD-affected probands as part of the Deciphering Developmental Disorders (DDD) study. We have ascertained 9 de novo MEs, 4 of which are likely causative of the patient’s symptoms (0.04% of probands), as well as 2 de novo gene retroduplications. Beyond identifying likely diagnostic RT events, we have estimated genome-wide germline ME mutagenesis and constraint and demonstrated that coding RT events have signatures of purifying selection equivalent to those of truncating mutations. Overall, our analysis represents a comprehensive interrogation of the impact of retrotransposition on protein coding genes and a framework for future evolutionary and disease studies.

Download Full-text

How do we transition from non-coding to coding?

10.7287/peerj.preprints.3031v1 ◽

2017 ◽

Author(s):

Jorge Ruiz-Orera ◽

José Luis Villanueva-Cañas ◽

William Blevins ◽

M.Mar Albà

Keyword(s):

De Novo ◽

Gene Evolution ◽

Purifying Selection ◽

Neutral Evolution ◽

Functional Protein ◽

Protein Coding ◽

Coding Sequences ◽

Sequence Composition ◽

Protein Coding Genes ◽

Small Proteins

Recent years have witnessed the discovery of protein–coding genes which appear to have evolved de novo from previously non-coding sequences. This has changed the long-standing view that coding sequences can only evolve from other coding sequences. However, there are still many open questions regarding how new protein-coding sequences can arise from non-genic DNA. Two prerequisites for the birth of a new functional protein-coding gene are that the corresponding DNA fragment is transcribed and that it is also translated. Transcription is known to be pervasive in the genome, producing a large number of transcripts that do not correspond to conserved protein-coding genes, and which are usually annotated as long non-coding RNAs (lncRNA). Recently, sequencing of ribosome protected fragments (Ribo-Seq) has provided evidence that many of these transcripts actually translate small proteins. We have used mouse non-synonymous and synonymous variation data to estimate the strength of purifying selection acting on the translated open reading frames (ORFs). Whereas a subset of the lncRNAs are likely to actually be true protein-coding genes (and thus previously misclassified), the bulk of lncRNAs code for proteins which show variation patterns consistent with neutral evolution. We also show that the ORFs that have a more favorable, coding-like, sequence composition are more likely to be translated than other ORFs in lncRNAs. This study provides strong evidence that there is a large and ever-changing reservoir of lowly abundant proteins; some of these peptides may become useful and act as seeds for de novo gene evolution.

Download Full-text