Identification of Genes under Purifying Selection in Human Cancers

Mapping Intimacies ◽

10.1101/129205 ◽

2017 ◽

Author(s):

Robert A. Mathis ◽

Ethan S. Sokol ◽

Piyush B. Gupta

Keyword(s):

Negative Selection ◽

Somatic Mutations ◽

Purifying Selection ◽

Sequencing Data ◽

Systematic Assessment ◽

Coding Regions ◽

Strong Negative Selection ◽

Cancer Genomes ◽

Lung Adenocarcinomas ◽

Widespread Interest

AbstractThere is widespread interest in finding therapeutic vulnerabilities by analyzing the somatic mutations in cancers. Most analyses have focused on identifying driver oncogenes mutated in patient tumors, but this approach is incapable of discovering genes essential for tumor growth yet not activated through mutation. We show that such genes can be systematically discovered by mining cancer sequencing data for evidence of purifying selection. We show that purifying selection reduces substitution rates in coding regions of cancer genomes, depleting up to 90% of mutations for some genes. Moreover, mutations resulting in non-conservative amino acid substitutions are under strong negative selection in tumors, whereas conservative substitutions are more tolerated. Genes under purifying selection include members of the EGFR and FGFR pathways in lung adenocarcinomas, and DNA repair pathways in melanomas. A systematic assessment of purifying selection in tumors would identify hundreds of tumor-specific enablers and thus novel targets for therapy.

Download Full-text

Extreme purifying selection against point mutations in the human genome

10.1101/2021.08.23.457339 ◽

2021 ◽

Author(s):

Noah Dukler ◽

Mehreen R Mughal ◽

Ritika Ramani ◽

Yi-Fei Huang ◽

Adam Siepel

Keyword(s):

Human Genome ◽

De Novo ◽

Point Mutations ◽

Purifying Selection ◽

Selection Coefficient ◽

Sequencing Data ◽

Protein Coding ◽

Coding Regions ◽

Protein Coding Genes ◽

Selective Effects

Genome sequencing of tens of thousands of human individuals has recently enabled the measurement of large selective effects for mutations to protein-coding genes. Here we describe a new method, called ExtRaINSIGHT, for measuring similar selective effects at individual sites in noncoding as well as in coding regions of the human genome. ExtRaINSIGHT estimates the prevalance of strong purifying selection, or "ultraselection" (λs), as the fractional depletion of rare single-nucleotide variants (minor allele frequency <0.1%) in a target set of genomic sites relative to matched sites that are putatively neutrally evolving, in a manner that controls for local variation and neighbor-dependence in mutation rate. We show using simulations that, above an appropriate threshold, λs is closely related to the average site-specific selection coefficient against heterozygous point mutations, as predicted at mutation-selection balance. Applying ExtRaINSIGHT to 71,702 whole genome sequences from gnomAD v3, we find particularly strong evidence of ultraselection in evolutionarily ancient miRNAs and neuronal protein-coding genes, as well as at splice sites. Moreover, our estimated selection coefficient against heterozygous amino-acid replacements across the genome (at 1.4%) is substantially larger than previous estimates based on smaller sample sizes. By contrast, we find weak evidence of ultraselection in other noncoding RNAs and transcription factor binding sites, and only modest evidence in ultraconserved elements and human accelerated regions. We estimate that ~0.3-0.5% of the human genome is ultraselected, with one third to one half of ultraselected sites falling in coding regions. These estimates suggest ~0.3-0.4 lethal or nearly lethal de novo mutations per potential human zygote, together with ~2 de novo mutations that are more weakly deleterious. Overall, our study sheds new light on the genome-wide distribution of fitness effects for new point mutations by combining deep new sequencing data sets and classical theory from population genetics.

Download Full-text

Integrative Analysis of Somatic Mutations in Non-coding Regions Altering RNA Secondary Structures in Cancer Genomes

Scientific Reports ◽

10.1038/s41598-019-44489-5 ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 1

Author(s):

Funan He ◽

Ran Wei ◽

Zhan Zhou ◽

Leihuan Huang ◽

Yinan Wang ◽

...

Keyword(s):

Somatic Mutations ◽

Secondary Structures ◽

Integrative Analysis ◽

Rna Secondary Structures ◽

Coding Regions ◽

Cancer Genomes

Download Full-text

A computational framework for detecting signatures of accelerated somatic evolution in cancer genomes

10.1101/177261 ◽

2017 ◽

Author(s):

Kyle S. Smith ◽

Debashis Ghosh ◽

Katherine S. Pollard ◽

Subhajyoti De

Keyword(s):

Somatic Mutations ◽

Local Context ◽

Whole Genome Sequencing Data ◽

Nuclear Space ◽

Evolutionary Divergence ◽

Sequencing Data ◽

Computational Framework ◽

Somatic Evolution ◽

A Genome ◽

Cancer Genomes

ABSTRACTBy accumulation of somatic mutations, cancer genomes evolve, diverging away from the genome of the host. It remains unclear to what extent somatic evolutionary divergence is comparable across different regions of the cancer genome versus concentrated in specific genomic elements. We present a novel computational framework, SASE-mapper, to identify genomic regions that show signatures of accelerated somatic evolution (SASE) in a subset of samples in a cohort, marked by accumulation of an excess of somatic mutations compared to that expected based on local, context-aware background mutation rates in the cancer genomes. Analyzing tumor whole genome sequencing data for 365 samples from 5 cohorts we detect recurrent SASE at a genome-wide scale. The SASEs were enriched for genomic elements associated with active chromatin, and regulatory regions of several known cancer genes had SASE in multiple cohorts. Regions with SASE carried specific mutagenic signatures and often co-localized within the 3D nuclear space suggesting their common basis. A subset of SASEs was frequently associated with regulatory changes in key cancer pathways and also poor clinical outcome. While the SASE-associated mutations were not necessarily recurrent at base-pair resolution, the SASEs recurrently targeted same functional regions, with similar consequences. It is likely that regulatory redundancy and plasticity promote prevalence of SASE-like patterns in the cancer genomes.

Download Full-text

Evolutionary dynamics of neoantigens in growing tumours

10.1101/536433 ◽

2019 ◽

Cited By ~ 8

Author(s):

Eszter Lakatos ◽

Marc J. Williams ◽

Ryan O. Schenck ◽

William C. H. Cross ◽

Jacob Househam ◽

...

Keyword(s):

Negative Selection ◽

Evolutionary Dynamics ◽

Immune Escape ◽

Patient Specific ◽

Mathematical Framework ◽

Cancer Evolution ◽

Sequencing Data ◽

Clone Size ◽

Strong Negative Selection ◽

Immune Escape Mechanisms

ABSTRACTCancer evolution is driven by the acquisition of somatic mutations that provide cells with a beneficial phenotype in a changing microenvironment. However, mutations that give rise to neoantigens, novel cancer–specific peptides that elicit an immune response, are likely to be disadvantageous. Here we show how the clonal structure and immunogenotype of growing tumours is shaped by negative selection in response to neoantigenic mutations. We construct a mathematical model of neoantigen evolution in a growing tumour, and verify the model using genomic sequencing data. The model predicts that, in the absence of active immune escape mechanisms, tumours either evolve clonal neoantigens (antigen– ‘hot’), or have no clonally– expanded neoantigens at all (antigen– ‘cold’), whereas antigen– ‘warm’ tumours (with high frequency subclonal neoantigens) form only following the evolution of immune evasion. Counterintuitively, strong negative selection for neoantigens during tumour formation leads to an increased number of antigen– warm or – hot tumours, as a consequence of selective pressure for immune escape. Further, we show that the clone size distribution under negative selection is effectively– neutral, and moreover, that stronger negative selection paradoxically leads to more neutral– like dynamics. Analysis of antigen clone sizes and immune escape in colorectal cancer exome sequencing data confirms these results. Overall, we provide and verify a mathematical framework to understand the evolutionary dynamics and clonality of neoantigens in human cancers that may inform patient– specific immunotherapy decision– making.

Download Full-text

Up-regulation of cryptochrome 1 gene expression in cotton bollworm (Helicoverpa armigera) during migration over the Bohai Sea

PeerJ ◽

10.7717/peerj.8071 ◽

2019 ◽

Vol 7 ◽

pp. e8071

Author(s):

Liyu Yang ◽

Yingjie Liu ◽

Philip Donkersley ◽

Pengjun Xu

Keyword(s):

Helicoverpa Armigera ◽

Negative Selection ◽

Bohai Sea ◽

Purifying Selection ◽

Migration Behavior ◽

Negative Selection Pressure ◽

Strong Negative Selection ◽

Helicoverpa Armígera ◽

Novel Transcripts ◽

Cryptochrome 1

Cryptochromes (CRYs) are flavoproteins and play a pivotal role in circadian clocks which mediate behavior of organisms such as feeding, mating and migrating navigation. Herein, we identified novel transcripts in Helicoverpa armigera of six isoforms of cry1 and seven isoforms of cry2 by Sanger sequencing. Phylogenetic analysis showed that the transcripts of cry1 and cry2 align closely with other insect crys, indicating within-species divergence of Hacry. A dn/ds analysis revealed that the encoding sequence of the cry1 was under purifying selection by a strong negative selection pressure whereas the cry2 was less constraint and showed a less strong purification selection than cry1. In general, Hacrys were more abundantly transcribed in wild migrating populations than that in laboratory maintained populations, and expression of the cry2 was lower than cry1 in all samples tested. Moreover, when compared with the migrating parental population, offspring reared in laboratory conditions showed a significant reduction on transcription of the cry1 but not cry2. These results strongly suggest that cry1 was more related to the migration behavior of H. armigera than cry2.

Download Full-text

Human-lineage-specific genomic elements are associated with neurodegenerative disease and APOE transcript usage

Nature Communications ◽

10.1038/s41467-021-22262-5 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Zhongbo Chen ◽

◽

David Zhang ◽

Regina H. Reynolds ◽

Emil K. Gustavsson ◽

...

Keyword(s):

Neurological Diseases ◽

Purifying Selection ◽

Whole Genome Sequencing Data ◽

Human Lineage ◽

Sequencing Data ◽

Protein Coding ◽

Potential Association ◽

High Depth ◽

Specific Sequences ◽

Human Specific

AbstractKnowledge of genomic features specific to the human lineage may provide insights into brain-related diseases. We leverage high-depth whole genome sequencing data to generate a combined annotation identifying regions simultaneously depleted for genetic variation (constrained regions) and poorly conserved across primates. We propose that these constrained, non-conserved regions (CNCRs) have been subject to human-specific purifying selection and are enriched for brain-specific elements. We find that CNCRs are depleted from protein-coding genes but enriched within lncRNAs. We demonstrate that per-SNP heritability of a range of brain-relevant phenotypes are enriched within CNCRs. We find that genes implicated in neurological diseases have high CNCR density, including APOE, highlighting an unannotated intron-3 retention event. Using human brain RNA-sequencing data, we show the intron-3-retaining transcript to be more abundant in Alzheimer’s disease with more severe tau and amyloid pathological burden. Thus, we demonstrate potential association of human-lineage-specific sequences in brain development and neurological disease.

Download Full-text

A simple method to estimate the in-house limit of detection for genetic mutations with low allele frequencies in whole-exome sequencing analysis by next-generation sequencing

BMC Genomic Data ◽

10.1186/s12863-020-00956-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Takumi Miura ◽

Satoshi Yasuda ◽

Yoji Sato

Keyword(s):

Next Generation Sequencing ◽

Allele Frequency ◽

Somatic Mutations ◽

Limit Of Detection ◽

Allele Frequencies ◽

Genetic Mutations ◽

Sequencing Data ◽

Simple Method ◽

Whole Exome ◽

Generation Sequencing

Abstract Background Next-generation sequencing (NGS) has profoundly changed the approach to genetic/genomic research. Particularly, the clinical utility of NGS in detecting mutations associated with disease risk has contributed to the development of effective therapeutic strategies. Recently, comprehensive analysis of somatic genetic mutations by NGS has also been used as a new approach for controlling the quality of cell substrates for manufacturing biopharmaceuticals. However, the quality evaluation of cell substrates by NGS largely depends on the limit of detection (LOD) for rare somatic mutations. The purpose of this study was to develop a simple method for evaluating the ability of whole-exome sequencing (WES) by NGS to detect mutations with low allele frequency. To estimate the LOD of WES for low-frequency somatic mutations, we repeatedly and independently performed WES of a reference genomic DNA using the same NGS platform and assay design. LOD was defined as the allele frequency with a relative standard deviation (RSD) value of 30% and was estimated by a moving average curve of the relation between RSD and allele frequency. Results Allele frequencies of 20 mutations in the reference material that had been pre-validated by droplet digital PCR (ddPCR) were obtained from 5, 15, 30, or 40 G base pair (Gbp) sequencing data per run. There was a significant association between the allele frequencies measured by WES and those pre-validated by ddPCR, whose p-value decreased as the sequencing data size increased. By this method, the LOD of allele frequency in WES with the sequencing data of 15 Gbp or more was estimated to be between 5 and 10%. Conclusions For properly interpreting the WES data of somatic genetic mutations, it is necessary to have a cutoff threshold of low allele frequencies. The in-house LOD estimated by the simple method shown in this study provides a rationale for setting the cutoff.

Download Full-text

912 Preferential recognition of neoantigens over non-canonical peptides in cancer patients

Journal for ImmunoTherapy of Cancer ◽

10.1136/jitc-2021-sitc2021.912 ◽

2021 ◽

Vol 9 (Suppl 3) ◽

pp. A958-A958

Author(s):

Maria Lozano-Rabella ◽

Andrea Garcia-Garijo ◽

Jara Palomero ◽

Florian Erhard ◽

Juan Martín-Liberal ◽

...

Keyword(s):

T Cells ◽

Tumor Cell ◽

Somatic Mutations ◽

Tumor Antigens ◽

Tumor Rejection ◽

Tumor Cell Lines ◽

Coding Regions ◽

Cell Responses ◽

Whole Exome

BackgroundDespite recent advances in exome and RNA sequencing to identify tumor-rejection antigens including neoantigens, the existing techniques fail to identify the vast majority of antigens targeted by tumor-reactive cells. A growing number of studies suggest that HLA-I peptides derived from non-canonical (nonC) open reading frames or derived from allegedly non-coding regions can contribute to tumor immunogenicity. Here we use proteogenomics to identify personalized candidate canonical and non-canonical tumor-rejection antigens and to evaluate their contribution to cancer immune surveillance in patients.MethodsWhole exome sequencing was performed to identify the non-synonymous somatic mutations (NSM) and immunopeptidomics to identify the HLA-I presented peptides (pHLA) in 9 patient-derived tumor cell lines (TCL). Peptid-PRISM proteogenomics pipeline was used to identify both canonical and non-canonical pHLA, including those derived from NSM in coding regions. All peptides containing mutations and derived from either cancer-testis (CTA) or tumor-associated antigens (TAA) were selected as candidate tumor antigens. For nonC peptides, an immunopeptidomics healthy dataset containing several tissues and HLA-allotypes was used to eliminate those derived from normal ORFs and select nonC peptides preferentially expressed in tumor cells (nonC-TE). The selected candidate peptides were synthesized, pulsed onto autologous APCs and co-cultured with tumor-reactive ex vivo expanded lymphocytes to assess immune recognition (figure 1).ResultsNonC-TE peptides were identified in all TCL studied, ranging from 0.5% to 5.4% of the total HLA-I presented peptides (n= 506). As described previoulsy, 5’UTR were the main source. Of note, the tumor type did not have an impact on the frequency of presented nonC peptides, but rather the presence of HLA-A*11:01 and HLA-A*03:01 was a major determinant. T cell responses were detected against at least 13/33 putative neoantigens, 2/24 CTA and 2/61 TAA. On the contrary, none of the 471 nonC-TE candidate peptides tested thus far, including one containing a NSM were able to elicit a recall immune response. Nevertheless, T cells recognizing at least 3 of them were detected through in vitro sensitization of non-autologous PBMCs.Abstract 912 Figure 1Workflow diagramTumor biopsies and blood samples are obtained from cancer patients (left panel). Patient-derived tumor cell lines are generated in vitro, the peptides presented on HLA molecules are further isolated and analyzed in a mass-spectrometer (top panel). Whole exome sequencing (WES) from matched tumor and healthy tissue is performed to identify the non-synonymous somatic mutations (NSM) (middle panel). Peptide-PRISM proteogenomics pipeline combines the information from the immunopeptidomics data and WES to identify pHLA sequences from both canonical and non-canonical candidate tumor antigens (top right panel). Lymphocyte populations either TILs or sorted PBMCs are expanded and further screened for pre-existing T cell responses (bottom panel) against the candidate epitopes by co-culturing the T cells with peptide-pulsed autologous APC. The recognition is assessed by measuring IFNg release by elispot and the upregulation of activation surface markers by FACS (bottom right panel).ConclusionsOur results show that although HLA-I nonC peptides were frequently presented in all TCLs studied and they can be immunogenic, neoantigens derived from mutations in canonical coding regions were preferentially recognized by tumor-reactive lymphocytes, suggesting T cells targeting the latter are primed more efficiently. The identification of mutated nonC antigens using whole genome sequencing to identify mutations in non-coding regions warrants further examination. Still, the specificity of many tumor-reactive TILs remains unknown.Ethics Approval”This study was approved by the ”Comité de Ética de Investigación con Medicamentos del Hospital Universitario Vall d’Hebron” institution’s Ethics Board; approval number PR(AG)537/2019.”

Download Full-text

Longitudinal dynamics of clonal hematopoiesis identifies gene-specific fitness effects

10.1101/2021.05.27.446006 ◽

2021 ◽

Author(s):

Neil A Robertson ◽

Eric Latorre-Crespo ◽

Maria Terrada-Terradas ◽

Alison C Purcell ◽

Benjamin J Livesey ◽

...

Keyword(s):

Stem Cells ◽

Stem Cell ◽

Somatic Mutations ◽

Growth Potential ◽

Healthy Individuals ◽

Sequencing Data ◽

Fitness Effects ◽

Increased Risk ◽

Neutral Mutations ◽

Proliferative Advantage

The prevalence of clonal haematopoiesis of indeterminate potential (CHIP) in healthy individuals increases rapidly from age 60 onwards and has been associated with increased risk for malignancy, heart disease and ischemic stroke. CHIP is driven by somatic mutations in stem cells that are also drivers of myeloid malignancies. Since mutations in stem cells often drive leukaemia, we hypothesised that stem cell fitness substantially contributes to transformation from CHIP to leukaemia. Stem cell fitness is defined as the proliferative advantage over cells carrying no or only neutral mutations. We set out to quantify the fitness effects of CHIP drivers over a 15 year timespan in older age, using longitudinal error-corrected sequencing data. It is currently unknown whether mutations in different CHIP genes lead to distinct fitness advantages that could form the basis for patient stratification. We developed a new method based on drift-induced fluctuation (DIF) filtering to extract fitness effects from longitudinal data, and thus quantify the growth potential of variants within each individual. Our approach discriminates naturally drifting populations of cells and faster growing clones, while taking into account individual mutational context. We show that gene-specific fitness differences can outweigh inter-individual variation and therefore could form the basis for personalised clinical management.

Download Full-text

Divergent and convergent evolution of housekeeping genes in human–pig lineage

PeerJ ◽

10.7717/peerj.4840 ◽

2018 ◽

Vol 6 ◽

pp. e4840 ◽

Cited By ~ 4

Author(s):

Kai Wei ◽

Tingting Zhang ◽

Lei Ma

Keyword(s):

Active Sites ◽

Evolutionary Dynamics ◽

Purifying Selection ◽

Housekeeping Genes ◽

Neutral Evolution ◽

Structure Evolution ◽

Tissue Cell ◽

Sequencing Data ◽

Cellular Functions ◽

Species Specific

Housekeeping genes are ubiquitously expressed and maintain basic cellular functions across tissue/cell type conditions. The present study aimed to develop a set of pig housekeeping genes and compare the structure, evolution and function of housekeeping genes in the human–pig lineage. By using RNA sequencing data, we identified 3,136 pig housekeeping genes. Compared with human housekeeping genes, we found that pig housekeeping genes were longer and subjected to slightly weaker purifying selection pressure and faster neutral evolution. Common housekeeping genes, shared by the two species, achieve stronger purifying selection than species-specific genes. However, pig- and human-specific housekeeping genes have similar functions. Some species-specific housekeeping genes have evolved independently to form similar protein active sites or structure, such as the classical catalytic serine–histidine–aspartate triad, implying that they have converged for maintaining the basic cellular function, which allows them to adapt to the environment. Human and pig housekeeping genes have varied structures and gene lists, but they have converged to maintain basic cellular functions essential for the existence of a cell, regardless of its specific role in the species. The results of our study shed light on the evolutionary dynamics of housekeeping genes.

Download Full-text