scholarly journals A comprehensive analysis of RNA sequences reveals macroscopic somatic clonal expansion across normal tissues

2018 ◽  
Author(s):  
Keren Yizhak ◽  
Francois Aguet ◽  
Jaegil Kim ◽  
Julian Hess ◽  
Kirsten Kubler ◽  
...  

AbstractCancer genome studies have significantly advanced our knowledge of somatic mutations. However, how these mutations accumulate in normal cells and whether they promote pre-cancerous lesions remains poorly understood. Here we perform a comprehensive analysis of normal tissues by utilizing RNA sequencing data from ∼6,700 samples across 29 normal tissues collected as part of the Genotype-Tissue Expression (GTEx) project. We identify somatic mutations using a newly developed pipeline, RNA-MuTect, for calling somatic mutations directly from RNA-seq samples and their matched-normal DNA. When applied to the GTEx dataset, we detect multiple variants across different tissues and find that mutation burden is associated with both the age of the individual and tissue proliferation rate. We also detect hotspot cancer mutations that share tissue specificity with their matched cancer type. This study is the first to analyze a large number of samples across multiple normal tissues, identifying clones with genomic aberrations observed in cancer.

NAR Cancer ◽  
2020 ◽  
Vol 2 (1) ◽  
Author(s):  
Julianne K David ◽  
Sean K Maden ◽  
Benjamin R Weeder ◽  
Reid F Thompson ◽  
Abhinav Nellore

Abstract This study probes the distribution of putatively cancer-specific junctions across a broad set of publicly available non-cancer human RNA sequencing (RNA-seq) datasets. We compared cancer and non-cancer RNA-seq data from The Cancer Genome Atlas (TCGA), the Genotype-Tissue Expression (GTEx) Project and the Sequence Read Archive. We found that (i) averaging across cancer types, 80.6% of exon–exon junctions thought to be cancer-specific based on comparison with tissue-matched samples (σ = 13.0%) are in fact present in other adult non-cancer tissues throughout the body; (ii) 30.8% of junctions not present in any GTEx or TCGA normal tissues are shared by multiple samples within at least one cancer type cohort, and 87.4% of these distinguish between different cancer types; and (iii) many of these junctions not found in GTEx or TCGA normal tissues (15.4% on average, σ = 2.4%) are also found in embryological and other developmentally associated cells. These findings refine the meaning of RNA splicing event novelty, particularly with respect to the human neoepitope repertoire. Ultimately, cancer-specific exon–exon junctions may have a substantial causal relationship with the biology of disease.


2019 ◽  
Vol 37 (15_suppl) ◽  
pp. e13032-e13032 ◽  
Author(s):  
Anton Buzdin ◽  
Andrew Garazha ◽  
Maxim Sorokin ◽  
Alex Glusker ◽  
Alexey Aleshin ◽  
...  

e13032 Background: Intracellular molecular pathways (IMPs) control all major events in the living cell. They are considered hotspots in contemporary oncology because knowledge of IMPs activation is essential for understanding mechanisms of molecular pathogenesis in oncology. Profiling IMPs requires RNA-seq data for tumors and for a collection of reference normal tissues. However, there is a shortage now in such profiles for normal tissues from healthy human donors, uniformly profiled in a single series of experiments. Access to the largest dataset of normal profiles GTEx is only partly available through the dbGaP. In TCGA database, norms are adjacent to surgically removed tumors and may be affected by tumor-linked growth factors, inflammation and altered vascularization. ENCODE datasets were for the autopsies of normal tissues, but they can’t form statistically significant reference groups. Methods: Tissue samples representing 20 organs were taken from post-mortal human healthy donors killed in road accidents no later than 36 hours after death, blood samples were taken from healthy volunteers. Gene expression was profiled in RNA-seq experiments using the same reagents, equipment and protocols. Bioinformatic algorithms for IMP analysis were developed and validated using experimental and public gene expression datasets. Results: From original sequencing data we constructed the biggest fully open reference expression database of normal human tissues including 465 profiles termed Oncobox Atlas of Normal Tissue Expression (ANTE, original data: GSE120795). We next developed a method termed Oncobox for interrogating activation of IMPs in human cancers. It includes modules of expression data harmonization and comparison and an algorithm for automatic annotation of molecular pathways. The Oncobox system enables accurate scoring of thousands molecular pathways using RNA-seq data. Oncobox pathway analysis is also applicable for quantitative proteomics and microRNA data in oncology. Conclusions: The Oncobox system can be used for a plethora of applications in cancer research including finding differentially regulated genes and IMPs, and for discovery of new pathway-related diagnostic and prognostic biomarkers.


2017 ◽  
Vol 18 (1) ◽  
Author(s):  
Seirana Hashemi ◽  
Abbas Nowzari Dalini ◽  
Adrin Jalali ◽  
Ali Mohammad Banaei-Moghaddam ◽  
Zahra Razaghi-Moghadam

2021 ◽  
Vol 23 (1) ◽  
pp. 58
Author(s):  
Mareike Polenkowski ◽  
Sebastian Burbano de Lara ◽  
Aldrige Allister ◽  
Thi Nguyen ◽  
Teruko Tamura ◽  
...  

Identification of cancer-specific target molecules and biomarkers may be useful in the development of novel treatment and immunotherapeutic strategies. We have recently demonstrated that the expression of long noncoding (lnc) RNAs can be cancer-type specific due to abnormal chromatin remodeling and alternative splicing. Furthermore, we identified and determined that the functional small protein C20orf204-189AA encoded by long intergenic noncoding RNA Linc00176 that is expressed predominantly in hepatocellular carcinoma (HCC), enhances transcription of ribosomal RNAs and supports growth of HCC. In this study we combined RNA-sequencing and polysome profiling to identify novel micropeptides that originate from HCC-specific lncRNAs. We identified nine lncRNAs that are expressed exclusively in HCC cells but not in the liver or other normal tissues. Here, DNase-sequencing data revealed that the altered chromatin structure plays a key role in the HCC-specific expression of lncRNAs. Three out of nine HCC-specific lncRNAs contain at least one open reading frame (ORF) longer than 50 amino acid (aa) and enriched in the polysome fraction, suggesting that they are translated. We generated a peptide specific antibody to characterize one candidate, NONHSAT013026.2/Linc013026. We show that Linc013026 encodes a 68 amino acid micropeptide that is mainly localized at the perinuclear region. Linc013026-68AA is expressed in a subset of HCC cells and plays a role in cell proliferation, suggesting that Linc013026-68AA may be used as a HCC-specific target molecule. Our finding also sheds light on the role of the previously ignored ’dark proteome’, that originates from noncoding regions in the maintenance of cancer.


2021 ◽  
Vol 84 (1) ◽  
Author(s):  
Hayato Ogawa ◽  
Keita Horitani ◽  
Yasuhiro Izumiya ◽  
Soichi Sano

Contrary to earlier beliefs, every cell in the individual is genetically different due to somatic mutations. Consequently, tissues become a mixture of cells with distinct genomes, a phenomenon termed somatic mosaicism. Recent advances in genome sequencing technology have unveiled possible causes of mutations and how they shape the unique mutational landscape of the tissues. Moreover, the analysis of sequencing data in combination with clinical information has revealed the impacts of somatic mosaicism on disease processes. In this review, we discuss somatic mosaicism in various tissues and its clinical implications for human disease. Expected final online publication date for the Annual Review of Physiology, Volume 84 is February 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


2020 ◽  
pp. 445-455
Author(s):  
James D. Brenton ◽  
Tim Eisen

Cancer is a genetic disease in which progressive accumulation of mutations in the genome of somatic cells induces abnormal biological capabilities. Cancer-inducing mutations may originate from single base substitutions or large chromosomal rearrangements; but ultimately they disrupt normal cellular processes by altering protein function or disturbing the regulation of gene expression. Loss-of-function mutations in tumour suppressor genes inactivate physiological control of cell processes, whereas gain-of-function mutations directly affect physiological networks and, for example, induce pathological activation of signalling pathways. For many common cancers, we are now close to defining unique sets of somatic alterations which confer a specific signature of the cancer type and are also highly specific to the individual patient.


2021 ◽  
Vol 22 (S4) ◽  
Author(s):  
Zexian Zeng ◽  
Chengsheng Mao ◽  
Andy Vo ◽  
Xiaoyu Li ◽  
Janna Ore Nugent ◽  
...  

Abstract Background Genetic information is becoming more readily available and is increasingly being used to predict patient cancer types as well as their subtypes. Most classification methods thus far utilize somatic mutations as independent features for classification and are limited by study power. We aim to develop a novel method to effectively explore the landscape of genetic variants, including germline variants, and small insertions and deletions for cancer type prediction. Results We proposed DeepCues, a deep learning model that utilizes convolutional neural networks to unbiasedly derive features from raw cancer DNA sequencing data for disease classification and relevant gene discovery. Using raw whole-exome sequencing as features, germline variants and somatic mutations, including insertions and deletions, were interactively amalgamated for feature generation and cancer prediction. We applied DeepCues to a dataset from TCGA to classify seven different types of major cancers and obtained an overall accuracy of 77.6%. We compared DeepCues to conventional methods and demonstrated a significant overall improvement (p < 0.001). Strikingly, using DeepCues, the top 20 breast cancer relevant genes we have identified, had a 40% overlap with the top 20 known breast cancer driver genes. Conclusion Our results support DeepCues as a novel method to improve the representational resolution of DNA sequencings and its power in deriving features from raw sequences for cancer type prediction, as well as discovering new cancer relevant genes.


2015 ◽  
Vol 112 (23) ◽  
pp. E3050-E3057 ◽  
Author(s):  
Christian L. Barrett ◽  
Christopher DeBoever ◽  
Kristen Jepsen ◽  
Cheryl C. Saenz ◽  
Dennis A. Carson ◽  
...  

Tumor-specific molecules are needed across diverse areas of oncology for use in early detection, diagnosis, prognosis and therapy. Large and growing public databases of transcriptome sequencing data (RNA-seq) derived from tumors and normal tissues hold the potential of yielding tumor-specific molecules, but because the data are new they have not been fully explored for this purpose. We have developed custom bioinformatic algorithms and used them with 296 high-grade serous ovarian (HGS-OvCa) tumor and 1,839 normal RNA-seq datasets to identify mRNA isoforms with tumor-specific expression. We rank prioritized isoforms by likelihood of being expressed in HGS-OvCa tumors and not in normal tissues and analyzed 671 top-ranked isoforms by high-throughput RT-qPCR. Six of these isoforms were expressed in a majority of the 12 tumors examined but not in 18 normal tissues. An additional 11 were expressed in most tumors and only one normal tissue, which in most cases was fallopian or colon. Of the 671 isoforms, the topmost 5% (n = 33) ranked based on having tumor-specific or highly restricted normal tissue expression by RT-qPCR analysis are enriched for oncogenic, stem cell/cancer stem cell, and early development loci—including ETV4, FOXM1, LSR, CD9, RAB11FIP4, and FGFRL1. Many of the 33 isoforms are predicted to encode proteins with unique amino acid sequences, which would allow them to be specifically targeted for one or more therapeutic strategies—including monoclonal antibodies and T-cell–based vaccines. The systematic process described herein is readily and rapidly applicable to the more than 30 additional tumor types for which sufficient amounts of RNA-seq already exist.


2020 ◽  
pp. jmedgenet-2020-106905
Author(s):  
Ji-Hye Oh ◽  
Chang Ohk Sung

BackgroundSomatic mutations are a major driver of cancer development and many have now been identified in various cancer types, but the comprehensive somatic mutation status of the normal tissues matched to tumours has not been revealed.MethodWe analysed the somatic mutations of whole exome sequencing data in 392 patient tumour and normal tissue pairs based on the corresponding blood samples across 10 tumour types.ResultsMany of the mutations involved in oncogenic pathways such as PI3K, NOTCH and TP53, were identified in the normal tissues. The ageing-related mutational signature was the most prominent contributing signature found and the mutations in the normal tissues were frequently in genes involved in late replication time (p<0.0001). Variants were rarely overlapping across tissue types but shared variants between normal and matched tumour tissue were present. These shared variants were frequently pathogenic when compared with non-shared variants (p=0.001) and showed a higher variant-allele-fraction (p<0.0001). Normal tissue-specific mutated genes were frequently non-cancer-associated (p=0.009). PIK3CA mutations were identified in 6 normal tissues and were harboured by all of the matched cancer tissues. Multiple types of PIK3CA mutations were found in normal breast and matched cancer tissues. The PIK3CA mutations exclusively present in normal tissue may indicate clonal expansions unrelated to the tumour. In addition, PIK3CA mutation was appeared that they arose before the occurrence of the allelic imbalance.ConclusionOur current results suggest that somatic mutant clones exist in normal tissues and that their clonal expansion could be linked to cancer development.


2019 ◽  
Author(s):  
Julianne K. David ◽  
Sean K. Maden ◽  
Benjamin R. Weeder ◽  
Reid F. Thompson ◽  
Abhinav Nellore

ABSTRACTWe compared cancer and non-cancer RNA sequencing (RNA-seq) data from The Cancer Genome Atlas (TCGA), the Genotype-Tissue Expression (GTEx) Project, and the Sequence Read Archive (SRA). We found that: 1) averaging across cancer types, 80.6% of exon-exon junctions thought to be cancer-specific based on comparison with tissue-matched samples are in fact present in other adult non-cancer tissues throughout the body; 2) 30.8% of junctions not present in any GTEx or TCGA normal tissues are shared by multiple samples within at least one cancer type cohort, and 87.4% of these distinguish between different cancer types; and 3) many of these junctions not found in GTEx or TCGA normal tissues (15.4% on average) are also found in embryological and other developmentally associated cells. This study probes the distribution of putatively cancer-specific junctions across a broad set of publicly available non-cancer human RNA-seq datasets. Overall, we identify a subset of shared cancer-specific junctions that could represent novel sources of cancer neoantigens. We further describe a framework for characterizing possible origins of these junctions, including potential developmental and embryological sources, as well as cell type-specific markers particularly related to cell types of cancer origin. These findings refine the meaning of RNA splicing event novelty, particularly with respect to the human neoepitope repertoire. Ultimately, cancer-specific exon-exon junctions may affect the anti-cancer immune response and may have a substantial causal relationship with the biology of disease.


Sign in / Sign up

Export Citation Format

Share Document