TOD-CUP: a gene expression rank-based majority vote algorithm for tissue origin diagnosis of cancers of unknown primary

Author(s):  
Yifei Shen ◽  
Qinjie Chu ◽  
Xinxin Yin ◽  
Yinjun He ◽  
Panpan Bai ◽  
...  

Abstract Gene expression profiling holds great potential as a new approach to histological diagnosis and precision medicine of cancers of unknown primary (CUP). Batch effects and different data types greatly decrease the predictive performance of biomarker-based algorithms, and few methods have been widely applied to identify tissue origin of CUP up to now. To address this problem and assist in more precise diagnosis, we have developed a gene expression rank-based majority vote algorithm for tissue origin diagnosis of CUP (TOD-CUP) of most common cancer types. Based on massive tissue-specific RNA-seq data sets (10 553) found in The Cancer Genome Atlas (TCGA), 538 feature genes (biomarkers) were selected based on their gene expression ranks and used to predict tissue types. The top scoring pairs (TSPs) classifier of the tumor type was optimized by the TCGA training samples. To test the prediction accuracy of our TOD-CUP algorithm, we analyzed (1) two microarray data sets (1029 Agilent and 2277 Affymetrix/Illumina chips) and found 91% and 94% prediction accuracy, respectively, (2) RNA-seq data from five cancer types derived from 141 public metastatic cancer tumor samples and achieved 94% accuracy and (3) a total of 25 clinical cancer samples (including 14 metastatic cancer samples) were able to classify 24/25 samples correctly (96.0% accuracy). Taken together, the TOD-CUP algorithm provides a powerful and robust means to accurately identify the tissue origin of 24 cancer types across different data platforms. To make the TOD-CUP algorithm easily accessible for clinical application, we established a Web-based server for tumor tissue origin diagnosis (http://ibi. zju.edu.cn/todcup/).

2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Xianlu Laura Peng ◽  
Richard A. Moffitt ◽  
Robert J. Torphy ◽  
Keith E. Volmar ◽  
Jen Jen Yeh

Abstract Tumors are mixtures of different compartments. While global gene expression analysis profiles the average expression of all compartments in a sample, identifying the specific contribution of each compartment remains a challenge. With the increasing recognition of the importance of non-neoplastic components, the ability to breakdown the gene expression contribution of each is critical. Here, we develop DECODER, an integrated framework which performs de novo deconvolution and single-sample compartment weight estimation. We use DECODER to deconvolve 33 TCGA tumor RNA-seq data sets and show that it may be applied to other data types including ATAC-seq. We demonstrate that it can be utilized to reproducibly estimate cellular compartment weights in pancreatic cancer that are clinically meaningful. Application of DECODER across cancer types advances the capability of identifying cellular compartments in an unknown sample and may have implications for identifying the tumor of origin for cancers of unknown primary.


Cancers ◽  
2020 ◽  
Vol 12 (9) ◽  
pp. 2476 ◽  
Author(s):  
Shaoli Das ◽  
Kevin Camphausen ◽  
Uma Shankavaram

To elucidate the role of immune cell infiltration as a prognostic signature in solid tumors, we analyzed immune-function-related genes from four publicly available single-cell RNA-Seq data sets and twenty bulk tumor RNA-Seq data sets from The Cancer Genome Atlas (TCGA). Unsupervised clustering of pan-cancer transcriptomic signature showed two major immune function types: one related to NK-, T-, and B-cell functions and the other related to monocyte, macrophage, dendritic cell, and Toll-like receptor functions. Kaplan–Meier analysis showed differential prognosis of these two groups, dependent on the cancer type. Our analysis of TCGA solid tumors with an elastic net model identified 155 genes associated with disease-free survival in different tumor types with varied influence across different cancer types. With this gene set, we computed cancer-specific prognostic immune score models for individual cancer types that predicted disease-free and overall survival. Validation of our model on available published data of immune checkpoint blockade therapies on melanoma, kidney renal cell carcinoma, non-small cell lung cancer, gastric cancer and bladder cancer confirmed that cancer-specific higher immune scores are associated with response to immunotherapy. Our analysis provides a comprehensive map of cancer-specific immune-related prognostic gene sets that are associated with immunotherapy response.


2020 ◽  
Author(s):  
Ramon Viñas ◽  
Tiago Azevedo ◽  
Eric R. Gamazon ◽  
Pietro Liò

AbstractA question of fundamental biological significance is to what extent the expression of a subset of genes can be used to recover the full transcriptome, with important implications for biological discovery and clinical application. To address this challenge, we present GAIN-GTEx, a method for gene expression imputation based on Generative Adversarial Imputation Networks. In order to increase the applicability of our approach, we leverage data from GTEx v8, a reference resource that has generated a comprehensive collection of transcriptomes from a diverse set of human tissues. We compare our model to several standard and state-of-the-art imputation methods and show that GAIN-GTEx is significantly superior in terms of predictive performance and runtime. Furthermore, our results indicate strong generalisation on RNA-Seq data from 3 cancer types across varying levels of missingness. Our work can facilitate a cost-effective integration of large-scale RNA biorepositories into genomic studies of disease, with high applicability across diverse tissue types.


2021 ◽  
Vol 9 (Suppl 3) ◽  
pp. A86-A86
Author(s):  
Paul DePietro ◽  
Mary Nesline ◽  
Yong Hee Lee ◽  
RJ Seager ◽  
Erik Van Roey ◽  
...  

BackgroundImmune checkpoint inhibitor-based therapies have achieved impressive success in the treatment of several cancer types. Predictive immune biomarkers, including PD-L1, MSI and TMB are well established as surrogate markers for immune evasion and tumor-specific neoantigens across many tumors. Positive detection across cancer types varies, but overall ~50% of patients test negative for these primary immune markers.1 In this study, we investigated the prevalence of secondary immune biomarkers outside of PD-L1, TMB and MSI.MethodsComprehensive genomic and immune profiling, including PD-L1 IHC, TMB, MSI and gene expression of 395 immune related genes was performed on 6078 FFPE tumors representing 34 cancer types, predominantly composed of lung cancer (36.7%), colorectal cancer (11.9%) and breast cancer (8.5%). Expression levels by RNA-seq of 36 genes targeted by immunotherapies in solid tumor clinical trials, identified as secondary immune biomarkers, were ranked against a reference population. Genes with a rank value ≥75th percentile were considered high and values were associated with PD-L1 (positive ≥1%), MSI (MSI-H or MSS) and TMB (high ≥10 Mut/Mb) status. Additionally, secondary immune biomarker status was segmented by tumor type and cancer immune cycle roles.ResultsIn total, 41.0% of cases were PD-L1+, 6.4% TMB+, and 0.1% MSI-H. 12.6% of cases were positive for >2 of these markers while 39.9% were triple negative (PD-L1-/TMB-/MSS). Of the PD-L1-/TMB-/MSS cases, 89.1% were high for at least one secondary immune biomarker, with 69.3% having ≥3 markers. PD-L1-/TMB-/MSS tumor types with ≥50% prevalence of high secondary immune biomarkers included brain, prostate, kidney, sarcoma, gallbladder, breast, colorectal, and liver cancer. High expression of cancer testis antigen secondary immune biomarkers (e.g., NY-ESO-1, LAGE-1A, MAGE-A4) was most commonly observed in bladder, ovarian, sarcoma, liver, and prostate cancer (≥15%). Tumors demonstrating T-cell priming (e.g., CD40, OX40, CD137), trafficking (e.g., TGFB1, TLR9, TNF) and/or recognition (e.g., CTLA4, LAG3, TIGIT) secondary immune biomarkers were most represented by kidney, gallbladder, and sarcoma (≥40%), with melanoma, esophageal, head & neck, cervical, stomach, and lung cancer least represented (≥15%).ConclusionsOur studies show comprehensive tumor profiling that includes gene expression can detect secondary immune biomarkers targeted by investigational therapies in ~90% of PD-L1-/TMB-/MSS cases. While genomic profiling could also provide therapeutic choices for a percentage of these patients, detection of secondary immune biomarkers by RNA-seq provides additional options for patients without a clear therapeutic path as determined by PD-L1 testing and genomic profiling alone.ReferenceHuang R S P, Haberberger J, Severson E, et al. A pan-cancer analysis of PD-L1 immunohistochemistry and gene amplification, tumor mutation burden and microsatellite instability in 48,782 cases. Mod Pathol 2021;34: 252–263.


2019 ◽  
Author(s):  
Marcus Alvarez ◽  
Elior Rahmani ◽  
Brandon Jew ◽  
Kristina M. Garske ◽  
Zong Miao ◽  
...  

AbstractSingle-nucleus RNA sequencing (snRNA-seq) measures gene expression in individual nuclei instead of cells, allowing for unbiased cell type characterization in solid tissues. Contrary to single-cell RNA seq (scRNA-seq), we observe that snRNA-seq is commonly subject to contamination by high amounts of extranuclear background RNA, which can lead to identification of spurious cell types in downstream clustering analyses if overlooked. We present a novel approach to remove debris-contaminated droplets in snRNA-seq experiments, called Debris Identification using Expectation Maximization (DIEM). Our likelihood-based approach models the gene expression distribution of debris and cell types, which are estimated using EM. We evaluated DIEM using three snRNA-seq data sets: 1) human differentiating preadipocytes in vitro, 2) fresh mouse brain tissue, and 3) human frozen adipose tissue (AT) from six individuals. All three data sets showed various degrees of extranuclear RNA contamination. We observed that existing methods fail to account for contaminated droplets and led to spurious cell types. When compared to filtering using these state of the art methods, DIEM better removed droplets containing high levels of extranuclear RNA and led to higher quality clusters. Although DIEM was designed for snRNA-seq data, we also successfully applied DIEM to single-cell data. To conclude, our novel method DIEM removes debris-contaminated droplets from single-cell-based data fast and effectively, leading to cleaner downstream analysis. Our code is freely available for use at https://github.com/marcalva/diem.


2019 ◽  
Vol 36 (7) ◽  
pp. 1373-1383 ◽  
Author(s):  
Longjun Wu ◽  
Kailey E Ferger ◽  
J David Lambert

Abstract It has been proposed that animals have a pattern of developmental evolution resembling an hourglass because the most conserved development stage—often called the phylotypic stage—is always in midembryonic development. Although the topic has been debated for decades, recent studies using molecular data such as RNA-seq gene expression data sets have largely supported the existence of periods of relative evolutionary conservation in middevelopment, consistent with the phylotypic stage and the hourglass concepts. However, so far this approach has only been applied to a limited number of taxa across the tree of life. Here, using established phylotranscriptomic approaches, we found a surprising reverse hourglass pattern in two molluscs and a polychaete annelid, representatives of the Spiralia, an understudied group that contains a large fraction of metazoan body plan diversity. These results suggest that spiralians have a divergent midembryonic stage, with more conserved early and late development, which is the inverse of the pattern seen in almost all other organisms where these phylotranscriptomic approaches have been reported. We discuss our findings in light of proposed reasons for the phylotypic stage and hourglass model in other systems.


2015 ◽  
Vol 76 (1) ◽  
Author(s):  
Ang Jun Chin ◽  
Andri Mirzal ◽  
Habibollah Haron

Gene expression profile is eminent for its broad applications and achievements in disease discovery and analysis, especially in cancer research. Spectral clustering is robust to irrelevant features which are appropriated for gene expression analysis. However, previous works show that performance comparison with other clustering methods is limited and only a few microarray data sets were analyzed in each study. In this study, we demonstrate the use of spectral clustering in identifying cancer types or subtypes from microarray gene expression profiling. Spectral clustering was applied to eleven microarray data sets and its clustering performances were compared with the results in the literature. Based on the result, overall the spectral clustering slightly outperformed the corresponding results in the literature. The spectral clustering can also offer more stable clustering performances as it has smaller standard deviation value. Moreover, out of eleven data sets the spectral clustering outperformed the corresponding methods in the literature for six data sets. So, it can be stated that the spectral clustering is a promising method in identifying the cancer types or subtypes for microarray gene expression data sets.


2019 ◽  
Vol 2019 ◽  
pp. 1-9 ◽  
Author(s):  
Anna Pačínková ◽  
Vlad Popovici

The dysfunction of the DNA mismatch repair system results in microsatellite instability (MSI). MSI plays a central role in the development of multiple human cancers. In colon cancer, despite being associated with resistance to 5-fluorouracil treatment, MSI is a favourable prognostic marker. In gastric and endometrial cancers, its prognostic value is not so well established. Nevertheless, recognising the MSI tumours may be important for predicting the therapeutic effect of immune checkpoint inhibitors. Several gene expression signatures were trained on microarray data sets to understand the regulatory mechanisms underlying microsatellite instability in colorectal cancer. A wealth of expression data already exists in the form of microarray data sets. However, the RNA-seq has become a routine for transcriptome analysis. A new MSI gene expression signature presented here is the first to be valid across two different platforms, microarrays and RNA-seq. In the case of colon cancer, its estimated performance was (i) AUC = 0.94, 95% CI = (0.90 – 0.97) on RNA-seq and (ii) AUC = 0.95, 95% CI = (0.92 – 0.97) on microarray. The 25-gene expression signature was also validated in two independent microarray colon cancer data sets. Despite being derived from colorectal cancer, the signature maintained good performance on RNA-seq and microarray gastric cancer data sets (AUC = 0.90, 95% CI = (0.85 – 0.94) and AUC = 0.83, 95% CI = (0.69 – 0.97), respectively). Furthermore, this classifier retained high concordance even when classifying RNA-seq endometrial cancers (AUC = 0.71, 95% CI = (0.62 – 0.81). These results indicate that the new signature was able to remove the platform-specific differences while preserving the underlying biological differences between MSI/MSS phenotypes in colon cancer samples.


Blood ◽  
2015 ◽  
Vol 126 (23) ◽  
pp. 2663-2663
Author(s):  
Matthew A Care ◽  
Stephen M Thirdborough ◽  
Andrew J Davies ◽  
Peter W.M. Johnson ◽  
Andrew Jack ◽  
...  

Abstract Purpose To assess whether comparative gene network analysis can reveal characteristic immune response signatures that predict clinical response in Diffuse large B-cell lymphoma (DLBCL). Background The wealth of available gene expression data sets for DLBCL and other cancer types provides a resource to define recurrent pathological processes at the level of gene expression and gene correlation neighbourhoods. This is of particular relevance in the context of cancer immune responses, where convergence onto common patterns may drive shared gene expression profiles. Where existing and novel immunotherapies harness the immune response for therapeutic benefit such responses may provide predictive biomarkers. Methods We independently analysed publically available DLBCL gene expression data sets and a wide compendium of gene expression data from diverse cancer types, and then asked whether common elements of cancer host response could be identified from resulting networks. Using 10 DLBCL gene expression data sets, encompassing 2030 cases, we established pairwise gene correlation matrices per data set, which were merged to generate median correlations of gene pairs across all data sets. Gene network analysis and unsupervised clustering was then applied to define global representations of DLBCL gene expression neighbourhoods. In parallel a diverse range of solid and lymphoid malignancies including; breast, colorectal, oesophageal, head and neck, non-small cell lung, prostate, pancreatic cancer, Hodgkin lymphoma, Follicular lymphoma and DLBCL were independently analysed using an orthogonal weighted gene correlation network analysis of gene expression data sets from which correlated modules across diverse cancer types were identified. The biology of resulting gene neighbourhoods was assessed by signature and ontology enrichment, and the overlap between gene correlation neighbourhoods and WGCNA derived modules associated with immune/host responses was analysed. Results Amongst DLBCL data, we identified distinct gene correlation neighbourhoods associated with the immune response. These included both elements of IFN-polarised responses, core T-cell, and cytotoxic signatures as well as distinct macrophage responses. Neighbourhoods linked to macrophages separated CD163 from CD68 and CD14. In the WGCNA analysis of diverse cancer types clusters corresponding to these immune response neighbourhoods were independently identified including a highly similar cluster related to CD163. The overlapping CD163 clusters in both analyses linked to diverse Fc-Receptors, complement pathway components and patterns of scavenger receptors potentially linked to alternative macrophage activation. The relationship between the CD163 macrophage gene expression cluster and outcome was tested in DLBCL data sets, identifying a poor response in CD163 -cluster high patients, which reached statistical significance in one data set (GSE10846). Notably, the effect of the CD163-associated gene neighbourhood which correlates with poor outcome post rituximab containing immunochemotherapy is distinct from the effect of IFNG-STAT1-IRF1 polarised cytotoxic responses. The latter represents the predominant immune response pattern separating cell of origin unclassifiable (Type-III) DLBCL from either ABC or GCB DLBCL subsets, and is associated with a trend toward positive outcome. Conclusion Comparative gene expression network analysis identifies common immune response signatures shared between DLBCL and other cancer types. Gene expression clusters linked to CD163 macrophage responses and IFNG-STAT1-IRF1 polarised cytotoxic responses are common patterns with apparent divergent outcome association. Disclosures Davies: CTI: Honoraria; GIlead: Consultancy, Honoraria, Research Funding; Mundipharma: Honoraria, Research Funding; Bayer: Research Funding; Takeda: Honoraria, Research Funding; Janssen: Honoraria, Research Funding; Roche: Honoraria, Research Funding; GSK: Research Funding; Pfizer: Honoraria; Celgene: Honoraria, Research Funding. Jack:Jannsen: Research Funding.


2019 ◽  
Author(s):  
Bidossessi Wilfried Hounkpe ◽  
Francine Chenou ◽  
Franciele Lima ◽  
Erich Vinicius de Paula

AbstractHousekeeping (HK) genes are constitutively expressed genes that are required for the maintenance of basic cellular functions. Despite their importance in the calibration of gene expression, as well as the understanding of many genomic and evolutionary features, important discrepancies have been observed in studies that previously identified these genes. Here, we present Housekeeping Transcript Atlas (HRT Atlas v1.0, www.housekeeping.unicamp.br) a web-based database which addresses some of the previously observed limitations in the identification of these genes, and offers a more accurate database of human and mouse HK genes and transcripts. The database was generated by mining massive human and mouse RNA-seq data sets, including 12,482 and 507 high-quality RNA-seq samples from 82 human non-disease tissues/cells and 15 healthy tissues/cells of C57BL/6 wild type mouse, respectively. User can visualize the expression and download lists of 2,158 human HK transcripts from 2,176 HK genes and 3,024 mouse HK transcripts from 3,277 mouse HK genes. HRT Atlas also offers the most stable and suitable tissue selective candidate reference transcripts for normalization of qPCR experiments. Specific primers and predicted modifiers of gene expression for some of these HK transcripts are also proposed. HRT Atlas has also been integrated with regulatory elements from Epiregio server. All of these resources can be accessed and downloaded from any computer or small device web browsers.


Sign in / Sign up

Export Citation Format

Share Document