Leveraging collective regulatory effects of long-range DNA methylations to predict gene expressions and estimate their effects on phenotypes in cancer

Mapping Intimacies ◽

10.1101/472589 ◽

2018 ◽

Cited By ~ 1

Author(s):

Soyeon Kim ◽

Hyun Jung Park ◽

Xiangqin Cui ◽

Degui Zhi

Keyword(s):

Gene Expression ◽

Long Range ◽

Estrogen Receptor Status ◽

The Cancer Genome Atlas ◽

Collective Effects ◽

Statistical Machine Learning ◽

Promoter Regions ◽

Gene Expressions ◽

Cancer Data ◽

Clinical Phenotypes

ABSTRACTDNA methylation of various genomic regions plays an important role in regulating gene expression in diverse biological contexts. However, most genome-wide studies have focused on the effect of 1) methylation in cis, not in trans and 2) a single CpG, not the collective effects of multiple CpGs, on gene expression. In this study, we developed a statistical machine learning model, geneEXPLORER (geneexpression prediction by long-range epigenetic regulation), that quantifies the collective effects of both cis- and trans- methylations on gene expression. By applying geneEXPLORER to The Cancer Genome Atlas (TCGA) breast and lung cancer data, we found that most genes are affected by methylations of as much as 10Mb from promoter regions or more, and the long-range methylation explains 50% of the variation in gene expression on average, far greater than cis-methylation. The highly predictive genes are related to breast cancer, especially oncogenes and suppressor genes. Further, the predicted gene expressions could predict clinical phenotypes such as breast tumor status and estrogen receptor status (AUC=0.999, 0.94 respectively) as accurately as the measured gene expression levels. These results suggest that geneEXPLORER provides a means for accurate imputation of gene expression, which can be further used to predict clinical phenotypes.

Download Full-text

Mining The Cancer Genome Atlas gene expression data for lineage markers in distinguishing bladder urothelial carcinoma and prostate adenocarcinoma

Scientific Reports ◽

10.1038/s41598-021-85993-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Ewe Seng Ch’ng

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

The Cancer Genome Atlas ◽

Relative Importance ◽

Expression Data ◽

Gene Expressions ◽

Urothelial Carcinomas ◽

Cancer Genome Atlas ◽

Lineage Markers ◽

Genome Atlas

AbstractDistinguishing bladder urothelial carcinomas from prostate adenocarcinomas for poorly differentiated carcinomas derived from the bladder neck entails the use of a panel of lineage markers to help make this distinction. Publicly available The Cancer Genome Atlas (TCGA) gene expression data provides an avenue to examine utilities of these markers. This study aimed to verify expressions of urothelial and prostate lineage markers in the respective carcinomas and to seek the relative importance of these markers in making this distinction. Gene expressions of these markers were downloaded from TCGA Pan-Cancer database for bladder and prostate carcinomas. Differential gene expressions of these markers were analyzed. Standard linear discriminant analyses were applied to establish the relative importance of these markers in lineage determination and to construct the model best in making the distinction. This study shows that all urothelial lineage genes except for the gene for uroplakin III were significantly expressed in bladder urothelial carcinomas (p < 0.001). In descending order of importance to distinguish from prostate adenocarcinomas, genes for uroplakin II, S100P, GATA3 and thrombomodulin had high discriminant loadings (> 0.3). All prostate lineage genes were significantly expressed in prostate adenocarcinomas(p < 0.001). In descending order of importance to distinguish from bladder urothelial carcinomas, genes for NKX3.1, prostate specific antigen (PSA), prostate-specific acid phosphatase, prostein, and prostate-specific membrane antigen had high discriminant loadings (> 0.3). Combination of gene expressions for uroplakin II, S100P, NKX3.1 and PSA approached 100% accuracy in tumor classification both in the training and validation sets. Mining gene expression data, a combination of four lineage markers helps distinguish between bladder urothelial carcinomas and prostate adenocarcinomas.

Download Full-text

Collective effects of long-range DNA methylations predict gene expressions and estimate phenotypes in cancer

Scientific Reports ◽

10.1038/s41598-020-60845-2 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Soyeon Kim ◽

Hyun Jung Park ◽

Xiangqin Cui ◽

Degui Zhi

Keyword(s):

Long Range ◽

Collective Effects ◽

Gene Expressions ◽

Dna Methylations

Download Full-text

Multiple Suboptimal Solutions for Prediction Rules in Gene Expression Data

Computational and Mathematical Methods in Medicine ◽

10.1155/2013/798189 ◽

2013 ◽

Vol 2013 ◽

pp. 1-14

Author(s):

Osamu Komori ◽

Mari Pritchard ◽

Shinto Eguchi

Keyword(s):

Gene Expression ◽

High Performance ◽

Published Data ◽

Learning Approaches ◽

Statistical Machine Learning ◽

Mutual Coherence ◽

Gene Expressions ◽

Microarray Gene Expression ◽

Analysis Methods ◽

Microarray Gene

This paper discusses mathematical and statistical aspects in analysis methods applied to microarray gene expressions. We focus on pattern recognition to extract informative features embedded in the data for prediction of phenotypes. It has been pointed out that there are severely difficult problems due to the unbalance in the number of observed genes compared with the number of observed subjects. We make a reanalysis of microarray gene expression published data to detect many other gene sets with almost the same performance. We conclude in the current stage that it is not possible to extract only informative genes with high performance in the all observed genes. We investigate the reason why this difficulty still exists even though there are actively proposed analysis methods and learning algorithms in statistical machine learning approaches. We focus on the mutual coherence or the absolute value of the Pearson correlations between two genes and describe the distributions of the correlation for the selected set of genes and the total set. We show that the problem of finding informative genes in high dimensional data is ill-posed and that the difficulty is closely related with the mutual coherence.

Download Full-text

The prognostic potential of alternative transcript isoforms across human tumors

10.1101/036947 ◽

2016 ◽

Author(s):

Juan L. Trincado ◽

E. Sebestyén ◽

A. Pagés ◽

E. Eyras

Keyword(s):

Gene Expression ◽

Estrogen Receptor ◽

Lymph Node ◽

Cancer Progression ◽

Estrogen Receptor Status ◽

Breast Tumors ◽

The Cancer Genome Atlas ◽

Alternative Transcript ◽

Transcript Isoforms ◽

Receptor Status

AbstractBackgroundPhenotypic changes during cancer progression are associated to alterations in gene expression, which can be exploited to build molecular signatures for tumor stage identification and prognosis. However, it is not yet known whether the relative abundance of transcript isoforms may be informative for clinical stage and survival.MethodsUsing information theory and machine learning methods, we integrated RNA sequencing and clinical data from The Cancer Genome Atlas project to perform the first systematic analysis of the prognostic potential of transcript isoforms in 12 solid tumors to build new predictive signatures for stage and prognosis. This study was also performed in breast tumors according to estrogen receptor status and melanoma tumors with proliferative and invasive phenotypes.ResultsTranscript isoform signatures accurately separate early from late stage and metastatic from non-metastatic tumors, and are predictive of the survival of patients with undetermined lymph node invasion or metastatic status. These signatures show similar, and sometimes better, accuracies compared with known gene expression signatures, and are largely independent of gene expression changes. Furthermore, we show frequent transcript isoform changes in breast tumors according to estrogen receptor status, and in melanoma tumors according to the invasive or proliferative phenotype, and derive accurate predictive models of stage and survival within each patient subgroup.ConclusionsOur analyses reveal new signatures based on transcript isoform abundances that characterize tumor phenotypes and their progression independently of gene expression. Transcript isoform signatures appear especially relevant to determine lymph node invasion and metastasis, and may potentially contribute towards current strategies of precision cancer medicine.

Download Full-text

Longitudinal Analysis of Gene Expression Changes During Cervical Carcinogenesis Reveals Potential Therapeutic Targets

Evolutionary Bioinformatics ◽

10.1177/1176934320920574 ◽

2020 ◽

Vol 16 ◽

pp. 117693432092057

Author(s):

Lijun Yu ◽

Meiyan Wei ◽

Fengyan Li

Keyword(s):

Gene Expression ◽

Western Blotting ◽

Protein Interactions ◽

Target Genes ◽

Quantitative Polymerase Chain Reaction ◽

Enrichment Analysis ◽

Gene Interaction ◽

Gene Expression Omnibus ◽

The Cancer Genome Atlas ◽

Gene Expressions

Despite advances in the treatment of cervical cancer (CC), the prognosis of patients with CC remains to be improved. This study aimed to explore candidate gene targets for CC. CC datasets were downloaded from the Gene Expression Omnibus database. Genes with similar expression trends in varying steps of CC development were clustered using Short Time-series Expression Miner (STEM) software. Gene functions were then analyzed using the Gene Ontology (GO) database and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis. Protein interactions among genes of interest were predicted, followed by drug-target genes and prognosis-associated genes. The expressions of the predicted genes were determined using real-time quantitative polymerase chain reaction (RT-qPCR) and Western blotting. Red and green profiles with upward and downward gene expressions, respectively, were screened using STEM software. Genes with increased expression were significantly enriched in DNA replication, cell-cycle-related biological processes, and the p53 signaling pathway. Based on the predicted results of the Drug-Gene Interaction database, 17 drug-gene interaction pairs, including 3 red profile genes (TOP2A, RRM2, and POLA1) and 16 drugs, were obtained. The Cancer Genome Atlas data analysis showed that high POLA1 expression was significantly correlated with prolonged survival, indicating that POLA1 is protective against CC. RT-qPCR and Western blotting showed that the expressions of TOP2A, RRM2, and POLA1 gradually increased in the multistep process of CC. TOP2A, RRM2, and POLA1 may be targets for the treatment of CC. However, many studies are needed to validate our findings.

Download Full-text

High-dimensional covariance matrices tests for analyzing multi-tumor gene expression data

Statistical Methods in Medical Research ◽

10.1177/09622802211009257 ◽

2021 ◽

pp. 096228022110092

Author(s):

Abdullah Qayed ◽

Dong Han

Keyword(s):

Gene Expression ◽

Empirical Studies ◽

The Cancer Genome Atlas ◽

Covariance Structures ◽

Gene Expressions ◽

Multiple Tumor ◽

Gene Sets ◽

Cancer Genome Atlas ◽

Tumor Types ◽

Tumor Gene Expression

By collecting multiple sets per subject in microarray data, gene sets analysis requires characterize intra-subject variation using gene expression profiling. For each subject, the data can be written as a matrix with the different subsets of gene expressions (e.g. multiple tumor types) indexing the rows and the genes indexing the columns. To test the assumption of intra-subject (tumor) variation, we present and perform tests of multi-set sphericity and multi-set identity of covariance structures across subjects (tumor types). We demonstrate by both theoretical and empirical studies that the tests have good properties. We applied the proposed tests on The Cancer Genome Atlas (TCGA) and tested covariance structures for the gene expressions across several tumor types.

Download Full-text

Integrative analysis of DNA methylation and gene expression profiles identified potential breast cancer-specific diagnostic markers

Bioscience Reports ◽

10.1042/bsr20201053 ◽

2020 ◽

Vol 40 (5) ◽

Author(s):

Xinhua Liu ◽

Yonglin Peng ◽

Ju Wang

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Dna Methylation ◽

Expression Profiles ◽

Gene Expression Profiles ◽

The Cancer Genome Atlas ◽

Diagnostic Model ◽

Breast Cancer Patients ◽

Gene Expressions ◽

Normal Tissues

Abstract Breast cancer is a common malignant tumor among women whose prognosis is largely determined by the period and accuracy of diagnosis. We here propose to identify a robust DNA methylation-based breast cancer-specific diagnostic signature. Genome-wide DNA methylation and gene expression profiles of breast cancer patients along with their adjacent normal tissues from the Cancer Genome Atlas (TCGA) were obtained as the training set. CpGs that with significantly elevated methylation level in breast cancer than not only their adjacent normal tissues and the other ten common cancers from TCGA but also the healthy breast tissues from the Gene Expression Omnibus (GEO) were finally remained for logistic regression analysis. Another independent breast cancer DNA methylation dataset from GEO was used as the testing set. Lots of CpGs were hyper-methylated in breast cancer samples compared with adjacent normal tissues, which tend to be negatively correlated with gene expressions. Eight CpGs located at RIIAD1, ENPP2, ESPN, and ETS1, were finally retained. The diagnostic model was reliable in separating BRCA from normal samples. Besides, chromatin accessibility status of RIIAD1, ENPP2, ESPN and ETS1 showed great differences between MCF-7 and MDA-MB-231 cell lines. In conclusion, the present study should be helpful for breast cancer early and accurate diagnosis.

Download Full-text

Identification of Deregulated Transcription Factors Involved in Specific Bladder Cancer Subtypes

10.29007/v7qj ◽

2020 ◽

Author(s):

Magali Champion ◽

Julien Chiquet ◽

Pierre Neuvial ◽

Mohamed Elati ◽

François Radvanyi ◽

...

Keyword(s):

Gene Expression ◽

Bladder Cancer ◽

Transcription Factor ◽

Transcription Factors ◽

Target Genes ◽

The Cancer Genome Atlas ◽

Reference Network ◽

Data Set ◽

Cancer Subtypes ◽

Cancer Data

Comparison between tumoral and healthy cells may reveal abnormal regulation behaviors between a transcription factor and the genes it regulates, without exhibiting differential expression of the former genes. We propose a methodology for the identification of transcription factors involved in the deregulation of genes in tumoral cells. This strategy is based on the inference of a reference gene regulatory network that connects transcription factors to their downstream targets using gene expression data. Gene expression levels in tumor samples are then carefully compared to this reference network to detect deregulated target genes. A linear model is finally used to measure the ability of each transcription factor to explain these deregulations. We assess the performance of our method by numerical experiments on a public bladder cancer data set derived from the Cancer Genome Atlas project. We identify genes known for their implication in the development of specific bladder cancer subtypes as well as new potential biomarkers.

Download Full-text

Molecular Network-Based Drug Prediction in Thyroid Cancer

International Journal of Molecular Sciences ◽

10.3390/ijms20020263 ◽

2019 ◽

Vol 20 (2) ◽

pp. 263 ◽

Cited By ~ 2

Author(s):

Xingyu Xu ◽

Haixia Long ◽

Baohang Xi ◽

Binbin Ji ◽

Zejun Li ◽

...

Keyword(s):

Gene Expression ◽

Thyroid Cancer ◽

Hormone Secretion ◽

The Cancer Genome Atlas ◽

Expression Level ◽

Literature Mining ◽

Drug Selection ◽

Exact Test ◽

Cancer Data ◽

Receptor Localization

As a common malignant tumor disease, thyroid cancer lacks effective preventive and therapeutic drugs. Thus, it is crucial to provide an effective drug selection method for thyroid cancer patients. The connectivity map (CMAP) project provides an experimental validated strategy to repurpose and optimize cancer drugs, the rationale behind which is to select drugs to reverse the gene expression variations induced by cancer. However, it has a few limitations. Firstly, CMAP was performed on cell lines, which are usually different from human tissues. Secondly, only gene expression information was considered, while the information about gene regulations and modules/pathways was more or less ignored. In this study, we first measured comprehensively the perturbations of thyroid cancer on a patient including variations at gene expression level, gene co-expression level and gene module level. After that, we provided a drug selection pipeline to reverse the perturbations based on drug signatures derived from tissue studies. We applied the analyses pipeline to the cancer genome atlas (TCGA) thyroid cancer data consisting of 56 normal and 500 cancer samples. As a result, we obtained 812 up-regulated and 213 down-regulated genes, whose functions are significantly enriched in extracellular matrix and receptor localization to synapses. In addition, a total of 33,778 significant differentiated co-expressed gene pairs were found, which form a larger module associated with impaired immune function and low immunity. Finally, we predicted drugs and gene perturbations that could reverse the gene expression and co-expression changes incurred by the development of thyroid cancer through the Fisher’s exact test. Top predicted drugs included validated drugs like baclofen, nevirapine, glucocorticoid, formaldehyde and so on. Combining our analyses with literature mining, we inferred that the regulation of thyroid hormone secretion might be closely related to the inhibition of the proliferation of thyroid cancer cells.

Download Full-text

Correlation of QNBC immune and basal signature to targeted therapies and race.

Journal of Clinical Oncology ◽

10.1200/jco.2017.35.15_suppl.e14537 ◽

2017 ◽

Vol 35 (15_suppl) ◽

pp. e14537-e14537

Author(s):

Raymond Wesley Hughley ◽

Windy Marie Dean-Colomb ◽

Shweta Tripathi ◽

Melissa Davis ◽

Clayton Clopton Yates

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Targeted Therapies ◽

Poor Prognosis ◽

The Cancer Genome Atlas ◽

Gene Expressions ◽

Cancer Subtypes ◽

Caucasian Americans ◽

Therapy Targets ◽

Significance Threshold

e14537 Background: Triple Negative Breast Cancer has been identified as one of the most aggressive breast cancer subtypes. We previously identified a subtype of TNBC, which lacks the androgen receptor (AR), which we coined quadriple negative breast cancer (QNBC). We have shown that women with QNBC have a worse prognosis compared to those with just TNBC. Additionally, we have noted that African American (AA) women with TNBC have a greater chance of having QNBC (28%) and a higher rate of disease progression (25%) vs Caucasian Americans (CA). We also found these QNBC patients have a unique immune signature that correlates to the more aggressive basal subtype. Here, we explore individual genes and their correlation to race and immune therapies. Methods: We analyzed mRNA expression in 925 tumors from CA or AA women from The Cancer Genome Atlas. Samples were dichotomized as positive or negative using mean gene expression cutoff. Gene expression comparisons among groups were determined using standard t-tests, ANOVA, and odds ratios with a significance threshold alpha of 0.05. Therapy targets were selected from the list of significantly differentiated genes. Boxplots and heatmaps were made using R packages. Cumulative Incidence plots for time to progression were constructed using XLStats and Grays test for significance. Results: QNBC CA have higher expression of PDCD4 (p < .001), CD46 ( p < .001), CD3EAP ( p = .04) and CD84 ( p = .07) compared to QNBC AA. CD47 ( p = .63) and PD1 (p = .96) express no significant differences in race. QNBC AA show higher expression of NFKB ( p < .001), CCL2 ( p < .001), and IL12 ( p < .001) compared to QNBC CAs. Interestingly, QNBCs with high expressions of CD47 ( p = .04) have poor prognosis. Thus, QNBC patients with low expression of CD46 (p = .025), PDCD4 ( p < .001), and CD3EAP ( p < .001) have poor prognosis and overall survival. Conclusions: Current immune targeted therapies like CD47 and PD1 looks favorable for treatment in some QNBC patients. Race appears to effect other major therapy target gene expressions in the QNBC immune basal signature as well. This may imply that QNBC AA and CA have different immune response which may explain racial progression differences. Assessment of immune markers may contribute to more accurate prognosis.

Download Full-text