Analyzing high‐dimensional gene expression and DNA methylation data with RHongmeiZhang, (2020). Chapman & Hall/CRC Press Mathematical and Computational Biology 202 pages, £59.99 (Paperback), £150.00 (Hardbound), £53.99 (e‐book). ISBN 9780367495169

Analyzing High-Dimensional Gene Expression and DNA Methylation Data with R

10.1201/9780429155192 ◽

2020 ◽

Author(s):

Hongmei Zhang

Keyword(s):

Gene Expression ◽

Dna Methylation ◽

High Dimensional ◽

Methylation Data

Download Full-text

Abstract P098: An Epigenome-wide Study of Obesity in African American Youth and Young Adults: Novel Findings, Replication in Neutrophils and Relationship With Gene Expression

Circulation ◽

10.1161/circ.135.suppl_1.p098 ◽

2017 ◽

Vol 135 (suppl_1) ◽

Author(s):

Xiaoling Wang ◽

Yue Pan ◽

Haidong Zhu ◽

Guang Hao ◽

Xin Wang ◽

...

Keyword(s):

Gene Expression ◽

Older Adults ◽

Dna Methylation ◽

Young Adults ◽

Methylation Data ◽

Middle Aged ◽

Cpg Sites ◽

Genome Wide ◽

Obesity Status ◽

Youth And Young Adults

Background: Several large-scale epigenome wide association studies on obesity-related DNA methylation changes have been published and in total identified 46 CpG sites. These studies were conducted in middle-aged and older adults of Caucasians and African Americans (AAs) using leukocytes. To what extend these signals are independent of cell compositions as well as to what extend they may influence gene expression have not been systematically investigated. Furthermore, the high prevalence of obesity comorbidities in middle-aged or older population may hide or bias obesity itself related DNA methylation changes. Methods: In this study of healthy AA youth and young adults, genome wide DNA methylation data from leukocytes were obtained from three independent studies: EpiGO study (96 obese cases vs. 92 lean controls, aged 14-21, 50% females, test of interest is obesity status), LACHY study (284 participants from general population, aged 14-18, 50% females, test of interest is BMI), and Georgia Stress and Heart study (298 participants from general population, aged 18-38, 52% females, test of interest is BMI) using the Infinium HumanMethylation450 BeadChip. Genome wide DNA methylation data from purified neutrophils as well as genome wide gene expression data from leukocytes using Illumina HT12 V4 array were also obtained for the EpiGO samples. Results: The meta-analysis on the 3 cohorts identified 76 obesity related CpG sites in leukocytes with p<1х10 -7 . Out of the 46 previously identified CpG sites, 36 can be replicated in this AA youth and young adult sample with same direction and p<0.05. Out of the 107 CpG sites including the 36 replicated ones and the 71 newly identified ones, 71 CpG sites (66%) had their relationship with obesity replicated in purified neutrophils (p<0.05). The analysis on the cis regulation of the 107 CpG sites on gene expression showed that 59 CpG sites had at least one gene within 250kb having expression difference between obese cases and lean controls. Furthermore, out of the 59 CpG sites, 6 showed significantly negative correlations and 1 showed significantly positive correlation with the differentially expressed genes. These CpG sites located in SOCS3, CISH, ABCG1, PIM3 and PTGDS genes. Conclusion: In this study of AA youth and young adults, we identified novel CpG sites associated with obesity and replicated majority of the CpG sites previously identified in middle-aged and older adults. For the first time, we showed that majority of the obesity related CpG sites identified from leukocytes are not driven by cell compositions and provided the direct link between DNA methylation-gene expression-obesity status for 7 CpG sites in 5 genes.

Download Full-text

The Expression Level of mRNA, Protein, and DNA Methylation Status of FOSL2 of Uyghur in XinJiang in Type 2 Diabetes

Journal of Diabetes Research ◽

10.1155/2016/5957404 ◽

2016 ◽

Vol 2016 ◽

pp. 1-7 ◽

Cited By ~ 2

Author(s):

Jun Li ◽

Siyuan Li ◽

Ying Hu ◽

Guolei Cao ◽

Siyao Wang ◽

...

Keyword(s):

Gene Expression ◽

Type 2 Diabetes ◽

Dna Methylation ◽

Methylation Status ◽

Methylation Data ◽

Biochemical Indicators ◽

Expression Levels ◽

Protein Levels ◽

Cpg Sites

Objective. We investigated the expression levels of both FOSL2 mRNA and protein as well as evaluating DNA methylation in the blood of type 2 diabetes mellitus (T2DM) Uyghur patients from Xinjiang. This study also evaluated whether FOSL2 gene expression had demonstrated any associations with clinical and biochemical indicators of T2DM. Methods. One hundred Uyghur subjects where divided into two groups, T2DM and nonimpaired glucose tolerance (NGT) groups. DNA methylation of FOSL2 was also analyzed by MassARRAY Spectrometry and methylation data of individual units were generated by the EpiTyper v1.0.5 software. The expression levels of FOS-like antigen 2 (FOSL2) and the protein expression levels were analyzed. Results. Significant differences were observed in mRNA and protein levels when compared with the NGT group, while methylation rates of eight CpG units within the FOSL2 gene were higher in the T2DM group. Methylation of CpG sites was found to inversely correlate with expression of other markers. Conclusions. Results show that a correlation between mRNA, protein, and DNA methylation of FOSL2 gene exists among T2DM patients from Uyghur. FOSL2 protein and mRNA were downregulated and the DNA became hypermethylated, all of which may be involved in T2DM pathogenesis in this population.

Download Full-text

Uncovering Driver DNA Methylation Events in Nonsmoking Early Stage Lung Adenocarcinoma

BioMed Research International ◽

10.1155/2016/2090286 ◽

2016 ◽

Vol 2016 ◽

pp. 1-10 ◽

Cited By ~ 2

Author(s):

Xindong Zhang ◽

Lin Gao ◽

Zhi-Ping Liu ◽

Songwei Jia ◽

Luonan Chen

Keyword(s):

Gene Expression ◽

Dna Methylation ◽

Lung Adenocarcinoma ◽

Early Stage ◽

Methylation Data ◽

Driver Genes ◽

Differential Network Analysis ◽

Differential Network ◽

Aberrant Dna Methylation ◽

Never Smokers

As smoking rates decrease, proportionally more cases with lung adenocarcinoma occur in never-smokers, while aberrant DNA methylation has been suggested to contribute to the tumorigenesis of lung adenocarcinoma. It is extremely difficult to distinguish which genes play key roles in tumorigenic processes via DNA methylation-mediated gene silencing from a large number of differentially methylated genes. By integrating gene expression and DNA methylation data, a pipeline combined with the differential network analysis is designed to uncover driver methylation genes and responsive modules, which demonstrate distinctive expressions and network topology in tumors with aberrant DNA methylation. Totally, 135 genes are recognized as candidate driver genes in early stage lung adenocarcinoma and top ranked 30 genes are recognized as driver methylation genes. Functional annotation and the differential network analysis indicate the roles of identified driver genes in tumorigenesis, while literature study reveals significant correlations of the top 30 genes with early stage lung adenocarcinoma in never-smokers. The analysis pipeline can also be employed in identification of driver epigenetic events for other cancers characterized by matched gene expression data and DNA methylation data.

Download Full-text

Comprehensive analysis of gene expression and DNA methylation data identifies potential biomarkers and functional epigenetic modules for lung adenocarcinoma

Genetics and Molecular Biology ◽

10.1590/1678-4685-gmb-2019-0164 ◽

2020 ◽

Vol 43 (3) ◽

Author(s):

XiaoCong Wang ◽

YanMei Li ◽

HuiHua Hu ◽

FangZheng Zhou ◽

Jie Chen ◽

...

Keyword(s):

Gene Expression ◽

Dna Methylation ◽

Lung Adenocarcinoma ◽

Comprehensive Analysis ◽

Methylation Data ◽

Potential Biomarkers

Download Full-text

Penalized logistic regression based on L1/2 penalty for high-dimensional DNA methylation data

Technology and Health Care ◽

10.3233/thc-209016 ◽

2020 ◽

Vol 28 ◽

pp. 161-171

Author(s):

Hong-Kun Jiang ◽

Yong Liang

Keyword(s):

Dna Methylation ◽

Logistic Regression ◽

High Dimensional ◽

Methylation Data ◽

Penalized Logistic Regression

Download Full-text

Integration of DNA Methylation Data and Gene Expression Data for Prostate Adenocarcinoma: A Proof of Concept

Current Bioinformatics ◽

10.2174/1574893612666170328171106 ◽

2017 ◽

Vol 12 (5) ◽

Cited By ~ 1

Author(s):

Arpit Singh ◽

Razia Rahman ◽

Yasha Hasija

Keyword(s):

Gene Expression ◽

Dna Methylation ◽

Gene Expression Data ◽

Prostate Adenocarcinoma ◽

Methylation Data ◽

Expression Data ◽

Proof Of Concept

Download Full-text

DNA Methylation Module Network-Based Prognosis and Molecular Typing of Cancer

Genes ◽

10.3390/genes10080571 ◽

2019 ◽

Vol 10 (8) ◽

pp. 571 ◽

Cited By ~ 4

Author(s):

Ze-Jia Cui ◽

Xiong-Hui Zhou ◽

Hong-Yu Zhang

Keyword(s):

Gene Expression ◽

Dna Methylation ◽

Molecular Typing ◽

Core Gene ◽

Cancer Prognosis ◽

Methylation Data ◽

Expression Data ◽

Module Network ◽

The Core ◽

Gene Modules

Achieving cancer prognosis and molecular typing is critical for cancer treatment. Previous studies have identified some gene signatures for the prognosis and typing of cancer based on gene expression data. Some studies have shown that DNA methylation is associated with cancer development, progression, and metastasis. In addition, DNA methylation data are more stable than gene expression data in cancer prognosis. Therefore, in this work, we focused on DNA methylation data. Some prior researches have shown that gene modules are more reliable in cancer prognosis than are gene signatures and that gene modules are not isolated. However, few studies have considered cross-talk among the gene modules, which may allow some important gene modules for cancer to be overlooked. Therefore, we constructed a gene co-methylation network based on the DNA methylation data of cancer patients, and detected the gene modules in the co-methylation network. Then, by permutation testing, cross-talk between every two modules was identified; thus, the module network was generated. Next, the core gene modules in the module network of cancer were identified using the K-shell method, and these core gene modules were used as features to study the prognosis and molecular typing of cancer. Our method was applied in three types of cancer (breast invasive carcinoma, skin cutaneous melanoma, and uterine corpus endometrial carcinoma). Based on the core gene modules identified by the constructed DNA methylation module networks, we can distinguish not only the prognosis of cancer patients but also use them for molecular typing of cancer. These results indicated that our method has important application value for the diagnosis of cancer and may reveal potential carcinogenic mechanisms.

Download Full-text

MLW-gcForest: a multi-weighted gcForest model towards the staging of lung adenocarcinoma based on multi-modal genetic data

BMC Bioinformatics ◽

10.1186/s12859-019-3172-z ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 1

Author(s):

Yunyun Dong ◽

Wenkai Yang ◽

Jiawen Wang ◽

Juanjuan Zhao ◽

Yan Qiang ◽

...

Keyword(s):

Gene Expression ◽

Lung Adenocarcinoma ◽

Copy Number ◽

Genetic Data ◽

Small Samples ◽

High Dimensional ◽

Methylation Data ◽

Rna Seq ◽

Number Variation ◽

Staging Model

Abstract Background Lung cancer is one of the most common types of cancer, among which lung adenocarcinoma accounts for the largest proportion. Currently, accurate staging is a prerequisite for effective diagnosis and treatment of lung adenocarcinoma. Previous research has used mainly single-modal data, such as gene expression data, for classification and prediction. Integrating multi-modal genetic data (gene expression RNA-seq, methylation data and copy number variation) from the same patient provides the possibility of using multi-modal genetic data for cancer prediction. A new machine learning method called gcForest has recently been proposed. This method has been proven to be suitable for classification in some fields. However, the model may face challenges when applied to small samples and high-dimensional genetic data. Results In this paper, we propose a multi-weighted gcForest algorithm (MLW-gcForest) to construct a lung adenocarcinoma staging model using multi-modal genetic data. The new algorithm is based on the standard gcForest algorithm. First, different weights are assigned to different random forests according to the classification performance of these forests in the standard gcForest model. Second, because the feature vectors generated under different scanning granularities have a diverse influence on the final classification result, the feature vectors are given weights according to the proposed sorting optimization algorithm. Then, we train three MLW-gcForest models based on three single-modal datasets (gene expression RNA-seq, methylation data, and copy number variation) and then perform decision fusion to stage lung adenocarcinoma. Experimental results suggest that the MLW-gcForest model is superior to the standard gcForest model in constructing a staging model of lung adenocarcinoma and is better than the traditional classification methods. The accuracy, precision, recall, and AUC reached 0.908, 0.896, 0.882, and 0.96, respectively. Conclusions The MLW-gcForest model has great potential in lung adenocarcinoma staging, which is helpful for the diagnosis and personalized treatment of lung adenocarcinoma. The results suggest that the MLW-gcForest algorithm is effective on multi-modal genetic data, which consist of small samples and are high dimensional.

Download Full-text

New variable selection strategy for analysis of high-dimensional DNA methylation data

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720018500105 ◽

2018 ◽

Vol 16 (04) ◽

pp. 1850010 ◽

Cited By ~ 2

Author(s):

Jiyun Choi ◽

Kipoong Kim ◽

Hokeun Sun

Keyword(s):

Dna Methylation ◽

Variable Selection ◽

Group Structure ◽

Association Studies ◽

High Dimensional ◽

Selection Strategy ◽

Methylation Data ◽

Selection Probability ◽

Cancer Data ◽

Cpg Sites

In genetic association studies, regularization methods are often used due to their computational efficiency for analysis of high-dimensional genomic data. DNA methylation data generated from Infinium HumanMethylation450 BeadChip Kit have a group structure where an individual gene consists of multiple Cytosine–phosphate–Guanine (CpG) sites. Consequently, group-based regularization can precisely detect outcome-related CpG sites. Representative examples are sparse group lasso (SGL) and network-based regularization. The former is powerful when most of the CpG sites within the same gene are associated with a phenotype outcome. In contrast, the latter is preferred when only a few of the CpG sites within the same gene are related to the outcome. In this paper, we propose new variable selection strategy based on a selection probability that measures selection frequency of individual variables selected by both SGL and network-based regularization. In extensive simulation study, we demonstrated that the proposed strategy can show relatively outstanding selection performance under any situation, compared with both SGL and network-based regularization. Also, we applied the proposed strategy to identify differentially methylated CpG sites and their corresponding genes from ovarian cancer data.

Download Full-text