Correcting for cell-type composition bias in epigenome-wide association studies

In epigenome-wide association studies analysing DNA methylation from samples containing multiple cell types, it is essential to adjust the analysis for cell type composition. One well established strategy for achieving this is reference-based cell type deconvolution, which relies on knowledge of the DNA methylation profiles of purified constituent cell types. These are then used to estimate the cell type proportions of each sample, which can then be incorporated to adjust the association analysis. Bronchoalveolar lavage is commonly used to sample the lung in clinical practice and contains a mixture of different cell types that can vary in proportion across samples, affecting the overall methylation profile. A current barrier to the use of bronchoalveolar lavage in DNA methylation-based research is the lack of reference DNA methylation profiles for each of the constituent cell types, thus making reference-based cell composition estimation difficult. Herein, we use bronchoalveolar lavage samples collected from children with cystic fibrosis to define DNA methylation profiles for the four most common and clinically relevant cell types: alveolar macrophages, granulocytes, lymphocytes and alveolar epithelial cells. We then demonstrate the use of these methylation profiles in conjunction with an established reference-based methylation deconvolution method to estimate the cell type composition of two different tissue types; a publicly available dataset derived from artificial blood-based cell mixtures and further bronchoalveolar lavage samples. The reference DNA methylation profiles developed in this work can be used for future reference-based cell type composition estimation of bronchoalveolar lavage. This will facilitate the use of this tissue in studies examining the role of DNA methylation in lung health and disease.

Download Full-text

Estimating cell-type-specific DNA methylation effects in heterogeneous cellular populations

Epigenomics ◽

10.2217/epi-2020-0147 ◽

2020 ◽

Author(s):

Yen-Chen A Feng ◽

Yichen Guo ◽

Lucile Pain ◽

G Mark Lathrop ◽

Catherine Laprise ◽

...

Keyword(s):

Association Studies ◽

Methylation Pattern ◽

Cell Type ◽

Specific Effects ◽

Cell Type Composition ◽

Type Composition ◽

Cell Groups ◽

Reliable Performance ◽

Cell Type Specific ◽

Monte Carlo Expectation Maximization

Aim: To develop a method for estimating cell-specific effects in epigenomic association studies in the presence of cell type heterogeneity. Materials & methods: We utilized Monte Carlo Expectation-Maximization (MCEM) algorithm with Metropolis–Hastings sampler to reconstruct the ‘missing’ cell-specific methylations and to estimate their associations with phenotypes free of confounding by cell type proportions. Results: Simulations showed reliable performance of the method under various settings including when the cell type is rare. Application to a real dataset recapitulated the directly measured cell-specific methylation pattern in whole blood. Conclusion: This work provides a framework to identify important cell groups and account for cell type composition useful for studying the role of epigenetic changes in human traits and diseases.

Download Full-text

Edge sparse PCA based on gene network for correcting cell type heterogeneity in epigenome-wide association studies

10.21203/rs.3.rs-52780/v1 ◽

2020 ◽

Author(s):

Miao Rui ◽

Dang Qi ◽

Huang Hai Hui ◽

Xia Liang Yong ◽

Yong Liang

Keyword(s):

Network Structure ◽

Gene Network ◽

Association Studies ◽

Cell Counting ◽

Cell Type ◽

Methylation Site ◽

Sparse Pca ◽

False Discovery ◽

Cell Type Composition ◽

Type Composition

Abstract Background: In epigenome-wide association studies (EWAS), the mixed methylation expression caused by the combination of different cell types may lead the researchers to find the false methylation site related to the phenotype of interest. In order to fix this problem, researchers have proposed some non-reference methods based on sparse principle component analysis (PCA) to correct the EWAS false discovery. However, the existing model assumes that all methylation site have the same a priori probability in each PC load, but it is known that there already has network structure in the genetic variable corresponding to the methylation site. In this paper, we show that the results of the existing EWAS correction model are still not good enough. If we can integrate the existing methylation network as prior knowledge into the sparse PCA model, we can effectively improve the correction ability of the existing model. Result: Based on the above ideas, we propose GN-ReFAEWAS, a model which uses the prior methylation gene network structure into the PCA framework for feature extraction. This model can be used to correct the false discovery in EWAS. GN-ReFAEWAS model does not need cell counting data and can estimate cell type composition through methylation principal component data. The key of this model is to solve a sparse regularize problem of methylation network. This paper uses regularize and random sampling algorithm to solve this problem. We used one simulated data set and three real data sets for experiments and compared four existing EWAS calibration models. The experimental results show that the GN-ReFAEWAS model is superior to existing models. Conclusion: The result proved that GN-ReFAEWAS model can provide a better estimation of cell-type composition and reduce the false positives in EWAS.

Download Full-text

Correcting for cell-type heterogeneity in epigenome-wide association studies: premature analyses and conclusions

10.1101/121533 ◽

2017 ◽

Author(s):

Shijie C Zheng ◽

Stephan Beck ◽

Andrew E. Jaffe ◽

Devin C. Koestler ◽

Kasper D. Hansen ◽

...

Keyword(s):

Statistical Analysis ◽

State Of The Art ◽

Association Studies ◽

Free Cell ◽

Deconvolution Method ◽

Cell Type ◽

Cell Type Composition ◽

Type Composition ◽

Rigorous Statistical Analysis ◽

Constructive Criticism

AbstractRecently, a study by Rahmani et al [1] claimed that a reference-free cell-type deconvolution method, called ReFACTor, leads to improved power and improved estimates of cell-type composition compared to competing reference-free and reference-based methods in the context of Epigenome-Wide Association Studies (EWAS). However, we identified many critical flaws (both conceptual and statistical in nature), which seriously question the validity of their claims. We outlined constructive criticism in a recent correspondence letter, Zheng et al [2]. The purpose of this letter is two-fold. First, to present additional analyses, which demonstrate that our original criticism is statistically sound. Second, to highlight additional serious concerns, which Rahmani et al have not yet addressed. In summary, we find that ReFACTor has not been demonstrated to outperform state-of-the-art reference-free methods such as SVA or RefFreeEWAS, nor state-of-the-art reference-based methods. Thus, the claim by Rahmani et al (a claim reiterated in their recent response letter [3]) that ReFACT or represents an advance over the state-of-the-art is not supported by an objective and rigorous statistical analysis of the data.

Download Full-text

EpiDISH web server: Epigenetic Dissection of Intra-Sample-Heterogeneity with online GUI

Bioinformatics ◽

10.1093/bioinformatics/btz833 ◽

2019 ◽

Cited By ~ 2

Author(s):

Shijie C Zheng ◽

Charles E Breeze ◽

Stephan Beck ◽

Danyue Dong ◽

Tianyu Zhu ◽

...

Keyword(s):

Web Application ◽

Association Studies ◽

Web Server ◽

Cell Types ◽

R Package ◽

Cell Type ◽

Cell Type Composition ◽

Type Composition ◽

User Friendly ◽

Sample Heterogeneity

Abstract Summary It is well recognized that cell-type heterogeneity hampers the interpretation of Epigenome-Wide Association Studies (EWAS). Many tools have emerged to address this issue, including several R/Bioconductor packages that infer cell-type composition. Here we present a web application for cell-type deconvolution, which offers the functionality of our EpiDISH Bioconductor/R package in a user-friendly GUI environment. Users can upload their data to infer cell-type composition and differentially methylated cytosines in individual cell-types (DMCTs) for a range of different tissues. Availability and implementation EpiDISH web server is implemented with Shiny in R, and is freely available at https://www.biosino.org/EpiDISH/.

Download Full-text

Epigenome-wide association studies without the need for cell-type composition

Nature Methods ◽

10.1038/nmeth.2815 ◽

2014 ◽

Vol 11 (3) ◽

pp. 309-311 ◽

Cited By ~ 150

Author(s):

James Zou ◽

Christoph Lippert ◽

David Heckerman ◽

Martin Aryee ◽

Jennifer Listgarten

Keyword(s):

Association Studies ◽

Cell Type ◽

Cell Type Composition ◽

Type Composition

Download Full-text

Faculty Opinions recommendation of Epigenome-wide association studies without the need for cell-type composition.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.718272350.793490839 ◽

2014 ◽

Author(s):

Stephan Beck

Keyword(s):

Association Studies ◽

Cell Type ◽

Cell Type Composition ◽

Type Composition

Download Full-text

Distinguishing biological from technical sources of variation by leveraging multiple methylation datasets

10.1101/521146 ◽

2019 ◽

Author(s):

Mike Thompson ◽

Zeyuan Johnson Chen ◽

Elior Rahmani ◽

Eran Halperin

Keyword(s):

Biological Effects ◽

Association Studies ◽

Real Data ◽

Relevant Information ◽

R Package ◽

Cell Type ◽

Global Methylation ◽

Cell Type Composition ◽

Type Composition ◽

Sources Of Variation

AbstractDNA methylation remains one of the most widely studied epigenetic markers. One of the major challenges in population studies of methylation is the presence of global methylation effects that may mask local signals. Such global effects may be due to either technical effects (e.g., batch effects) or biological effects (e.g., cell-type composition, genetics). Many methods have been developed for the detection of such global effects, typically in the context of epigenome-wide association studies. However, current unsupervised methods do not distinguish between biological and technical effects, resulting in a loss of highly relevant information. Though supervised methods can be used to estimate known biological effects, it remains difficult to identify and estimate unknown biological effects that globally affect the methylome. Here, we proposeCONFINED,a reference-free method based on sparse canonical correlation analysis that captures replicable sources of variation—such as age, sex, and cell-type composition—across multiple methylation datasets and distinguishes them from dataset-specific sources of variability (e.g., technical effects). Consequently, we demonstrate through simulated and real data that by leveraging multiple datasets simultaneously, our approach captures several replicable sources of biological variation better than previous reference-free methods and is considerably more robust to technical noise than previous reference-free methods.CONFINEDis available as an R package as detailed athttps://github.com/cozygene/CONFINED.

Download Full-text

Changes in cell type composition in the hepatopancreas of Chinese mitten crab Eriocheir sinensis during the molting cycle

Journal of Fishery Sciences of China ◽

10.3724/sp.j.1118.2013.01175 ◽

2013 ◽

Vol 20 (6) ◽

pp. 1175-1181

Author(s):

Zhihuan TIAN ◽

Xianjiang KANG ◽

Chuanzhen JIAO

Keyword(s):

Eriocheir Sinensis ◽

Cell Type ◽

Chinese Mitten Crab ◽

Molting Cycle ◽

Cell Type Composition ◽

Type Composition ◽

Mitten Crab

Download Full-text