scholarly journals Distinguishing biological from technical sources of variation by leveraging multiple methylation datasets

2019 ◽  
Author(s):  
Mike Thompson ◽  
Zeyuan Johnson Chen ◽  
Elior Rahmani ◽  
Eran Halperin

AbstractDNA methylation remains one of the most widely studied epigenetic markers. One of the major challenges in population studies of methylation is the presence of global methylation effects that may mask local signals. Such global effects may be due to either technical effects (e.g., batch effects) or biological effects (e.g., cell-type composition, genetics). Many methods have been developed for the detection of such global effects, typically in the context of epigenome-wide association studies. However, current unsupervised methods do not distinguish between biological and technical effects, resulting in a loss of highly relevant information. Though supervised methods can be used to estimate known biological effects, it remains difficult to identify and estimate unknown biological effects that globally affect the methylome. Here, we proposeCONFINED,a reference-free method based on sparse canonical correlation analysis that captures replicable sources of variation—such as age, sex, and cell-type composition—across multiple methylation datasets and distinguishes them from dataset-specific sources of variability (e.g., technical effects). Consequently, we demonstrate through simulated and real data that by leveraging multiple datasets simultaneously, our approach captures several replicable sources of biological variation better than previous reference-free methods and is considerably more robust to technical noise than previous reference-free methods.CONFINEDis available as an R package as detailed athttps://github.com/cozygene/CONFINED.

Author(s):  
Shijie C Zheng ◽  
Charles E Breeze ◽  
Stephan Beck ◽  
Danyue Dong ◽  
Tianyu Zhu ◽  
...  

Abstract Summary It is well recognized that cell-type heterogeneity hampers the interpretation of Epigenome-Wide Association Studies (EWAS). Many tools have emerged to address this issue, including several R/Bioconductor packages that infer cell-type composition. Here we present a web application for cell-type deconvolution, which offers the functionality of our EpiDISH Bioconductor/R package in a user-friendly GUI environment. Users can upload their data to infer cell-type composition and differentially methylated cytosines in individual cell-types (DMCTs) for a range of different tissues. Availability and implementation EpiDISH web server is implemented with Shiny in R, and is freely available at https://www.biosino.org/EpiDISH/.


2021 ◽  
Vol 12 ◽  
Author(s):  
Shivanthan Shanthikumar ◽  
Melanie R. Neeland ◽  
Richard Saffery ◽  
Sarath C. Ranganathan ◽  
Alicia Oshlack ◽  
...  

In epigenome-wide association studies analysing DNA methylation from samples containing multiple cell types, it is essential to adjust the analysis for cell type composition. One well established strategy for achieving this is reference-based cell type deconvolution, which relies on knowledge of the DNA methylation profiles of purified constituent cell types. These are then used to estimate the cell type proportions of each sample, which can then be incorporated to adjust the association analysis. Bronchoalveolar lavage is commonly used to sample the lung in clinical practice and contains a mixture of different cell types that can vary in proportion across samples, affecting the overall methylation profile. A current barrier to the use of bronchoalveolar lavage in DNA methylation-based research is the lack of reference DNA methylation profiles for each of the constituent cell types, thus making reference-based cell composition estimation difficult. Herein, we use bronchoalveolar lavage samples collected from children with cystic fibrosis to define DNA methylation profiles for the four most common and clinically relevant cell types: alveolar macrophages, granulocytes, lymphocytes and alveolar epithelial cells. We then demonstrate the use of these methylation profiles in conjunction with an established reference-based methylation deconvolution method to estimate the cell type composition of two different tissue types; a publicly available dataset derived from artificial blood-based cell mixtures and further bronchoalveolar lavage samples. The reference DNA methylation profiles developed in this work can be used for future reference-based cell type composition estimation of bronchoalveolar lavage. This will facilitate the use of this tissue in studies examining the role of DNA methylation in lung health and disease.


Author(s):  
Yun Zhang ◽  
Jonavelle Cuerdo ◽  
Marc K Halushka ◽  
Matthew N McCall

Abstract Variable cellular composition of tissue samples represents a significant challenge for the interpretation of genomic profiling studies. Substantial effort has been devoted to modeling and adjusting for compositional differences when estimating differential expression between sample types. However, relatively little attention has been given to the effect of tissue composition on co-expression estimates. In this study, we illustrate the effect of variable cell-type composition on correlation-based network estimation and provide a mathematical decomposition of the tissue-level correlation. We show that a class of deconvolution methods developed to separate tumor and stromal signatures can be applied to two component cell-type mixtures. In simulated and real data, we identify conditions in which a deconvolution approach would be beneficial. Our results suggest that uncorrelated cell-type-specific markers are ideally suited to deconvolute both the expression and co-expression patterns of an individual cell type. We provide a Shiny application for users to interactively explore the effect of cell-type composition on correlation-based co-expression estimation for any cell types of interest.


Epigenomics ◽  
2020 ◽  
Author(s):  
Yen-Chen A Feng ◽  
Yichen Guo ◽  
Lucile Pain ◽  
G Mark Lathrop ◽  
Catherine Laprise ◽  
...  

Aim: To develop a method for estimating cell-specific effects in epigenomic association studies in the presence of cell type heterogeneity. Materials & methods: We utilized Monte Carlo Expectation-Maximization (MCEM) algorithm with Metropolis–Hastings sampler to reconstruct the ‘missing’ cell-specific methylations and to estimate their associations with phenotypes free of confounding by cell type proportions. Results: Simulations showed reliable performance of the method under various settings including when the cell type is rare. Application to a real dataset recapitulated the directly measured cell-specific methylation pattern in whole blood. Conclusion: This work provides a framework to identify important cell groups and account for cell type composition useful for studying the role of epigenetic changes in human traits and diseases.


2021 ◽  
Author(s):  
Belinda Phipson ◽  
Choon Boon Sim ◽  
Enzo R. Porrello ◽  
Alex W Hewitt ◽  
Joseph Powell ◽  
...  

Single cell RNA Sequencing (scRNA-seq) has rapidly gained popularity over the last few years for profiling the transcriptomes of thousands to millions of single cells. To date, there are more than a thousand software packages that have been developed to analyse scRNA-seq data. These focus predominantly on visualization, dimensionality reduction and cell type identification. Single cell technology is now being used to analyse experiments with complex designs including biological replication. One question that can be asked from single cell experiments which has not been possible to address with bulk RNA-seq data is whether the cell type proportions are different between two or more experimental conditions. As well as gene expression changes, the relative depletion or enrichment of a particular cell type can be the functional consequence of disease or treatment. However, cell type proportions estimates from scRNA-seq data are variable and statistical methods that can correctly account for different sources of variability are needed to confidently identify statistically significant shifts in cell type composition between experimental conditions. We present propeller, a robust and flexible method that leverages biological replication to find statistically significant differences in cell type proportions between groups. The propeller method is publicly available in the open source speckle R package (https://github.com/Oshlack/speckle).


2020 ◽  
Author(s):  
Miao Rui ◽  
Dang Qi ◽  
Huang Hai Hui ◽  
Xia Liang Yong ◽  
Yong Liang

Abstract Background: In epigenome-wide association studies (EWAS), the mixed methylation expression caused by the combination of different cell types may lead the researchers to find the false methylation site related to the phenotype of interest. In order to fix this problem, researchers have proposed some non-reference methods based on sparse principle component analysis (PCA) to correct the EWAS false discovery. However, the existing model assumes that all methylation site have the same a priori probability in each PC load, but it is known that there already has network structure in the genetic variable corresponding to the methylation site. In this paper, we show that the results of the existing EWAS correction model are still not good enough. If we can integrate the existing methylation network as prior knowledge into the sparse PCA model, we can effectively improve the correction ability of the existing model. Result: Based on the above ideas, we propose GN-ReFAEWAS, a model which uses the prior methylation gene network structure into the PCA framework for feature extraction. This model can be used to correct the false discovery in EWAS. GN-ReFAEWAS model does not need cell counting data and can estimate cell type composition through methylation principal component data. The key of this model is to solve a sparse regularize problem of methylation network. This paper uses regularize and random sampling algorithm to solve this problem. We used one simulated data set and three real data sets for experiments and compared four existing EWAS calibration models. The experimental results show that the GN-ReFAEWAS model is superior to existing models. Conclusion: The result proved that GN-ReFAEWAS model can provide a better estimation of cell-type composition and reduce the false positives in EWAS.


2017 ◽  
Author(s):  
Shijie C Zheng ◽  
Stephan Beck ◽  
Andrew E. Jaffe ◽  
Devin C. Koestler ◽  
Kasper D. Hansen ◽  
...  

AbstractRecently, a study by Rahmani et al [1] claimed that a reference-free cell-type deconvolution method, called ReFACTor, leads to improved power and improved estimates of cell-type composition compared to competing reference-free and reference-based methods in the context of Epigenome-Wide Association Studies (EWAS). However, we identified many critical flaws (both conceptual and statistical in nature), which seriously question the validity of their claims. We outlined constructive criticism in a recent correspondence letter, Zheng et al [2]. The purpose of this letter is two-fold. First, to present additional analyses, which demonstrate that our original criticism is statistically sound. Second, to highlight additional serious concerns, which Rahmani et al have not yet addressed. In summary, we find that ReFACTor has not been demonstrated to outperform state-of-the-art reference-free methods such as SVA or RefFreeEWAS, nor state-of-the-art reference-based methods. Thus, the claim by Rahmani et al (a claim reiterated in their recent response letter [3]) that ReFACT or represents an advance over the state-of-the-art is not supported by an objective and rigorous statistical analysis of the data.


2017 ◽  
Author(s):  
Gabriel E. Hoffman ◽  
Brigham J. Hartley ◽  
Erin Flaherty ◽  
Ian Ladran ◽  
Peter Gochman ◽  
...  

ABSTRACTWhereas highly penetrant variants have proven well-suited to human induced pluripotent stem cell (hiPSC)-based models, the power of hiPSC-based studies to resolve the much smaller effects of common variants within the size of cohorts that can be realistically assembled remains uncertain. In developing a large case/control schizophrenia (SZ) hiPSC-derived cohort of neural progenitor cells and neurons, we identified and accounted for a variety of technical and biological sources of variation. Reducing the stochastic effects of the differentiation process by correcting for cell type composition boosted the SZ signal in hiPSC-based models and increased the concordance with post mortem datasets. Because this concordance was strongest in hiPSC-neurons, it suggests that this cell type may better model genetic risk for SZ. We predict a growing convergence between hiPSC and post mortem studies as both approaches expand to larger cohort sizes. For studies of complex genetic disorders, to maximize the power of hiPSC cohorts currently feasible, in most cases and whenever possible, we recommend expanding the number of individuals even at the expense of the number of replicate hiPSC clones.


Sign in / Sign up

Export Citation Format

Share Document