scholarly journals A comparison of reference-based algorithms for correcting cell-type heterogeneity in Epigenome-Wide Association Studies

2017 ◽  
Author(s):  
Andrew E Teschendorff ◽  
Charles E Breeze ◽  
Shijie C Zheng ◽  
Stephan Beck

AbstractBackgroundIntra-sample cellular heterogeneity presents numerous challenges to the identification of biomarkers in large Epigenome-Wide Association Studies (EWAS). While a number of reference-based deconvolution algorithms have emerged, their potential remains underexplored and a comparative evaluation of these algorithms beyond tissues such as blood is still lacking.ResultsHere we present a novel framework for reference-based inference, which leverages cell-type specific DNAse Hypersensitive Site (DHS) information from the NIH Epigenomics Roadmap to construct an improved reference DNA methylation database. We show that this leads to a marginal but statistically significant improvement of cell-count estimates in whole blood as well as in mixtures involving epithelial cell-types. Using this framework we compare a widely used state-of-the-art reference-based algorithm (called constrained projection) to two non-constrained approaches including CIBERSORT and a method based on robust partial correlations. We conclude that the widely-used constrained projection technique may not always be optimal. Instead, we find that the method based on robust partial correlations is generally more robust across a range of different tissue types and for realistic noise levels. We call the combined algorithm which uses DHS data and robust partial correlations for inference, EpiDISH (Epigenetic Dissection of Intra-Sample Heterogeneity). Finally, we demonstrate the added value of EpiDISH in an EWAS of smoking.ConclusionsEstimating cell-type fractions and subsequent inference in EWAS may benefit from the use of non-constrained reference-based cell-type deconvolution methods.

2020 ◽  
Vol 29 (11) ◽  
pp. 1922-1932
Author(s):  
Priyanka Nandakumar ◽  
Dongwon Lee ◽  
Thomas J Hoffmann ◽  
Georg B Ehret ◽  
Dan Arking ◽  
...  

Abstract Hundreds of loci have been associated with blood pressure (BP) traits from many genome-wide association studies. We identified an enrichment of these loci in aorta and tibial artery expression quantitative trait loci in our previous work in ~100 000 Genetic Epidemiology Research on Aging study participants. In the present study, we sought to fine-map known loci and identify novel genes by determining putative regulatory regions for these and other tissues relevant to BP. We constructed maps of putative cis-regulatory elements (CREs) using publicly available open chromatin data for the heart, aorta and tibial arteries, and multiple kidney cell types. Variants within these regions may be evaluated quantitatively for their tissue- or cell-type-specific regulatory impact using deltaSVM functional scores, as described in our previous work. We aggregate variants within these putative CREs within 50 Kb of the start or end of ‘expressed’ genes in these tissues or cell types using public expression data and use deltaSVM scores as weights in the group-wise sequence kernel association test to identify candidates. We test for association with both BP traits and expression within these tissues or cell types of interest and identify the candidates MTHFR, C10orf32, CSK, NOV, ULK4, SDCCAG8, SCAMP5, RPP25, HDGFRP3, VPS37B and PPCDC. Additionally, we examined two known QT interval genes, SCN5A and NOS1AP, in the Atherosclerosis Risk in Communities Study, as a positive control, and observed the expected heart-specific effect. Thus, our method identifies variants and genes for further functional testing using tissue- or cell-type-specific putative regulatory information.


2018 ◽  
Author(s):  
Xuran Wang ◽  
Jihwan Park ◽  
Katalin Susztak ◽  
Nancy R. Zhang ◽  
Mingyao Li

AbstractWe present MuSiC, a method that utilizes cell-type specific gene expression from single-cell RNA sequencing (RNA-seq) data to characterize cell type compositions from bulk RNA-seq data in complex tissues. When applied to pancreatic islet and whole kidney expression data in human, mouse, and rats, MuSiC outperformed existing methods, especially for tissues with closely related cell types. MuSiC enables characterization of cellular heterogeneity of complex tissues for identification of disease mechanisms.


2021 ◽  
Author(s):  
Rujin Wang ◽  
Danyu Lin ◽  
Yuchao Jiang

More than a decade of genome-wide association studies (GWASs) have identified genetic risk variants that are significantly associated with complex traits. Emerging evidence suggests that the function of trait-associated variants likely acts in a tissue- or cell-type-specific fashion. Yet, it remains challenging to prioritize trait-relevant tissues or cell types to elucidate disease etiology. Here, we present EPIC (cEll tyPe enrIChment), a statistical framework that relates large-scale GWAS summary statistics to cell-type-specific omics measurements from single-cell sequencing. We derive powerful gene-level test statistics for common and rare variants, separately and jointly, and adopt generalized least squares to prioritize trait-relevant tissues or cell types while accounting for the correlation structures both within and between genes. Using enrichment of loci associated with four lipid traits in the liver and enrichment of loci associated with three neurological disorders in the brain as ground truths, we show that EPIC outperforms existing methods. We extend our framework to single-cell transcriptomic data and identify cell types underlying type 2 diabetes and schizophrenia. The enrichment is replicated using independent GWAS and single-cell datasets and further validated using PubMed search and existing bulk case-control testing results.


2018 ◽  
Author(s):  
Xiangyu Luo ◽  
Can Yang ◽  
Yingying Wei

In epigenome-wide association studies, the measured signals for each sample are a mixture of methylation profiles from different cell types. The current approaches to the association detection only claim whether a cytosine-phosphate-guanine (CpG) site is associated with the phenotype or not, but they cannot determine the cell type in which the risk-CpG site is affected by the phenotype. Here, we propose a solid statistical method, HIgh REsolution (HIRE), which not only substantially improves the power of association detection at the aggregated level as compared to the existing methods but also enables the detection of risk-CpG sites for individual cell types.


2020 ◽  
Author(s):  
Alexandre P. Marand ◽  
Zongliang Chen ◽  
Andrea Gallavotti ◽  
Robert J. Schmitz

ABSTRACTCis-regulatory elements (CREs) encode the genomic blueprints for coordinating spatiotemporal gene expression programs underlying highly specialized cell functions. To identify CREs underlying cell-type specification and developmental transitions, we implemented single-cell sequencing of Assay for Transposase Accessible Chromatin in an atlas of Zea mays organs. We describe 92 distinct states of chromatin accessibility across more than 165,913 putative CREs, 56,575 cells, and 52 known cell-types in maize using a novel implementation of regularized quasibinomial logistic regression. Cell states were largely determined by combinatorial accessibility of transcription factors (TFs) and their binding sites. A neural network revealed that cell identity could be accurately predicted (>0.94) solely based on TF binding site accessibility. Co-accessible chromatin recapitulated higher-order chromatin interactions, with distinct sets of TFs coordinating cell type-specific regulatory dynamics. Pseudotime reconstruction and alignment with Arabidopsis thaliana trajectories identified conserved TFs, associated motifs, and cis-regulatory regions specifying sequential developmental progressions. Cell-type specific accessible chromatin regions were enriched with phenotype-associated genetic variants and signatures of selection, revealing the major cell-types and putative CREs targeted by modern maize breeding. Collectively, our analysis affords a comprehensive framework for understanding cellular heterogeneity, evolution, and cis-regulatory grammar of cell-type specification in a major crop species.


2017 ◽  
Author(s):  
Sebastian Preissl ◽  
Rongxin Fang ◽  
Yuan Zhao ◽  
Ramya Raviram ◽  
Yanxiao Zhang ◽  
...  

ABSTRACTGenome-wide analysis of chromatin accessibility in primary tissues has uncovered millions of candidate regulatory sequences in the human and mouse genomes1–4. However, the heterogeneity of biological samples used in previous studies has prevented a precise understanding of the dynamic chromatin landscape in specific cell types. Here, we show that analysis of the transposase-accessible-chromatin in single nuclei isolated from frozen tissue samples can resolve cellular heterogeneity and delineate transcriptional regulatory sequences in the constituent cell types. Our strategy is based on a combinatorial barcoding assisted single cell assay for transposase-accessible chromatin5 and is optimized for nuclei from flash-frozen primary tissue samples (snATAC-seq). We used this method to examine the mouse forebrain at seven development stages and in adults. From snATAC-seq profiles of more than 15,000 high quality nuclei, we identify 20 distinct cell populations corresponding to major neuronal and non-neuronal cell-types in foetal and adult forebrains. We further define cell-type specific cis regulatory sequences and infer potential master transcriptional regulators of each cell population. Our results demonstrate the feasibility of a general approach for identifying cell-type-specific cis regulatory sequences in heterogeneous tissue samples, and provide a rich resource for understanding forebrain development in mammals.


2019 ◽  
Vol 12 (1) ◽  
Author(s):  
Liduo Yin ◽  
Yanting Luo ◽  
Xiguang Xu ◽  
Shiyu Wen ◽  
Xiaowei Wu ◽  
...  

Abstract Background Numerous cell types can be identified within plant tissues and animal organs, and the epigenetic modifications underlying such enormous cellular heterogeneity are just beginning to be understood. It remains a challenge to infer cellular composition using DNA methylomes generated for mixed cell populations. Here, we propose a semi-reference-free procedure to perform virtual methylome dissection using the nonnegative matrix factorization (NMF) algorithm. Results In the pipeline that we implemented to predict cell-subtype percentages, putative cell-type-specific methylated (pCSM) loci were first determined according to their DNA methylation patterns in bulk methylomes and clustered into groups based on their correlations in methylation profiles. A representative set of pCSM loci was then chosen to decompose target methylomes into multiple latent DNA methylation components (LMCs). To test the performance of this pipeline, we made use of single-cell brain methylomes to create synthetic methylomes of known cell composition. Compared with highly variable CpG sites, pCSM loci achieved a higher prediction accuracy in the virtual methylome dissection of synthetic methylomes. In addition, pCSM loci were shown to be good predictors of the cell type of the sorted brain cells. The software package developed in this study is available in the GitHub repository (https://github.com/Gavin-Yinld). Conclusions We anticipate that the pipeline implemented in this study will be an innovative and valuable tool for the decoding of cellular heterogeneity.


2021 ◽  
Author(s):  
Dongqing Sun ◽  
Yihan Xiao ◽  
Zhaoyang Liu ◽  
Taiwen Li ◽  
Qiu Wu ◽  
...  

AbstractThe recent advances in spatial transcriptomics have brought unprecedented opportunities to understand the cellular heterogeneity in the spatial context. However, the current limitations of spatial technologies hamper the exploration of cellular localizations and interactions at single-cell level. Here, we present spatial transcriptomics deconvolution by topic modeling (STRIDE), a computational method to decompose cell-types from spatial mixtures by leveraging topic profiles trained from single-cell transcriptomics. STRIDE accurately estimated the cell-type proportions and showed balanced specificity and sensitivity compared to existing methods. We demonstrate STRIDE’s utility by applying it to different spatial platforms and biological systems. Deconvolution by STRIDE not only mapped rare cell-types to spatial locations but also improved the identification of spatial localized genes and domains. Moreover, topics discovered by STRIDE were associated with cell-type-specific functions, and could be further used to integrate successive sections and reconstruct the three-dimensional architecture of tissues. Taken together, STRIDE is a versatile and extensible tool for integrated analysis of spatial and single-cell transcriptomics and is publicly available at https://github.com/DongqingSun96/STRIDE.


2020 ◽  
Vol 52 (11) ◽  
pp. 1798-1808
Author(s):  
Junha Cha ◽  
Insuk Lee

AbstractUnderstanding cellular heterogeneity is the holy grail of biology and medicine. Cells harboring identical genomes show a wide variety of behaviors in multicellular organisms. Genetic circuits underlying cell-type identities will facilitate the understanding of the regulatory programs for differentiation and maintenance of distinct cellular states. Such a cell-type-specific gene network can be inferred from coregulatory patterns across individual cells. Conventional methods of transcriptome profiling using tissue samples provide only average signals of diverse cell types. Therefore, reconstructing gene regulatory networks for a particular cell type is not feasible with tissue-based transcriptome data. Recently, single-cell omics technology has emerged and enabled the capture of the transcriptomic landscape of every individual cell. Although single-cell gene expression studies have already opened up new avenues, network biology using single-cell transcriptome data will further accelerate our understanding of cellular heterogeneity. In this review, we provide an overview of single-cell network biology and summarize recent progress in method development for network inference from single-cell RNA sequencing (scRNA-seq) data. Then, we describe how cell-type-specific gene networks can be utilized to study regulatory programs specific to disease-associated cell types and cellular states. Moreover, with scRNA data, modeling personal or patient-specific gene networks is feasible. Therefore, we also introduce potential applications of single-cell network biology for precision medicine. We envision a rapid paradigm shift toward single-cell network analysis for systems biology in the near future.


Author(s):  
Shijie C Zheng ◽  
Charles E Breeze ◽  
Stephan Beck ◽  
Danyue Dong ◽  
Tianyu Zhu ◽  
...  

Abstract Summary It is well recognized that cell-type heterogeneity hampers the interpretation of Epigenome-Wide Association Studies (EWAS). Many tools have emerged to address this issue, including several R/Bioconductor packages that infer cell-type composition. Here we present a web application for cell-type deconvolution, which offers the functionality of our EpiDISH Bioconductor/R package in a user-friendly GUI environment. Users can upload their data to infer cell-type composition and differentially methylated cytosines in individual cell-types (DMCTs) for a range of different tissues. Availability and implementation EpiDISH web server is implemented with Shiny in R, and is freely available at https://www.biosino.org/EpiDISH/.


Sign in / Sign up

Export Citation Format

Share Document