scholarly journals Accurate prediction of single-cell DNA methylation states using deep learning

2016 ◽  
Author(s):  
Christof Angermueller ◽  
Heather J. Lee ◽  
Wolf Reik ◽  
Oliver Stegle

AbstractRecent technological advances have enabled assaying DNA methylation at single-cell resolution. Current protocols are limited by incomplete CpG coverage and hence methods to predict missing methylation states are critical to enable genome-wide analyses. Here, we report DeepCpG, a computational approach based on deep neural networks to predict DNA methylation states from DNA sequence and incomplete methylation profiles in single cells. We evaluated DeepCpG on single-cell methylation data from five cell types generated using alternative sequencing protocols, finding that DeepCpG yields substantially more accurate predictions than previous methods. Additionally, we show that the parameters of our model can be interpreted, thereby providing insights into the effect of sequence composition on methylation variability.

Author(s):  
Dong-Sung Lee ◽  
Chongyuan Luo ◽  
Jingtian Zhou ◽  
Sahaana Chandran ◽  
Angeline Rivkin ◽  
...  

Abstract The ability to profile epigenomic features in single cells is facilitating the study of the variation in transcription regulation at the single cell level. Single cell methods have also facilitated the generation of cell-type resolved transcriptomic and epigenetic profiles of lineages derived from complex heterogeneous samples. However, integrating different epigenetic features remain challenging, as many current methods profile a single data type at at time. Furthermore, some epigenetic features, such as 3D genome organization, are intrinsically variable between single cells of the same lineage, so it remains unclear how well these methods may resolve cell-types from complex mixtures. Here we describe a method for profiling 3D genome organization and DNA methylation in single cells. This protocol accompanies Lee et al. (Nature Methods 2019) after peer review to aid potential users in applying the method to their own samples.


2017 ◽  
Vol 12 (3) ◽  
pp. 534-547 ◽  
Author(s):  
Stephen J Clark ◽  
Sébastien A Smallwood ◽  
Heather J Lee ◽  
Felix Krueger ◽  
Wolf Reik ◽  
...  

2021 ◽  
Author(s):  
Geert-Jan Huizing ◽  
Gabriel Peyré ◽  
Laura Cantini

AbstractThe recent advent of high-throughput single-cell molecular profiling is revolutionizing biology and medicine by unveiling the diversity of cell types and states contributing to development and disease. The identification and characterization of cellular heterogeneity is typically achieved through unsupervised clustering, which crucially relies on a similarity metric.We here propose the use of Optimal Transport (OT) as a cell-cell similarity metric for single-cell omics data. OT defines distances to compare, in a geometrically faithful way, high-dimensional data represented as probability distributions. It is thus expected to better capture complex relationships between features and produce a performance improvement over state-of-the-art metrics. To speed up computations and cope with the high-dimensionality of single-cell data, we consider the entropic regularization of the classical OT distance. We then extensively benchmark OT against state-of-the-art metrics over thirteen independent datasets, including simulated, scRNA-seq, scATAC-seq and single-cell DNA methylation data. First, we test the ability of the metrics to detect the similarity between cells belonging to the same groups (e.g. cell types, cell lines of origin). Then, we apply unsupervised clustering and test the quality of the resulting clusters.In our in-depth evaluation, OT is found to improve cell-cell similarity inference and cell clustering in all simulated and real scRNA-seq data, while its performances are comparable with Pearson correlation in scATAC-seq and single-cell DNA methylation data. All our analyses are reproducible through the OT-scOmics Jupyter notebook available at https://github.com/ComputationalSystemsBiology/OT-scOmics.


2018 ◽  
Author(s):  
Stephanie M. Linker ◽  
Lara Urban ◽  
Stephen Clark ◽  
Mariya Chhatriwala ◽  
Shradha Amatya ◽  
...  

AbstractBackgroundAlternative splicing is a key regulatory mechanism in eukaryotic cells and increases the effective number of functionally distinct gene products. Using bulk RNA sequencing, splicing variation has been studied across human tissues and in genetically diverse populations. This has identified disease-relevant splicing events, as well as associations between splicing and genomic variations, including sequence composition and conservation. However, variability in splicing between single cells from the same tissue or cell type and its determinants remain poorly understood.ResultsWe applied parallel DNA methylation and transcriptome sequencing to differentiating human induced pluripotent stem cells to characterize splicing variation (exon skipping) and its determinants. Our results shows that variation in single-cell splicing can be accurately predicted based on local sequence composition and genomic features. We observe moderate but consistent contributions from local DNA methylation profiles to splicing variation across cells. A combined model that is built based on sequence as well as DNA methylation information accurately predicts different splicing modes of individual cassette exons (AUC=0.85). These categories include the conventional inclusion and exclusion patterns, but also more subtle modes of cell-to-cell variation in splicing. Finally, we identified and characterized associations between DNA methylation and splicing changes during cell differentiation.ConclusionsOur study yields new insights into alternative splicing at the single-cell level and reveals a previously underappreciated link between DNA methylation variation and splicing.


2020 ◽  
Vol 126 (9) ◽  
pp. 1112-1126 ◽  
Author(s):  
Jesse W. Williams ◽  
Holger Winkels ◽  
Christopher P. Durant ◽  
Konstantin Zaitsev ◽  
Yanal Ghosheh ◽  
...  

Technological advances in characterizing molecular heterogeneity at the single cell level have ushered in a deeper understanding of the biological diversity of cells present in tissues including atherosclerotic plaques. New subsets of cells have been discovered among cell types previously considered homogenous. The commercial availability of systems to obtain transcriptomes and matching surface phenotypes from thousands of single cells is rapidly changing our understanding of cell types and lineage identity. Emerging methods to infer cellular functions are beginning to shed new light on the interplay of components involved in multifaceted disease responses, like atherosclerosis. Here, we provide a technical guide for design, implementation, assembly, and interpretations of current single cell transcriptomics approaches from the perspective of employing these tools for advancing cardiovascular disease research.


Author(s):  
Jingyi Jessica Li

Abstract Single-cell RNA sequencing (scRNA-seq) is a burgeoning field where experimental techniques and computational methods have been under rapid evolution in the past six years. These technological advances have allowed biomedical researchers to identify new cell types, delineate cell sub-populations, and infer cell differentiation trajectories in various tissue samples. Among the important features extractable from scRNA-seq data, the predominant ones are individual genes’ expression levels in single cells. Most analyses require a preprocessing step that converts a scRNA-seq dataset into a count matrix, where rows correspond to cells (or genes), columns correspond to genes (or cells), and entries are counts, i.e. a count is the number of sequenced reads or uniquely mapped identifiers (UMIs) mapped to a gene in a cell. Single-cell count matrices are highly sparse; for example, a typical matrix constructed from a droplet-based dataset may have >90% of counts as zeros.


2020 ◽  
Vol 6 (51) ◽  
pp. eaba9031
Author(s):  
Laiyi Fu ◽  
Lihua Zhang ◽  
Emmanuel Dollinger ◽  
Qinke Peng ◽  
Qing Nie ◽  
...  

Characterizing genome-wide binding profiles of transcription factors (TFs) is essential for understanding biological processes. Although techniques have been developed to assess binding profiles within a population of cells, determining them at a single-cell level remains elusive. Here, we report scFAN (single-cell factor analysis network), a deep learning model that predicts genome-wide TF binding profiles in individual cells. scFAN is pretrained on genome-wide bulk assay for transposase-accessible chromatin sequencing (ATAC-seq), DNA sequence, and chromatin immunoprecipitation sequencing (ChIP-seq) data and uses single-cell ATAC-seq to predict TF binding in individual cells. We demonstrate the efficacy of scFAN by both studying sequence motifs enriched within predicted binding peaks and using predicted TFs for discovering cell types. We develop a new metric “TF activity score” to characterize each cell and show that activity scores can reliably capture cell identities. scFAN allows us to discover and study cellular identities and heterogeneity based on chromatin accessibility profiles.


2020 ◽  
Author(s):  
Laiyi Fu ◽  
Lihua Zhang ◽  
Emmanuel Dollinger ◽  
Qinke Peng ◽  
Qing Nie ◽  
...  

AbstractCharacterizing genome-wide binding profiles of transcription factor (TF) is essential for understanding many biological processes. Although techniques have been developed to assess binding profiles within a population of cells, determining binding profiles at a single cell level remains elusive. Here we report scFAN (Single Cell Factor Analysis Network), a deep learning model that predicts genome-wide TF binding profiles in individual cells. scFAN is pre-trained on genome-wide bulk ATAC-seq, DNA sequence and ChIP-seq data, and utilizes single-cell ATAC-seq to predict TF binding in individual cells. We demonstrate the efficacy of scFAN by studying sequence motifs enriched within predicted binding peaks and investigating the effectiveness of predicted TF peaks for discovering cell types. We develop a new metric “TF activity score” to characterize each cell, and show that the activity scores can reliably capture cell identities. The method allows us to discover and study cellular identities and heterogeneity based on chromatin accessibility profiles.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Hanyu Zhang ◽  
Ruoyi Cai ◽  
James Dai ◽  
Wei Sun

AbstractWe introduce a new computational method named EMeth to estimate cell type proportions using DNA methylation data. EMeth is a reference-based method that requires cell type-specific DNA methylation data from relevant cell types. EMeth improves on the existing reference-based methods by detecting the CpGs whose DNA methylation are inconsistent with the deconvolution model and reducing their contributions to cell type decomposition. Another novel feature of EMeth is that it allows a cell type with known proportions but unknown reference and estimates its methylation. This is motivated by the case of studying methylation in tumor cells while bulk tumor samples include tumor cells as well as other cell types such as infiltrating immune cells, and tumor cell proportion can be estimated by copy number data. We demonstrate that EMeth delivers more accurate estimates of cell type proportions than several other methods using simulated data and in silico mixtures. Applications in cancer studies show that the proportions of T regulatory cells estimated by DNA methylation have expected associations with mutation load and survival time, while the estimates from gene expression miss such associations.


2010 ◽  
Vol 20 (12) ◽  
pp. 1719-1729 ◽  
Author(s):  
M. D. Robinson ◽  
C. Stirzaker ◽  
A. L. Statham ◽  
M. W. Coolen ◽  
J. Z. Song ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document