Accurate prediction of single-cell DNA methylation states using deep learning

Mapping Intimacies ◽

10.1101/055715 ◽

2016 ◽

Cited By ~ 8

Author(s):

Christof Angermueller ◽

Heather J. Lee ◽

Wolf Reik ◽

Oliver Stegle

Keyword(s):

Dna Methylation ◽

Single Cell ◽

Deep Neural Networks ◽

Single Cells ◽

Cell Types ◽

Computational Approach ◽

Methylation Data ◽

Sequence Composition ◽

Technological Advances ◽

Genome Wide

AbstractRecent technological advances have enabled assaying DNA methylation at single-cell resolution. Current protocols are limited by incomplete CpG coverage and hence methods to predict missing methylation states are critical to enable genome-wide analyses. Here, we report DeepCpG, a computational approach based on deep neural networks to predict DNA methylation states from DNA sequence and incomplete methylation profiles in single cells. We evaluated DeepCpG on single-cell methylation data from five cell types generated using alternative sequencing protocols, finding that DeepCpG yields substantially more accurate predictions than previous methods. Additionally, we show that the parameters of our model can be interpreted, thereby providing insights into the effect of sequence composition on methylation variability.

Download Full-text

Single-cell multi-omic profiling of chromatin conformation and DNA methylation

10.21203/rs.2.11454/v1 ◽

2019 ◽

Cited By ~ 1

Author(s):

Dong-Sung Lee ◽

Chongyuan Luo ◽

Jingtian Zhou ◽

Sahaana Chandran ◽

Angeline Rivkin ◽

...

Keyword(s):

Dna Methylation ◽

Single Cell ◽

Genome Organization ◽

Single Cells ◽

Cell Types ◽

Cell Type ◽

Cell Level ◽

3D Genome ◽

Heterogeneous Samples ◽

Single Data

Abstract The ability to profile epigenomic features in single cells is facilitating the study of the variation in transcription regulation at the single cell level. Single cell methods have also facilitated the generation of cell-type resolved transcriptomic and epigenetic profiles of lineages derived from complex heterogeneous samples. However, integrating different epigenetic features remain challenging, as many current methods profile a single data type at at time. Furthermore, some epigenetic features, such as 3D genome organization, are intrinsically variable between single cells of the same lineage, so it remains unclear how well these methods may resolve cell-types from complex mixtures. Here we describe a method for profiling 3D genome organization and DNA methylation in single cells. This protocol accompanies Lee et al. (Nature Methods 2019) after peer review to aid potential users in applying the method to their own samples.

Download Full-text

Genome-wide base-resolution mapping of DNA methylation in single cells using single-cell bisulfite sequencing (scBS-seq)

Nature Protocols ◽

10.1038/nprot.2016.187 ◽

2017 ◽

Vol 12 (3) ◽

pp. 534-547 ◽

Cited By ~ 86

Author(s):

Stephen J Clark ◽

Sébastien A Smallwood ◽

Heather J Lee ◽

Felix Krueger ◽

Wolf Reik ◽

...

Keyword(s):

Dna Methylation ◽

Single Cell ◽

Bisulfite Sequencing ◽

Single Cells ◽

Genome Wide ◽

Resolution Mapping

Download Full-text

Optimal Transport improves cell-cell similarity inference in single-cell omics data

10.1101/2021.03.19.436159 ◽

2021 ◽

Author(s):

Geert-Jan Huizing ◽

Gabriel Peyré ◽

Laura Cantini

Keyword(s):

Dna Methylation ◽

Single Cell ◽

Optimal Transport ◽

State Of The Art ◽

Cell Types ◽

Unsupervised Clustering ◽

Methylation Data ◽

Omics Data ◽

Similarity Metric ◽

Cell Cell

AbstractThe recent advent of high-throughput single-cell molecular profiling is revolutionizing biology and medicine by unveiling the diversity of cell types and states contributing to development and disease. The identification and characterization of cellular heterogeneity is typically achieved through unsupervised clustering, which crucially relies on a similarity metric.We here propose the use of Optimal Transport (OT) as a cell-cell similarity metric for single-cell omics data. OT defines distances to compare, in a geometrically faithful way, high-dimensional data represented as probability distributions. It is thus expected to better capture complex relationships between features and produce a performance improvement over state-of-the-art metrics. To speed up computations and cope with the high-dimensionality of single-cell data, we consider the entropic regularization of the classical OT distance. We then extensively benchmark OT against state-of-the-art metrics over thirteen independent datasets, including simulated, scRNA-seq, scATAC-seq and single-cell DNA methylation data. First, we test the ability of the metrics to detect the similarity between cells belonging to the same groups (e.g. cell types, cell lines of origin). Then, we apply unsupervised clustering and test the quality of the resulting clusters.In our in-depth evaluation, OT is found to improve cell-cell similarity inference and cell clustering in all simulated and real scRNA-seq data, while its performances are comparable with Pearson correlation in scATAC-seq and single-cell DNA methylation data. All our analyses are reproducible through the OT-scOmics Jupyter notebook available at https://github.com/ComputationalSystemsBiology/OT-scOmics.

Download Full-text

Combined single-cell profiling of expression and DNA methylation reveals splicing regulation and heterogeneity

10.1101/328138 ◽

2018 ◽

Cited By ~ 1

Author(s):

Stephanie M. Linker ◽

Lara Urban ◽

Stephen Clark ◽

Mariya Chhatriwala ◽

Shradha Amatya ◽

...

Keyword(s):

Dna Methylation ◽

Alternative Splicing ◽

Single Cell ◽

Single Cells ◽

Exon Skipping ◽

Sequence Composition ◽

Local Sequence ◽

Single Cell Profiling ◽

Induced Pluripotent ◽

Cell Variation

AbstractBackgroundAlternative splicing is a key regulatory mechanism in eukaryotic cells and increases the effective number of functionally distinct gene products. Using bulk RNA sequencing, splicing variation has been studied across human tissues and in genetically diverse populations. This has identified disease-relevant splicing events, as well as associations between splicing and genomic variations, including sequence composition and conservation. However, variability in splicing between single cells from the same tissue or cell type and its determinants remain poorly understood.ResultsWe applied parallel DNA methylation and transcriptome sequencing to differentiating human induced pluripotent stem cells to characterize splicing variation (exon skipping) and its determinants. Our results shows that variation in single-cell splicing can be accurately predicted based on local sequence composition and genomic features. We observe moderate but consistent contributions from local DNA methylation profiles to splicing variation across cells. A combined model that is built based on sequence as well as DNA methylation information accurately predicts different splicing modes of individual cassette exons (AUC=0.85). These categories include the conventional inclusion and exclusion patterns, but also more subtle modes of cell-to-cell variation in splicing. Finally, we identified and characterized associations between DNA methylation and splicing changes during cell differentiation.ConclusionsOur study yields new insights into alternative splicing at the single-cell level and reveals a previously underappreciated link between DNA methylation variation and splicing.

Download Full-text

Single Cell RNA Sequencing in Atherosclerosis Research

Circulation Research ◽

10.1161/circresaha.119.315940 ◽

2020 ◽

Vol 126 (9) ◽

pp. 1112-1126 ◽

Cited By ~ 10

Author(s):

Jesse W. Williams ◽

Holger Winkels ◽

Christopher P. Durant ◽

Konstantin Zaitsev ◽

Yanal Ghosheh ◽

...

Keyword(s):

Single Cell ◽

Biological Diversity ◽

Single Cells ◽

Cell Types ◽

Atherosclerotic Plaques ◽

Molecular Heterogeneity ◽

Cellular Functions ◽

Technological Advances ◽

Technical Guide ◽

Atherosclerosis Research

Technological advances in characterizing molecular heterogeneity at the single cell level have ushered in a deeper understanding of the biological diversity of cells present in tissues including atherosclerotic plaques. New subsets of cells have been discovered among cell types previously considered homogenous. The commercial availability of systems to obtain transcriptomes and matching surface phenotypes from thousands of single cells is rapidly changing our understanding of cell types and lineage identity. Emerging methods to infer cellular functions are beginning to shed new light on the interplay of components involved in multifaceted disease responses, like atherosclerosis. Here, we provide a technical guide for design, implementation, assembly, and interpretations of current single cell transcriptomics approaches from the perspective of employing these tools for advancing cardiovascular disease research.

Download Full-text

A new bioinformatics tool to recover missing gene expression in single-cell RNA sequencing data

Journal of Molecular Cell Biology ◽

10.1093/jmcb/mjaa053 ◽

2020 ◽

Author(s):

Jingyi Jessica Li

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Single Cells ◽

Rapid Evolution ◽

Cell Types ◽

Sequencing Data ◽

Tissue Samples ◽

Technological Advances ◽

Single Cell Rna Sequencing ◽

Or Genes

Abstract Single-cell RNA sequencing (scRNA-seq) is a burgeoning field where experimental techniques and computational methods have been under rapid evolution in the past six years. These technological advances have allowed biomedical researchers to identify new cell types, delineate cell sub-populations, and infer cell differentiation trajectories in various tissue samples. Among the important features extractable from scRNA-seq data, the predominant ones are individual genes’ expression levels in single cells. Most analyses require a preprocessing step that converts a scRNA-seq dataset into a count matrix, where rows correspond to cells (or genes), columns correspond to genes (or cells), and entries are counts, i.e. a count is the number of sequenced reads or uniquely mapped identifiers (UMIs) mapped to a gene in a cell. Single-cell count matrices are highly sparse; for example, a typical matrix constructed from a droplet-based dataset may have >90% of counts as zeros.

Download Full-text

Predicting transcription factor binding in single cells through deep learning

Science Advances ◽

10.1126/sciadv.aba9031 ◽

2020 ◽

Vol 6 (51) ◽

pp. eaba9031

Author(s):

Laiyi Fu ◽

Lihua Zhang ◽

Emmanuel Dollinger ◽

Qinke Peng ◽

Qing Nie ◽

...

Keyword(s):

Deep Learning ◽

Single Cell ◽

Single Cells ◽

Cell Types ◽

Chromatin Accessibility ◽

Sequence Motifs ◽

Genome Wide ◽

Chromatin Immunoprecipitation Sequencing ◽

Deep Learning Model ◽

Accessible Chromatin

Characterizing genome-wide binding profiles of transcription factors (TFs) is essential for understanding biological processes. Although techniques have been developed to assess binding profiles within a population of cells, determining them at a single-cell level remains elusive. Here, we report scFAN (single-cell factor analysis network), a deep learning model that predicts genome-wide TF binding profiles in individual cells. scFAN is pretrained on genome-wide bulk assay for transposase-accessible chromatin sequencing (ATAC-seq), DNA sequence, and chromatin immunoprecipitation sequencing (ChIP-seq) data and uses single-cell ATAC-seq to predict TF binding in individual cells. We demonstrate the efficacy of scFAN by both studying sequence motifs enriched within predicted binding peaks and using predicted TFs for discovering cell types. We develop a new metric “TF activity score” to characterize each cell and show that activity scores can reliably capture cell identities. scFAN allows us to discover and study cellular identities and heterogeneity based on chromatin accessibility profiles.

Download Full-text

Predicting transcription factor binding in single cells through deep learning

10.1101/2020.01.14.905232 ◽

2020 ◽

Author(s):

Laiyi Fu ◽

Lihua Zhang ◽

Emmanuel Dollinger ◽

Qinke Peng ◽

Qing Nie ◽

...

Keyword(s):

Transcription Factor ◽

Deep Learning ◽

Single Cell ◽

Single Cells ◽

Cell Types ◽

Chromatin Accessibility ◽

Biological Processes ◽

Sequence Motifs ◽

Genome Wide ◽

Deep Learning Model

AbstractCharacterizing genome-wide binding profiles of transcription factor (TF) is essential for understanding many biological processes. Although techniques have been developed to assess binding profiles within a population of cells, determining binding profiles at a single cell level remains elusive. Here we report scFAN (Single Cell Factor Analysis Network), a deep learning model that predicts genome-wide TF binding profiles in individual cells. scFAN is pre-trained on genome-wide bulk ATAC-seq, DNA sequence and ChIP-seq data, and utilizes single-cell ATAC-seq to predict TF binding in individual cells. We demonstrate the efficacy of scFAN by studying sequence motifs enriched within predicted binding peaks and investigating the effectiveness of predicted TF peaks for discovering cell types. We develop a new metric “TF activity score” to characterize each cell, and show that the activity scores can reliably capture cell identities. The method allows us to discover and study cellular identities and heterogeneity based on chromatin accessibility profiles.

Download Full-text

EMeth: An EM algorithm for cell type decomposition based on DNA methylation data

Scientific Reports ◽

10.1038/s41598-021-84864-9 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Hanyu Zhang ◽

Ruoyi Cai ◽

James Dai ◽

Wei Sun

Keyword(s):

Dna Methylation ◽

Tumor Cells ◽

T Regulatory Cells ◽

Simulated Data ◽

Cell Types ◽

Computational Method ◽

Methylation Data ◽

Cell Type ◽

A Cell ◽

Type Decomposition

AbstractWe introduce a new computational method named EMeth to estimate cell type proportions using DNA methylation data. EMeth is a reference-based method that requires cell type-specific DNA methylation data from relevant cell types. EMeth improves on the existing reference-based methods by detecting the CpGs whose DNA methylation are inconsistent with the deconvolution model and reducing their contributions to cell type decomposition. Another novel feature of EMeth is that it allows a cell type with known proportions but unknown reference and estimates its methylation. This is motivated by the case of studying methylation in tumor cells while bulk tumor samples include tumor cells as well as other cell types such as infiltrating immune cells, and tumor cell proportion can be estimated by copy number data. We demonstrate that EMeth delivers more accurate estimates of cell type proportions than several other methods using simulated data and in silico mixtures. Applications in cancer studies show that the proportions of T regulatory cells estimated by DNA methylation have expected associations with mutation load and survival time, while the estimates from gene expression miss such associations.

Download Full-text

Evaluation of affinity-based genome-wide DNA methylation data: Effects of CpG density, amplification bias, and copy number variation

Genome Research ◽

10.1101/gr.110601.110 ◽

2010 ◽

Vol 20 (12) ◽

pp. 1719-1729 ◽

Cited By ~ 92

Author(s):

M. D. Robinson ◽

C. Stirzaker ◽

A. L. Statham ◽

M. W. Coolen ◽

J. Z. Song ◽

...

Keyword(s):

Dna Methylation ◽

Copy Number Variation ◽

Copy Number ◽

Methylation Data ◽

Amplification Bias ◽

Genome Wide ◽

Number Variation

Download Full-text