scholarly journals Assessment of computational methods for the analysis of single-cell ATAC-seq data

2019 ◽  
Author(s):  
Huidong Chen ◽  
Caleb Lareau ◽  
Tommaso Andreani ◽  
Michael E. Vinyard ◽  
Sara P. Garcia ◽  
...  

AbstractBackgroundRecent innovations in single-cell Assay for Transposase Accessible Chromatin using sequencing (scATAC-seq) enable profiling of the epigenetic landscape of thousands of individual cells. scATAC-seq data analysis presents unique methodological challenges. scATAC-seq experiments sample DNA, which, due to low copy numbers (diploid in humans) lead to inherent data sparsity (1-10% of peaks detected per cell) compared to transcriptomic (scRNA-seq) data (20-50% of expressed genes detected per cell). Such challenges in data generation emphasize the need for informative features to assess cell heterogeneity at the chromatin level.ResultsWe present a benchmarking framework that was applied to 10 computational methods for scATAC-seq on 13 synthetic and real datasets from different assays, profiling cell types from diverse tissues and organisms. Methods for processing and featurizing scATAC-seq data were evaluated by their ability to discriminate cell types when combined with common unsupervised clustering approaches. We rank evaluated methods and discuss computational challenges associated with scATAC-seq analysis including inherently sparse data, determination of features, peak calling, the effects of sequencing coverage and noise, and clustering performance. Running times and memory requirements are also discussed.ConclusionsThis reference summary of scATAC-seq methods offers recommendations for best practices with consideration for both the non-expert user and the methods developer. Despite variation across methods and datasets, SnapATAC, Cusanovich2018, and cisTopic outperform other methods in separating cell populations of different coverages and noise levels in both synthetic and real datasets. Notably, SnapATAC was the only method able to analyze a large dataset (> 80,000 cells).

2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Huidong Chen ◽  
Caleb Lareau ◽  
Tommaso Andreani ◽  
Michael E. Vinyard ◽  
Sara P. Garcia ◽  
...  

Abstract Background Recent innovations in single-cell Assay for Transposase Accessible Chromatin using sequencing (scATAC-seq) enable profiling of the epigenetic landscape of thousands of individual cells. scATAC-seq data analysis presents unique methodological challenges. scATAC-seq experiments sample DNA, which, due to low copy numbers (diploid in humans), lead to inherent data sparsity (1–10% of peaks detected per cell) compared to transcriptomic (scRNA-seq) data (10–45% of expressed genes detected per cell). Such challenges in data generation emphasize the need for informative features to assess cell heterogeneity at the chromatin level. Results We present a benchmarking framework that is applied to 10 computational methods for scATAC-seq on 13 synthetic and real datasets from different assays, profiling cell types from diverse tissues and organisms. Methods for processing and featurizing scATAC-seq data were compared by their ability to discriminate cell types when combined with common unsupervised clustering approaches. We rank evaluated methods and discuss computational challenges associated with scATAC-seq analysis including inherently sparse data, determination of features, peak calling, the effects of sequencing coverage and noise, and clustering performance. Running times and memory requirements are also discussed. Conclusions This reference summary of scATAC-seq methods offers recommendations for best practices with consideration for both the non-expert user and the methods developer. Despite variation across methods and datasets, SnapATAC, Cusanovich2018, and cisTopic outperform other methods in separating cell populations of different coverages and noise levels in both synthetic and real datasets. Notably, SnapATAC is the only method able to analyze a large dataset (> 80,000 cells).


2021 ◽  
Vol 12 ◽  
Author(s):  
Zhe Cui ◽  
Ya Cui ◽  
Yan Gao ◽  
Tao Jiang ◽  
Tianyi Zang ◽  
...  

Single-cell Assay Transposase Accessible Chromatin sequencing (scATAC-seq) has been widely used in profiling genome-wide chromatin accessibility in thousands of individual cells. However, compared with single-cell RNA-seq, the peaks of scATAC-seq are much sparser due to the lower copy numbers (diploid in humans) and the inherent missing signals, which makes it more challenging to classify cell type based on specific expressed gene or other canonical markers. Here, we present svmATAC, a support vector machine (SVM)-based method for accurately identifying cell types in scATAC-seq datasets by enhancing peak signal strength and imputing signals through patterns of co-accessibility. We applied svmATAC to several scATAC-seq data from human immune cells, human hematopoietic system cells, and peripheral blood mononuclear cells. The benchmark results showed that svmATAC is free of literature-based markers and robust across datasets in different libraries and platforms. The source code of svmATAC is available at https://github.com/mrcuizhe/svmATAC under the MIT license.


2020 ◽  
Author(s):  
Alexandre P. Marand ◽  
Zongliang Chen ◽  
Andrea Gallavotti ◽  
Robert J. Schmitz

ABSTRACTCis-regulatory elements (CREs) encode the genomic blueprints for coordinating spatiotemporal gene expression programs underlying highly specialized cell functions. To identify CREs underlying cell-type specification and developmental transitions, we implemented single-cell sequencing of Assay for Transposase Accessible Chromatin in an atlas of Zea mays organs. We describe 92 distinct states of chromatin accessibility across more than 165,913 putative CREs, 56,575 cells, and 52 known cell-types in maize using a novel implementation of regularized quasibinomial logistic regression. Cell states were largely determined by combinatorial accessibility of transcription factors (TFs) and their binding sites. A neural network revealed that cell identity could be accurately predicted (>0.94) solely based on TF binding site accessibility. Co-accessible chromatin recapitulated higher-order chromatin interactions, with distinct sets of TFs coordinating cell type-specific regulatory dynamics. Pseudotime reconstruction and alignment with Arabidopsis thaliana trajectories identified conserved TFs, associated motifs, and cis-regulatory regions specifying sequential developmental progressions. Cell-type specific accessible chromatin regions were enriched with phenotype-associated genetic variants and signatures of selection, revealing the major cell-types and putative CREs targeted by modern maize breeding. Collectively, our analysis affords a comprehensive framework for understanding cellular heterogeneity, evolution, and cis-regulatory grammar of cell-type specification in a major crop species.


2021 ◽  
Author(s):  
Adam Francisco ◽  
Jine Li ◽  
Alaa Farghli ◽  
Matt Kanke ◽  
Bo Shui ◽  
...  

Fibrolamellar carcinoma (FLC) is an aggressive liver cancer with no effective therapeutic options. The extracellular environment of FLC tumors is poorly characterized and may contribute to cancer growth and/or metastasis. To bridge this knowledge gap, we assessed pathways relevant to proteoglycans, a major component of the extracellular matrix. We first analyzed gene expression data from FLC and non-malignant liver tissue to identify changes in glycosaminoglycan (GAG) biosynthesis pathways. We then implemented a novel LC-MS/MS based method to quantify the abundance of different types of GAGs in patient tumors, followed by measurement of the levels of different GAG-associated proteins. Finally, we performed the first single-cell assay for transposon-accessible chromatin-sequencing on FLC tumors, to identify which cell types are linked to the most dominant GAG-associated protein in FLC. Our results reveal a pathologic aberrancy in chondroitin (but not heparan) sulfate proteoglycans in FLC and highlight a potential role for activated stellate cells.


2016 ◽  
Author(s):  
Vladimir Yu. Kiselev ◽  
Kristina Kirschner ◽  
Michael T. Schaub ◽  
Tallulah Andrews ◽  
Andrew Yiu ◽  
...  

AbstractUsing single-cell RNA-seq (scRNA-seq), the full transcriptome of individual cells can be acquired, enabling a quantitative cell-type characterisation based on expression profiles. However, due to the large variability in gene expression, identifying cell types based on the transcriptome remains challenging. We present Single-Cell Consensus Clustering (SC3), a tool for unsupervised clustering of scRNA-seq data. SC3 achieves high accuracy and robustness by consistently integrating different clustering solutions through a consensus approach. Tests on twelve published datasets show that SC3 outperforms five existing methods while remaining scalable, as shown by the analysis of a large dataset containing 44,808 cells. Moreover, an interactive graphical implementation makes SC3 accessible to a wide audience of users, and SC3 aids biological interpretation by identifying marker genes, differentially expressed genes and outlier cells. We illustrate the capabilities of SC3 by characterising newly obtained transcriptomes from subclones of neoplastic cells collected from patients.


2021 ◽  
Author(s):  
Xinhai Pan ◽  
Hechen Li ◽  
Xiuwei Zhang

Recently, the combined scRNA-seq and CRISPR/Cas9 genome editing technologies have enabled simultaneous readouts of gene expressions and lineage barcodes, which allows for the reconstruction of the cell division tree, and makes it possible to trace the origin of each cell type. Computational methods are emerging to take advantage of the jointly profiled scRNA-seq and lineage barcode data to better reconstruct the cell division history or to infer the cell state trajectories. Here, we present TedSim (single cell Temporal dynamics Simulator), a simulator that simulates the cell division events from the root cell to present-day cells, simultaneously generating the CRISPR/Cas9 genome editing lineage barcodes and scRNA-seq data. In particular, TedSim generates cells from multiple cell types through cell division events. TedSim can be used to benchmark and investigate computational methods which use either or both of the two types of data, scRNA-seq and lineage barcodes, to study cell lineages or trajectories. TedSim is available at: https://github.com/Galaxeee/TedSim.


2021 ◽  
Author(s):  
Maria Brbic ◽  
Kaidi Cao ◽  
John W Hickey ◽  
Yuqi Tan ◽  
Michael Snyder ◽  
...  

Spatial protein and RNA imaging technologies have been gaining rapid attention but current computational methods for annotating cells are based on techniques established for dissociated single-cell technologies and thus do not take spatial organization into account. Here we present STELLAR, a geometric deep learning method that utilizes spatial and molecular cell information to automatically assign cell types from an annotated reference set as well as discover new cell types and cell states. STELLAR transfers annotations across different dissection regions, tissues, and donors and detects higher-order tissue structures with dramatic time savings.


2019 ◽  
Author(s):  
Caleb A. Lareau ◽  
Fabiana M. Duarte ◽  
Jennifer G. Chew ◽  
Vinay K. Kartha ◽  
Zach D. Burkett ◽  
...  

AbstractWhile recent technical advancements have facilitated the mapping of epigenomes at single-cell resolution, the throughput and quality of these methods have limited the widespread adoption of these technologies. Here, we describe a droplet microfluidics platform for single-cell assay for transposase accessible chromatin (scATAC-seq) for high-throughput single-cell profiling of chromatin accessibility. We use this approach for the unbiased discovery of cell types and regulatory elements within the mouse brain. Further, we extend the throughput of this approach by pairing combinatorial indexing with droplet microfluidics, enabling single-cell studies at a massive scale. With this approach, we measure chromatin accessibility across resting and stimulated human bone marrow derived cells to reveal changes in the cis- and trans- regulatory landscape across cell types and upon stimulation conditions at single-cell resolution. Altogether, we describe a total of 502,207 single-cell profiles, demonstrating the scalability and flexibility of this droplet-based platform.


2018 ◽  
Author(s):  
John R. Sinnamon ◽  
Kristof A. Torkenczy ◽  
Michael W. Linhoff ◽  
Sarah Vitak ◽  
Hannah A. Pliner ◽  
...  

ABSTRACTHere we present a comprehensive map of the accessible chromatin landscape of the mouse hippocampus at single-cell resolution. Substantial advances of this work include the optimization of single-cell combinatorial indexing assay for transposase accessible chromatin (sci-ATAC-seq), a software suite,scitools, for the rapid processing and visualization of single-cell combinatorial indexing datasets, and a valuable resource of hippocampal regulatory networks at single-cell resolution. We utilized sci-ATAC-seq to produce 2,346 high-quality single-cell chromatin accessibility maps with a mean unique read count per cell of 29,201 from both fresh and frozen hippocampi, observing little difference in accessibility patterns between the preparations. Using this dataset, we identified eight distinct major clusters of cells representing both neuronal and non-neuronal cell types and characterized the driving regulatory factors and differentially accessible loci that define each cluster. We then applied a recently described co-accessibility framework,Cicero, which identified 146,818 links between promoters and putative distal regulatory DNA. Identified co-accessibility networks showed cell-type specificity, shedding light on key dynamic loci that reconfigure to specify hippocampal cell lineages. Lastly, we carried out an additional sci-ATAC-seq preparation from cultured hippocampal neurons (899 high-quality cells, 43,532 mean unique reads) that revealed substantial alterations in their epigenetic landscape compared to nuclei from hippocampal tissue. This dataset and accompanying analysis tools provide a new resource that can guide subsequent studies of the hippocampus.


Sign in / Sign up

Export Citation Format

Share Document