scholarly journals HiCluster: A Robust Single-Cell Hi-C Clustering Method Based on Convolution and Random Walk

2018 ◽  
Author(s):  
Jingtian Zhou ◽  
Jianzhu Ma ◽  
Yusi Chen ◽  
Chuankai Cheng ◽  
Bokan Bao ◽  
...  

3D genome structure plays a pivotal role in gene regulation and cellular function. Single-cell analysis of genome architecture has been achieved using imaging and chromatin conformation capture methods such as Hi-C. To study variation in chromosome structure between different cell types, computational approaches are needed that can utilize sparse and heterogeneous single-cell Hi-C data. However, few methods exist that are able to accurately and efficiently cluster such data into constituent cell types. Here, we describe HiCluster, a single-cell clustering algorithm for Hi-C contact matrices that is based on imputations using linear convolution and random walk. Using both simulated and real data as benchmarks, HiCluster significantly improves clustering accuracy when applied to low coverage Hi-C datasets compared to existing methods. After imputation by HiCluster, structures similar to topologically associating domains (TADs) could be identified within single cells, and their consensus boundaries among cells were enriched at the TAD boundaries observed in bulk samples. In summary, HiCluster facilitates visualization and comparison of single-cell 3D genomes.

2019 ◽  
Vol 116 (28) ◽  
pp. 14011-14018 ◽  
Author(s):  
Jingtian Zhou ◽  
Jianzhu Ma ◽  
Yusi Chen ◽  
Chuankai Cheng ◽  
Bokan Bao ◽  
...  

Three-dimensional genome structure plays a pivotal role in gene regulation and cellular function. Single-cell analysis of genome architecture has been achieved using imaging and chromatin conformation capture methods such as Hi-C. To study variation in chromosome structure between different cell types, computational approaches are needed that can utilize sparse and heterogeneous single-cell Hi-C data. However, few methods exist that are able to accurately and efficiently cluster such data into constituent cell types. Here, we describe scHiCluster, a single-cell clustering algorithm for Hi-C contact matrices that is based on imputations using linear convolution and random walk. Using both simulated and real single-cell Hi-C data as benchmarks, scHiCluster significantly improves clustering accuracy when applied to low coverage datasets compared with existing methods. After imputation by scHiCluster, topologically associating domain (TAD)-like structures (TLSs) can be identified within single cells, and their consensus boundaries were enriched at the TAD boundaries observed in bulk cell Hi-C samples. In summary, scHiCluster facilitates visualization and comparison of single-cell 3D genomes.


Author(s):  
Tianming Zhou ◽  
Ruochi Zhang ◽  
Jian Ma

The spatial organization of the genome in the cell nucleus is pivotal to cell function. However, how the 3D genome organization and its dynamics influence cellular phenotypes remains poorly understood. The very recent development of single-cell technologies for probing the 3D genome, especially single-cell Hi-C (scHi-C), has ushered in a new era of unveiling cell-to-cell variability of 3D genome features at an unprecedented resolution. Here, we review recent developments in computational approaches to the analysis of scHi-C, including data processing, dimensionality reduction, imputation for enhancing data quality, and the revealing of 3D genome features at single-cell resolution. While much progress has been made in computational method development to analyze single-cell 3D genomes, substantial future work is needed to improve data interpretation and multimodal data integration, which are critical to reveal fundamental connections between genome structure and function among heterogeneous cell populations in various biological contexts. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 4 is July 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


2021 ◽  
Author(s):  
Qing Xie ◽  
Chengong Han ◽  
Victor Jin ◽  
Shili Lin

Single cell Hi-C techniques enable one to study cell to cell variability in chromatin interactions. However, single cell Hi-C (scHi-C) data suffer severely from sparsity, that is, the existence of excess zeros due to insufficient sequencing depth. Complicate things further is the fact that not all zeros are created equal, as some are due to loci truly not interacting because of the underlying biological mechanism (structural zeros), whereas others are indeed due to insufficient sequencing depth (sampling zeros), especially for loci that interact infrequently. Differentiating between structural zeros and sampling zeros is important since correct inference would improve downstream analyses such as clustering and discovery of subtypes. Nevertheless, distinguishing between these two types of zeros has received little attention in the single cell Hi-C literature, where the issue of sparsity has been addressed mainly as a data quality improvement problem. To fill this gap, in this paper, we propose HiCImpute, a Bayesian hierarchy model that goes beyond data quality improvement by also identifying observed zeros that are in fact structural zeros. HiCImpute takes spatial dependencies of scHi-C 2D data structure into account while also borrowing information from similar single cells and bulk data, when such are available. Through an extensive set of analyses of synthetic and real data, we demonstrate the ability of HiCImpute for identifying structural zeros with high sensitivity, and for accurate imputation of dropout values in sampling zeros. Downstream analyses using data improved from HiCImpute yielded much more accurate clustering of cell types compared to using observed data or data improved by several comparison methods. Most significantly, HiCImpute-improved data has led to the identification of subtypes within each of the excitatory neuronal cells of L4 and L5 in the prefrontal cortex.


2020 ◽  
Vol 52 (10) ◽  
pp. 468-477
Author(s):  
Alexander C. Zambon ◽  
Tom Hsu ◽  
Seunghee Erin Kim ◽  
Miranda Klinck ◽  
Jennifer Stowe ◽  
...  

Much of our understanding of the regulatory mechanisms governing the cell cycle in mammals has relied heavily on methods that measure the aggregate state of a population of cells. While instrumental in shaping our current understanding of cell proliferation, these approaches mask the genetic signatures of rare subpopulations such as quiescent (G0) and very slowly dividing (SD) cells. Results described in this study and those of others using single-cell analysis reveal that even in clonally derived immortalized cancer cells, ∼1–5% of cells can exhibit G0 and SD phenotypes. Therefore to enable the study of these rare cell phenotypes we established an integrated molecular, computational, and imaging approach to track, isolate, and genetically perturb single cells as they proliferate. A genetically encoded cell-cycle reporter (K67p-FUCCI) was used to track single cells as they traversed the cell cycle. A set of R-scripts were written to quantify K67p-FUCCI over time. To enable the further study G0 and SD phenotypes, we retrofitted a live cell imaging system with a micromanipulator to enable single-cell targeting for functional validation studies. Single-cell analysis revealed HT1080 and MCF7 cells had a doubling time of ∼24 and ∼48 h, respectively, with high duration variability in G1 and G2 phases. Direct single-cell microinjection of mRNA encoding (GFP) achieves detectable GFP fluorescence within ∼5 h in both cell types. These findings coupled with the possibility of targeting several hundreds of single cells improves throughput and sensitivity over conventional methods to study rare cell subpopulations.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Chunxiang Wang ◽  
Xin Gao ◽  
Juntao Liu

Abstract Background Advances in single-cell RNA-seq technology have led to great opportunities for the quantitative characterization of cell types, and many clustering algorithms have been developed based on single-cell gene expression. However, we found that different data preprocessing methods show quite different effects on clustering algorithms. Moreover, there is no specific preprocessing method that is applicable to all clustering algorithms, and even for the same clustering algorithm, the best preprocessing method depends on the input data. Results We designed a graph-based algorithm, SC3-e, specifically for discriminating the best data preprocessing method for SC3, which is currently the most widely used clustering algorithm for single cell clustering. When tested on eight frequently used single-cell RNA-seq data sets, SC3-e always accurately selects the best data preprocessing method for SC3 and therefore greatly enhances the clustering performance of SC3. Conclusion The SC3-e algorithm is practically powerful for discriminating the best data preprocessing method, and therefore largely enhances the performance of cell-type clustering of SC3. It is expected to play a crucial role in the related studies of single-cell clustering, such as the studies of human complex diseases and discoveries of new cell types.


2020 ◽  
Author(s):  
Jeremy Lombardo ◽  
Marzieh Aliaghaei ◽  
Quy Nguyen ◽  
Kai Kessenbrock ◽  
Jered Haun

Abstract Tissues are composed of highly heterogeneous mixtures of cell subtypes, and this diversity is increasingly being characterized using high-throughput single cell analysis methods. However, these efforts are hindered by the fact that tissues must first be dissociated into single cell suspensions that are viable and still accurately represent phenotypes from the original tissue. Current methods for breaking down tissues are inefficient, labor-intensive, subject to high variability, and potentially biased towards cell subtypes that are easier to release. Here, we present a microfluidic platform consisting of three different tissue processing technologies that can perform the complete tissue to single cell workflow, including digestion, disaggregation, and filtration. First, we developed a new microfluidic digestion device that can be loaded with minced tissue specimens quickly and easily, and then use the combination of proteolytic enzyme activity and fluid shear forces to accelerate tissue breakdown. Next, we integrated dissociation and filter technologies into a single device, which enhanced single cell numbers and fully prepared the sample for single cell analysis. The final multi-device platform was then evaluated using a diverse array of tissue types that exhibited a wide range of properties. For murine kidney and mammary tumor, we found that microfluidic processing produced 2.5-fold more single, viable cells. Single cell RNA sequencing (scRNA-seq) further revealed that device processing enriched for endothelial cells, fibroblasts, and basal epithelium, and did not increase stress responses. For murine liver and heart, which are softer tissues containing fragile cell types, processing time could be reduced to 15 min, and even as short as 1 min. We also demonstrated that periodic recovery at defined time intervals produced substantially more hepatocytes and cardiomyocytes than continuous operation, most likely by preventing damage to fragile cell types. In future work, we will seek to integrate additional operations such as upstream tissue preparation and downstream microfluidic cell sorting and detection to create powerful point-of-care single cell diagnostic platforms.


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 1158 ◽  
Author(s):  
Fanny Perraudeau ◽  
Davide Risso ◽  
Kelly Street ◽  
Elizabeth Purdom ◽  
Sandrine Dudoit

Novel single-cell transcriptome sequencing assays allow researchers to measure gene expression levels at the resolution of single cells and offer the unprecendented opportunity to investigate at the molecular level fundamental biological questions, such as stem cell differentiation or the discovery and characterization of rare cell types. However, such assays raise challenging statistical and computational questions and require the development of novel methodology and software. Using stem cell differentiation in the mouse olfactory epithelium as a case study, this integrated workflow provides a step-by-step tutorial to the methodology and associated software for the following four main tasks: (1) dimensionality reduction accounting for zero inflation and over dispersion and adjusting for gene and cell-level covariates; (2) cell clustering using resampling-based sequential ensemble clustering; (3) inference of cell lineages and pseudotimes; and (4) differential expression analysis along lineages.


Author(s):  
Dong-Sung Lee ◽  
Chongyuan Luo ◽  
Jingtian Zhou ◽  
Sahaana Chandran ◽  
Angeline Rivkin ◽  
...  

Abstract The ability to profile epigenomic features in single cells is facilitating the study of the variation in transcription regulation at the single cell level. Single cell methods have also facilitated the generation of cell-type resolved transcriptomic and epigenetic profiles of lineages derived from complex heterogeneous samples. However, integrating different epigenetic features remain challenging, as many current methods profile a single data type at at time. Furthermore, some epigenetic features, such as 3D genome organization, are intrinsically variable between single cells of the same lineage, so it remains unclear how well these methods may resolve cell-types from complex mixtures. Here we describe a method for profiling 3D genome organization and DNA methylation in single cells. This protocol accompanies Lee et al. (Nature Methods 2019) after peer review to aid potential users in applying the method to their own samples.


2019 ◽  
Author(s):  
Erwin M. Schoof ◽  
Nicolas Rapin ◽  
Simonas Savickas ◽  
Coline Gentil ◽  
Eric Lechman ◽  
...  

AbstractIn recent years, cellular life science research has experienced a significant shift, moving away from conducting bulk cell interrogation towards single-cell analysis. It is only through single cell analysis that a complete understanding of cellular heterogeneity, and the interplay between various cell types that are fundamental to specific biological phenotypes, can be achieved. Single-cell assays at the protein level have been predominantly limited to targeted, antibody-based methods. However, here we present an experimental and computational pipeline, which establishes a comprehensive single-cell mass spectrometry-based proteomics workflow.By exploiting a leukemia culture system, containing functionally-defined leukemic stem cells, progenitors and terminally differentiated blasts, we demonstrate that our workflow is able to explore the cellular heterogeneity within this aberrant developmental hierarchy. We show our approach is capable to quantifying hundreds of proteins across hundreds of single cells using limited instrument time. Furthermore, we developed a computational pipeline (SCeptre), that effectively clusters the data and permits the extraction of cell-specific proteins and functional pathways. This proof-of-concept work lays the foundation for future global single-cell proteomics studies.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Shahin Mohammadi ◽  
Jose Davila-Velderrain ◽  
Manolis Kellis

Abstract Dissecting the cellular heterogeneity embedded in single-cell transcriptomic data is challenging. Although many methods and approaches exist, identifying cell states and their underlying topology is still a major challenge. Here, we introduce the concept of multiresolution cell-state decomposition as a practical approach to simultaneously capture both fine- and coarse-grain patterns of variability. We implement this concept in ACTIONet, a comprehensive framework that combines archetypal analysis and manifold learning to provide a ready-to-use analytical approach for multiresolution single-cell state characterization. ACTIONet provides a robust, reproducible, and highly interpretable single-cell analysis platform that couples dominant pattern discovery with a corresponding structural representation of the cell state landscape. Using multiple synthetic and real data sets, we demonstrate ACTIONet’s superior performance relative to existing alternatives. We use ACTIONet to integrate and annotate cells across three human cortex data sets. Through integrative comparative analysis, we define a consensus vocabulary and a consistent set of gene signatures discriminating against the transcriptomic cell types and subtypes of the human prefrontal cortex.


Sign in / Sign up

Export Citation Format

Share Document