Efficient synergistic single-cell genome assembly

Mapping Intimacies ◽

10.1101/002972 ◽

2014 ◽

Author(s):

Narjes S. Movahedi ◽

Zeinab Taghavi ◽

Mallory Embree ◽

Harish Nagarajan ◽

Karsten Zengler ◽

...

Keyword(s):

Single Cell ◽

De Novo ◽

Single Cells ◽

Data Sets ◽

Protein Coding ◽

Single Cell Sequencing ◽

Assembly Method ◽

Sequencing Experiment ◽

Cell Genome ◽

Lower Depth

As the vast majority of all microbes are unculturable, single-cell sequencing has become a significant method to gain insight into microbial physiology. Single-cell sequencing methods, currently powered by multiple displacement genome amplification (MDA), have passed important milestones such as finishing and closing the genome of a prokaryote. However, the quality and reliability of genome assemblies from single cells are still unsatisfactory due to uneven coverage depth and the absence of scattered chunks of the genome in the final collection of reads caused by MDA bias. In this work, our new algorithm Hybrid De novo Assembler (HyDA) demonstrates the power of co-assembly of multiple single-cell genomic data sets through significant improvement of the assembly quality in terms of predicted functional elements and length statistics. Co-assemblies contain significantly more base pairs and protein coding genes, cover more subsystems, and consist of longer contigs compared to individual assemblies by the same algorithm as well as state-of-the-art single-cell assemblers SPAdes and IDBA-UD. Hybrid De novo Assembler (HyDA) is also able to avoid chimeric assemblies by detecting and separating shared and exclusive pieces of sequence for input data sets. By replacing one deep single-cell sequencing experiment with a few single-cell sequencing experiments of lower depth, the co-assembly method can hedge against the risk of failure and loss of the sample, without significantly increasing sequencing cost. Application of the single-cell co-assembler HyDA to the study of three uncultured members of an alkane-degrading methanogenic community validated the usefulness of the co-assembly concept.

Download Full-text

Cellsnp-lite: an efficient tool for genotyping single cells

10.1101/2020.12.31.424913 ◽

2021 ◽

Author(s):

Xianjie Huang ◽

Yuanhua Huang

Keyword(s):

Single Cell ◽

Single Cells ◽

Basic Research ◽

Substantial Improvement ◽

Data Sets ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Memory Efficiency ◽

Computational Speed ◽

Cell Data

AbstractSummarySingle-cell sequencing is an increasingly used technology and has promising applications in basic research and clinical translations. However, genotyping methods developed for bulk sequencing data have not been well adapted for single-cell data, in terms of both computational parallelization and simplified user interface. Here we introduce a software, cellsnp-lite, implemented in C/C++ and based on well supported package htslib, for genotyping in single-cell sequencing data for both droplet and well based platforms. On various experimental data sets, it shows substantial improvement in computational speed and memory efficiency with retaining highly concordant results compared to existing methods. Cellsnp-lite therefore lightens the genetic analysis for increasingly large single-cell data.AvailabilityThe source code is freely available at https://github.com/single-cell-genetics/[email protected]

Download Full-text

Neural Data Visualization for Scalable and Generalizable Single Cell Analysis

10.1101/289223 ◽

2018 ◽

Cited By ~ 2

Author(s):

Hyunghoon Cho ◽

Bonnie Berger ◽

Jian Peng

Keyword(s):

Single Cell ◽

Single Cell Analysis ◽

Single Cells ◽

Data Sets ◽

Cell Analysis ◽

Data Set ◽

Unseen Data ◽

Sequencing Experiment ◽

Cell Expression

SummarySingle-cell RNA sequencing is becoming effective and accessible as emerging technologies push its scale to millions of cells and beyond. Visualizing the landscape of single cell expression has been a fundamental tool in single cell analysis. However, standard methods for visualization, such as t-stochastic neighbor embedding (t-SNE), not only lack scalability to data sets with millions of cells, but also are unable to generalize to new cells, an important ability for transferring knowledge across fast-accumulating data sets. We introduce net-SNE, which trains a neural network to learn a high quality visualization of single cells that newly generalizes to unseen data. While matching the visualization quality of t-SNE on 14 benchmark data sets of varying sizes, from hundreds to 1.3 million cells, net-SNE also effectively positions previously unseen cells, even when an entire subtype is missing from the initial data set or when the new cells are from a different sequencing experiment. Furthermore, given a “reference” visualization, net-SNE can vastly reduce the computational burden of visualizing millions of single cells from multiple days to just a few minutes of runtime. Our work provides a general framework for newly bootstrapping single cell analysis from existing data sets.

Download Full-text

Single-cell sequencing techniques from individual to multiomics analyses

Experimental & Molecular Medicine ◽

10.1038/s12276-020-00499-2 ◽

2020 ◽

Vol 52 (9) ◽

pp. 1419-1427

Author(s):

Yukie Kashima ◽

Yoshitaka Sakamoto ◽

Keiya Kaneko ◽

Masahide Seki ◽

Yutaka Suzuki ◽

...

Keyword(s):

Single Cell ◽

Single Cells ◽

Experimental Methods ◽

Chromatin Accessibility ◽

Data Sets ◽

Transcriptome Data ◽

Detailed Understanding ◽

Single Cell Sequencing ◽

Single Cell Rna Sequencing ◽

Molecular Profiles

Abstract Here, we review single-cell sequencing techniques for individual and multiomics profiling in single cells. We mainly describe single-cell genomic, epigenomic, and transcriptomic methods, and examples of their applications. For the integration of multilayered data sets, such as the transcriptome data derived from single-cell RNA sequencing and chromatin accessibility data derived from single-cell ATAC-seq, there are several computational integration methods. We also describe single-cell experimental methods for the simultaneous measurement of two or more omics layers. We can achieve a detailed understanding of the basic molecular profiles and those associated with disease in each cell by utilizing a large number of single-cell sequencing techniques and the accumulated data sets.

Download Full-text

ChromVAR: Inferring transcription factor variation from single-cell epigenomic data

10.1101/110346 ◽

2017 ◽

Cited By ~ 8

Author(s):

Alicia N. Schep ◽

Beijing Wu ◽

Jason D. Buenrostro ◽

William J. Greenleaf

Keyword(s):

Transcription Factor ◽

Single Cell ◽

De Novo ◽

Single Cells ◽

R Package ◽

Chromatin Accessibility ◽

Data Sets ◽

Sequence Motifs ◽

Computational Approaches

AbstractSingle cell ATAC-seq (scATAC) yields sparse data that makes application of conventional computational approaches for data analysis challenging or impossible. We developed chromVAR, an R package for analyzing sparse chromatin accessibility data by estimating the gain or loss of accessibility within sets of peaks sharing the same motif or annotation while controlling for known technical biases. chromVAR enables accurate clustering of scATAC-seq profiles and enables characterization of known, or the de novo identification of novel, sequence motifs associated with variation in chromatin accessibility across single cells or other sparse epigenomic data sets.

Download Full-text

Faculty Opinions recommendation of Efficient de novo assembly of single-cell bacterial genomes from short-read data sets.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.13296960.14657061 ◽

2011 ◽

Author(s):

Steven Salzberg

Keyword(s):

Single Cell ◽

De Novo Assembly ◽

De Novo ◽

Data Sets ◽

Bacterial Genomes ◽

Short Read

Download Full-text

The Development of an Effective Bacterial Single-Cell Lysis Method Suitable for Whole Genome Amplification in Microfluidic Platforms

Micromachines ◽

10.3390/mi9080367 ◽

2018 ◽

Vol 9 (8) ◽

pp. 367 ◽

Cited By ~ 6

Author(s):

Yuguang Liu ◽

Dirk Schulze-Makuch ◽

Jean-Pierre de Vera ◽

Charles Cockell ◽

Thomas Leya ◽

...

Keyword(s):

Single Cell ◽

Single Cells ◽

Bacterial Species ◽

Cell Manipulation ◽

Microfluidic Platforms ◽

Gram Positive ◽

Gram Negative ◽

Genome Amplification ◽

Single Cell Sequencing ◽

Wide Range

Single-cell sequencing is a powerful technology that provides the capability of analyzing a single cell within a population. This technology is mostly coupled with microfluidic systems for controlled cell manipulation and precise fluid handling to shed light on the genomes of a wide range of cells. So far, single-cell sequencing has been focused mostly on human cells due to the ease of lysing the cells for genome amplification. The major challenges that bacterial species pose to genome amplification from single cells include the rigid bacterial cell walls and the need for an effective lysis protocol compatible with microfluidic platforms. In this work, we present a lysis protocol that can be used to extract genomic DNA from both gram-positive and gram-negative species without interfering with the amplification chemistry. Corynebacterium glutamicum was chosen as a typical gram-positive model and Nostoc sp. as a gram-negative model due to major challenges reported in previous studies. Our protocol is based on thermal and chemical lysis. We consider 80% of single-cell replicates that lead to >5 ng DNA after amplification as successful attempts. The protocol was directly applied to Gloeocapsa sp. and the single cells of the eukaryotic Sphaerocystis sp. and achieved a 100% success rate.

Download Full-text

P02.10 FocuSCOPE: a single cell, multi-omics solution to simultaneously analyze tumor variants and microenvironment

Journal for ImmunoTherapy of Cancer ◽

10.1136/jitc-2021-itoc8.22 ◽

2021 ◽

Vol 9 (Suppl 1) ◽

pp. A12.1-A12

Author(s):

Y Arjmand Abbassi ◽

N Fang ◽

W Zhu ◽

Y Zhou ◽

Y Chen ◽

...

Keyword(s):

Gene Expression ◽

Tumor Microenvironment ◽

Single Cell ◽

High Throughput ◽

Immune Cells ◽

Genetic Variants ◽

Expression Profiles ◽

Single Cells ◽

Gene Expression Profiles ◽

Single Cell Sequencing

Recent advances of high-throughput single cell sequencing technologies have greatly improved our understanding of the complex biological systems. Heterogeneous samples such as tumor tissues commonly harbor cancer cell-specific genetic variants and gene expression profiles, both of which have been shown to be related to the mechanisms of disease development, progression, and responses to treatment. Furthermore, stromal and immune cells within tumor microenvironment interact with cancer cells to play important roles in tumor responses to systematic therapy such as immunotherapy or cell therapy. However, most current high-throughput single cell sequencing methods detect only gene expression levels or epigenetics events such as chromatin conformation. The information on important genetic variants including mutation or fusion is not captured. To better understand the mechanisms of tumor responses to systematic therapy, it is essential to decipher the connection between genotype and gene expression patterns of both tumor cells and cells in the tumor microenvironment. We developed FocuSCOPE, a high-throughput multi-omics sequencing solution that can detect both genetic variants and transcriptome from same single cells. FocuSCOPE has been used to successfully perform single cell analysis of both gene expression profiles and point mutations, fusion genes, or intracellular viral sequences from thousands of cells simultaneously, delivering comprehensive insights of tumor and immune cells in tumor microenvironment at single cell resolution.Disclosure InformationY. Arjmand Abbassi: None. N. Fang: None. W. Zhu: None. Y. Zhou: None. Y. Chen: None. U. Deutsch: None.

Download Full-text

EPGA-SC : A framework for de novo assembly of single-cell sequencing reads

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2019.2945761 ◽

2019 ◽

pp. 1-1 ◽

Cited By ~ 1

Author(s):

Xingyu Liao ◽

Min Li ◽

Junwei Luo ◽

You Zou ◽

Fangxiang Wu ◽

...

Keyword(s):

Single Cell ◽

De Novo Assembly ◽

De Novo ◽

Single Cell Sequencing

Download Full-text

PhyDOSE: Design of Follow-up Single-cell Sequencing Experiments of Tumors

10.1101/2020.03.30.016410 ◽

2020 ◽

Author(s):

Leah Weber ◽

Nuraini Aguse ◽

Nicholas Chia ◽

Mohammed El-Kebir

Keyword(s):

Single Cell ◽

Retrospective Analysis ◽

High Fidelity ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Bulk Data ◽

Sequencing Experiment ◽

Tumor Phylogeny ◽

Number Of Cells

AbstractThe combination of bulk and single-cell DNA sequencing data of the same tumor enables the inference of high-fidelity phylogenies that form the input to many important downstream analyses in cancer genomics. While many studies simultaneously perform bulk and single-cell sequencing, some studies have analyzed initial bulk data to identify which mutations to target in a follow-up single-cell sequencing experiment, thereby decreasing cost. Bulk data provide an additional untapped source of valuable information, composed of candidate phylogenies and associated clonal prevalence. Here, we introduce PhyDOSE, a method that uses this information to strategically optimize the design of follow-up single cell experiments. Underpinning our method is the observation that only a small number of clones uniquely distinguish one candidate tree from all other trees. We incorporate distinguishing features into a probabilistic model that infers the number of cells to sequence so as to confidently reconstruct the phylogeny of the tumor. We validate PhyDOSE using simulations and a retrospective analysis of a leukemia patient, concluding that PhyDOSE’s computed number of cells resolves tree ambiguity even in the presence of typical single-cell sequencing errors. We also conduct a retrospective analysis on an acute myeloid leukemia cohort, demonstrating the potential to achieve similar results with a significant reduction in the number of cells sequenced. In a prospective analysis, we demonstrate that only a small number of cells suffice to disambiguate the solution space of trees in a recent lung cancer cohort. In summary, PhyDOSE proposes cost-efficient single-cell sequencing experiments that yield high-fidelity phylogenies, which will improve downstream analyses aimed at deepening our understanding of cancer biology.Author summaryCancer development in a patient can be explained using a phylogeny — a tree that describes the evolutionary history of a tumor and has therapeutic implications. A tumor phylogeny is constructed from sequencing data, commonly obtained using either bulk or single-cell DNA sequencing technology. The accuracy of tumor phylogeny inference increases when both types of data are used, but single-cell sequencing may become prohibitively costly with increasing number of cells. Here, we propose a method that uses bulk sequencing data to guide the design of a follow-up single-cell sequencing experiment. Our results suggest that PhyDOSE provides a significant decrease in the number of cells to sequence compared to the number of cells sequenced in existing studies. The ability to make informed decisions based on prior data can help reduce the cost of follow-up single cell sequencing experiments of tumors, improving accuracy of tumor phylogeny inference and ultimately getting us closer to understanding and treating cancer.

Download Full-text

CIM-seq

10.21203/rs.3.pex-1365/v1 ◽

2021 ◽

Author(s):

Nathanael Andrews ◽

Martin Enge

Keyword(s):

Single Cell ◽

Single Cells ◽

Likelihood Estimation ◽

Cell Types ◽

Data Sets ◽

Target Tissue ◽

Data Set ◽

Rnaseq Data ◽

The Given ◽

Cell Data

Abstract CIM-seq is a tool for deconvoluting RNA-seq data from cell multiplets (clusters of two or more cells) in order to identify physically interacting cell in a given tissue. The method requires two RNAseq data sets from the same tissue: one of single cells to be used as a reference, and one of cell multiplets to be deconvoluted. CIM-seq is compatible with both droplet based sequencing methods, such as Chromium Single Cell 3′ Kits from 10x genomics; and plate based methods, such as Smartseq2. The pipeline consists of three parts: 1) Dissociation of the target tissue, FACS sorting of single cells and multiplets, and conventional scRNA-seq 2) Feature selection and clustering of cell types in the single cell data set - generating a blueprint of transcriptional profiles in the given tissue 3) Computational deconvolution of multiplets through a maximum likelihood estimation (MLE) to determine the most likely cell type constituents of each multiplet.

Download Full-text