CellCoal: Coalescent Simulation of Single-Cell Sequencing Samples

David Posada

doi:10.1093/molbev/msaa025

CellCoal: Coalescent Simulation of Single-Cell Sequencing Samples

Molecular Biology and Evolution ◽

10.1093/molbev/msaa025 ◽

2020 ◽

Vol 37 (5) ◽

pp. 1535-1542 ◽

Cited By ~ 1

Author(s):

David Posada

Keyword(s):

Single Cell ◽

Single Cells ◽

Software Tool ◽

Coalescent Simulation ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Allelic Dropout ◽

Flexible Tool ◽

Single Cell Sequencing ◽

Somatic Evolution

Abstract Our capacity to study individual cells has enabled a new level of resolution for understanding complex biological systems such as multicellular organisms or microbial communities. Not surprisingly, several methods have been developed in recent years with a formidable potential to investigate the somatic evolution of single cells in both healthy and pathological tissues. However, single-cell sequencing data can be quite noisy due to different technical biases, so inferences resulting from these new methods need to be carefully contrasted. Here, I introduce CellCoal, a software tool for the coalescent simulation of single-cell sequencing genotypes. CellCoal simulates the history of single-cell samples obtained from somatic cell populations with different demographic histories and produces single-nucleotide variants under a variety of mutation models, sequencing read counts, and genotype likelihoods, considering allelic imbalance, allelic dropout, amplification, and sequencing errors, typical of this type of data. CellCoal is a flexible tool that can be used to understand the implications of different somatic evolutionary processes at the single-cell level, and to benchmark dedicated bioinformatic tools for the analysis of single-cell sequencing data. CellCoal is available at https://github.com/dapogon/cellcoal.

Download Full-text

MQuad enables clonal substructure discovery using single cell mitochondrial variants

10.1101/2021.03.27.437331 ◽

2021 ◽

Author(s):

Aaron Wing Cheung Kwok ◽

Chen Qiao ◽

Rongting Huang ◽

Mai-Har Sham ◽

Joshua W. K. Ho ◽

...

Keyword(s):

Dna Sequencing ◽

Single Cell ◽

Single Cells ◽

High Sensitivity ◽

Copy Number Variations ◽

Sequencing Data ◽

Single Nucleotide ◽

Single Cell Sequencing ◽

Mtdna Variants ◽

Python Package

AbstractMitochondrial mutations are increasingly recognised as informative endogenous genetic markers that can be used to reconstruct cellular clonal structure using single-cell RNA or DNA sequencing data. However, there is a lack of effective computational methods to identify informative mtDNA variants in noisy and sparse single-cell sequencing data. Here we present an open source computational tool MQuad that accurately calls clonally informative mtDNA variants in a population of single cells, and an analysis suite for complete clonality inference, based on single cell RNA or DNA sequencing data. Through a variety of simulated and experimental single cell sequencing data, we showed that MQuad can identify mitochondrial variants with both high sensitivity and specificity, outperforming existing methods by a large extent. Furthermore, we demonstrated its wide applicability in different single cell sequencing protocols, particularly in complementing single-nucleotide and copy-number variations to extract finer clonal resolution. MQuad is a Python package available via https://github.com/single-cell-genetics/MQuad.

Download Full-text

Cellsnp-lite: an efficient tool for genotyping single cells

10.1101/2020.12.31.424913 ◽

2021 ◽

Author(s):

Xianjie Huang ◽

Yuanhua Huang

Keyword(s):

Single Cell ◽

Single Cells ◽

Basic Research ◽

Substantial Improvement ◽

Data Sets ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Memory Efficiency ◽

Computational Speed ◽

Cell Data

AbstractSummarySingle-cell sequencing is an increasingly used technology and has promising applications in basic research and clinical translations. However, genotyping methods developed for bulk sequencing data have not been well adapted for single-cell data, in terms of both computational parallelization and simplified user interface. Here we introduce a software, cellsnp-lite, implemented in C/C++ and based on well supported package htslib, for genotyping in single-cell sequencing data for both droplet and well based platforms. On various experimental data sets, it shows substantial improvement in computational speed and memory efficiency with retaining highly concordant results compared to existing methods. Cellsnp-lite therefore lightens the genetic analysis for increasingly large single-cell data.AvailabilityThe source code is freely available at https://github.com/single-cell-genetics/[email protected]

Download Full-text

SNV identification from single-cell RNA sequencing data

Human Molecular Genetics ◽

10.1093/hmg/ddz207 ◽

2019 ◽

Vol 28 (21) ◽

pp. 3569-3583 ◽

Cited By ~ 3

Author(s):

Patricia M Schnepp ◽

Mengjie Chen ◽

Evan T Keller ◽

Xiang Zhou

Keyword(s):

Dna Sequencing ◽

Single Cell ◽

Rna Sequencing ◽

Single Cells ◽

Specific Gene ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Single Cell Rna Sequencing ◽

Sequencing Studies ◽

Genomic Regions

Abstract Integrating single-cell RNA sequencing (scRNA-seq) data with genotypes obtained from DNA sequencing studies facilitates the detection of functional genetic variants underlying cell type-specific gene expression variation. Unfortunately, most existing scRNA-seq studies do not come with DNA sequencing data; thus, being able to call single nucleotide variants (SNVs) from scRNA-seq data alone can provide crucial and complementary information, detection of functional SNVs, maximizing the potential of existing scRNA-seq studies. Here, we perform extensive analyses to evaluate the utility of two SNV calling pipelines (GATK and Monovar), originally designed for SNV calling in either bulk or single-cell DNA sequencing data. In both pipelines, we examined various parameter settings to determine the accuracy of the final SNV call set and provide practical recommendations for applied analysts. We found that combining all reads from the single cells and following GATK Best Practices resulted in the highest number of SNVs identified with a high concordance. In individual single cells, Monovar resulted in better quality SNVs even though none of the pipelines analyzed is capable of calling a reasonable number of SNVs with high accuracy. In addition, we found that SNV calling quality varies across different functional genomic regions. Our results open doors for novel ways to leverage the use of scRNA-seq for the future investigation of SNV function.

Download Full-text

Linked-read analysis identifies mutations in single cell DNA sequencing data

10.1101/211169 ◽

2017 ◽

Cited By ~ 6

Author(s):

Craig L. Bohrson ◽

Allison R. Barton ◽

Michael A. Lodato ◽

Rachel E. Rodin ◽

Vinay Viswanadham ◽

...

Keyword(s):

Single Cell ◽

Dna Isolation ◽

Single Cells ◽

Mutation Rates ◽

Accurate Estimation ◽

Single Cell Level ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Cell Level ◽

Single Nucleotide

AbstractWhole-genome sequencing of DNA from single cells has the potential to reshape our understanding of the mutational heterogeneity in normal and disease tissues. A major difficulty, however, is distinguishing artifactual mutations that arise from DNA isolation and amplification from true mutations. Here, we describe linked-read analysis (LiRA), a method that utilizes phasing of somatic single nucleotide variants with nearby germline variants to identify true mutations, thereby allowing accurate estimation of somatic mutation rates at the single cell level.

Download Full-text

Conbase: a software for unsupervised discovery of clonal somatic mutations in single cells through read phasing

10.1101/259994 ◽

2018 ◽

Cited By ~ 2

Author(s):

Joanna Hård ◽

Ezeddin Al Håkim ◽

Marie Kindblom ◽

Åsa K. Björklund ◽

Ilke Demirci ◽

...

Keyword(s):

Somatic Mutations ◽

Single Cells ◽

Read Depth ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Allelic Dropout ◽

Healthy Human ◽

Human Donor

AbstractHere we report the development of Conbase, a software application for the identification of somatic mutations in single cell DNA sequencing data with high rates of allelic dropout and at low read depth. Conbase leverages data from multiple samples in a dataset and utilizes read phasing to call somatic single nucleotide variants and to accurately predict genotypes in whole genome amplified single cells in somatic variant loci. We demonstrate the accuracy of Conbase on simulated datasets, in vitro expanded fibroblasts and clonally in vivo expanded lymphocyte populations isolated directly from a healthy human donor.

Download Full-text

Accurate and scalable variant calling from single cell DNA sequencing data with ProSolo

Nature Communications ◽

10.1038/s41467-021-26938-w ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

David Lähnemann ◽

Johannes Köster ◽

Ute Fischer ◽

Arndt Borkhardt ◽

Alice C. McHardy ◽

...

Keyword(s):

Dna Sequencing ◽

Single Cell ◽

Single Cells ◽

Variant Calling ◽

Sequencing Data ◽

Computationally Efficient ◽

Single Nucleotide Variants ◽

Efficient Manner ◽

Single Nucleotide ◽

Amplification Bias

AbstractAccurate single cell mutational profiles can reveal genomic cell-to-cell heterogeneity. However, sequencing libraries suitable for genotyping require whole genome amplification, which introduces allelic bias and copy errors. The resulting data violates assumptions of variant callers developed for bulk sequencing. Thus, only dedicated models accounting for amplification bias and errors can provide accurate calls. We present ProSolo for calling single nucleotide variants from multiple displacement amplified (MDA) single cell DNA sequencing data. ProSolo probabilistically models a single cell jointly with a bulk sequencing sample and integrates all relevant MDA biases in a site-specific and scalable—because computationally efficient—manner. This achieves a higher accuracy in calling and genotyping single nucleotide variants in single cells in comparison to state-of-the-art tools and supports imputation of insufficiently covered genotypes, when downstream tools cannot handle missing data. Moreover, ProSolo implements the first approach to control the false discovery rate reliably and flexibly. ProSolo is implemented in an extendable framework, with code and usage at: https://github.com/prosolo/prosolo

Download Full-text

SCIΦ: Single-cell mutation identification via phylogenetic inference

10.1101/290908 ◽

2018 ◽

Cited By ~ 1

Author(s):

Jochen Singer ◽

Jack Kuipers ◽

Katharina Jahn ◽

Niko Beerenwinkel

Keyword(s):

Single Cell ◽

Lymphoblastic Leukemia ◽

Evolutionary Relationship ◽

Simulated Data ◽

Error Rates ◽

Cancer Therapies ◽

Sequencing Data ◽

Allelic Dropout ◽

Single Cell Sequencing ◽

Real World Datasets

AbstractUnderstanding the evolution of cancer is important for the development of appropriate cancer therapies. The task is challenging because tumors evolve as heterogeneous cell populations with an unknown number of genetically distinct subclones of varying frequencies. Conventional approaches based on bulk sequencing are limited in addressing this challenge as clones cannot be observed directly. Single-cell sequencing holds the promise of resolving the heterogeneity of tumors; however, it has its own challenges including elevated error rates, allelic dropout, and uneven coverage. Here, we develop a new approach to mutation detection in individual tumor cells by leveraging the evolutionary relationship among cells. Our method, called SCIΦ, jointly calls mutations in individual cells and estimates the tumor phylogeny among these cells. Employing a Markov Chain Monte Carlo scheme we robustly account for the various sources of noise in single-cell sequencing data. Our approach enables us to reliably call mutations in each single cell even in experiments with high dropout rates and missing data. We show that SCIΦ outperforms existing methods on simulated data and applied it to different real-world datasets, namely a whole exome breast cancer as well as a panel acute lymphoblastic leukemia dataset. Availability: https://github.com/cbg-ethz/SCIPhI

Download Full-text

rCASC: reproducible Classification Analysis of Single Cell sequencing data

10.1101/430967 ◽

2018 ◽

Cited By ~ 1

Author(s):

Luca Alessandrì ◽

Marco Beccuti ◽

Maddalena Arigoni ◽

Martina Olivero ◽

Greta Romano ◽

...

Keyword(s):

Single Cell ◽

Single Cells ◽

R Package ◽

Cellular Heterogeneity ◽

Supplementary Information ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Analysis Workflow ◽

User Friendly ◽

Bioinformatics Workflows

AbstractSummarySingle-cell RNA sequencing has emerged as an essential tool to investigate cellular heterogeneity, and highlighting cell sub-population specific signatures. Nowadays, dedicated and user-friendly bioinformatics workflows are required to exploit the deconvolution of single-cells transcriptome. Furthermore, there is a growing need of bioinformatics workflows granting both functional, i.e. saving information about data and analysis parameters, and computation reproducibility, i.e. storing the real image of the computation environment. Here, we present rCASC a modular RNAseq analysis workflow allowing data analysis from counts generation to cell sub-population signatures identification, granting both functional and computation reproducibility.Availability and ImplementationrCASC is part of the reproducible bioinfomatics project. rCASC is a docker based application controlled by a R package available at https://github.com/kendomaniac/rCASC.Supplementary informationSupplementary data are available at rCASC github

Download Full-text

ProSolo: Accurate Variant Calling from Single Cell DNA Sequencing Data

10.1101/2020.04.27.064071 ◽

2020 ◽

Author(s):

David Lähnemann ◽

Johannes Köster ◽

Ute Fischer ◽

Arndt Borkhardt ◽

Alice C. McHardy ◽

...

Keyword(s):

Dna Sequencing ◽

Single Cell ◽

Single Cells ◽

Variant Calling ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Biologically Relevant ◽

Amplification Bias ◽

Missing Genotypes

ABSTRACTObtaining accurate mutational profiles from single cell DNA is essential for the analysis of genomic cell-to-cell heterogeneity at the finest level of resolution. However, sequencing libraries suitable for genotyping require whole genome amplification, which introduces allelic bias and copy errors. As a result, single cell DNA sequencing data violates the assumptions of variant callers developed for bulk sequencing, which when applied to single cells generate significant numbers of false positives and false negatives. Only dedicated models accounting for amplification bias and errors will be able to provide more accurate calls.We present ProSolo, a probabilistic model for calling single nucleotide variants from multiple displacement amplified single cell DNA sequencing data. It introduces a mechanistically motivated empirical model of amplification bias that improves the quantification of genotyping uncertainty. To account for amplification errors, it jointly models the single cell sample with a bulk sequencing sample from the same cell population—also enabling a biologically relevant imputation of missing genotypes for the single cell. Through these innovations, ProSolo achieves substantially higher performance in calling and genotyping single nucleotide variants in single cells in comparison to all state-of-the-art tools. Moreover, ProSolo implements the first approach to control the false discovery rate reliably and flexibly; not only for single nucleotide variant calls, but also for artefacts of single cell methodology that one may wish to identify, such as allele dropout.ProSolo’s model is implemented into a flexible framework, encouraging extensions. The source code and usage instructions are available at: https://github.com/prosolo/prosolo

Download Full-text

Single-cell tumor phylogeny inference with copy-number constrained mutation losses

10.1101/840355 ◽

2019 ◽

Cited By ~ 1

Author(s):

Gryte Satas ◽

Simone Zaccaria ◽

Geoffrey Mon ◽

Benjamin J. Raphael

Keyword(s):

Single Cell ◽

Copy Number ◽

Phylogenetic Trees ◽

Colorectal Cancer Patient ◽

Simulated Data ◽

Cell Tumor ◽

Tumor Evolution ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Single Cell Sequencing

AbstractMotivationSingle-cell DNA sequencing enables the measurement of somatic mutations in individual tumor cells, and provides data to reconstruct the evolutionary history of the tumor. Nearly all existing methods to construct phylogenetic trees from single-cell sequencing data use single-nucleotide variants (SNVs) as markers. However, most solid tumors contain copy-number aberrations (CNAs) which can overlap loci containing SNVs. Particularly problematic are CNAs that delete an SNV, thus returning the SNV locus to the unmutated state. Such mutation losses are allowed in some models of SNV evolution, but these models are generally too permissive, allowing mutation losses without evidence of a CNA overlapping the locus.ResultsWe introduce a novel loss-supported evolutionary model, a generalization of the infinite sites and Dollo models, that constrains mutation losses to loci with evidence of a decrease in copy number. We design a new algorithm, Single-Cell Algorithm for Reconstructing the Loss-supported Evolution of Tumors (Scarlet), that infers phylogenies from single-cell tumor sequencing data using the loss-supported model and a probabilistic model of sequencing errors and allele dropout. On simulated data, we show that Scarlet outperforms current single-cell phylogeny methods, recovering more accurate trees and correcting errors in SNV data. On single-cell sequencing data from a metastatic colorectal cancer patient, Scarlet constructs a phylogeny that is both more consistent with the observed copy-number data and also reveals a simpler monooclonal seeding of the metastasis, contrasting with published reports of polyclonal seeding in this patient. Scarlet substantially improves single-cell phylogeny inference in tumors with CNAs, yielding new insights into the analysis of tumor evolution.AvailabilitySoftware is available at github.com/raphael-group/[email protected]

Download Full-text