SiCloneFit: Bayesian inference of population structure, genotype,and phylogeny of tumor clones from single-cell genome sequencing data

Mapping Intimacies ◽

10.1101/394262 ◽

2018 ◽

Cited By ~ 3

Author(s):

Hamim Zafar ◽

Nicholas Navin ◽

Ken Chen ◽

Luay Nakhleh

Keyword(s):

Single Cell ◽

Single Cells ◽

Evolutionary Relationship ◽

Sequencing Data ◽

Allelic Dropout ◽

Site Model ◽

Joint Reconstruction ◽

Technical Errors ◽

Gibbs Sampling Algorithm ◽

Cell Genome

AbstractAccumulation and selection of somatic mutations in a Darwinian framework result in intra-tumor heterogeneity (ITH) that poses significant challenges to the diagnosis and clinical therapy of cancer. Identification of the tumor cell populations (clones) and reconstruction of their evolutionary relationship can elucidate this heterogeneity. Recently developed single-cell DNA sequencing (SCS) technologies promise to resolve ITH to a single-cell level. However, technical errors in SCS datasets, including false-positives (FP), false-negatives (FN) due to allelic dropout and cell doublets, significantly complicate these tasks. Here, we propose a non-parametric Bayesian method that reconstructs the clonal populations as clusters of single cells, genotypes of each clone and the evolutionary relationships between the clones. It employs a tree-structured Chinese restaurant process as the prior on the number and composition of clonal populations. The evolution of the clonal populations is modeled by a clonal phylogeny and a finite-site model of evolution to account for potential mutation recurrence and losses. We probabilistically account for FP and FN errors, and cell doublets are modeled by employing a Beta-binomial distribution. We develop a Gibbs sampling algorithm comprising of partial reversible-jump and partial Metropolis-Hastings updates to explore the joint posterior space of all parameters. The performance of our method on synthetic and experimental datasets suggests that joint reconstruction of tumor clones and clonal phylogeny under a finite-site model of evolution leads to more accurate inferences. Our method is the first to enable this joint reconstruction in a fully Bayesian framework, thus providing measures of support of the inferences it makes.

Download Full-text

SCIΦ: Single-cell mutation identification via phylogenetic inference

10.1101/290908 ◽

2018 ◽

Cited By ~ 1

Author(s):

Jochen Singer ◽

Jack Kuipers ◽

Katharina Jahn ◽

Niko Beerenwinkel

Keyword(s):

Single Cell ◽

Lymphoblastic Leukemia ◽

Evolutionary Relationship ◽

Simulated Data ◽

Error Rates ◽

Cancer Therapies ◽

Sequencing Data ◽

Allelic Dropout ◽

Single Cell Sequencing ◽

Real World Datasets

AbstractUnderstanding the evolution of cancer is important for the development of appropriate cancer therapies. The task is challenging because tumors evolve as heterogeneous cell populations with an unknown number of genetically distinct subclones of varying frequencies. Conventional approaches based on bulk sequencing are limited in addressing this challenge as clones cannot be observed directly. Single-cell sequencing holds the promise of resolving the heterogeneity of tumors; however, it has its own challenges including elevated error rates, allelic dropout, and uneven coverage. Here, we develop a new approach to mutation detection in individual tumor cells by leveraging the evolutionary relationship among cells. Our method, called SCIΦ, jointly calls mutations in individual cells and estimates the tumor phylogeny among these cells. Employing a Markov Chain Monte Carlo scheme we robustly account for the various sources of noise in single-cell sequencing data. Our approach enables us to reliably call mutations in each single cell even in experiments with high dropout rates and missing data. We show that SCIΦ outperforms existing methods on simulated data and applied it to different real-world datasets, namely a whole exome breast cancer as well as a panel acute lymphoblastic leukemia dataset. Availability: https://github.com/cbg-ethz/SCIPhI

Download Full-text

CellCoal: Coalescent Simulation of Single-Cell Sequencing Samples

Molecular Biology and Evolution ◽

10.1093/molbev/msaa025 ◽

2020 ◽

Vol 37 (5) ◽

pp. 1535-1542 ◽

Cited By ~ 1

Author(s):

David Posada

Keyword(s):

Single Cell ◽

Single Cells ◽

Software Tool ◽

Coalescent Simulation ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Allelic Dropout ◽

Flexible Tool ◽

Single Cell Sequencing ◽

Somatic Evolution

Abstract Our capacity to study individual cells has enabled a new level of resolution for understanding complex biological systems such as multicellular organisms or microbial communities. Not surprisingly, several methods have been developed in recent years with a formidable potential to investigate the somatic evolution of single cells in both healthy and pathological tissues. However, single-cell sequencing data can be quite noisy due to different technical biases, so inferences resulting from these new methods need to be carefully contrasted. Here, I introduce CellCoal, a software tool for the coalescent simulation of single-cell sequencing genotypes. CellCoal simulates the history of single-cell samples obtained from somatic cell populations with different demographic histories and produces single-nucleotide variants under a variety of mutation models, sequencing read counts, and genotype likelihoods, considering allelic imbalance, allelic dropout, amplification, and sequencing errors, typical of this type of data. CellCoal is a flexible tool that can be used to understand the implications of different somatic evolutionary processes at the single-cell level, and to benchmark dedicated bioinformatic tools for the analysis of single-cell sequencing data. CellCoal is available at https://github.com/dapogon/cellcoal.

Download Full-text

Quality assessment of single-cell RNA sequencing data by coverage skewness analysis

10.1101/2019.12.31.890269 ◽

2019 ◽

Author(s):

Imad Abugessaisa ◽

Shuhei Noguchi ◽

Melissa Cardon ◽

Akira Hasegawa ◽

Kazuhide Watanabe ◽

...

Keyword(s):

Quality Assessment ◽

Single Cell ◽

Rna Sequencing ◽

Single Cells ◽

Assessment Method ◽

Poor Quality ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Gene Coverage ◽

The Impact

AbstractAnalysis and interpretation of single-cell RNA-sequencing (scRNA-seq) experiments are compromised by the presence of poor quality cells. For meaningful analyses, such poor quality cells should be excluded to avoid biases and large variation. However, no clear guidelines exist. We introduce SkewC, a novel quality-assessment method to identify poor quality single-cells in scRNA-seq experiments. The method is based on the assessment of gene coverage for each single cell and its skewness as a quality measure. To validate the method, we investigated the impact of poor quality cells on downstream analyses and compared biological differences between typical and poor quality cells. Moreover, we measured the ratio of intergenic expression, suggesting genomic contamination, and foreign organism contamination of single-cell samples. SkewC is tested in 37,993 single-cells generated by 15 scRNA-seq protocols. We envision SkewC as an indispensable QC method to be incorporated into scRNA-seq experiment to preclude the possibility of scRNA-seq data misinterpretation.

Download Full-text

SCELLECTOR: ranking amplification bias in single cells using shallow sequencing

BMC Bioinformatics ◽

10.1186/s12859-020-03858-y ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Vivekananda Sarangi ◽

Alexandre Jourdon ◽

Taejeong Bae ◽

Arijit Panda ◽

Flora Vaccarino ◽

...

Keyword(s):

Single Cell ◽

Multiple Displacement Amplification ◽

Single Cells ◽

Sequencing Data ◽

High Coverage ◽

Amplification Bias ◽

Single Cell Profiling ◽

High Concordance ◽

Human Neuronal Cells

Abstract Background The study of mosaic mutation is important since it has been linked to cancer and various disorders. Single cell sequencing has become a powerful tool to study the genome of individual cells for the detection of mosaic mutations. The amount of DNA in a single cell needs to be amplified before sequencing and multiple displacement amplification (MDA) is widely used owing to its low error rate and long fragment length of amplified DNA. However, the phi29 polymerase used in MDA is sensitive to template fragmentation and presence of sites with DNA damage that can lead to biases such as allelic imbalance, uneven coverage and over representation of C to T mutations. It is therefore important to select cells with uniform amplification to decrease false positives and increase sensitivity for mosaic mutation detection. Results We propose a method, Scellector (single cell selector), which uses haplotype information to detect amplification quality in shallow coverage sequencing data. We tested Scellector on single human neuronal cells, obtained in vitro and amplified by MDA. Qualities were estimated from shallow sequencing with coverage as low as 0.3× per cell and then confirmed using 30× deep coverage sequencing. The high concordance between shallow and high coverage data validated the method. Conclusion Scellector can potentially be used to rank amplifications obtained from single cell platforms relying on a MDA-like amplification step, such as Chromium Single Cell profiling solution.

Download Full-text

Estimating the Allele-Specific Expression of SNVs From 10× Genomics Single-Cell RNA-Sequencing Data

Genes ◽

10.3390/genes11030240 ◽

2020 ◽

Vol 11 (3) ◽

pp. 240 ◽

Cited By ~ 2

Author(s):

Prashant N. M. ◽

Hongyu Liu ◽

Pavlos Bousounis ◽

Liam Spurr ◽

Nawaf Alomran ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Single Cells ◽

Sequencing Data ◽

Specific Expression ◽

Single Nucleotide ◽

Healthy Donors ◽

Allele Expression ◽

Single Cell Rna Sequencing ◽

Allele Specific

With the recent advances in single-cell RNA-sequencing (scRNA-seq) technologies, the estimation of allele expression from single cells is becoming increasingly reliable. Allele expression is both quantitative and dynamic and is an essential component of the genomic interactome. Here, we systematically estimate the allele expression from heterozygous single nucleotide variant (SNV) loci using scRNA-seq data generated on the 10×Genomics Chromium platform. We analyzed 26,640 human adipose-derived mesenchymal stem cells (from three healthy donors), sequenced to an average of 150K sequencing reads per cell (more than 4 billion scRNA-seq reads in total). High-quality SNV calls assessed in our study contained approximately 15% exonic and >50% intronic loci. To analyze the allele expression, we estimated the expressed variant allele fraction (VAFRNA) from SNV-aware alignments and analyzed its variance and distribution (mono- and bi-allelic) at different minimum sequencing read thresholds. Our analysis shows that when assessing positions covered by a minimum of three unique sequencing reads, over 50% of the heterozygous SNVs show bi-allelic expression, while at a threshold of 10 reads, nearly 90% of the SNVs are bi-allelic. In addition, our analysis demonstrates the feasibility of scVAFRNA estimation from current scRNA-seq datasets and shows that the 3′-based library generation protocol of 10×Genomics scRNA-seq data can be informative in SNV-based studies, including analyses of transcriptional kinetics.

Download Full-text

Scirpy: A Scanpy extension for analyzing single-cell T-cell receptor sequencing data

10.1101/2020.04.10.035865 ◽

2020 ◽

Author(s):

Gregor Sturm ◽

Tamas Szabo ◽

Georgios Fotakis ◽

Marlene Haider ◽

Dietmar Rieder ◽

...

Keyword(s):

T Cell ◽

Single Cell ◽

Large Scale ◽

Single Cells ◽

Cell Receptor ◽

Sequencing Data ◽

Seamless Integration ◽

T Cell Phenotypes ◽

Cell Phenotypes

AbstractSummaryAdvances in single-cell technologies have enabled the investigation of T cell phenotypes and repertoires at unprecedented resolution and scale. Bioinformatic methods for the efficient analysis of these large-scale datasets are instrumental for advancing our understanding of adaptive immune responses in cancer, but also in infectious diseases like COVID-19. However, while well-established solutions are accessible for the processing of single-cell transcriptomes, no streamlined pipelines are available for the comprehensive characterization of T cell receptors. Here we propose Scirpy, a scalable Python toolkit that provides simplified access to the analysis and visualization of immune repertoires from single cells and seamless integration with transcriptomic data.Availability and implementationScirpy source code and documentation are available at https://github.com/icbi-lab/scirpy.

Download Full-text

Gene set inference from single-cell sequencing data using a hybrid of matrix factorization and variational autoencoders

10.1101/740415 ◽

2019 ◽

Author(s):

Soeren Lukassen ◽

Foo Wei Ten ◽

Roland Eils ◽

Christian Conrad

Keyword(s):

Neural Network ◽

Single Cell ◽

Network Model ◽

Neural Network Model ◽

Matrix Factorization ◽

Latent Variable ◽

Single Cells ◽

Sequencing Data ◽

Gene Set ◽

Gene Sets

AbstractRecent advances in single-cell RNA sequencing (scRNA-Seq) have driven the simultaneous measurement of the expression of 1,000s of genes in 1,000s of single cells. These growing data sets allow us to model gene sets in biological networks at an unprecedented level of detail, in spite of heterogenous cell populations. Here, we propose an unsupervised deep neural network model that is a hybrid of matrix factorization and conditional variational autoencoders (CVA), which utilizes weights as matrix factorizations to obtain gene sets, while class-specific inputs to the latent variable space facilitate a plausible identification of cell types. This artificial neural network model seamlessly integrates functional gene set inference, experimental batch effect correction, and static gene identification, which we conceptually prove here for three single-cell RNA-Seq datasets and suggest for future single-cell-gene analytics.

Download Full-text

Exploring cell-specific miRNA regulation with single-cell miRNA-mRNA co-sequencing data

10.1101/2020.10.14.340299 ◽

2020 ◽

Author(s):

Junpeng Zhang ◽

Lin Liu ◽

Taosheng Xu ◽

Wu Zhang ◽

Chunwen Zhao ◽

...

Keyword(s):

Single Cell ◽

Regulatory Networks ◽

Single Cells ◽

Small Scale ◽

Mirna Regulation ◽

Sequencing Data ◽

Resolution Level ◽

Novel Strategy ◽

Cell Cell

AbstractBackgroundExisting computational methods for studying miRNA regulation are mostly based on bulk miRNA and mRNA expression data. However, bulk data only allows the analysis of miRNA regulation regarding a group of cells, rather than the miRNA regulation unique to individual cells. Recent advance in single-cell miRNA-mRNA co-sequencing technology has opened a way for investigating miRNA regulation at single-cell level. However, as currently single-cell miRNA-mRNA co-sequencing data is just emerging and only available at small-scale, there is a strong need of novel methods to exploit existing single-cell data for the study of cell-specific miRNA regulation.ResultsIn this work, we propose a new method, CSmiR (Cell-Specific miRNA regulation) to use single-cell miRNA-mRNA co-sequencing data to identify miRNA regulatory networks at the resolution of individual cells. We apply CSmiR to the miRNA-mRNA co-sequencing data in 19 K562 single-cells to identify cell-specific miRNA-mRNA regulatory networks to understand miRNA regulation in each K562 single-cell. By analyzing the obtained cell-specific miRNA-mRNA regulatory networks, we observe that the miRNA regulation in each K562 single-cell is unique. Moreover, we conduct detailed analysis on the cell-specific miRNA regulation associated with the miR-17/92 family as a case study. Finally, through exploring cell-cell similarity matrix characterized by cell-specific miRNA regulation, CSmiR provides a novel strategy for clustering single-cells to help understand cell-cell crosstalk.ConclusionsTo the best of our knowledge, CSmiR is the first method to explore miRNA regulation at a single-cell resolution level, and we believe that it can be a useful method to enhance the understanding of cell-specific miRNA regulation.

Download Full-text

MQuad enables clonal substructure discovery using single cell mitochondrial variants

10.1101/2021.03.27.437331 ◽

2021 ◽

Author(s):

Aaron Wing Cheung Kwok ◽

Chen Qiao ◽

Rongting Huang ◽

Mai-Har Sham ◽

Joshua W. K. Ho ◽

...

Keyword(s):

Dna Sequencing ◽

Single Cell ◽

Single Cells ◽

High Sensitivity ◽

Copy Number Variations ◽

Sequencing Data ◽

Single Nucleotide ◽

Single Cell Sequencing ◽

Mtdna Variants ◽

Python Package

AbstractMitochondrial mutations are increasingly recognised as informative endogenous genetic markers that can be used to reconstruct cellular clonal structure using single-cell RNA or DNA sequencing data. However, there is a lack of effective computational methods to identify informative mtDNA variants in noisy and sparse single-cell sequencing data. Here we present an open source computational tool MQuad that accurately calls clonally informative mtDNA variants in a population of single cells, and an analysis suite for complete clonality inference, based on single cell RNA or DNA sequencing data. Through a variety of simulated and experimental single cell sequencing data, we showed that MQuad can identify mitochondrial variants with both high sensitivity and specificity, outperforming existing methods by a large extent. Furthermore, we demonstrated its wide applicability in different single cell sequencing protocols, particularly in complementing single-nucleotide and copy-number variations to extract finer clonal resolution. MQuad is a Python package available via https://github.com/single-cell-genetics/MQuad.

Download Full-text

Cellsnp-lite: an efficient tool for genotyping single cells

10.1101/2020.12.31.424913 ◽

2021 ◽

Author(s):

Xianjie Huang ◽

Yuanhua Huang

Keyword(s):

Single Cell ◽

Single Cells ◽

Basic Research ◽

Substantial Improvement ◽

Data Sets ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Memory Efficiency ◽

Computational Speed ◽

Cell Data

AbstractSummarySingle-cell sequencing is an increasingly used technology and has promising applications in basic research and clinical translations. However, genotyping methods developed for bulk sequencing data have not been well adapted for single-cell data, in terms of both computational parallelization and simplified user interface. Here we introduce a software, cellsnp-lite, implemented in C/C++ and based on well supported package htslib, for genotyping in single-cell sequencing data for both droplet and well based platforms. On various experimental data sets, it shows substantial improvement in computational speed and memory efficiency with retaining highly concordant results compared to existing methods. Cellsnp-lite therefore lightens the genetic analysis for increasingly large single-cell data.AvailabilityThe source code is freely available at https://github.com/single-cell-genetics/[email protected]

Download Full-text