Integrative inference of subclonal tumour evolution from single-cell and bulk sequencing data

Mapping Intimacies ◽

10.1101/234914 ◽

2017 ◽

Cited By ~ 12

Author(s):

Salem Malikic ◽

Katharina Jahn ◽

Jack Kuipers ◽

S. Cenk Sahinalp ◽

Niko Beerenwinkel

Keyword(s):

Single Cell ◽

Single Cells ◽

Simulated Data ◽

Resistant Cell ◽

Current Data ◽

Sequencing Data ◽

Reconstruction Accuracy ◽

Tree Reconstruction ◽

Tumour Heterogeneity ◽

Tumour Genetics

AbstractUnderstanding the evolutionary history and subclonal composition of a tumour represents one of the key challenges in overcoming treatment failure due to resistant cell populations. Most of the current data on tumour genetics stems from short read bulk sequencing data. While this type of data is characterised by low sequencing noise and cost, it consists of aggregate measurements across a large number of cells. It is therefore of limited use for the accurate detection of the distinct cellular populations present in a tumour and the unambiguous inference of their evolutionary relationships. Single-cell DNA sequencing instead provides data of the highest resolution for studying intra-tumour heterogeneity and evolution, but is characterised by higher sequencing costs and elevated noise rates. In this work, we develop the first computational approach that infers trees of tumour evolution from combined single-cell and bulk sequencing data. Using a comprehensive set of simulated data, we show that our approach systematically outperforms existing methods with respect to tree reconstruction accuracy and subclone identification. High fidelity reconstructions are obtained even with a modest number of single cells. We also show that combining single-cell and bulk sequencing data provides more realistic mutation histories for real tumours.

Download Full-text

Distinguishing linear and branched evolution given single-cell DNA sequencing data of tumors

Algorithms for Molecular Biology ◽

10.1186/s13015-021-00194-5 ◽

2021 ◽

Vol 16 (1) ◽

Author(s):

Leah L. Weber ◽

Mohammed El-Kebir

Keyword(s):

Dna Sequencing ◽

Single Cell ◽

Evolutionary Process ◽

Treatment Decision ◽

Real Data ◽

Current Data ◽

Fast Method ◽

Sequencing Data ◽

Evolutionary Trajectory ◽

Cancer Types

Abstract Background Cancer arises from an evolutionary process where somatic mutations give rise to clonal expansions. Reconstructing this evolutionary process is useful for treatment decision-making as well as understanding evolutionary patterns across patients and cancer types. In particular, classifying a tumor’s evolutionary process as either linear or branched and understanding what cancer types and which patients have each of these trajectories could provide useful insights for both clinicians and researchers. While comprehensive cancer phylogeny inference from single-cell DNA sequencing data is challenging due to limitations with current sequencing technology and the complexity of the resulting problem, current data might provide sufficient signal to accurately classify a tumor’s evolutionary history as either linear or branched. Results We introduce the Linear Perfect Phylogeny Flipping (LPPF) problem as a means of testing two alternative hypotheses for the pattern of evolution, which we prove to be NP-hard. We develop Phyolin, which uses constraint programming to solve the LPPF problem. Through both in silico experiments and real data application, we demonstrate the performance of our method, outperforming a competing machine learning approach. Conclusion Phyolin is an accurate, easy to use and fast method for classifying an evolutionary trajectory as linear or branched given a tumor’s single-cell DNA sequencing data.

Download Full-text

Cryopreservation of human cancers conserves tumour heterogeneity for single-cell multi-omics analysis

Genome Medicine ◽

10.1186/s13073-021-00885-z ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Sunny Z. Wu ◽

Daniel L. Roden ◽

Ghamdan Al-Eryani ◽

Nenad Bartonicek ◽

Kate Harvey ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

High Throughput ◽

Single Cells ◽

Cellular Heterogeneity ◽

Tumour Heterogeneity ◽

Fresh Tissue ◽

Human Cancers ◽

Cryopreserved Cell ◽

Single Cell Rna Sequencing

Abstract Background High throughput single-cell RNA sequencing (scRNA-Seq) has emerged as a powerful tool for exploring cellular heterogeneity among complex human cancers. scRNA-Seq studies using fresh human surgical tissue are logistically difficult, preclude histopathological triage of samples, and limit the ability to perform batch processing. This hindrance can often introduce technical biases when integrating patient datasets and increase experimental costs. Although tissue preservation methods have been previously explored to address such issues, it is yet to be examined on complex human tissues, such as solid cancers and on high throughput scRNA-Seq platforms. Methods Using the Chromium 10X platform, we sequenced a total of ~ 120,000 cells from fresh and cryopreserved replicates across three primary breast cancers, two primary prostate cancers and a cutaneous melanoma. We performed detailed analyses between cells from each condition to assess the effects of cryopreservation on cellular heterogeneity, cell quality, clustering and the identification of gene ontologies. In addition, we performed single-cell immunophenotyping using CITE-Seq on a single breast cancer sample cryopreserved as solid tissue fragments. Results Tumour heterogeneity identified from fresh tissues was largely conserved in cryopreserved replicates. We show that sequencing of single cells prepared from cryopreserved tissue fragments or from cryopreserved cell suspensions is comparable to sequenced cells prepared from fresh tissue, with cryopreserved cell suspensions displaying higher correlations with fresh tissue in gene expression. We showed that cryopreservation had minimal impacts on the results of downstream analyses such as biological pathway enrichment. For some tumours, cryopreservation modestly increased cell stress signatures compared to freshly analysed tissue. Further, we demonstrate the advantage of cryopreserving whole-cells for detecting cell-surface proteins using CITE-Seq, which is impossible using other preservation methods such as single nuclei-sequencing. Conclusions We show that the viable cryopreservation of human cancers provides high-quality single-cells for multi-omics analysis. Our study guides new experimental designs for tissue biobanking for future clinical single-cell RNA sequencing studies.

Download Full-text

Quality assessment of single-cell RNA sequencing data by coverage skewness analysis

10.1101/2019.12.31.890269 ◽

2019 ◽

Author(s):

Imad Abugessaisa ◽

Shuhei Noguchi ◽

Melissa Cardon ◽

Akira Hasegawa ◽

Kazuhide Watanabe ◽

...

Keyword(s):

Quality Assessment ◽

Single Cell ◽

Rna Sequencing ◽

Single Cells ◽

Assessment Method ◽

Poor Quality ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Gene Coverage ◽

The Impact

AbstractAnalysis and interpretation of single-cell RNA-sequencing (scRNA-seq) experiments are compromised by the presence of poor quality cells. For meaningful analyses, such poor quality cells should be excluded to avoid biases and large variation. However, no clear guidelines exist. We introduce SkewC, a novel quality-assessment method to identify poor quality single-cells in scRNA-seq experiments. The method is based on the assessment of gene coverage for each single cell and its skewness as a quality measure. To validate the method, we investigated the impact of poor quality cells on downstream analyses and compared biological differences between typical and poor quality cells. Moreover, we measured the ratio of intergenic expression, suggesting genomic contamination, and foreign organism contamination of single-cell samples. SkewC is tested in 37,993 single-cells generated by 15 scRNA-seq protocols. We envision SkewC as an indispensable QC method to be incorporated into scRNA-seq experiment to preclude the possibility of scRNA-seq data misinterpretation.

Download Full-text

Haplotype-aware single-cell multiomics uncovers functional effects of somatic structural variation

10.1101/2021.11.11.468039 ◽

2021 ◽

Author(s):

Hyobin Jeong ◽

Karen Grimes ◽

Peter-Martin Bruch ◽

Tobias Rausch ◽

Patrick Hasenfeld ◽

...

Keyword(s):

Single Cell ◽

Nucleosome Occupancy ◽

Single Cells ◽

Chromosomal Rearrangements ◽

Regulatory Elements ◽

Computational Method ◽

Tumour Heterogeneity ◽

Cancer Genomes ◽

Functional Consequences ◽

Oncogenic Transcription Factor

Somatic structural variants (SVs) are widespread in cancer genomes, however, their impact on tumorigenesis and intra-tumour heterogeneity is incompletely understood, since methods to functionally characterize the broad spectrum of SVs arising in cancerous single-cells are lacking. We present a computational method, scNOVA, that couples SV discovery with nucleosome occupancy analysis by haplotype-resolved single-cell sequencing, to systematically uncover SV effects on cis-regulatory elements and gene activity. Application to leukemias and cell lines uncovered SV outcomes at several loci, including dysregulated cancer-related pathways and mono-allelic oncogene expression near SV breakpoints. At the intra-patient level, we identified different yet overlapping subclonal SVs that converge on aberrant Wnt signaling. We also deconvoluted the effects of catastrophic chromosomal rearrangements resulting in oncogenic transcription factor dysregulation. scNOVA directly links SVs to their functional consequences, opening the door for single-cell multiomics of SVs in heterogeneous cell populations.

Download Full-text

SCELLECTOR: ranking amplification bias in single cells using shallow sequencing

BMC Bioinformatics ◽

10.1186/s12859-020-03858-y ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Vivekananda Sarangi ◽

Alexandre Jourdon ◽

Taejeong Bae ◽

Arijit Panda ◽

Flora Vaccarino ◽

...

Keyword(s):

Single Cell ◽

Multiple Displacement Amplification ◽

Single Cells ◽

Sequencing Data ◽

High Coverage ◽

Amplification Bias ◽

Single Cell Profiling ◽

High Concordance ◽

Human Neuronal Cells

Abstract Background The study of mosaic mutation is important since it has been linked to cancer and various disorders. Single cell sequencing has become a powerful tool to study the genome of individual cells for the detection of mosaic mutations. The amount of DNA in a single cell needs to be amplified before sequencing and multiple displacement amplification (MDA) is widely used owing to its low error rate and long fragment length of amplified DNA. However, the phi29 polymerase used in MDA is sensitive to template fragmentation and presence of sites with DNA damage that can lead to biases such as allelic imbalance, uneven coverage and over representation of C to T mutations. It is therefore important to select cells with uniform amplification to decrease false positives and increase sensitivity for mosaic mutation detection. Results We propose a method, Scellector (single cell selector), which uses haplotype information to detect amplification quality in shallow coverage sequencing data. We tested Scellector on single human neuronal cells, obtained in vitro and amplified by MDA. Qualities were estimated from shallow sequencing with coverage as low as 0.3× per cell and then confirmed using 30× deep coverage sequencing. The high concordance between shallow and high coverage data validated the method. Conclusion Scellector can potentially be used to rank amplifications obtained from single cell platforms relying on a MDA-like amplification step, such as Chromium Single Cell profiling solution.

Download Full-text

Estimating the Allele-Specific Expression of SNVs From 10× Genomics Single-Cell RNA-Sequencing Data

Genes ◽

10.3390/genes11030240 ◽

2020 ◽

Vol 11 (3) ◽

pp. 240 ◽

Cited By ~ 2

Author(s):

Prashant N. M. ◽

Hongyu Liu ◽

Pavlos Bousounis ◽

Liam Spurr ◽

Nawaf Alomran ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Single Cells ◽

Sequencing Data ◽

Specific Expression ◽

Single Nucleotide ◽

Healthy Donors ◽

Allele Expression ◽

Single Cell Rna Sequencing ◽

Allele Specific

With the recent advances in single-cell RNA-sequencing (scRNA-seq) technologies, the estimation of allele expression from single cells is becoming increasingly reliable. Allele expression is both quantitative and dynamic and is an essential component of the genomic interactome. Here, we systematically estimate the allele expression from heterozygous single nucleotide variant (SNV) loci using scRNA-seq data generated on the 10×Genomics Chromium platform. We analyzed 26,640 human adipose-derived mesenchymal stem cells (from three healthy donors), sequenced to an average of 150K sequencing reads per cell (more than 4 billion scRNA-seq reads in total). High-quality SNV calls assessed in our study contained approximately 15% exonic and >50% intronic loci. To analyze the allele expression, we estimated the expressed variant allele fraction (VAFRNA) from SNV-aware alignments and analyzed its variance and distribution (mono- and bi-allelic) at different minimum sequencing read thresholds. Our analysis shows that when assessing positions covered by a minimum of three unique sequencing reads, over 50% of the heterozygous SNVs show bi-allelic expression, while at a threshold of 10 reads, nearly 90% of the SNVs are bi-allelic. In addition, our analysis demonstrates the feasibility of scVAFRNA estimation from current scRNA-seq datasets and shows that the 3′-based library generation protocol of 10×Genomics scRNA-seq data can be informative in SNV-based studies, including analyses of transcriptional kinetics.

Download Full-text

Scirpy: A Scanpy extension for analyzing single-cell T-cell receptor sequencing data

10.1101/2020.04.10.035865 ◽

2020 ◽

Author(s):

Gregor Sturm ◽

Tamas Szabo ◽

Georgios Fotakis ◽

Marlene Haider ◽

Dietmar Rieder ◽

...

Keyword(s):

T Cell ◽

Single Cell ◽

Large Scale ◽

Single Cells ◽

Cell Receptor ◽

Sequencing Data ◽

Seamless Integration ◽

T Cell Phenotypes ◽

Cell Phenotypes

AbstractSummaryAdvances in single-cell technologies have enabled the investigation of T cell phenotypes and repertoires at unprecedented resolution and scale. Bioinformatic methods for the efficient analysis of these large-scale datasets are instrumental for advancing our understanding of adaptive immune responses in cancer, but also in infectious diseases like COVID-19. However, while well-established solutions are accessible for the processing of single-cell transcriptomes, no streamlined pipelines are available for the comprehensive characterization of T cell receptors. Here we propose Scirpy, a scalable Python toolkit that provides simplified access to the analysis and visualization of immune repertoires from single cells and seamless integration with transcriptomic data.Availability and implementationScirpy source code and documentation are available at https://github.com/icbi-lab/scirpy.

Download Full-text

Gene set inference from single-cell sequencing data using a hybrid of matrix factorization and variational autoencoders

10.1101/740415 ◽

2019 ◽

Author(s):

Soeren Lukassen ◽

Foo Wei Ten ◽

Roland Eils ◽

Christian Conrad

Keyword(s):

Neural Network ◽

Single Cell ◽

Network Model ◽

Neural Network Model ◽

Matrix Factorization ◽

Latent Variable ◽

Single Cells ◽

Sequencing Data ◽

Gene Set ◽

Gene Sets

AbstractRecent advances in single-cell RNA sequencing (scRNA-Seq) have driven the simultaneous measurement of the expression of 1,000s of genes in 1,000s of single cells. These growing data sets allow us to model gene sets in biological networks at an unprecedented level of detail, in spite of heterogenous cell populations. Here, we propose an unsupervised deep neural network model that is a hybrid of matrix factorization and conditional variational autoencoders (CVA), which utilizes weights as matrix factorizations to obtain gene sets, while class-specific inputs to the latent variable space facilitate a plausible identification of cell types. This artificial neural network model seamlessly integrates functional gene set inference, experimental batch effect correction, and static gene identification, which we conceptually prove here for three single-cell RNA-Seq datasets and suggest for future single-cell-gene analytics.

Download Full-text

Exploring cell-specific miRNA regulation with single-cell miRNA-mRNA co-sequencing data

10.1101/2020.10.14.340299 ◽

2020 ◽

Author(s):

Junpeng Zhang ◽

Lin Liu ◽

Taosheng Xu ◽

Wu Zhang ◽

Chunwen Zhao ◽

...

Keyword(s):

Single Cell ◽

Regulatory Networks ◽

Single Cells ◽

Small Scale ◽

Mirna Regulation ◽

Sequencing Data ◽

Resolution Level ◽

Novel Strategy ◽

Cell Cell

AbstractBackgroundExisting computational methods for studying miRNA regulation are mostly based on bulk miRNA and mRNA expression data. However, bulk data only allows the analysis of miRNA regulation regarding a group of cells, rather than the miRNA regulation unique to individual cells. Recent advance in single-cell miRNA-mRNA co-sequencing technology has opened a way for investigating miRNA regulation at single-cell level. However, as currently single-cell miRNA-mRNA co-sequencing data is just emerging and only available at small-scale, there is a strong need of novel methods to exploit existing single-cell data for the study of cell-specific miRNA regulation.ResultsIn this work, we propose a new method, CSmiR (Cell-Specific miRNA regulation) to use single-cell miRNA-mRNA co-sequencing data to identify miRNA regulatory networks at the resolution of individual cells. We apply CSmiR to the miRNA-mRNA co-sequencing data in 19 K562 single-cells to identify cell-specific miRNA-mRNA regulatory networks to understand miRNA regulation in each K562 single-cell. By analyzing the obtained cell-specific miRNA-mRNA regulatory networks, we observe that the miRNA regulation in each K562 single-cell is unique. Moreover, we conduct detailed analysis on the cell-specific miRNA regulation associated with the miR-17/92 family as a case study. Finally, through exploring cell-cell similarity matrix characterized by cell-specific miRNA regulation, CSmiR provides a novel strategy for clustering single-cells to help understand cell-cell crosstalk.ConclusionsTo the best of our knowledge, CSmiR is the first method to explore miRNA regulation at a single-cell resolution level, and we believe that it can be a useful method to enhance the understanding of cell-specific miRNA regulation.

Download Full-text

Single-cell copy number calling and event history reconstruction

10.1101/2020.04.28.065755 ◽

2020 ◽

Cited By ~ 1

Author(s):

Jack Kuipers ◽

Mustafa Anıl Tuncel ◽

Pedro Ferreira ◽

Katharina Jahn ◽

Niko Beerenwinkel

Keyword(s):

Dna Sequencing ◽

Single Cell ◽

Copy Number ◽

Driving Forces ◽

Simulated Data ◽

Read Depth ◽

Cancer Diagnostics ◽

Whole Genome ◽

Copy Number Alterations ◽

Sequencing Data

Copy number alterations are driving forces of tumour development and the emergence of intra-tumour heterogeneity. A comprehensive picture of these genomic aberrations is therefore essential for the development of personalised and precise cancer diagnostics and therapies. Single-cell sequencing offers the highest resolution for copy number profiling down to the level of individual cells. Recent high-throughput protocols allow for the processing of hundreds of cells through shallow whole-genome DNA sequencing. The resulting low read-depth data poses substantial statistical and computational challenges to the identification of copy number alterations. We developed SCICoNE, a statistical model and MCMC algorithm tailored to single-cell copy number profiling from shallow whole-genome DNA sequencing data. SCICoNE reconstructs the history of copy number events in the tumour and uses these evolutionary relationships to identify the copy number profiles of the individual cells. We show the accuracy of this approach in evaluations on simulated data and demonstrate its practicability in applications to a xenograft breast cancer sample.

Download Full-text