scholarly journals SCOPE: a normalization and copy number estimation method for single-cell DNA sequencing

2019 ◽  
Author(s):  
Rujin Wang ◽  
Dan-Yu Lin ◽  
Yuchao Jiang

AbstractWhole genome single-cell DNA sequencing (scDNA-seq) enables characterization of copy number profiles at the cellular level. This technology circumvents the averaging effects associated with bulk-tissue sequencing and increases resolution while decreasing ambiguity in tracking the evolutionary history of cancer. ScDNA-seq data is, however, highly sparse and noisy due to the biases and artifacts that are introduced during the library preparation and sequencing procedure. Here, we propose SCOPE, a normalization and copy number estimation method for scDNA-seq data of cancer cells. The main features of SCOPE include: (i) a Poisson latent factor model for normalization, which borrows information across cells and regions to estimate bias, using negative control cells identified by cell-specific Gini coefficients; (ii) modeling of GC content bias using an expectation-maximization algorithm embedded in the normalization step, which accounts for the aberrant copy number changes that deviate from the null distributions; and (iii) a cross-sample segmentation procedure to identify breakpoints that are shared across cells from the same subclone. We evaluate SCOPE on a diverse set of scDNA-seq data in cancer genomics, using array-based calls of purified bulk samples as gold standards and whole-exome sequencing and single-cell RNA sequencing as orthogonal validations; we find that, compared to existing methods, SCOPE offers more accurate copy number estimates. Further, we demonstrate SCOPE on three recently released scDNA-seq datasets by 10X Genomics: we show that it can reliably recover 1% cancer cell spike-ins from a background of normal cells and that it successfully reconstructs cancer subclonal structure from ∼10,000 breast cancer cells.

Cell Systems ◽  
2020 ◽  
Vol 10 (5) ◽  
pp. 445-452.e6 ◽  
Author(s):  
Rujin Wang ◽  
Dan-Yu Lin ◽  
Yuchao Jiang

2020 ◽  
Vol 16 (7) ◽  
pp. e1008012 ◽  
Author(s):  
Xian F. Mallory ◽  
Mohammadamin Edrisi ◽  
Nicholas Navin ◽  
Luay Nakhleh

Author(s):  
Jack Kuipers ◽  
Mustafa Anıl Tuncel ◽  
Pedro Ferreira ◽  
Katharina Jahn ◽  
Niko Beerenwinkel

Copy number alterations are driving forces of tumour development and the emergence of intra-tumour heterogeneity. A comprehensive picture of these genomic aberrations is therefore essential for the development of personalised and precise cancer diagnostics and therapies. Single-cell sequencing offers the highest resolution for copy number profiling down to the level of individual cells. Recent high-throughput protocols allow for the processing of hundreds of cells through shallow whole-genome DNA sequencing. The resulting low read-depth data poses substantial statistical and computational challenges to the identification of copy number alterations. We developed SCICoNE, a statistical model and MCMC algorithm tailored to single-cell copy number profiling from shallow whole-genome DNA sequencing data. SCICoNE reconstructs the history of copy number events in the tumour and uses these evolutionary relationships to identify the copy number profiles of the individual cells. We show the accuracy of this approach in evaluations on simulated data and demonstrate its practicability in applications to a xenograft breast cancer sample.


2021 ◽  
Author(s):  
Nicholas Navin ◽  
Jake Leighton ◽  
Min Hu ◽  
Emi Sei ◽  
Funda Meric-Bernstam

Single cell DNA sequencing (scDNA-seq) methods are powerful tools for profiling mutations in cancer cells, however most genomic regions characterized in single cells are non-informative. To overcome this issue, we developed a Multi-Patient-Targeted (MPT) scDNA-seq sequencing method. MPT involves first performing bulk exome sequencing across a cohort of cancer patients to identify somatic mutations, which are then pooled together to develop a single custom targeted panel for high-throughput scDNA-seq using a microfluidics platform. We applied MPT to profile 330 mutations across 23,500 cells from 5 TNBC patients, which showed that 3 tumors were monoclonal and 2 tumors were polyclonal. From this data, we reconstructed mutational lineages and identified early mutational and copy number events, including early TP53 mutations that occurred in all five patients. Collectively, our data suggests that MPT can overcome technical obstacles for studying tumor evolution using scDNA-seq by profiling information-rich mutation sites.


2017 ◽  
Author(s):  
Yuchao Jiang ◽  
Rujin Wang ◽  
Eugene Urrutia ◽  
Ioannis N. Anastopoulos ◽  
Katherine L. Nathanson ◽  
...  

AbstractHigh-throughput DNA sequencing enables detection of copy number variations (CNVs) on the genome-wide scale with finer resolution compared to array-based methods, but suffers from biases and artifacts that lead to false discoveries and low sensitivity. We describe CODEX2, a statistical framework for full-spectrum CNV profiling that is sensitive for variants with both common and rare population frequencies and that is applicable to study designs with and without negative control samples. We demonstrate and evaluate CODEX2 on whole-exome and targeted sequencing data, where biases are the most prominent. CODEX2 outperforms existing methods and, in particular, significantly improves sensitivity for common CNVs.


2019 ◽  
Author(s):  
Xian Fan ◽  
Mohammadamin Edrisi ◽  
Nicholas Navin ◽  
Luay Nakhleh

AbstractSingle-cell DNA sequencing technologies are enabling the study of mutations and their evolutionary trajectories in cancer. Somatic copy number aberrations (CNAs) have been implicated in the development and progression of various types of cancer. A wide array of methods for CNA detection has been either developed specifically for or adapted to single-cell DNA sequencing data. Understanding the strengths and limitations that are unique to each of these methods is very important for obtaining accurate copy number profiles from single-cell DNA sequencing data. Here we review the major steps that are followed by these methods when analyzing such data, and then review the strengths and limitations of the methods individually. In terms of segmenting the genome into regions of different copy numbers, we categorize the methods into three groups, select a representative method from each group that has been commonly used in this context, and benchmark them on simulated as well as real datasets. While single-cell DNA sequencing is very promising for elucidating and understanding CNAs, even the best existing method does not exceed 80% accuracy. New methods that significantly improve upon the accuracy of these three methods are needed. Furthermore, with the large datasets being generated, the methods must be computationally efficient.


2021 ◽  
Author(s):  
Wilson McKerrow ◽  
Shane A. Evans ◽  
Azucena Rocha ◽  
John Sedivy ◽  
Nicola Neretti ◽  
...  

AbstractLINE-1 retrotransposons are known to be expressed in early development, in tumors and in the germline. Less is known about LINE-1 expression at the single cell level, especially outside the context of cancer. Because LINE-1 elements are present at a high copy number, many transcripts that are not driven by the LINE-1 promoter nevertheless terminate at the LINE-1 3’ UTR. Thus, 3’ targeted single cell RNA-seq datasets are not appropriate for studying LINE-1. However, 5’ targeted single cell datasets provide an opportunity to analyze LINE-1 expression at the single cell level. Most LINE-1 copies are 5’ truncated, and a transcript that contains the LINE-1 5’ UTR as its 5’ end is likely to have been transcribed from its promoter. We developed a method, L1-sc (LINE-1 expression for single cells), to quantify LINE-1 expression in 5’ targeted 10x genomics single cell RNA-seq datasets. Our method confirms that LINE-1 expression is high in cancer cells, but low or absent from immune cells. We also find that LINE-1 expression is elevated in epithelial compared to immune cells outside of the context of cancer and that it is also elevated in neurons compared to glia in the mouse hippocampus.


2021 ◽  
Author(s):  
Sanjana Rajan ◽  
Simone Zaccaria ◽  
Matthew V. Cannon ◽  
Maren Cam ◽  
Amy C. Gross ◽  
...  

AbstractOsteosarcoma is an aggressive malignancy characterized by high genomic complexity. Identification of few recurrent mutations in protein coding genes suggests that somatic copy-number aberrations (SCNAs) are the genetic drivers of disease. Models around genomic instability conflict-it is unclear if osteosarcomas result from pervasive ongoing clonal evolution with continuous optimization of the fitness landscape or an early catastrophic event followed by stable maintenance of an abnormal genome. We address this question by investigating SCNAs in 12,019 tumor cells obtained from expanded patient tissues using single-cell DNA sequencing, in ways that were previously impossible with bulk sequencing. Using the CHISEL algorithm, we inferred allele- and haplotype-specific SCNAs from whole-genome single-cell DNA sequencing data. Surprisingly, we found that, despite extensive genomic aberrations, cells within each tumor exhibit remarkably homogeneous SCNA profiles with little sub-clonal diversification. Longitudinal analysis between two pairs of patient samples obtained at distant time points (early detection, relapse) demonstrated remarkable conservation of SCNA profiles over tumor evolution. Phylogenetic analysis suggests that the bulk of SCNAs was acquired early in the oncogenic process, with few new events arising in response to therapy or during adaptation to growth in distant tissues. These data suggest that early catastrophic events, rather than sustained genomic instability, drive formation of these extensively aberrant genomes. Overall, we demonstrate the power of combining single-cell DNA sequencing with an allele- and haplotype-specific SCNA inference algorithm to resolve longstanding questions regarding genetics of tumor initiation and progression, questioning the underlying assumptions of genomic instability inferred from bulk tumor data.


Sign in / Sign up

Export Citation Format

Share Document