scholarly journals Cardelino: Integrating whole exomes and single-cell transcriptomes to reveal phenotypic impact of somatic variants

2018 ◽  
Author(s):  
Davis J. McCarthy ◽  
Raghd Rostom ◽  
Yuanhua Huang ◽  
Daniel J. Kunz ◽  
Petr Danecek ◽  
...  

AbstractDecoding the clonal substructures of somatic tissues sheds light on cell growth, development and differentiation in health, ageing and disease. DNA-sequencing, either using bulk or using single-cell assays, has enabled the reconstruction of clonal trees from frequency and co-occurrence patterns of somatic variants. However, approaches to systematically characterize phenotypic and functional variations between individual clones are not established. Here we present cardelino (https://github.com/PMBio/cardelino), a computational method for inferring the clone of origin of individual cells that have been assayed using single-cell RNA-seq (scRNA-seq). After validating our model using simulations, we apply cardelino to matched scRNA-seq and exome sequencing data from 32 human dermal fibroblast lines, identifying hundreds of differentially expressed genes between cells from different somatic clones. These genes are frequently enriched for cell cycle and proliferation pathways, indicating a key role for cell division genes in non-neutral somatic evolution.Key findingsA novel approach for integrating DNA-seq and single-cell RNA-seq data to reconstruct clonal substructure for single-cell transcriptomes.Evidence for non-neutral evolution of clonal populations in human fibroblasts.Proliferation and cell cycle pathways are commonly distorted in mutated clonal populations.

2020 ◽  
Vol 36 (15) ◽  
pp. 4255-4262
Author(s):  
Si-Yi Chen ◽  
Chun-Jie Liu ◽  
Qiong Zhang ◽  
An-Yuan Guo

Abstract Motivation T-cell receptors (TCRs) function to recognize antigens and play vital roles in T-cell immunology. Surveying TCR repertoires by characterizing complementarity-determining region 3 (CDR3) is a key issue. Due to the high diversity of CDR3 and technological limitation, accurate characterization of CDR3 repertoires remains a great challenge. Results We propose a computational method named CATT for ultra-sensitive and precise TCR CDR3 sequences detection. CATT can be applied on TCR sequencing, RNA-Seq and single-cell TCR(RNA)-Seq data to characterize CDR3 repertoires. CATT integrated de Bruijn graph-based micro-assembly algorithm, data-driven error correction model and Bayesian inference algorithm, to self-adaptively and ultra-sensitively characterize CDR3 repertoires with high performance. Benchmark results of datasets from in silico and experimental data demonstrated that CATT showed superior recall and precision compared with existing tools, especially for data with short read length and small size and single-cell sequencing data. Thus, CATT will be a useful tool for TCR analysis in researches of cancer and immunology. Availability and implementation http://bioinfo.life.hust.edu.cn/CATT or https://github.com/GuoBioinfoLab/CATT. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Parashar Dhapola ◽  
Mohamed Eldeeb ◽  
Amol Ugale ◽  
Rasmus Olofzon ◽  
Eva Erlandsson ◽  
...  

ABSTRACTSingle-cell transcriptomics facilitates innovative approaches to define and identify cell types within tissues and cell populations. An emerging interest in the cancer field is to assess the heterogeneity of transformed cells, including the identification of tumor-initiating cells based on similarities to their normal counterparts. However, such cell mapping is often confounded by the large effects on total gene expression programs introduced by strong perturbations such as an oncogenic event. Here, we present Nabo, a novel computational method that allows mapping of cells from one population to the most similar cells in a reference population, independently of confounding changes to gene expression programs initiated by perturbation. We validated this method on multiple datasets from different sources and platforms and show that Nabo achieves higher rates of accuracy than conventional classification methods. Nabo is available as an integrated toolkit for preprocessing, cell mapping, differential gene expression identification, and visualization of single-cell RNA-Seq data. For exploratory studies, Nabo includes methods to help evaluate the reliability of cell mapping results. We applied Nabo on droplet-based single-cell RNA-Seq data of healthy and oncogene-induced (MLL-ENL) hematopoietic progenitor cells (GMLPs) differentiating in vitro. Despite a substantial cellular heterogeneity resulting from differentiation of GMLPs and the large transcriptional effects induced by the fusion oncogene, Nabo could pinpoint the specific cell stage where differentiation arrest occurs, which included an immunophenotypic definition of the tumor-initiating population. Thus, Nabo allows for relevant comparison between target and control cells, without being confounded by differences in population heterogeneity.


Author(s):  
Alex M. Ascensión ◽  
Sandra Fuertes-Álvarez ◽  
Olga Ibañez-Solé ◽  
Ander Izeta ◽  
Marcos J. Araúzo-Bravo

2019 ◽  
Author(s):  
Christina Huan Shi ◽  
Kevin Y. Yip

AbstractK-mer counting has many applications in sequencing data processing and analysis. However, sequencing errors can produce many false k-mers that substantially increase the memory requirement during counting. We propose a fast k-mer counting method, CQF-deNoise, which has a novel component for dynamically identifying and removing false k-mers while preserving counting accuracy. Compared with four state-of-the-art k-mer counting methods, CQF-deNoise consumed 49-76% less memory than the second best method, but still ran competitively fast. The k-mer counts from CQF-deNoise produced cell clusters from single-cell RNA-seq data highly consistent with CellRanger but required only 5% of the running time at the same memory consumption, suggesting that CQF-deNoise can be used for a preview of cell clusters for an early detection of potential data problems, before running a much more time-consuming full analysis pipeline.


2018 ◽  
Vol 138 (4) ◽  
pp. 811-825 ◽  
Author(s):  
Christina Philippeos ◽  
Stephanie B. Telerman ◽  
Bénédicte Oulès ◽  
Angela O. Pisco ◽  
Tanya J. Shaw ◽  
...  

2020 ◽  
Vol 30 (4) ◽  
pp. 611-621 ◽  
Author(s):  
Chiaowen Joyce Hsiao ◽  
PoYuan Tung ◽  
John D. Blischak ◽  
Jonathan E. Burnett ◽  
Kenneth A. Barr ◽  
...  

Author(s):  
Massimo Andreatta ◽  
Santiago J Carmona

Abstract Summary STACAS is a computational method for the identification of integration anchors in the Seurat environment, optimized for the integration of single-cell (sc) RNA-seq datasets that share only a subset of cell types. We demonstrate that by (i) correcting batch effects while preserving relevant biological variability across datasets, (ii) filtering aberrant integration anchors with a quantitative distance measure and (iii) constructing optimal guide trees for integration, STACAS can accurately align scRNA-seq datasets composed of only partially overlapping cell populations. Availability and implementation Source code and R package available at https://github.com/carmonalab/STACAS; Docker image available at https://hub.docker.com/repository/docker/mandrea1/stacas_demo.


2018 ◽  
Author(s):  
Xianwen Ren ◽  
Liangtao Zheng ◽  
Zemin Zhang

ABSTRACTClustering is a prevalent analytical means to analyze single cell RNA sequencing data but the rapidly expanding data volume can make this process computational challenging. New methods for both accurate and efficient clustering are of pressing needs. Here we proposed a new clustering framework based on random projection and feature construction for large scale single-cell RNA sequencing data, which greatly improves clustering accuracy, robustness and computational efficacy for various state-of-the-art algorithms benchmarked on multiple real datasets. On a dataset with 68,578 human blood cells, our method reached 20% improvements for clustering accuracy and 50-fold acceleration but only consumed 66% memory usage compared to the widely-used software package SC3. Compared to k-means, the accuracy improvement can reach 3-fold depending on the concrete dataset. An R implementation of the framework is available from https://github.com/Japrin/sscClust.


Sign in / Sign up

Export Citation Format

Share Document