Using Single Nucleotide Variations in Single-Cell RNA-Seq to Identify Subpopulations and Genotype-phenotype Linkage

Mapping Intimacies ◽

10.1101/095810 ◽

2016 ◽

Cited By ~ 4

Author(s):

Olivier Poirion ◽

Xun Zhu ◽

Travers Ching ◽

Lana X. Garmire

Keyword(s):

Gene Expression ◽

Single Cell ◽

Single Cells ◽

Transcript Abundance ◽

Rna Seq ◽

Linear Modeling ◽

Modeling Framework ◽

Single Nucleotide ◽

Single Nucleotide Variations

AbstractDespite its popularity, characterization of subpopulations with transcript abundance is subject to a significant amount of noise. We propose to use effective and expressed nucleotide variations (eeSNVs) from scRNA-seq as alternative features for tumor subpopulation identification. We developed a linear modeling framework, SSrGE, to link eeSNVs associated with gene expression. In all the datasets tested, eeSNVs achieve better accuracies than gene expression for identifying subpopulations. Previously validated cancer-relevant genes are also highly ranked, confirming the significance of the method. Moreover, SSrGE is capable of analyzing coupled DNA-seq and RNA-seq data from the same single cells, demonstrating its value in integrating multi-omics single cell techniques. In summary, SNV features from scRNA-seq data have merits for both subpopulation identification and linkage of genotype-phenotype relationship. The method SSrGE is available at https://github.com/lanagarmire/SSrGE.

Download Full-text

SHERRY2: A method for rapid and sensitive single cell RNA-seq

10.1101/2021.12.25.474161 ◽

2021 ◽

Author(s):

Lin Di ◽

Bo Liu ◽

Yuzhu Lyu ◽

Shihui Zhao ◽

Yuhong Pang ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Dynamic Range ◽

Single Cells ◽

Rna Seq ◽

Wide Dynamic Range ◽

Uniform Coverage ◽

Optimized Protocol ◽

Tn5 Transposase ◽

Higher Sensitivity

Many single cell RNA-seq applications aim to probe a wide dynamic range of gene expression, but most of them are still challenging to accurately quantify low-aboundance transcripts. Based on our previous finding that Tn5 transposase can directly cut-and-tag DNA/RNA hetero-duplexes, we present SHERRY2, an optimized protocol for sequencing transcriptomes of single cells or single nuclei. SHERRY2 is robust and scalable, and it has higher sensitivity and more uniform coverage in comparison with prevalent scRNA-seq methods. With throughput of a few thousand cells per batch, SHERRY2 can reveal the subtle transcriptomic differences between cells and facilitate important biological discoveries.

Download Full-text

Bayesian inference of the gene expression states of single cells from scRNA-seq data

10.1101/2019.12.28.889956 ◽

2019 ◽

Cited By ~ 3

Author(s):

Jérémie Breda ◽

Mihaela Zavolan ◽

Erik van Nimwegen

Keyword(s):

Gene Expression ◽

Single Cell ◽

Single Cells ◽

Downstream Processing ◽

Noise Removal ◽

Rna Seq ◽

Expression Of Genes ◽

Normalization Methods ◽

Quantify Gene Expression ◽

Selection Of

AbstractIn spite of a large investment in the development of methodologies for analysis of single-cell RNA-seq data, there is still little agreement on how to best normalize such data, i.e. how to quantify gene expression states of single cells from such data. Starting from a few basic requirements such as that inferred expression states should correct for both intrinsic biological fluctuations and measurement noise, and that changes in expression state should be measured in terms of fold-changes rather than changes in absolute levels, we here derive a unique Bayesian procedure for normalizing single-cell RNA-seq data from first principles. Our implementation of this normalization procedure, called Sanity (SAmpling Noise corrected Inference of Transcription activitY), estimates log expression values and associated errors bars directly from raw UMI counts without any tunable parameters.Comparison of Sanity with other recent normalization methods on a selection of scRNA-seq datasets shows that Sanity outperforms other methods on basic downstream processing tasks such as clustering cells into subtypes and identification of differentially expressed genes. More importantly, we show that all other normalization methods present severely distorted pictures of the data. By failing to account for biological and technical Poisson noise, many methods systematically predict the lowest expressed genes to be most variable in expression, whereas in reality these genes provide least evidence of true biological variability. In addition, by confounding noise removal with lower-dimensional representation of the data, many methods introduce strong spurious correlations of expression levels with the total UMI count of each cell as well as spurious co-expression of genes.

Download Full-text

Single-cell RNA-seq data reveals TNBC tumor heterogeneity through characterizing subclone compositions and proportions

10.1101/858290 ◽

2019 ◽

Author(s):

Weida Wang ◽

Jinyuan Xu ◽

Shuyuan Wang ◽

Peng Xia ◽

Li Zhang ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Tumor Heterogeneity ◽

Single Cells ◽

Rna Seq ◽

Biological Functions ◽

Gene Markers ◽

Gene Expression Matrix ◽

Deconvolution Algorithm ◽

Expression Matrix

AbstractUnderstanding subclonal architecture and their biological functions poses one of the key challenges to deeply portray and investigative the cause of triple-negative breast cancer (TNBC). Here we combine single-cell and bulk sequencing data to analyze tumor heterogeneity through characterizing subclone compositions and proportions. Based on sing-cell RNA-seq data (GSE118389) we identified five distinct cell subpopulations and characterized their biological functions based on their gene markers. According to the results of functional annotation, we found that C1 and C2 are related to immune functions, while C5 is related to programmed cell death. Then based on subclonal basis gene expression matrix, we applied deconvolution algorithm on TCGA tissue RNA-seq data and observed that microenvironment is diverse among TNBC subclones, especially C1 is closely related to T cells. What’s more, we also found that high C5 proportions would led to poor survival outcome, log-rank test p-value and HR [95%CI] for five years overall survival in GSE96058 dataset were 0.0158 and 2.557 [1.160-5.636]. Collectively, our analysis reveals both intra-tumor and inter-tumor heterogeneity and their association with subclonal microenvironment in TNBC (subclone compositions and proportions), and uncovers the organic combination of subclones dictating poor outcomes in this disease.HighlightsWe applied deconvolution algorithm on subclonal basis gene expression matrix to link single cells and bulk tissue together.

Download Full-text

Solo: doublet identification via semi-supervised deep learning

10.1101/841981 ◽

2019 ◽

Cited By ~ 1

Author(s):

Nicholas Bernstein ◽

Nicole Fong ◽

Irene Lam ◽

Margaret Roy ◽

David G. Hendrickson ◽

...

Keyword(s):

Gene Expression ◽

Deep Learning ◽

High Resolution ◽

Single Cell ◽

Single Cells ◽

Detection Methods ◽

Learning Approach ◽

Rna Seq ◽

Previous Approach ◽

Cell Technology

AbstractSingle cell RNA-seq (scRNA-seq) measurements of gene expression enable an unprecedented high-resolution view into cellular state. However, current methods often result in two or more cells that share the same cell-identifying barcode; these “doublets” violate the fundamental premise of single cell technology and can lead to incorrect inferences. Here, we describe Solo, a semi-supervised deep learning approach that identifies doublets with greater accuracy than existing methods. Solo can be applied in combination with experimental doublet detection methods to further purify scRNA-seq data to true single cells beyond any previous approach.

Download Full-text

Comparative analysis of sequencing technologies platforms for single-cell transcriptomics

10.1101/463117 ◽

2018 ◽

Cited By ~ 1

Author(s):

Kedar Nath Natarajan ◽

Zhichao Miao ◽

Miaomiao Jiang ◽

Xiaoyun Huang ◽

Hongpo Zhou ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Single Cells ◽

K562 Cells ◽

Library Preparation ◽

Rna Seq ◽

Illumina Hiseq ◽

Technical Variability ◽

Sequencing Technologies ◽

Sequencing Platforms

AbstractAll single-cell RNA-seq protocols and technologies require library preparation prior to sequencing on a platform such as Illumina. Here, we present the first report to utilize the BGISEQ-500 platform for scRNA-seq, and compare the sensitivity and accuracy to Illumina sequencing. We generate a scRNA-seq resource of 468 unique single-cells and 1,297 matched single cDNA samples, performing SMARTer and Smart-seq2 protocols on mESCs and K562 cells with RNA spike-ins. We sequence these libraries on both BGISEQ-500 and Illumina HiSeq platforms using single- and paired-end reads. The two platforms have comparable sensitivity and accuracy in terms of quantification of gene expression, and low technical variability. Our study provides a standardised scRNA-seq resource to benchmark new scRNA-seq library preparation protocols and sequencing platforms.

Download Full-text

LTMG: A novel statistical modeling of transcriptional expression states in single-cell RNA-Seq data

10.1101/430009 ◽

2018 ◽

Cited By ~ 1

Author(s):

Changlin Wan ◽

Wennan Chang ◽

Yu Zhang ◽

Fenil Shah ◽

Xiaoyu Lu ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Single Cells ◽

Cell Types ◽

R Package ◽

Data Sets ◽

Rna Seq ◽

Cell Functions ◽

Transcriptional Regulatory ◽

A Cell

ABSTRACTA key challenge in modeling single-cell RNA-seq (scRNA-seq) data is to capture the diverse gene expression states regulated by different transcriptional regulatory inputs across single cells, which is further complicated by a large number of observed zero and low expressions. We developed a left truncated mixture Gaussian (LTMG) model that stems from the kinetic relationships between the transcriptional regulatory inputs and metabolism of mRNA and gene expression abundance in a cell. LTMG infers the expression multi-modalities across single cell entities, representing a gene’s diverse expression states; meanwhile the dropouts and low expressions are treated as left truncated, specifically representing an expression state that is under suppression. We demonstrated that LTMG has significantly better goodness of fitting on an extensive number of single-cell data sets, comparing to three other state of the art models. In addition, our systems kinetic approach of handling the low and zero expressions and correctness of the identified multimodality are validated on several independent experimental data sets. Application on data of complex tissues demonstrated the capability of LTMG in extracting varied expression states specific to cell types or cell functions. Based on LTMG, a differential gene expression test and a co-regulation module identification method, namely LTMG-DGE and LTMG-GCR, are further developed. We experimentally validated that LTMG-DGE is equipped with higher sensitivity and specificity in detecting differentially expressed genes, compared with other five popular methods, and that LTMG-GCR is capable to retrieve the gene co-regulation modules corresponding to perturbed transcriptional regulations. A user-friendly R package with all the analysis power is available at https://github.com/zy26/LTMGSCA.

Download Full-text

Comprehensive Bulk and Single Cell Transcriptomic Characterization of SF3B1 Mutation Reveals Its Pleiotropic Effects in Chronic Lymphocytic Leukemia

Blood ◽

10.1182/blood.v126.23.2906.2906 ◽

2015 ◽

Vol 126 (23) ◽

pp. 2906-2906 ◽

Cited By ~ 1

Author(s):

Jean Fan ◽

Lili Wang ◽

Angela N Brooks ◽

Youzhong Wan ◽

Donna S Neuberg ◽

...

Keyword(s):

Alternative Splicing ◽

Single Cell ◽

Single Cells ◽

Lymphocytic Leukemia ◽

Rna Seq ◽

Wild Type ◽

Mutation Status ◽

Total Rna ◽

Gene Sets

Abstract Large-scale sequencing efforts have identified SF3B1 as arecurrently mutated gene in chronic lymphocytic leukemia (CLL). While SF3B1 mutations have been associated with adverse clinical outcome in CLL, mechanistic understanding of its role in the oncogenic phenotype remains lacking. We therefore undertook a comprehensive transcriptomic characterization of CLL in relation to SF3B1 mutation status at both bulk and single cell levels. We first profiled bulk mature poly-A selected RNA by sequencing (RNA-seq) from 37 CLLs (13 SF3B1 wild-type, 24 mutated). After identifying and classifying splice alterations using the tool JuncBASE, we found SF3B1 mutation to be associated with increased alternative splicing, with the most pervasive changes in 3' splice site selection. 304 alternatively spliced events were significantly associated with SF3B1 mutation, 4 of which we validated by qRT-PCR in 20 independent CLL samples with known SF3B1 mutation status. We further identified 1963 differentially expressed genes (q < 0.2) associated with SF3B1 mutation. By gene set enrichment analysis, SF3B1 mutation appeared to impact a variety of cancer and CLL-associated gene pathways, including DNA damage response, apoptosis regulation, chromatin remodeling, RNA processing, and Notch activation (q < 0.01). ~20% of these gene sets were also found to be significantly enriched for genes exhibiting alternative splicing in association with SF3B1 mutation. As SF3B1 acts at the level of pre-mRNA, we also performed bulk RNA-seq with total RNA libraries generated from 5 CLLs (2 SF3B1 wild-type, 3 with the common K700E mutation). We again observed an enrichment of 3' splice site changes, along with ~30% overlap of differentially expressed genes, and ~16% overlap of enriched gene sets with the aforementioned poly-A data analysis. One differentially over-expressed gene associated with SF3B1 mutation unique to this total RNA data analysis and validated by total RNA qPCR of independent CLL samples was TERC, an essential RNA component of telomerase that serves as a replication template during telomeric elongation. TERC is a non-polyadenylated transcript and thus was undetected by our previous poly-A selected RNA-seq and by targeted qRT-PCR of oligo dT-generated cDNA. Recent reports have highlighted the involvement of the spliceosome in telomerase RNA processing, and shorter telomere length of CLLs with SF3B1 mutation. Thus, although further investigation will be needed, our analyses suggest a potential mechanism by which SF3B1 mutation contributes to aberrant regulation of telomerase activity. Since SF3B1 is commonly found as a subclonal mutation in CLL, and because signals obtained from bulk analyses reflect only the average characteristics of the population, we assessed the transcriptomic effects of SF3B1 mutation in single cells within a subset of CLL cases. We developed a novel and sensitive microfluidic approach that performs multiplexed targeted amplification of RNA to simultaneously detect somatic mutation status, gene expression (96 targets), and alternative splicing (45 targets) within the same individual cell for 96 to 288 cells from 5 patients with different SF3B1 mutations. From the same patient sample, single cells with SF3B1 mutation generally exhibited increased alternative splicing for events identified from the bulk analysis, thus confirming the association of SF3B1 mutation with altered splicing at the single cell level. Different SF3B1 hotspot mutations within the HEAT repeat domains exhibited similar patterns of alternative splicing while a mutation outside of the repeat domain did not. Furthermore, we confirmed significant changes in gene expression between SF3B1 wild-type and mutant cells of target genes involved in the Notch pathway (NCOR2), cell cycle (CDKN2A, CCND1) and apoptosis (TXNIP). Consistent with these analyses, functional studies with overexpression of full-length mutated SF3B1 in a hematopoietic cell lines confirmed the modulation of these pathways by this putative CLL driver. Our high-resolution single cell analysis further uncovered 2 transcription factors strongly associated with SF3B1 mutation but not previously appreciated (KLF3 and KLF8). Our comprehensive transcriptomic analysis thus highlights SF3B1 mutation as an efficient mechanism by which a complex of changes relevant to CLL biology are generated that can contribute to disease progression. Disclosures Kipps: Pharmacyclics Abbvie Celgene Genentech Astra Zeneca Gilead Sciences: Other: Advisor. Li:Fluidigm: Employment. Livak:Fluidigm: Employment.

Download Full-text

CCSN: Single Cell RNA Sequencing Data Analysis by Conditional Cell-specific Network

10.1101/2020.01.25.919829 ◽

2020 ◽

Author(s):

Lin Li ◽

Hao Dai ◽

Zhaoyuan Fang ◽

Luonan Chen

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Network Flow ◽

Single Cells ◽

Cellular Heterogeneity ◽

Rna Seq ◽

Sequencing Data ◽

Cell Clustering ◽

A Cell

AbstractThe rapid advancement of single cell technologies has shed new light on the complex mechanisms of cellular heterogeneity. However, compared with bulk RNA sequencing (RNA-seq), single-cell RNA-seq (scRNA-seq) suffers from higher noise and lower coverage, which brings new computational difficulties. Based on statistical independence, cell-specific network (CSN) is able to quantify the overall associations between genes for each cell, yet suffering from a problem of overestimation related to indirect effects. To overcome this problem, we propose the “conditional cell-specific network” (CCSN) method, which can measure the direct associations between genes by eliminating the indirect associations. CCSN can be used for cell clustering and dimension reduction on a network basis of single cells. Intuitively, each CCSN can be viewed as the transformation from less “reliable” gene expression to more “reliable” gene-gene associations in a cell. Based on CCSN, we further design network flow entropy (NFE) to estimate the differentiation potency of a single cell. A number of scRNA-seq datasets were used to demonstrate the advantages of our approach: (1) one direct association network for one cell; (2) most existing scRNA-seq methods designed for gene expression matrices are also applicable to CCSN-transformed degree matrices; (3) CCSN-based NFE helps resolving the direction of differentiation trajectories by quantifying the potency of each cell. CCSN is publicly available at http://sysbio.sibcb.ac.cn/cb/chenlab/soft/CCSN.zip.

Download Full-text

MAST: A flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA-seq data.

10.1101/020842 ◽

2015 ◽

Cited By ~ 3

Author(s):

Greg Finak ◽

Andrew McDavid ◽

Masanao Yajima ◽

Jingyuan Deng ◽

Vivian Gersuk ◽

...

Keyword(s):

Single Cell ◽

Gene Set Enrichment Analysis ◽

Cellular Heterogeneity ◽

Single Cell Level ◽

Rna Seq ◽

Linear Modeling ◽

Cell Level ◽

Modeling Framework ◽

Extrinsic Noise ◽

Transcriptomic Data

Single-cell transcriptomic profiling enables the unprecedented interrogation of gene expression heterogeneity in rare cell populations that would otherwise be obscured in bulk RNA sequencing experiments. The stochastic nature of transcription is revealed in the bimodality of single-cell transcriptomic data, a feature shared across single-cell expression platforms. There is, however, a paucity of computational tools that take advantage of this unique characteristic. We present a new methodology to analyze single-cell transcriptomic data that models this bimodality within a coherent generalized linear modeling framework. We propose a two-part, generalized linear model that allows one to characterize biological changes in the proportions of cells that are expressing each gene, and in the positive mean expression level of that gene. We introduce the cellular detection rate, the fraction of genes turned on in a cell, and show how it can be used to simultaneously adjust for technical variation and so-called “extrinsic noise” at the single-cell level without the use of control genes. Our model permits direct inference on statistics formed by collections of genes, facilitating gene set enrichment analysis. The residuals defined by such models can be manipulated to interrogate cellular heterogeneity and gene-gene correlation across cells and conditions, providing insights into the temporal evolution of networks of co-expressed genes at the single-cell level. Using two single-cell RNA-seq datasets, including newly generated data from Mucosal Associated Invariant T (MAIT) cells, we show how model residuals can be used to identify significant changes across biologically relevant gene sets that are missed by other methods and characterize cellular heterogeneity in response to stimulation.

Download Full-text

SIS-seq, a molecular ‘time machine’, connects single cell fate with gene programs

10.1101/403113 ◽

2018 ◽

Cited By ~ 3

Author(s):

Luyi Tian ◽

Jaring Schreuder ◽

Daniela Zalcenstein ◽

Jessica Tran ◽

Nikolce Kocovski ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Fate ◽

Single Cells ◽

Cell Subset ◽

Dendritic Cell Subset ◽

Rna Seq ◽

Time Machine ◽

Clonal Heterogeneity ◽

And Function

AbstractConventional single cell RNA-seq methods are destructive, such that a given cell cannot also then be tested for fate and function, without a time machine. Here, we develop a clonal method SIS-seq, whereby single cells are allowed to divide, and progeny cells are assayed separately in SISter conditions; some for fate, others by RNA-seq. By cross-correlating progenitor gene expression with mature cell fate within a clone, and doing this for many clones, we can identify the earliest gene expression signatures of dendritic cell subset development. SIS-seq could be used to study other populations harboring clonal heterogeneity, including stem, reprogrammed and cancer cells to reveal the transcriptional origins of fate decisions.

Download Full-text