ColorCells: a database of expression, classification and functions of lncRNAs in single cells

Author(s):  
Ling-Ling Zheng ◽  
Jing-Hua Xiong ◽  
Wu-Jian Zheng ◽  
Jun-Hao Wang ◽  
Zi-Liang Huang ◽  
...  

Abstract Although long noncoding RNAs (lncRNAs) have significant tissue specificity, their expression and variability in single cells remain unclear. Here, we developed ColorCells (http://rna.sysu.edu.cn/colorcells/), a resource for comparative analysis of lncRNAs expression, classification and functions in single-cell RNA-Seq data. ColorCells was applied to 167 913 publicly available scRNA-Seq datasets from six species, and identified a batch of cell-specific lncRNAs. These lncRNAs show surprising levels of expression variability between different cell clusters, and has the comparable cell classification ability as known marker genes. Cell-specific lncRNAs have been identified and further validated by in vitro experiments. We found that lncRNAs are typically co-expressed with the mRNAs in the same cell cluster, which can be used to uncover lncRNAs’ functions. Our study emphasizes the need to uncover lncRNAs in all cell types and shows the power of lncRNAs as novel marker genes at single cell resolution.

2020 ◽  
Author(s):  
Mohit Goyal ◽  
Guillermo Serrano ◽  
Ilan Shomorony ◽  
Mikel Hernaez ◽  
Idoia Ochoa

AbstractSingle-cell RNA-seq is a powerful tool in the study of the cellular composition of different tissues and organisms. A key step in the analysis pipeline is the annotation of cell-types based on the expression of specific marker genes. Since manual annotation is labor-intensive and does not scale to large datasets, several methods for automated cell-type annotation have been proposed based on supervised learning. However, these methods generally require feature extraction and batch alignment prior to classification, and their performance may become unreliable in the presence of cell-types with very similar transcriptomic profiles, such as differentiating cells. We propose JIND, a framework for automated cell-type identification based on neural networks that directly learns a low-dimensional representation (latent code) in which cell-types can be reliably determined. To account for batch effects, JIND performs a novel asymmetric alignment in which the transcriptomic profile of unseen cells is mapped onto the previously learned latent space, hence avoiding the need of retraining the model whenever a new dataset becomes available. JIND also learns cell-type-specific confidence thresholds to identify and reject cells that cannot be reliably classified. We show on datasets with and without batch effects that JIND classifies cells more accurately than previously proposed methods while rejecting only a small proportion of cells. Moreover, JIND batch alignment is parallelizable, being more than five or six times faster than Seurat integration. Availability: https://github.com/mohit1997/JIND.


2019 ◽  
Author(s):  
Marcus Alvarez ◽  
Elior Rahmani ◽  
Brandon Jew ◽  
Kristina M. Garske ◽  
Zong Miao ◽  
...  

AbstractSingle-nucleus RNA sequencing (snRNA-seq) measures gene expression in individual nuclei instead of cells, allowing for unbiased cell type characterization in solid tissues. Contrary to single-cell RNA seq (scRNA-seq), we observe that snRNA-seq is commonly subject to contamination by high amounts of extranuclear background RNA, which can lead to identification of spurious cell types in downstream clustering analyses if overlooked. We present a novel approach to remove debris-contaminated droplets in snRNA-seq experiments, called Debris Identification using Expectation Maximization (DIEM). Our likelihood-based approach models the gene expression distribution of debris and cell types, which are estimated using EM. We evaluated DIEM using three snRNA-seq data sets: 1) human differentiating preadipocytes in vitro, 2) fresh mouse brain tissue, and 3) human frozen adipose tissue (AT) from six individuals. All three data sets showed various degrees of extranuclear RNA contamination. We observed that existing methods fail to account for contaminated droplets and led to spurious cell types. When compared to filtering using these state of the art methods, DIEM better removed droplets containing high levels of extranuclear RNA and led to higher quality clusters. Although DIEM was designed for snRNA-seq data, we also successfully applied DIEM to single-cell data. To conclude, our novel method DIEM removes debris-contaminated droplets from single-cell-based data fast and effectively, leading to cleaner downstream analysis. Our code is freely available for use at https://github.com/marcalva/diem.


2018 ◽  
Author(s):  
Douglas Abrams ◽  
Parveen Kumar ◽  
R. Krishna Murthy Karuturi ◽  
Joshy George

AbstractBackgroundThe advent of single cell RNA sequencing (scRNA-seq) enabled researchers to study transcriptomic activity within individual cells and identify inherent cell types in the sample. Although numerous computational tools have been developed to analyze single cell transcriptomes, there are no published studies and analytical packages available to guide experimental design and to devise suitable analysis procedure for cell type identification.ResultsWe have developed an empirical methodology to address this important gap in single cell experimental design and analysis into an easy-to-use tool called SCEED (Single Cell Empirical Experimental Design and analysis). With SCEED, user can choose a variety of combinations of tools for analysis, conduct performance analysis of analytical procedures and choose the best procedure, and estimate sample size (number of cells to be profiled) required for a given analytical procedure at varying levels of cell type rarity and other experimental parameters. Using SCEED, we examined 3 single cell algorithms using 48 simulated single cell datasets that were generated for varying number of cell types and their proportions, number of genes expressed per cell, number of marker genes and their fold change, and number of single cells successfully profiled in the experiment.ConclusionsBased on our study, we found that when marker genes are expressed at fold change of 4 or more than the rest of the genes, either Seurat or Simlr algorithm can be used to analyze single cell dataset for any number of single cells isolated (minimum 1000 single cells were tested). However, when marker genes are expected to be only up to fC 2 upregulated, choice of the single cell algorithm is dependent on the number of single cells isolated and proportion of rare cell type to be identified. In conclusion, our work allows the assessment of various single cell methods and also aids in examining the single cell experimental design.


2021 ◽  
Author(s):  
Zhengyu Ouyang ◽  
Nathanael Bourgeois ◽  
Eugenia Lyashenko ◽  
Paige Cundiff ◽  
Patrick F Cullen ◽  
...  

Induced pluripotent stem cell (iPSC) derived cell types are increasingly employed as in vitro model systems for drug discovery. For these studies to be meaningful, it is important to understand the reproducibility of the iPSC-derived cultures and their similarity to equivalent endogenous cell types. Single-cell and single-nucleus RNA sequencing (RNA-seq) are useful to gain such understanding, but they are expensive and time consuming, while bulk RNA-seq data can be generated quicker and at lower cost. In silico cell type decomposition is an efficient, inexpensive, and convenient alternative that can leverage bulk RNA-seq to derive more fine-grained information about these cultures. We developed CellMap, a computational tool that derives cell type profiles from publicly available single-cell and single-nucleus datasets to infer cell types in bulk RNA-seq data from iPSC-derived cell lines.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Qingnan Liang ◽  
Rachayata Dharmat ◽  
Leah Owen ◽  
Akbar Shakoor ◽  
Yumei Li ◽  
...  

AbstractSingle-cell RNA-seq is a powerful tool in decoding the heterogeneity in complex tissues by generating transcriptomic profiles of the individual cell. Here, we report a single-nuclei RNA-seq (snRNA-seq) transcriptomic study on human retinal tissue, which is composed of multiple cell types with distinct functions. Six samples from three healthy donors are profiled and high-quality RNA-seq data is obtained for 5873 single nuclei. All major retinal cell types are observed and marker genes for each cell type are identified. The gene expression of the macular and peripheral retina is compared to each other at cell-type level. Furthermore, our dataset shows an improved power for prioritizing genes associated with human retinal diseases compared to both mouse single-cell RNA-seq and human bulk RNA-seq results. In conclusion, we demonstrate that obtaining single cell transcriptomes from human frozen tissues can provide insight missed by either human bulk RNA-seq or animal models.


2020 ◽  
Author(s):  
Siamak Yousefi ◽  
Hao Chen ◽  
Jesse F. Ingels ◽  
Melinda S. McCarty ◽  
Arthur G. Centeno ◽  
...  

SUMMARYSingle cell RNA sequencing has enabled quantification of single cells and identification of different cell types and subtypes as well as cell functions in different tissues. Single cell RNA sequence analyses assume acquired RNAs correspond to cells, however, RNAs from contamination within the input data are also captured by these assays. The sequencing of background contamination as well as unwanted cells making their way to the final assay Potentially confound the correct biological interpretation of single cell transcriptomic data. Here we demonstrate two approaches to deal with background contamination as well as profiling of unwanted cells in the assays. We use three real-life datasets of whole-cell capture and nucleotide single-cell captures generated by Fluidigm and 10x technologies and show that these methods reduce the effect of contamination, strengthen clustering of cells and improves biological interpretation.


2019 ◽  
Author(s):  
Ayshwarya Subramanian ◽  
Eriene-Heidi Sidhom ◽  
Maheswarareddy Emani ◽  
Nareh Sahakian ◽  
Katherine Vernon ◽  
...  

AbstractHuman iPSC-derived kidney organoids have the potential to revolutionize discovery, but assessing their consistency and reproducibility across iPSC lines, and reducing the generation of off-target cells remain an open challenge. Here, we used single cell RNA-Seq (scRNA-Seq) to profile 415,775 cells to show that organoid composition and development are comparable to human fetal and adult kidneys. Although cell classes were largely reproducible across iPSC lines, time points, protocols, and replicates, cell proportions were variable between different iPSC lines. Off-target cell proportions were the most variable. Prolonged in vitro culture did not alter cell types, but organoid transplantation under the mouse kidney capsule diminished off-target cells. Our work shows how scRNA-seq can help score organoids for reproducibility, faithfulness and quality, that kidney organoids derived from different iPSC lines are comparable surrogates for human kidney, and that transplantation enhances their formation by diminishing off-target cells.


Blood ◽  
2018 ◽  
Vol 132 (Supplement 1) ◽  
pp. 3887-3887
Author(s):  
Moosa Qureshi ◽  
Fernando Calero-Nieto ◽  
Iwo Kucinski ◽  
Sarah Kinston ◽  
George Giotopoulos ◽  
...  

Abstract The C/EBPα transcription factor plays a pivotal role in myeloid differentiation and E2F-mediated cell cycle regulation. Although CEBPA mutations are common in acute myeloid leukaemia (AML), little is known regarding pre-leukemic alterations caused by mutated CEBPA. Here, we investigated early events involved in pre-leukemic transformation driven by CEBPA N321D in the LMPP-like cell line Hoxb8-FL (Redecke et al., Nat Methods 2013), which can be maintained in vitro as a self-renewing LMPP population using Flt3L and estradiol, as well as differentiated both in vitro and in vivo into myeloid and lymphoid cell types. Hoxb8-FL cells were retrovirally transduced with Empty Vector (EV), wild-type CEBPA (CEBPA WT) or its N321D mutant form (CEBPA N321D). CEBPA WT-transduced cells showed increased expression of cd11b and SIRPα and downregulation of c-kit, suggesting that wild-type CEBPA was sufficient to promote differentiation even under LMPP growth conditions. Interestingly, we did not observe the same phenotype in CEBPA N321D-transduced cells. Upon withdrawal of estradiol, both EV and CEBPA WT-transduced cells differentiated rapidly into a conventional dendritic cell (cDC) phenotype by day 7 and died within 12 days. By contrast, CEBPA N321D-transduced cells continued to grow for in excess of 56 days, with an initial cDC phenotype but by day 30 demonstrating a plasmacytoid dendritic cell precursor phenotype. CEBPA N321D-transduced cells were morphologically distinct from EV-transduced cells. To test leukemogenic potential in vivo, we performed transplantation experiments in lethally irradiated mice. Serial monitoring of peripheral blood demonstrated that Hoxb8-FL derived cells had disappeared by 4 weeks, and did not reappear. However, at 6 months CEBPA N321D-transduced cells could still be detected in bone marrow in contrast to EV-transduced cells but without any leukemic phenotype. To identify early events involved in pre-leukemic transformation, the differentiation profiles of EV, CEBPA WT and CEBPA N321D-transduced cells were examined with single cell RNA-seq (scRNA-seq). 576 single cells were taken from 3 biological replicates at days 0 and 5 post-differentiation, and analysed using the Automated Single-Cell Analysis Pipeline (Gardeux et al., Bioinformatics 2017). Visualisation by t-SNE (Fig 1) demonstrated: (i) CEBPA WT-transduced cells formed a distinct cluster at day 0 before withdrawal of estradiol; (ii) CEBPA N321D-transduced cells separated from EV and CEBPA WT-transduced cells after 5 days of differentiation, (iii) two subpopulations could be identified within the CEBPA N321D-transduced cells at day 5, with a cluster of five CEBPA N321D-transduced single cells distributed amongst or very close to the day 0 non-differentiated cells. Differential expression analysis identified 224 genes upregulated and 633 genes downregulated specifically in the CEBPA N321D-transduced cells when compared to EV cells after 5 days of differentiation. This gene expression signature revealed that CEBPA N321D-transduced cells switched on a HSC/MEP/CMP transcriptional program and switched off a myeloid dendritic cell program. Finally, in order to further dissect the effect of the N321D mutation, the binding profile of endogenous and CEBPA N321D was compared by ChIP-seq before and after 5 days of differentiation. Integration with scRNA-seq data identified 160 genes specifically downregulated in CEBPA N321D-transduced cells which were associated with the binding of the mutant protein. This list of genes included genes previously implicated in dendritic cell differentiation (such as NOTCH2, JAK2), as well as a number of genes not previously implicated in the evolution of AML, representing potentially novel therapeutic targets. Disclosures No relevant conflicts of interest to declare.


2021 ◽  
Author(s):  
Lorenzo Martini ◽  
Roberta Bardini ◽  
Stefano Di Carlo

The mammalian cortex contains a great variety of neuronal cells. In particular, GABAergic interneurons, which play a major role in neuronal circuit function, exhibit an extraordinary diversity of cell types. In this regard, single-cell RNA-seq analysis is crucial to study cellular heterogeneity. To identify and analyze rare cell types, it is necessary to reliably label cells through known markers. In this way, all the related studies are dependent on the quality of the employed marker genes. Therefore, in this work, we investigate how a set of chosen inhibitory interneurons markers perform. The gene set consists of both immunohistochemistry-derived genes and single-cell RNA-seq taxonomy ones. We employed various human and mouse datasets of the brain cortex, consequently processed with the Monocle3 pipeline. We defined metrics based on the relations between unsupervised cluster results and the marker expression. Specifically, we calculated the specificity, the fraction of cells expressing, and some metrics derived from decision tree analysis like entropy gain and impurity reduction. The results highlighted the strong reliability of some markers but also the low quality of others. More interestingly, though, a correlation emerges between the general performances of the genes set and the experimental quality of the datasets. Therefore, the proposed method allows evaluating the quality of a dataset in relation to its reliability regarding the inhibitory interneurons cellular heterogeneity study.


2021 ◽  
Author(s):  
Wenjing Ma ◽  
Sumeet Sharma ◽  
Peng Jin ◽  
Shannon L Gourley ◽  
Zhaohui Qin

The rapid proliferation of single-cell RNA-sequencing (scRNA-seq) datasets have revealed cell heterogeneity at unprecedented scales. Several deconvolution methods have been developed to decompose bulk experiments to reveal cell type contributions. However, these methods lack power in identifying the accurate cell type composition when having a considerable amount of sub-cell types in the reference dataset. Here, we present LRcell, a R Bioconductor package (http://bioconductor.org/packages/release/bioc/html/LRcell.html) aiming to identify specific sub-cell type(s) that drives the changes observed in a bulk RNA-seq differential gene expression experiment. In addition, LRcell provides pre-embedded marker genes computed from putative single-cell RNA-seq experiments as options to execute the analyses.


Sign in / Sign up

Export Citation Format

Share Document