scholarly journals GeneSwitches : Ordering gene-expression and functional events in single-cell experiments

2019 ◽  
Author(s):  
Elaine Y. Cao ◽  
John F. Ouyang ◽  
Owen J.L. Rackham

AbstractSummaryEmerging single-cell RNA-seq technologies has made it possible to capture and assess the gene expression of individual cells. Based on the similarity of gene expression profiles, many tools have been developed to generate an in silico ordering of cells in the form of pseudo-time trajectories. However, these tools do not provide a means to find the ordering of critical gene expression changes over pseudo-time. We present GeneSwitches, a tool that takes any single-cell pseudo-time trajectory and determines the precise order of gene-expression and functional-event changes over time. GeneSwitches uses a statistical framework based on logistic regression to identify the order in which genes are either switched on or off along pseudo-time. With this information, users can identify the order in which surface markers appear, investigate how functional ontologies are gained or lost over time, and compare the ordering of switching genes from two related pseudo-temporal processes.AvailabilityGeneSwitches is available at https://geneswitches.ddnetbio.comContactowen.rackham@duke-nus.edu.sgSupplementary Informationis available at http://www.ddnetbio.com/files/GeneSwitches_SI.pdf

2020 ◽  
Vol 36 (10) ◽  
pp. 3273-3275
Author(s):  
Elaine Y Cao ◽  
John F Ouyang ◽  
Owen J L Rackham

Abstract Summary Emerging single-cell RNA-sequencing data technologies has made it possible to capture and assess the gene expression of individual cells. Based on the similarity of gene expression profiles, many tools have been developed to generate an in silico ordering of cells in the form of pseudo-time trajectories. However, these tools do not provide a means to find the ordering of critical gene expression changes over pseudo-time. We present GeneSwitches, a tool that takes any single-cell pseudo-time trajectory and determines the precise order of gene expression and functional-event changes over time. GeneSwitches uses a statistical framework based on logistic regression to identify the order in which genes are either switched on or off along pseudo-time. With this information, users can identify the order in which surface markers appear, investigate how functional ontologies are gained or lost over time and compare the ordering of switching genes from two related pseudo-temporal processes. Availability GeneSwitches is available at https://geneswitches.ddnetbio.com. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Bong-Hyun Kim ◽  
Kijin Yu ◽  
Peter C W Lee

Abstract Motivation Cancer classification based on gene expression profiles has provided insight on the causes of cancer and cancer treatment. Recently, machine learning-based approaches have been attempted in downstream cancer analysis to address the large differences in gene expression values, as determined by single-cell RNA sequencing (scRNA-seq). Results We designed cancer classifiers that can identify 21 types of cancers and normal tissues based on bulk RNA-seq as well as scRNA-seq data. Training was performed with 7398 cancer samples and 640 normal samples from 21 tumors and normal tissues in TCGA based on the 300 most significant genes expressed in each cancer. Then, we compared neural network (NN), support vector machine (SVM), k-nearest neighbors (kNN) and random forest (RF) methods. The NN performed consistently better than other methods. We further applied our approach to scRNA-seq transformed by kNN smoothing and found that our model successfully classified cancer types and normal samples. Availability and implementation Cancer classification by neural network. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Meichen Dong ◽  
Aatish Thennavan ◽  
Eugene Urrutia ◽  
Yun Li ◽  
Charles M Perou ◽  
...  

Abstract Recent advances in single-cell RNA sequencing (scRNA-seq) enable characterization of transcriptomic profiles with single-cell resolution and circumvent averaging artifacts associated with traditional bulk RNA sequencing (RNA-seq) data. Here, we propose SCDC, a deconvolution method for bulk RNA-seq that leverages cell-type specific gene expression profiles from multiple scRNA-seq reference datasets. SCDC adopts an ENSEMBLE method to integrate deconvolution results from different scRNA-seq datasets that are produced in different laboratories and at different times, implicitly addressing the problem of batch-effect confounding. SCDC is benchmarked against existing methods using both in silico generated pseudo-bulk samples and experimentally mixed cell lines, whose known cell-type compositions serve as ground truths. We show that SCDC outperforms existing methods with improved accuracy of cell-type decomposition under both settings. To illustrate how the ENSEMBLE framework performs in complex tissues under different scenarios, we further apply our method to a human pancreatic islet dataset and a mouse mammary gland dataset. SCDC returns results that are more consistent with experimental designs and that reproduce more significant associations between cell-type proportions and measured phenotypes.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Thomas M. Adams ◽  
Tjelvar S. G. Olsson ◽  
Ricardo H. Ramírez-González ◽  
Ruth Bryant ◽  
Rosie Bryson ◽  
...  

Abstract Background Transcriptomics is being increasingly applied to generate new insight into the interactions between plants and their pathogens. For the wheat yellow (stripe) rust pathogen (Puccinia striiformis f. sp. tritici, Pst) RNA-based sequencing (RNA-Seq) has proved particularly valuable, overcoming the barriers associated with its obligate biotrophic nature. This includes the application of RNA-Seq approaches to study Pst and wheat gene expression dynamics over time and the Pst population composition through the use of a novel RNA-Seq based surveillance approach called “field pathogenomics”. As a dual RNA-Seq approach, the field pathogenomics technique also provides gene expression data from the host, giving new insight into host responses. However, this has created a wealth of data for interrogation. Results Here, we used the field pathogenomics approach to generate 538 new RNA-Seq datasets from Pst-infected field wheat samples, doubling the amount of transcriptomics data available for this important pathosystem. We then analysed these datasets alongside 66 RNA-Seq datasets from four Pst infection time-courses and 420 Pst-infected plant field and laboratory samples that were publicly available. A database of gene expression values for Pst and wheat was generated for each of these 1024 RNA-Seq datasets and incorporated into the development of the rust expression browser (http://www.rust-expression.com). This enables for the first time simultaneous ‘point-and-click’ access to gene expression profiles for Pst and its wheat host and represents the largest database of processed RNA-Seq datasets available for any of the three Puccinia wheat rust pathogens. We also demonstrated the utility of the browser through investigation of expression of putative Pst virulence genes over time and examined the host plants response to Pst infection. Conclusions The rust expression browser offers immense value to the wider community, facilitating data sharing and transparency and the underlying database can be continually expanded as more datasets become publicly available.


2019 ◽  
Author(s):  
Meichen Dong ◽  
Aatish Thennavan ◽  
Eugene Urrutia ◽  
Yun Li ◽  
Charles M. Perou ◽  
...  

AbstractRecent advances in single-cell RNA sequencing (scRNA-seq) enable characterization of transcriptomic profiles with single-cell resolution and circumvent averaging artifacts associated with traditional bulk RNA sequencing (RNA-seq) data. Here, we propose SCDC, a deconvolution method for bulk RNA-seq that leverages cell-type specific gene expression profiles from multiple scRNA-seq reference datasets. SCDC adopts an ENSEMBLE method to integrate deconvolution results from different scRNA-seq datasets that are produced in different laboratories and at different times, implicitly addressing the problem of batch-effect confounding. SCDC is benchmarked against existing methods using both in silico generated pseudo-bulk samples and experimentally mixed cell lines, whose known cell-type compositions serve as ground truths. We show that SCDC outperforms existing methods with improved accuracy of cell-type decomposition under both settings. To illustrate how the ENSEMBLE framework performs in complex tissues under different scenarios, we further apply our method to a human pancreatic islet dataset and a mouse mammary gland dataset. SCDC returns results that are more consistent with experimental designs and that reproduce more significant associations between cell-type proportions and measured phenotypes.


2018 ◽  
Author(s):  
Brandon Monier ◽  
Adam McDermaid ◽  
Jing Zhao ◽  
Anne Fennell ◽  
Qin Ma

AbstractMotivationNext-Generation Sequencing has made available much more large-scale genomic and transcriptomic data. Studies with RNA-sequencing (RNA-seq) data typically involve generation of gene expression profiles that can be further analyzed, many times involving differential gene expression (DGE). This process enables comparison across samples of two or more factor levels. A recurring issue with DGE analyses is the complicated nature of the comparisons to be made, in which a variety of factor combinations, pairwise comparisons, and main or blocked main effects need to be tested.ResultsHere we present a tool called IRIS-DGE, which is a server-based DGE analysis tool developed using Shiny. It provides a straightforward, user-friendly platform for performing comprehensive DGE analysis, and crucial analyses that help design hypotheses and to determine key genomic features. IRIS-DGE integrates the three most commonly used R-based DGE tools to determine differentially expressed genes (DEGs) and includes numerous methods for performing preliminary analysis on user-provided gene expression information. Additionally, this tool integrates a variety of visualizations, in a highly interactive manner, for improved interpretation of preliminary and DGE analyses.AvailabilityIRIS-DGE is freely available at http://bmbl.sdstate.edu/IRIS/[email protected] informationSupplementary data are available at Bioinformatics online.


Author(s):  
Johan Gustafsson ◽  
Felix Held ◽  
Jonathan Robinson ◽  
Elias Björnson ◽  
Rebecka Jörnsten ◽  
...  

Abstract Background Cell-type specific gene expression profiles are needed for many computational methods operating on bulk RNA-Seq samples, such as deconvolution of cell-type fractions and digital cytometry. However, the gene expression profile of a cell type can vary substantially due to both technical factors and biological differences in cell state and surroundings, reducing the efficacy of such methods. Here, we investigated which factors contribute most to this variation. Results We evaluated different normalization methods, quantified the magnitude of variation introduced by different sources, and examined the differences between UMI-based single-cell RNA-Seq and bulk RNA-Seq. We applied methods such as random forest regression to a collection of publicly available bulk and single-cell RNA-Seq datasets containing B and T cells, and found that the technical variation across laboratories is of the same magnitude as the biological variation across cell types. Tissue of origin and cell subtype are less important but still substantial factors, while the difference between individuals is relatively small. We also show that much of the differences between UMI-based single-cell and bulk RNA-Seq methods can be explained by the number of read duplicates per mRNA molecule in the single-cell sample.Conclusions Our work shows the importance of either matching or correcting for technical factors when creating cell-type specific gene expression profiles that are to be used together with bulk samples.


Blood ◽  
2012 ◽  
Vol 120 (21) ◽  
pp. 1231-1231
Author(s):  
Chih Long Liu ◽  
Bo Dai ◽  
Aaron M. Newman ◽  
Ravi Majeti ◽  
Ash A Alizadeh

Abstract Abstract 1231 Background: Current methods for defining and isolating human hematopoietic stem and progenitor cells using surface markers enrich for unique functional properties of these populations. However, significant functional heterogeneity in these compartments remains with important implications for understanding normal and altered hematopoiesis. Using flow sorting to enrich >10,000 cells as progenitor subpopulations, we previously characterized the gene expression signature of normal human HSC (Majetiet al 2009 PNAS 106(9):3396–3401). We hypothesized that interrogation of the transcriptomes of single cells from this compartment could resolve remaining heterogeneity and help identify and better define features of progenitor cells and hematopoietic stem cells (HSCs). Methods: Using normal human bone marrow aspirates and a FACS Aria II instrument equipped with a specialized single-cell sorting apparatus, we sorted cells enriched for HSCs based on expression of Lin-CD34+CD38-CD90+CD45RA− into 1-cell, 10-cell, 100-cell, and 40000-cell (bulk) representations. We used at least 5 replicates per group and verified single cell deposition by direct visualization. We amplified cDNA from these corresponding inputs using an exponential whole transcriptome amplification (WTA) scheme (Miltenyi SuperAmp), and evaluated gene expression profiles by two microarray platforms (Agilent/GE Healthcare 60K, and Affymetrix U133 plus 2.0), and by RNA-Seq (Illumina). We used gene expression correlation between replicates within and between microarrays as means of assessing methodological reproducibility and estimating population heterogeneity. Results: Whole transcriptome amplification yielded cDNA ranging from 0.2–1 kb for 10 and 100 cells, with significantly lower size distribution of amplified cDNA observed for single cells. Gene expression profiles had significantly better replicate reproducibility and array coverage with the Agilent microarray platform when compared with the Affymetrix U133 Plus 2.0 platform (gene coverage of 84 % for 100 cells, 73 % for 10 cells and 50% for 1 cell for Agilent vs 24 % for 100 cells, 11 % for 10 cells and 5.7% for 1 cell for Affymetrix). RNA-Seq profiling of the same populations is ongoing with major technical optimizations focused on reducing amplification of non-human templates while maintaining library complexity and representation. Using biological replicates for each input size, we observed high inter-replicate correlation levels for expression profiles obtained for bulk sorted HSCs from 8 healthy donors (∼40000-cells, average r=0.97) and for 100-cell and 10-cell inputs from a single donor (r=0.96–0.99, respectively). While intra-array concordance of replicate measurements (n=14642) was high (r>0.91) within each of 5 single cells from a single donor, comparison of 5-single cells from the same donor identified significant heterogeneity, when compared to the 10-cell and 100-cell sub-clusters (Figure 1). Individual genes characteristically expressed by these heterogeneous single cell populations are currently being investigated by FACS and Fluidigm arrays. A larger experiment characterizing 192 single progenitor cells, employing Agilent microarrays and RNA-Seq is currently in progress. Conclusions: Single cell transcriptome profiling is feasible, with best performance on 60-mer microarrays. Single cell transcriptomes exhibit lower, but reasonable levels of reproducibility (r>0.7) and precision as compared with higher cell numbers. Gene expression profiles of single cells capture gene expression heterogeneity in HSCs. Disclosures: No relevant conflicts of interest to declare.


2020 ◽  
Author(s):  
Johan Gustafsson ◽  
Felix Held ◽  
Jonathan Robinson ◽  
Elias Björnson ◽  
Rebecka Jörnsten ◽  
...  

Abstract Cell-type specific gene expression profiles are needed for many computational methods operating on bulk RNA-Seq samples, such as deconvolution of cell-type fractions and digital cytometry. However, the gene expression profile of a cell type can vary substantially due to both technical factors and biological differences in cell state and surroundings, reducing the efficacy of such methods. Here, we investigated which factors contribute most to this variation. We evaluated different normalization methods, quantified the variance explained by different factors, evaluated the effect on deconvolution of cell type fractions, and examined the differences between UMI-based single-cell RNA-Seq and bulk RNA-Seq. We investigated a collection of publicly available bulk and single-cell RNA-Seq datasets containing B and T cells, and found that the technical variation across laboratories is substantial, even for genes specifically selected for deconvolution, and has a confounding effect on deconvolution. Tissue of origin is also a substantial factor, highlighting the challenge of applying cell type profiles derived from blood on mixtures from other tissues. We also show that much of the differences between UMI-based single-cell and bulk RNA-Seq methods can be explained by the number of read duplicates per mRNA molecule in the single-cell sample. Our work shows the importance of either matching or correcting for technical factors when creating cell-type specific gene expression profiles that are to be used together with bulk samples.


Sign in / Sign up

Export Citation Format

Share Document