Detection of condition-specific marker genes from RNA-seq data with MGFR

JIND: Joint Integration and Discrimination for Automated Single-Cell Annotation

10.1101/2020.10.06.327601 ◽

2020 ◽

Author(s):

Mohit Goyal ◽

Guillermo Serrano ◽

Ilan Shomorony ◽

Mikel Hernaez ◽

Idoia Ochoa

Keyword(s):

Single Cell ◽

Cell Types ◽

Marker Genes ◽

Specific Marker ◽

Rna Seq ◽

Batch Effects ◽

Cell Type ◽

Latent Space ◽

Cell Type Specific ◽

Low Dimensional

AbstractSingle-cell RNA-seq is a powerful tool in the study of the cellular composition of different tissues and organisms. A key step in the analysis pipeline is the annotation of cell-types based on the expression of specific marker genes. Since manual annotation is labor-intensive and does not scale to large datasets, several methods for automated cell-type annotation have been proposed based on supervised learning. However, these methods generally require feature extraction and batch alignment prior to classification, and their performance may become unreliable in the presence of cell-types with very similar transcriptomic profiles, such as differentiating cells. We propose JIND, a framework for automated cell-type identification based on neural networks that directly learns a low-dimensional representation (latent code) in which cell-types can be reliably determined. To account for batch effects, JIND performs a novel asymmetric alignment in which the transcriptomic profile of unseen cells is mapped onto the previously learned latent space, hence avoiding the need of retraining the model whenever a new dataset becomes available. JIND also learns cell-type-specific confidence thresholds to identify and reject cells that cannot be reliably classified. We show on datasets with and without batch effects that JIND classifies cells more accurately than previously proposed methods while rejecting only a small proportion of cells. Moreover, JIND batch alignment is parallelizable, being more than five or six times faster than Seurat integration. Availability: https://github.com/mohit1997/JIND.

Download Full-text

289 COOPERATIVE EXPRESSION OF PLURIPOTENCY-RELATED GENES AND NEURAL CREST MARKER GENES IN PORCINE GFP-TRANSGENIC SKIN-DERIVED PROGENITORS

Reproduction Fertility and Development ◽

10.1071/rdv21n1ab289 ◽

2009 ◽

Vol 21 (1) ◽

pp. 241

Author(s):

M. T. Zhao ◽

C. S. Isom ◽

J. G. Zhao ◽

Y. H. Hao ◽

J. Ross ◽

...

Keyword(s):

Gene Expression ◽

Stem Cells ◽

Neural Crest ◽

Therapeutic Potential ◽

Marker Gene ◽

Stem Cell Marker ◽

Marker Genes ◽

Specific Marker ◽

Marker Gene Expression ◽

Neural Crest Origin

Recently neural crest derived multipotent progenitors from skin have attracted much attention as the skin may provide an accessible, autologous source of stem cells available with therapeutic potential (Toma JG et al. 2001 Nat. Cell Biol. 3, 778–784). The multipotent property of stem cells could be tracked back to the expression of specific marker genes that are exclusively expressed in multipotent stem cells rather than any other types of differentiated cells. Here we demonstrate the property of multipotency and neural crest origin of porcine GFP-transgenic skin derived progenitors (termed pSKP) in vitro by marker gene expression analysis. The pSKP cells were isolated from the back skin of GFP transgenic fetuses by serum-free selection culture in the presence of EGF (20 ng mL–1) and bFGF (40 ng mL–1), and developed into spheres in 1–2 weeks (Dyce PW et al. 2004 Biochem. Biophy. Res. Commun. 316, 651–658). Three groups of RT-PCR primers were used on total RNA from purified pSKP cells: pluripotency related genes (Oct4, Sox2, Nanog, Stat3), neural crest marker genes (p75NGFR, Slug, Twist, Pax3, Sox9, Sox10) and lineage specific genes (GFAP, tubulin β-III, leptin). Expression of both pluripotency related genes and neural crest marker genes were detected in undifferentiated pSKP cells. In addition, transcripts for fibronectin, vimentin and nestin (neural stem cell marker) were also present. The percentage of positive cells for Oct4, fibronection and vimentin were 12.3%, 67.9% and 53.7% respectively. Differentiation assays showed the appearance of tubulin β-III positive (39.4%) and GFAP-positive (42.6%) cells in cultures by immunocytochemistry, which share the characteristics of neurons and glial cells, respectively. Thus, we confirm the multiple lineage potentials and neural crest origin of pSKP cells in the level of marker gene expression. This work was funded by National Institutes of Health National Center for Research Resources RR013438.

Download Full-text

ASAP: A web-based platform for the analysis and interactive visualization of single-cell RNA-seq data

10.1101/096222 ◽

2016 ◽

Cited By ~ 5

Author(s):

Vincent Gardeux ◽

Fabrice David ◽

Adrian Shajkofci ◽

Petra C Schwalie ◽

Bart Deplancke

Keyword(s):

Single Cell ◽

Single Cell Analysis ◽

Transcriptome Profiling ◽

Cell Types ◽

Complete Analysis ◽

Marker Genes ◽

Specific Marker ◽

Rna Seq ◽

Web Based ◽

Wide Range

AbstractMotivationSingle-cell RNA-sequencing (scRNA-seq) allows whole transcriptome profiling of thousands of individual cells, enabling the molecular exploration of tissues at the cellular level. Such analytical capacity is of great interest to many research groups in the world, yet, these groups often lack the expertise to handle complex scRNA-seq data sets.ResultsWe developed a fully integrated, web-based platform aimed at the complete analysis of scRNA-seq data post genome alignment: from the parsing, filtering, and normalization of the input count data files, to the visual representation of the data, identification of cell clusters, differentially expressed genes (including cluster-specific marker genes), and functional gene set enrichment. This Automated Single-cell Analysis Pipeline (ASAP) combines a wide range of commonly used algorithms with sophisticated visualization tools. Compared with existing scRNA-seq analysis platforms, researchers (including those lacking computational expertise) are able to interact with the data in a straightforward fashion and in real time. Furthermore, given the overlap between scRNA-seq and bulk RNA-seq analysis workflows, ASAP should conceptually be broadly applicable to any RNA-seq dataset. As a validation, we demonstrate how we can use ASAP to simply reproduce the results from a single-cell study of 91 mouse cells involving five distinct cell types.AvailabilityThe tool is freely available at http://[email protected]

Download Full-text

scClustViz – Single-cell RNAseq cluster assessment and visualization

F1000Research ◽

10.12688/f1000research.16198.2 ◽

2019 ◽

Vol 7 ◽

pp. 1522 ◽

Cited By ~ 8

Author(s):

Brendan T. Innes ◽

Gary D. Bader

Keyword(s):

Gene Expression ◽

Single Cell ◽

Clustering Algorithms ◽

Expression Patterns ◽

Software Tool ◽

Cell Types ◽

Marker Genes ◽

Specific Marker ◽

Cell Type ◽

Single Experiment

Single-cell RNA sequencing (scRNAseq) represents a new kind of microscope that can measure the transcriptome profiles of thousands of individual cells from complex cellular mixtures, such as in a tissue, in a single experiment. This technology is particularly valuable for characterization of tissue heterogeneity because it can be used to identify and classify all cell types in a tissue. This is generally done by clustering the data, based on the assumption that cells of a particular type share similar transcriptomes, distinct from other cell types in the tissue. However, nearly all clustering algorithms have tunable parameters which affect the number of clusters they will identify in data. The R Shiny software tool described here, scClustViz, provides a simple interactive graphical user interface for exploring scRNAseq data and assessing the biological relevance of clustering results. Given that cell types are expected to have distinct gene expression patterns, scClustViz uses differential gene expression between clusters as a metric for assessing the fit of a clustering result to the data at multiple cluster resolution levels. This helps select a clustering parameter for further analysis. scClustViz also provides interactive visualisation of: cluster-specific distributions of technical factors, such as predicted cell cycle stage and other metadata; cluster-wise gene expression statistics to simplify annotation of cell types and identification of cell type specific marker genes; and gene expression distributions over all cells and cell types. scClustViz provides an interactive interface for visualisation, assessment, and biological interpretation of cell-type classifications in scRNAseq experiments that can be easily added to existing analysis pipelines, enabling customization by bioinformaticians while enabling biologists to explore their results without the need for computational expertise. It is available at https://baderlab.github.io/scClustViz/.

Download Full-text

Expression Atlas update: from tissues to single cells

Nucleic Acids Research ◽

10.1093/nar/gkz947 ◽

2019 ◽

Cited By ~ 34

Author(s):

Irene Papatheodorou ◽

Pablo Moreno ◽

Jonathan Manning ◽

Alfonso Muñoz-Pomer Fuentes ◽

Nancy George ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Single Cells ◽

Marker Gene ◽

Cell Types ◽

Added Value ◽

Rna Seq ◽

Gene And Protein Expression ◽

Expression Atlas ◽

Public Archives

Abstract Expression Atlas is EMBL-EBI’s resource for gene and protein expression. It sources and compiles data on the abundance and localisation of RNA and proteins in various biological systems and contexts and provides open access to this data for the research community. With the increased availability of single cell RNA-Seq datasets in the public archives, we have now extended Expression Atlas with a new added-value service to display gene expression in single cells. Single Cell Expression Atlas was launched in 2018 and currently includes 123 single cell RNA-Seq studies from 12 species. The website can be searched by genes within or across species to reveal experiments, tissues and cell types where this gene is expressed or under which conditions it is a marker gene. Within each study, cells can be visualized using a pre-calculated t-SNE plot and can be coloured by different features or by cell clusters based on gene expression. Within each experiment, there are links to downloadable files, such as RNA quantification matrices, clustering results, reports on protocols and associated metadata, such as assigned cell types.

Download Full-text

scClustViz – Single-cell RNAseq cluster assessment and visualization

F1000Research ◽

10.12688/f1000research.16198.1 ◽

2018 ◽

Vol 7 ◽

pp. 1522 ◽

Cited By ~ 6

Author(s):

Brendan T. Innes ◽

Gary D. Bader

Keyword(s):

Gene Expression ◽

Single Cell ◽

Clustering Algorithms ◽

Expression Patterns ◽

Software Tool ◽

Cell Types ◽

Marker Genes ◽

Specific Marker ◽

Cell Type ◽

Single Experiment

Single-cell RNA sequencing (scRNAseq) represents a new kind of microscope that can measure the transcriptome profiles of thousands of individual cells from complex cellular mixtures, such as in a tissue, in a single experiment. This technology is particularly valuable for characterization of tissue heterogeneity because it can be used to identify and classify all cell types in a tissue. This is generally done by clustering the data, based on the assumption that cells of a particular type share similar transcriptomes, distinct from other cell types in the tissue. However, nearly all clustering algorithms have tunable parameters which affect the number of clusters they will identify in data. The R Shiny software tool described here, scClustViz, provides a simple interactive graphical user interface for exploring scRNAseq data and assessing the biological relevance of clustering results. Given that cell types are expected to have distinct gene expression patterns, scClustViz uses differential gene expression between clusters as a metric for assessing the fit of a clustering result to the data at multiple cluster resolution levels. This helps select a clustering parameter for further analysis. scClustViz also provides interactive visualisation of: cluster-specific distributions of technical factors, such as predicted cell cycle stage and other metadata; cluster-wise gene expression statistics to simplify annotation of cell types and identification of cell type specific marker genes; and gene expression distributions over all cells and cell types. scClustViz provides an interactive interface for visualisation, assessment, and biological interpretation of cell-type classifications in scRNAseq experiments that can be easily added to existing analysis pipelines, enabling customization by bioinformaticians while enabling biologists to explore their results without the need for computational expertise. It is available at https://baderlab.github.io/scClustViz/.

Download Full-text

Spatial Maps of Hepatocellular Carcinoma Transcriptomes Highlight an Unexplored Landscape of Heterogeneity and a Novel Gene Signature for Survival

10.21203/rs.3.rs-812356/v1 ◽

2021 ◽

Author(s):

Nan Zhao ◽

Yanhui Zhang ◽

Runfen Cheng ◽

Danfang Zhang ◽

Fan Li ◽

...

Keyword(s):

Gene Expression ◽

Hepatocellular Carcinoma ◽

Drug Targets ◽

Marker Gene ◽

Risk Groups ◽

Gene Signature ◽

Low Risk ◽

Marker Genes ◽

Specific Marker ◽

Gene Profiles

Abstract Background: Hepatocellular carcinoma (HCC) are often present with satellite nodules, rendering current curative treatments ineffective in many patients. The heterogeneity of HCC is a major challenge in personalized medicine. The emergence of spatial transcriptomics (ST) provides a powerful strategy for delineating the complex molecular landscapes of tumors. Methods: In this study, we investigated tissue-wide gene expression heterogeneity in tumor and adjacent nonneoplastic tissues using ST technology. We analyzed the transcriptomes of nearly 10820 tissue regions and identified main gene expression clusters and their specific marker genes (differentially expressed genes, DEGs) in patients. The DEGs were analyzed from two perspectives. First of all, we identified two distinct gene profiles associated with satellite nodules and conducted a more comprehensive analysis for both gene profiles. Their clinical relevance for human HCC was validated with KM Plotter. Secondly, we screened DEGs with TCGA database to divide the HCC cohort into high- and low-risk groups according to Cox analysis. HCC patients from the ICGC cohort were used for validation. Kaplan Meier analysis was used to compare the overall survival (OS) between high- and low-risk groups. Univariate and multivariate Cox analyses were applied to determine the independent predictors for OS. Results: Novel markers for the prediction of satellite nodules and a tumor clusters specific marker genes signature model(6 genes) for HCC prognosis was constructed, respectively. Conclusion: The establishment of marker gene proﬁles may be an important step towards an unbiased view of HCC and the 6-genes signature can be used for prognostic prediction in HCC. This analysis will help us to clarify one of the possible soucres of HCC heterogeneity, uncover pathogenic mechanisms and novel anti-tumor drug targets.

Download Full-text

scSorter: assigning cells to known cell types according to marker genes

Genome Biology ◽

10.1186/s13059-021-02281-7 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Hongyu Guo ◽

Jun Li

Keyword(s):

Real Data ◽

Cell Types ◽

Exact Expression ◽

Marker Genes ◽

Specific Marker ◽

Sequencing Data ◽

Reference Dataset ◽

Over Expression ◽

Higher Power ◽

Cell Type Specific

AbstractOn single-cell RNA-sequencing data, we consider the problem of assigning cells to known cell types, assuming that the identities of cell-type-specific marker genes are given but their exact expression levels are unavailable, that is, without using a reference dataset. Based on an observation that the expected over-expression of marker genes is often absent in a nonnegligible proportion of cells, we develop a method called scSorter. scSorter allows marker genes to express at a low level and borrows information from the expression of non-marker genes. On both simulated and real data, scSorter shows much higher power compared to existing methods.

Download Full-text

Multi-view feature selection for identifying gene markers: a diversified biological data driven approach

BMC Bioinformatics ◽

10.1186/s12859-020-03810-0 ◽

2020 ◽

Vol 21 (S18) ◽

Author(s):

Sudipta Acharya ◽

Laizhong Cui ◽

Yi Pan

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Selection ◽

Marker Gene ◽

Biological Data ◽

Protein Interaction Data ◽

Marker Genes ◽

Data Sets ◽

Gene Markers ◽

Multi Objective

Abstract Background In recent years, to investigate challenging bioinformatics problems, the utilization of multiple genomic and proteomic sources has become immensely popular among researchers. One such issue is feature or gene selection and identifying relevant and non-redundant marker genes from high dimensional gene expression data sets. In that context, designing an efficient feature selection algorithm exploiting knowledge from multiple potential biological resources may be an effective way to understand the spectrum of cancer or other diseases with applications in specific epidemiology for a particular population. Results In the current article, we design the feature selection and marker gene detection as a multi-view multi-objective clustering problem. Regarding that, we propose an Unsupervised Multi-View Multi-Objective clustering-based gene selection approach called UMVMO-select. Three important resources of biological data (gene ontology, protein interaction data, protein sequence) along with gene expression values are collectively utilized to design two different views. UMVMO-select aims to reduce gene space without/minimally compromising the sample classification efficiency and determines relevant and non-redundant gene markers from three cancer gene expression benchmark data sets. Conclusion A thorough comparative analysis has been performed with five clustering and nine existing feature selection methods with respect to several internal and external validity metrics. Obtained results reveal the supremacy of the proposed method. Reported results are also validated through a proper biological significance test and heatmap plotting.

Download Full-text

RNA-Seq Data-Mining Allows the Discovery of Two Long Non-Coding RNA Biomarkers of Viral Infection in Humans

International Journal of Molecular Sciences ◽

10.3390/ijms21082748 ◽

2020 ◽

Vol 21 (8) ◽

pp. 2748 ◽

Cited By ~ 1

Author(s):

Ruth Barral-Arca ◽

Alberto Gómez-Carballa ◽

Miriam Cebey-López ◽

María José Currás-Tuala ◽

Sara Pischedda ◽

...

Keyword(s):

Gene Expression ◽

Viral Infections ◽

Umbilical Vein ◽

Cell Types ◽

Dermal Fibroblasts ◽

Learning Approaches ◽

Rna Seq ◽

Wide Range ◽

Healthy Control ◽

Umbilical Vein Endothelial Cells

There is a growing interest in unraveling gene expression mechanisms leading to viral host invasion and infection progression. Current findings reveal that long non-coding RNAs (lncRNAs) are implicated in the regulation of the immune system by influencing gene expression through a wide range of mechanisms. By mining whole-transcriptome shotgun sequencing (RNA-seq) data using machine learning approaches, we detected two lncRNAs (ENSG00000254680 and ENSG00000273149) that are downregulated in a wide range of viral infections and different cell types, including blood monocluclear cells, umbilical vein endothelial cells, and dermal fibroblasts. The efficiency of these two lncRNAs was positively validated in different viral phenotypic scenarios. These two lncRNAs showed a strong downregulation in virus-infected patients when compared to healthy control transcriptomes, indicating that these biomarkers are promising targets for infection diagnosis. To the best of our knowledge, this is the very first study using host lncRNAs biomarkers for the diagnosis of human viral infections.

Download Full-text