scholarly journals Detection of condition-specific marker genes from RNA-seq data with MGFR

PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e6970 ◽  
Author(s):  
Khadija El Amrani ◽  
Gregorio Alanis-Lobato ◽  
Nancy Mah ◽  
Andreas Kurtz ◽  
Miguel A. Andrade-Navarro

The identification of condition-specific genes is key to advancing our understanding of cell fate decisions and disease development. Differential gene expression analysis (DGEA) has been the standard tool for this task. However, the amount of samples that modern transcriptomic technologies allow us to study, makes DGEA a daunting task. On the other hand, experiments with low numbers of replicates lack the statistical power to detect differentially expressed genes. We have previously developed MGFM, a tool for marker gene detection from microarrays, that is particularly useful in the latter case. Here, we have adapted the algorithm behind MGFM to detect markers in RNA-seq data. MGFR groups samples with similar gene expression levels and flags potential markers of a sample type if their highest expression values represent all replicates of this type. We have benchmarked MGFR against other methods and found that its proposed markers accurately characterize the functional identity of different tissues and cell types in standard and single cell RNA-seq datasets. Then, we performed a more detailed analysis for three of these datasets, which profile the transcriptomes of different human tissues, immune and human blastocyst cell types, respectively. MGFR’s predicted markers were compared to gold-standard lists for these datasets and outperformed the other marker detectors. Finally, we suggest novel candidate marker genes for the examined tissues and cell types. MGFR is implemented as a freely available Bioconductor package (https://doi.org/doi:10.18129/B9.bioc.MGFR), which facilitates its use and integration with bioinformatics pipelines.

2020 ◽  
Author(s):  
Mohit Goyal ◽  
Guillermo Serrano ◽  
Ilan Shomorony ◽  
Mikel Hernaez ◽  
Idoia Ochoa

AbstractSingle-cell RNA-seq is a powerful tool in the study of the cellular composition of different tissues and organisms. A key step in the analysis pipeline is the annotation of cell-types based on the expression of specific marker genes. Since manual annotation is labor-intensive and does not scale to large datasets, several methods for automated cell-type annotation have been proposed based on supervised learning. However, these methods generally require feature extraction and batch alignment prior to classification, and their performance may become unreliable in the presence of cell-types with very similar transcriptomic profiles, such as differentiating cells. We propose JIND, a framework for automated cell-type identification based on neural networks that directly learns a low-dimensional representation (latent code) in which cell-types can be reliably determined. To account for batch effects, JIND performs a novel asymmetric alignment in which the transcriptomic profile of unseen cells is mapped onto the previously learned latent space, hence avoiding the need of retraining the model whenever a new dataset becomes available. JIND also learns cell-type-specific confidence thresholds to identify and reject cells that cannot be reliably classified. We show on datasets with and without batch effects that JIND classifies cells more accurately than previously proposed methods while rejecting only a small proportion of cells. Moreover, JIND batch alignment is parallelizable, being more than five or six times faster than Seurat integration. Availability: https://github.com/mohit1997/JIND.


2009 ◽  
Vol 21 (1) ◽  
pp. 241
Author(s):  
M. T. Zhao ◽  
C. S. Isom ◽  
J. G. Zhao ◽  
Y. H. Hao ◽  
J. Ross ◽  
...  

Recently neural crest derived multipotent progenitors from skin have attracted much attention as the skin may provide an accessible, autologous source of stem cells available with therapeutic potential (Toma JG et al. 2001 Nat. Cell Biol. 3, 778–784). The multipotent property of stem cells could be tracked back to the expression of specific marker genes that are exclusively expressed in multipotent stem cells rather than any other types of differentiated cells. Here we demonstrate the property of multipotency and neural crest origin of porcine GFP-transgenic skin derived progenitors (termed pSKP) in vitro by marker gene expression analysis. The pSKP cells were isolated from the back skin of GFP transgenic fetuses by serum-free selection culture in the presence of EGF (20 ng mL–1) and bFGF (40 ng mL–1), and developed into spheres in 1–2 weeks (Dyce PW et al. 2004 Biochem. Biophy. Res. Commun. 316, 651–658). Three groups of RT-PCR primers were used on total RNA from purified pSKP cells: pluripotency related genes (Oct4, Sox2, Nanog, Stat3), neural crest marker genes (p75NGFR, Slug, Twist, Pax3, Sox9, Sox10) and lineage specific genes (GFAP, tubulin β-III, leptin). Expression of both pluripotency related genes and neural crest marker genes were detected in undifferentiated pSKP cells. In addition, transcripts for fibronectin, vimentin and nestin (neural stem cell marker) were also present. The percentage of positive cells for Oct4, fibronection and vimentin were 12.3%, 67.9% and 53.7% respectively. Differentiation assays showed the appearance of tubulin β-III positive (39.4%) and GFAP-positive (42.6%) cells in cultures by immunocytochemistry, which share the characteristics of neurons and glial cells, respectively. Thus, we confirm the multiple lineage potentials and neural crest origin of pSKP cells in the level of marker gene expression. This work was funded by National Institutes of Health National Center for Research Resources RR013438.


2016 ◽  
Author(s):  
Vincent Gardeux ◽  
Fabrice David ◽  
Adrian Shajkofci ◽  
Petra C Schwalie ◽  
Bart Deplancke

AbstractMotivationSingle-cell RNA-sequencing (scRNA-seq) allows whole transcriptome profiling of thousands of individual cells, enabling the molecular exploration of tissues at the cellular level. Such analytical capacity is of great interest to many research groups in the world, yet, these groups often lack the expertise to handle complex scRNA-seq data sets.ResultsWe developed a fully integrated, web-based platform aimed at the complete analysis of scRNA-seq data post genome alignment: from the parsing, filtering, and normalization of the input count data files, to the visual representation of the data, identification of cell clusters, differentially expressed genes (including cluster-specific marker genes), and functional gene set enrichment. This Automated Single-cell Analysis Pipeline (ASAP) combines a wide range of commonly used algorithms with sophisticated visualization tools. Compared with existing scRNA-seq analysis platforms, researchers (including those lacking computational expertise) are able to interact with the data in a straightforward fashion and in real time. Furthermore, given the overlap between scRNA-seq and bulk RNA-seq analysis workflows, ASAP should conceptually be broadly applicable to any RNA-seq dataset. As a validation, we demonstrate how we can use ASAP to simply reproduce the results from a single-cell study of 91 mouse cells involving five distinct cell types.AvailabilityThe tool is freely available at http://[email protected]


F1000Research ◽  
2019 ◽  
Vol 7 ◽  
pp. 1522 ◽  
Author(s):  
Brendan T. Innes ◽  
Gary D. Bader

Single-cell RNA sequencing (scRNAseq) represents a new kind of microscope that can measure the transcriptome profiles of thousands of individual cells from complex cellular mixtures, such as in a tissue, in a single experiment. This technology is particularly valuable for characterization of tissue heterogeneity because it can be used to identify and classify all cell types in a tissue. This is generally done by clustering the data, based on the assumption that cells of a particular type share similar transcriptomes, distinct from other cell types in the tissue. However, nearly all clustering algorithms have tunable parameters which affect the number of clusters they will identify in data. The R Shiny software tool described here, scClustViz, provides a simple interactive graphical user interface for exploring scRNAseq data and assessing the biological relevance of clustering results. Given that cell types are expected to have distinct gene expression patterns, scClustViz uses differential gene expression between clusters as a metric for assessing the fit of a clustering result to the data at multiple cluster resolution levels. This helps select a clustering parameter for further analysis. scClustViz also provides interactive visualisation of: cluster-specific distributions of technical factors, such as predicted cell cycle stage and other metadata; cluster-wise gene expression statistics to simplify annotation of cell types and identification of cell type specific marker genes; and gene expression distributions over all cells and cell types. scClustViz provides an interactive interface for visualisation, assessment, and biological interpretation of cell-type classifications in scRNAseq experiments that can be easily added to existing analysis pipelines, enabling customization by bioinformaticians while enabling biologists to explore their results without the need for computational expertise. It is available at https://baderlab.github.io/scClustViz/.


Author(s):  
Irene Papatheodorou ◽  
Pablo Moreno ◽  
Jonathan Manning ◽  
Alfonso Muñoz-Pomer Fuentes ◽  
Nancy George ◽  
...  

Abstract Expression Atlas is EMBL-EBI’s resource for gene and protein expression. It sources and compiles data on the abundance and localisation of RNA and proteins in various biological systems and contexts and provides open access to this data for the research community. With the increased availability of single cell RNA-Seq datasets in the public archives, we have now extended Expression Atlas with a new added-value service to display gene expression in single cells. Single Cell Expression Atlas was launched in 2018 and currently includes 123 single cell RNA-Seq studies from 12 species. The website can be searched by genes within or across species to reveal experiments, tissues and cell types where this gene is expressed or under which conditions it is a marker gene. Within each study, cells can be visualized using a pre-calculated t-SNE plot and can be coloured by different features or by cell clusters based on gene expression. Within each experiment, there are links to downloadable files, such as RNA quantification matrices, clustering results, reports on protocols and associated metadata, such as assigned cell types.


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 1522 ◽  
Author(s):  
Brendan T. Innes ◽  
Gary D. Bader

Single-cell RNA sequencing (scRNAseq) represents a new kind of microscope that can measure the transcriptome profiles of thousands of individual cells from complex cellular mixtures, such as in a tissue, in a single experiment. This technology is particularly valuable for characterization of tissue heterogeneity because it can be used to identify and classify all cell types in a tissue. This is generally done by clustering the data, based on the assumption that cells of a particular type share similar transcriptomes, distinct from other cell types in the tissue. However, nearly all clustering algorithms have tunable parameters which affect the number of clusters they will identify in data. The R Shiny software tool described here, scClustViz, provides a simple interactive graphical user interface for exploring scRNAseq data and assessing the biological relevance of clustering results. Given that cell types are expected to have distinct gene expression patterns, scClustViz uses differential gene expression between clusters as a metric for assessing the fit of a clustering result to the data at multiple cluster resolution levels. This helps select a clustering parameter for further analysis. scClustViz also provides interactive visualisation of: cluster-specific distributions of technical factors, such as predicted cell cycle stage and other metadata; cluster-wise gene expression statistics to simplify annotation of cell types and identification of cell type specific marker genes; and gene expression distributions over all cells and cell types. scClustViz provides an interactive interface for visualisation, assessment, and biological interpretation of cell-type classifications in scRNAseq experiments that can be easily added to existing analysis pipelines, enabling customization by bioinformaticians while enabling biologists to explore their results without the need for computational expertise. It is available at https://baderlab.github.io/scClustViz/.


2021 ◽  
Author(s):  
Nan Zhao ◽  
Yanhui Zhang ◽  
Runfen Cheng ◽  
Danfang Zhang ◽  
Fan Li ◽  
...  

Abstract Background: Hepatocellular carcinoma (HCC) are often present with satellite nodules, rendering current curative treatments ineffective in many patients. The heterogeneity of HCC is a major challenge in personalized medicine. The emergence of spatial transcriptomics (ST) provides a powerful strategy for delineating the complex molecular landscapes of tumors. Methods: In this study, we investigated tissue-wide gene expression heterogeneity in tumor and adjacent nonneoplastic tissues using ST technology. We analyzed the transcriptomes of nearly 10820 tissue regions and identified main gene expression clusters and their specific marker genes (differentially expressed genes, DEGs) in patients. The DEGs were analyzed from two perspectives. First of all, we identified two distinct gene profiles associated with satellite nodules and conducted a more comprehensive analysis for both gene profiles. Their clinical relevance for human HCC was validated with KM Plotter. Secondly, we screened DEGs with TCGA database to divide the HCC cohort into high- and low-risk groups according to Cox analysis. HCC patients from the ICGC cohort were used for validation. Kaplan Meier analysis was used to compare the overall survival (OS) between high- and low-risk groups. Univariate and multivariate Cox analyses were applied to determine the independent predictors for OS. Results: Novel markers for the prediction of satellite nodules and a tumor clusters specific marker genes signature model(6 genes) for HCC prognosis was constructed, respectively. Conclusion: The establishment of marker gene profiles may be an important step towards an unbiased view of HCC and the 6-genes signature can be used for prognostic prediction in HCC. This analysis will help us to clarify one of the possible soucres of HCC heterogeneity, uncover pathogenic mechanisms and novel anti-tumor drug targets.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Hongyu Guo ◽  
Jun Li

AbstractOn single-cell RNA-sequencing data, we consider the problem of assigning cells to known cell types, assuming that the identities of cell-type-specific marker genes are given but their exact expression levels are unavailable, that is, without using a reference dataset. Based on an observation that the expected over-expression of marker genes is often absent in a nonnegligible proportion of cells, we develop a method called scSorter. scSorter allows marker genes to express at a low level and borrows information from the expression of non-marker genes. On both simulated and real data, scSorter shows much higher power compared to existing methods.


2020 ◽  
Vol 21 (S18) ◽  
Author(s):  
Sudipta Acharya ◽  
Laizhong Cui ◽  
Yi Pan

Abstract Background In recent years, to investigate challenging bioinformatics problems, the utilization of multiple genomic and proteomic sources has become immensely popular among researchers. One such issue is feature or gene selection and identifying relevant and non-redundant marker genes from high dimensional gene expression data sets. In that context, designing an efficient feature selection algorithm exploiting knowledge from multiple potential biological resources may be an effective way to understand the spectrum of cancer or other diseases with applications in specific epidemiology for a particular population. Results In the current article, we design the feature selection and marker gene detection as a multi-view multi-objective clustering problem. Regarding that, we propose an Unsupervised Multi-View Multi-Objective clustering-based gene selection approach called UMVMO-select. Three important resources of biological data (gene ontology, protein interaction data, protein sequence) along with gene expression values are collectively utilized to design two different views. UMVMO-select aims to reduce gene space without/minimally compromising the sample classification efficiency and determines relevant and non-redundant gene markers from three cancer gene expression benchmark data sets. Conclusion A thorough comparative analysis has been performed with five clustering and nine existing feature selection methods with respect to several internal and external validity metrics. Obtained results reveal the supremacy of the proposed method. Reported results are also validated through a proper biological significance test and heatmap plotting.


2020 ◽  
Vol 21 (8) ◽  
pp. 2748 ◽  
Author(s):  
Ruth Barral-Arca ◽  
Alberto Gómez-Carballa ◽  
Miriam Cebey-López ◽  
María José Currás-Tuala ◽  
Sara Pischedda ◽  
...  

There is a growing interest in unraveling gene expression mechanisms leading to viral host invasion and infection progression. Current findings reveal that long non-coding RNAs (lncRNAs) are implicated in the regulation of the immune system by influencing gene expression through a wide range of mechanisms. By mining whole-transcriptome shotgun sequencing (RNA-seq) data using machine learning approaches, we detected two lncRNAs (ENSG00000254680 and ENSG00000273149) that are downregulated in a wide range of viral infections and different cell types, including blood monocluclear cells, umbilical vein endothelial cells, and dermal fibroblasts. The efficiency of these two lncRNAs was positively validated in different viral phenotypic scenarios. These two lncRNAs showed a strong downregulation in virus-infected patients when compared to healthy control transcriptomes, indicating that these biomarkers are promising targets for infection diagnosis. To the best of our knowledge, this is the very first study using host lncRNAs biomarkers for the diagnosis of human viral infections.


Sign in / Sign up

Export Citation Format

Share Document