Detection of differentially expressed genes in discrete single‐cell RNA sequencing data using a hurdle model with correlated random effects

Biometrics ◽  
2019 ◽  
Vol 75 (4) ◽  
pp. 1051-1062 ◽  
Author(s):  
Michael Sekula ◽  
Jeremy Gaskins ◽  
Susmita Datta
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Bobby Ranjan ◽  
Florian Schmidt ◽  
Wenjie Sun ◽  
Jinyu Park ◽  
Mohammad Amin Honardoost ◽  
...  

Abstract Background Clustering is a crucial step in the analysis of single-cell data. Clusters identified in an unsupervised manner are typically annotated to cell types based on differentially expressed genes. In contrast, supervised methods use a reference panel of labelled transcriptomes to guide both clustering and cell type identification. Supervised and unsupervised clustering approaches have their distinct advantages and limitations. Therefore, they can lead to different but often complementary clustering results. Hence, a consensus approach leveraging the merits of both clustering paradigms could result in a more accurate clustering and a more precise cell type annotation. Results We present scConsensus, an $${\mathbf {R}}$$ R framework for generating a consensus clustering by (1) integrating results from both unsupervised and supervised approaches and (2) refining the consensus clusters using differentially expressed genes. The value of our approach is demonstrated on several existing single-cell RNA sequencing datasets, including data from sorted PBMC sub-populations. Conclusions scConsensus combines the merits of unsupervised and supervised approaches to partition cells with better cluster separation and homogeneity, thereby increasing our confidence in detecting distinct cell types. scConsensus is implemented in $${\mathbf {R}}$$ R and is freely available on GitHub at https://github.com/prabhakarlab/scConsensus.


2019 ◽  
Author(s):  
Florian Klimm ◽  
Enrique M. Toledo ◽  
Thomas Monfeuga ◽  
Fang Zhang ◽  
Charlotte M. Deane ◽  
...  

AbstractRecent advances in single-cell RNA sequencing (scRNA-seq) have allowed researchers to explore transcriptional function at a cellular level. In this study, we present scPPIN, a method for integrating single-cell RNA sequencing data with protein–protein interaction networks (PPINs) that detects active modules in cells of different transcriptional states. We achieve this by clustering RNA-sequencing data, identifying differentially expressed genes, constructing node-weighted PPINs, and finding the maximum-weight connected subgraphs with an exact Steiner-tree approach. As a case study, we investigate RNA-sequencing data from human liver spheroids but the techniques described here are applicable to other organisms and tissues. scPPIN allows us to expand the output of differential expressed genes analysis with information from protein interactions. We find that different transcriptional states have different subnetworks of the PPIN significantly enriched which represent biological pathways. In these pathways, scPPIN also identifies proteins that are not differentially expressed but have a crucial biological function (e.g., as receptors) and therefore reveals biology beyond a standard differentially expressed gene analysis.


BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Florian Klimm ◽  
Enrique M. Toledo ◽  
Thomas Monfeuga ◽  
Fang Zhang ◽  
Charlotte M. Deane ◽  
...  

Abstract Background Recent advances in single-cell RNA sequencing have allowed researchers to explore transcriptional function at a cellular level. In particular, single-cell RNA sequencing reveals that there exist clusters of cells with similar gene expression profiles, representing different transcriptional states. Results In this study, we present scPPIN, a method for integrating single-cell RNA sequencing data with protein–protein interaction networks that detects active modules in cells of different transcriptional states. We achieve this by clustering RNA-sequencing data, identifying differentially expressed genes, constructing node-weighted protein–protein interaction networks, and finding the maximum-weight connected subgraphs with an exact Steiner-tree approach. As case studies, we investigate two RNA-sequencing data sets from human liver spheroids and human adipose tissue, respectively. With scPPIN we expand the output of differential expressed genes analysis with information from protein interactions. We find that different transcriptional states have different subnetworks of the protein–protein interaction networks significantly enriched which represent biological pathways. In these pathways, scPPIN identifies proteins that are not differentially expressed but have a crucial biological function (e.g., as receptors) and therefore reveals biology beyond a standard differential expressed gene analysis. Conclusions The introduced scPPIN method can be used to systematically analyse differentially expressed genes in single-cell RNA sequencing data by integrating it with protein interaction data. The detected modules that characterise each cluster help to identify and hypothesise a biological function associated to those cells. Our analysis suggests the participation of unexpected proteins in these pathways that are undetectable from the single-cell RNA sequencing data alone. The techniques described here are applicable to other organisms and tissues.


2020 ◽  
Author(s):  
Bobby Ranjan ◽  
Florian Schmidt ◽  
Wenjie Sun ◽  
Jinyu Park ◽  
Mohammad Amin Honardoost ◽  
...  

Clustering is a crucial step in the analysis of single-cell data. Clusters identified using unsupervised clustering are typically annotated to cell types based on differentially expressed genes. In contrast, supervised methods use a reference panel of labelled transcriptomes to guide both clustering and cell type identification. Supervised and unsupervised clustering strategies have their distinct advantages and limitations. Therefore, they can lead to different but often complementary clustering results. Hence, a consensus approach leveraging the merits of both clustering paradigms could result in a more accurate clustering and a more precise cell type annotation. We present scConsensus, an R framework for generating a consensus clustering by (i) integrating the results from both unsupervised and supervised approaches and (ii) refining the consensus clusters using differentially expressed (DE) genes. The value of our approach is demonstrated on several existing single-cell RNA sequencing datasets, including data from sorted PBMC sub-populations. scConsensus is freely available on GitHub at https://github.com/prabhakarlab/scConsensus.


2021 ◽  
Author(s):  
Shuang Gao ◽  
Fazhan Li ◽  
Minghai Zhao ◽  
Wanqing Wu ◽  
Yuming Fu ◽  
...  

Abstract Background: Due to the lack of effective drugs, gastric cancer(GC) has a high mortality rate among other cancers, with a low 5-year survival rate and an inferior prognosis. Thus, screening of meaningful tumor biomarkers or therapeutic targets could play a vital role in the diagnosis, treatment, prognosis, and follow-up of GC. Methods: Gene expression profiles and comprehensive clinical information of 407 patients with GC were downloaded from The Cancer Genome Atlas (TCGA) database. GC-related single-cell RNA sequencing data from the GSE118916 dataset was downloaded from the Gene Expression Omnibus (GEO) database. The differentially expressed genes (DEGs) were screened from transcriptomic data in GC and normal samples by R language. The DAVID database was also used to analyze the functions and pathways of DEGs. After combining differential genes with patient survival information, target genes were identified. The interaction of DEGs in the protein-protein interaction (PPI) network was also studied. Results: Our study identified a total of 209 differential genes, which might be positively related to GC. Gene Ontology (GO) analysis indicated numerous enrichment of DEGs in the extracellular matrix organization, extracellular structure organization, and muscle contraction. The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis showed that the DEGs were mainly enriched in focal adhesion, protein digestion and absorption, AGE-RAGE signaling pathway in diabetic complications. Further analysis showed the higher expression of Carboxypeptidase vitellogenic-like gene (CPVL) was related to the better prognosis of GC patients in both TCGA and the GEO database. FAM3 metabolism regulating signaling molecule D (FAM3D) and oxidized low-density lipoprotein receptor 1 (OLR1) were significantly associated with GC patients’ prognosis only in the GEO database. Lastly, the PPI network shows the gene expression proteins that interact most closely with CPVL protein.Conclusion: Our study revealed that CPVL gene could be a promising target for the diagnosis and treatment of GC, which has a great significance for the future research on GC. In addition, we were the first to find a close relationship between FAM3D and GC.


Sign in / Sign up

Export Citation Format

Share Document