Post-modified non-negative matrix factorization for deconvoluting the gene expression profiles of specific cell types from heterogeneous clinical samples based on RNA-sequencing data

2017 ◽  
Vol 32 (11) ◽  
pp. e2929 ◽  
Author(s):  
Yuan Liu ◽  
Yu Liang ◽  
Qifan Kuang ◽  
Fanfan Xie ◽  
Yingyi Hao ◽  
...  
BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yanan Ren ◽  
Ting-You Wang ◽  
Leah C. Anderton ◽  
Qi Cao ◽  
Rendong Yang

Abstract Background Long non-coding RNAs (lncRNAs) are a growing focus in cancer research. Deciphering pathways influenced by lncRNAs is important to understand their role in cancer. Although knock-down or overexpression of lncRNAs followed by gene expression profiling in cancer cell lines are established approaches to address this problem, these experimental data are not available for a majority of the annotated lncRNAs. Results As a surrogate, we present lncGSEA, a convenient tool to predict the lncRNA associated pathways through Gene Set Enrichment Analysis of gene expression profiles from large-scale cancer patient samples. We demonstrate that lncGSEA is able to recapitulate lncRNA associated pathways supported by literature and experimental validations in multiple cancer types. Conclusions LncGSEA allows researchers to infer lncRNA regulatory pathways directly from clinical samples in oncology. LncGSEA is written in R, and is freely accessible at https://github.com/ylab-hi/lncGSEA.


2019 ◽  
Author(s):  
Arnav Moudgil ◽  
Michael N. Wilkinson ◽  
Xuhua Chen ◽  
June He ◽  
Alex J. Cammack ◽  
...  

AbstractIn situ measurements of transcription factor (TF) binding are confounded by cellular heterogeneity and represent averaged profiles in complex tissues. Single cell RNA-seq (scRNA-seq) is capable of resolving different cell types based on gene expression profiles, but no technology exists to directly link specific cell types to the binding pattern of TFs in those cell types. Here, we present self-reporting transposons (SRTs) and their use in single cell calling cards (scCC), a novel assay for simultaneously capturing gene expression profiles and mapping TF binding sites in single cells. First, we show how the genomic locations of SRTs can be recovered from mRNA. Next, we demonstrate that SRTs deposited by the piggyBac transposase can be used to map the genome-wide localization of the TFs SP1, through a direct fusion of the two proteins, and BRD4, through its native affinity for piggyBac. We then present the scCC method, which maps SRTs from scRNA-seq libraries, thus enabling concomitant identification of cell types and TF binding sites in those same cells. As a proof-of-concept, we show recovery of cell type-specific BRD4 and SP1 binding sites from cultured cells. Finally, we map Brd4 binding sites in the mouse cortex at single cell resolution, thus establishing a new technique for studying TF biology in situ.


2019 ◽  
Author(s):  
Chen-Hao Chen ◽  
Rongbin Zheng ◽  
Jingyu Fan ◽  
Myles Brown ◽  
Jun S. Liu ◽  
...  

AbstractTo characterize the genomic distances over which transcription factors (TFs) influence gene expression, we examined thousands of TF and histone modification ChIP-seq datasets and thousands of gene expression profiles. A model integrating these data revealed two classes of TF: one with short-range regulatory influence, the other with long-range regulatory influence. The two TF classes also had distinct chromatin-binding preferences and auto-regulatory properties. The regulatory range of a single TF bound within different topologically associating domains (TADs) depended on intrinsic TAD properties such as local gene density and G/C content, but also on the TAD chromatin state in specific cell types. Our results provide evidence that most TFs belong to one of these two functional classes, and that the regulatory range of long-range TFs is chromatin-state dependent. Thus, consideration of TF type, distance-to-target, and chromatin context is likely important in identifying TF regulatory targets and interpreting GWAS and eQTL SNPs.


2020 ◽  
Author(s):  
Weimiao Wu ◽  
Qile Dai ◽  
Yunqing Liu ◽  
Xiting Yan ◽  
Zuoheng Wang

AbstractSingle-cell RNA sequencing provides an opportunity to study gene expression at single-cell resolution. However, prevalent dropout events result in high data sparsity and noise that may obscure downstream analyses. We propose a novel method, G2S3, that imputes dropouts by borrowing information from adjacent genes in a sparse gene graph learned from gene expression profiles across cells. We applied G2S3 and other existing methods to seven single-cell datasets to compare their performance. Our results demonstrated that G2S3 is superior in recovering true expression levels, identifying cell subtypes, improving differential expression analyses, and recovering gene regulatory relationships, especially for mildly expressed genes.


2016 ◽  
Vol 22 (6) ◽  
pp. 579-592 ◽  
Author(s):  
Xiaomin Dong ◽  
Yanan You ◽  
Jia Qian Wu

The composition and function of the central nervous system (CNS) is extremely complex. In addition to hundreds of subtypes of neurons, other cell types, including glia (astrocytes, oligodendrocytes, and microglia) and vascular cells (endothelial cells and pericytes) also play important roles in CNS function. Such heterogeneity makes the study of gene transcription in CNS challenging. Transcriptomic studies, namely the analyses of the expression levels and structures of all genes, are essential for interpreting the functional elements and understanding the molecular constituents of the CNS. Microarray has been a predominant method for large-scale gene expression profiling in the past. However, RNA-sequencing (RNA-Seq) technology developed in recent years has many advantages over microarrays, and has enabled building more quantitative, accurate, and comprehensive transcriptomes of the CNS and other systems. The discovery of novel genes, diverse alternative splicing events, and noncoding RNAs has remarkably expanded the complexity of gene expression profiles and will help us to understand intricate neural circuits. Here, we discuss the procedures and advantages of RNA-Seq technology in mammalian CNS transcriptome construction, and review the approaches of sample collection as well as recent progress in building RNA-Seq-based transcriptomes from tissue samples and specific cell types.


BMC Biology ◽  
2020 ◽  
Vol 18 (1) ◽  
Author(s):  
Gabriele Partel ◽  
Markus M. Hilscher ◽  
Giorgia Milli ◽  
Leslie Solorzano ◽  
Anna H. Klemm ◽  
...  

Abstract Background Neuroanatomical compartments of the mouse brain are identified and outlined mainly based on manual annotations of samples using features related to tissue and cellular morphology, taking advantage of publicly available reference atlases. However, this task is challenging since sliced tissue sections are rarely perfectly parallel or angled with respect to sections in the reference atlas and organs from different individuals may vary in size and shape and requires manual annotation. With the advent of in situ sequencing technologies and automated approaches, it is now possible to profile the gene expression of targeted genes inside preserved tissue samples and thus spatially map biological processes across anatomical compartments. Results Here, we show how in situ sequencing data combined with dimensionality reduction and clustering can be used to identify spatial compartments that correspond to known anatomical compartments of the brain. We also visualize gradients in gene expression and sharp as well as smooth transitions between different compartments. We apply our method on mouse brain sections and show that a fully unsupervised approach can computationally define anatomical compartments, which are highly reproducible across individuals, using as few as 18 gene markers. We also show that morphological variation does not always follow gene expression, and different spatial compartments can be defined by various cell types with common morphological features but distinct gene expression profiles. Conclusion We show that spatial gene expression data can be used for unsupervised and unbiased annotations of mouse brain spatial compartments based only on molecular markers, without the need of subjective manual annotations based on tissue and cell morphology or matching reference atlases.


2021 ◽  
Vol 17 (5) ◽  
pp. e1009029
Author(s):  
Weimiao Wu ◽  
Yunqing Liu ◽  
Qile Dai ◽  
Xiting Yan ◽  
Zuoheng Wang

Single-cell RNA sequencing technology provides an opportunity to study gene expression at single-cell resolution. However, prevalent dropout events result in high data sparsity and noise that may obscure downstream analyses in single-cell transcriptomic studies. We propose a new method, G2S3, that imputes dropouts by borrowing information from adjacent genes in a sparse gene graph learned from gene expression profiles across cells. We applied G2S3 and ten existing imputation methods to eight single-cell transcriptomic datasets and compared their performance. Our results demonstrated that G2S3 has superior overall performance in recovering gene expression, identifying cell subtypes, reconstructing cell trajectories, identifying differentially expressed genes, and recovering gene regulatory and correlation relationships. Moreover, G2S3 is computationally efficient for imputation in large-scale single-cell transcriptomic datasets.


2020 ◽  
Author(s):  
Mikio Shiga ◽  
Shigeto Seno ◽  
Makoto Onizuka ◽  
Hideo Matsuda

AbstractUnsupervised cell clustering is important in discovering cell diversity and subpopulations. Single-cell clustering using gene expression profiles is known to show different results depending on the method of expression quantification; nevertheless, most single-cell clustering methods do not consider the method.In this article, we propose a robust and highly accurate clustering method using joint non-negative matrix factorization (joint NMF) based on multiple gene expression profiles quantified using different methods. Matrix factorization is an excellent method for dimension reduction and feature extraction of data. In particular, NMF approximates the data matrix as the product of two matrices in which all factors are non-negative. Our joint NMF can extract common factors among multiple gene expression profiles by applying each NMF to them under the constraint that one of the factorized matrices is shared among the multiple NMFs. The joint NMF determines more robust and accurate cell clustering results by leveraging multiple quantification methods compared to the conventional clustering methods, which uses only a single quantification method. In conclusion, our study showed that our clustering method using multiple gene expression profiles is more accurate than other popular methods.


Sign in / Sign up

Export Citation Format

Share Document