Post-modified non-negative matrix factorization for deconvoluting the gene expression profiles of specific cell types from heterogeneous clinical samples based on RNA-sequencing data

Yuan Liu; Yu Liang; Qifan Kuang; Fanfan Xie; Yingyi Hao; Zhining Wen; Menglong Li

doi:10.1002/cem.2929

LncGSEA: a versatile tool to infer lncRNA associated pathways from large-scale cancer transcriptome sequencing data

BMC Genomics ◽

10.1186/s12864-021-07900-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Yanan Ren ◽

Ting-You Wang ◽

Leah C. Anderton ◽

Qi Cao ◽

Rendong Yang

Keyword(s):

Gene Expression ◽

Large Scale ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Clinical Samples ◽

Sequencing Data ◽

Multiple Cancer ◽

Regulatory Pathways ◽

Cancer Transcriptome ◽

Versatile Tool

Abstract Background Long non-coding RNAs (lncRNAs) are a growing focus in cancer research. Deciphering pathways influenced by lncRNAs is important to understand their role in cancer. Although knock-down or overexpression of lncRNAs followed by gene expression profiling in cancer cell lines are established approaches to address this problem, these experimental data are not available for a majority of the annotated lncRNAs. Results As a surrogate, we present lncGSEA, a convenient tool to predict the lncRNA associated pathways through Gene Set Enrichment Analysis of gene expression profiles from large-scale cancer patient samples. We demonstrate that lncGSEA is able to recapitulate lncRNA associated pathways supported by literature and experimental validations in multiple cancer types. Conclusions LncGSEA allows researchers to infer lncRNA regulatory pathways directly from clinical samples in oncology. LncGSEA is written in R, and is freely accessible at https://github.com/ylab-hi/lncGSEA.

Download Full-text

Two subclasses of lung squamous cell carcinoma with different gene expression profiles and prognosis identified by hierarchical clustering and non-negative matrix factorization

Oncogene ◽

10.1038/sj.onc.1208858 ◽

2005 ◽

Vol 24 (47) ◽

pp. 7105-7113 ◽

Cited By ~ 57

Author(s):

Kentaro Inamura ◽

Takeshi Fujiwara ◽

Yujin Hoshida ◽

Takayuki Isagawa ◽

Michael H Jones ◽

...

Keyword(s):

Gene Expression ◽

Squamous Cell Carcinoma ◽

Cell Carcinoma ◽

Hierarchical Clustering ◽

Squamous Cell ◽

Matrix Factorization ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Lung Squamous Cell Carcinoma ◽

Non Negative Matrix Factorization

Download Full-text

Self-reporting transposons enable simultaneous readout of gene expression and transcription factor binding in single cells

10.1101/538553 ◽

2019 ◽

Cited By ~ 3

Author(s):

Arnav Moudgil ◽

Michael N. Wilkinson ◽

Xuhua Chen ◽

June He ◽

Alex J. Cammack ◽

...

Keyword(s):

Gene Expression ◽

Transcription Factor ◽

Single Cell ◽

Binding Sites ◽

Expression Profiles ◽

Single Cells ◽

Gene Expression Profiles ◽

Cell Types ◽

Specific Cell

AbstractIn situ measurements of transcription factor (TF) binding are confounded by cellular heterogeneity and represent averaged profiles in complex tissues. Single cell RNA-seq (scRNA-seq) is capable of resolving different cell types based on gene expression profiles, but no technology exists to directly link specific cell types to the binding pattern of TFs in those cell types. Here, we present self-reporting transposons (SRTs) and their use in single cell calling cards (scCC), a novel assay for simultaneously capturing gene expression profiles and mapping TF binding sites in single cells. First, we show how the genomic locations of SRTs can be recovered from mRNA. Next, we demonstrate that SRTs deposited by the piggyBac transposase can be used to map the genome-wide localization of the TFs SP1, through a direct fusion of the two proteins, and BRD4, through its native affinity for piggyBac. We then present the scCC method, which maps SRTs from scRNA-seq libraries, thus enabling concomitant identification of cell types and TF binding sites in those same cells. As a proof-of-concept, we show recovery of cell type-specific BRD4 and SP1 binding sites from cultured cells. Finally, we map Brd4 binding sites in the mouse cortex at single cell resolution, thus establishing a new technique for studying TF biology in situ.

Download Full-text

Determinants of transcription factor regulatory range

10.1101/582270 ◽

2019 ◽

Author(s):

Chen-Hao Chen ◽

Rongbin Zheng ◽

Jingyu Fan ◽

Myles Brown ◽

Jun S. Liu ◽

...

Keyword(s):

Gene Expression ◽

Long Range ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Cell Types ◽

Chromatin State ◽

Specific Cell ◽

Regulatory Influence ◽

State Dependent ◽

Functional Classes

AbstractTo characterize the genomic distances over which transcription factors (TFs) influence gene expression, we examined thousands of TF and histone modification ChIP-seq datasets and thousands of gene expression profiles. A model integrating these data revealed two classes of TF: one with short-range regulatory influence, the other with long-range regulatory influence. The two TF classes also had distinct chromatin-binding preferences and auto-regulatory properties. The regulatory range of a single TF bound within different topologically associating domains (TADs) depended on intrinsic TAD properties such as local gene density and G/C content, but also on the TAD chromatin state in specific cell types. Our results provide evidence that most TFs belong to one of these two functional classes, and that the regulatory range of long-range TFs is chromatin-state dependent. Thus, consideration of TF type, distance-to-target, and chromatin context is likely important in identifying TF regulatory targets and interpreting GWAS and eQTL SNPs.

Download Full-text

G2S3: a gene graph-based imputation method for single-cell RNA sequencing data

10.1101/2020.04.01.020586 ◽

2020 ◽

Author(s):

Weimiao Wu ◽

Qile Dai ◽

Yunqing Liu ◽

Xiting Yan ◽

Zuoheng Wang

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Sequencing Data ◽

High Data ◽

Study Gene Expression ◽

Single Cell Rna Sequencing ◽

Novel Method

AbstractSingle-cell RNA sequencing provides an opportunity to study gene expression at single-cell resolution. However, prevalent dropout events result in high data sparsity and noise that may obscure downstream analyses. We propose a novel method, G2S3, that imputes dropouts by borrowing information from adjacent genes in a sparse gene graph learned from gene expression profiles across cells. We applied G2S3 and other existing methods to seven single-cell datasets to compare their performance. Our results demonstrated that G2S3 is superior in recovering true expression levels, identifying cell subtypes, improving differential expression analyses, and recovering gene regulatory relationships, especially for mildly expressed genes.

Download Full-text

Building an RNA Sequencing Transcriptome of the Central Nervous System

The Neuroscientist ◽

10.1177/1073858415610541 ◽

2016 ◽

Vol 22 (6) ◽

pp. 579-592 ◽

Cited By ~ 12

Author(s):

Xiaomin Dong ◽

Yanan You ◽

Jia Qian Wu

Keyword(s):

Gene Expression ◽

Central Nervous System ◽

Nervous System ◽

Rna Sequencing ◽

Large Scale ◽

Expression Profiles ◽

Cell Types ◽

Specific Cell ◽

Rna Seq ◽

The Central Nervous System

The composition and function of the central nervous system (CNS) is extremely complex. In addition to hundreds of subtypes of neurons, other cell types, including glia (astrocytes, oligodendrocytes, and microglia) and vascular cells (endothelial cells and pericytes) also play important roles in CNS function. Such heterogeneity makes the study of gene transcription in CNS challenging. Transcriptomic studies, namely the analyses of the expression levels and structures of all genes, are essential for interpreting the functional elements and understanding the molecular constituents of the CNS. Microarray has been a predominant method for large-scale gene expression profiling in the past. However, RNA-sequencing (RNA-Seq) technology developed in recent years has many advantages over microarrays, and has enabled building more quantitative, accurate, and comprehensive transcriptomes of the CNS and other systems. The discovery of novel genes, diverse alternative splicing events, and noncoding RNAs has remarkably expanded the complexity of gene expression profiles and will help us to understand intricate neural circuits. Here, we discuss the procedures and advantages of RNA-Seq technology in mammalian CNS transcriptome construction, and review the approaches of sample collection as well as recent progress in building RNA-Seq-based transcriptomes from tissue samples and specific cell types.

Download Full-text

Automated identification of the mouse brain’s spatial compartments from in situ sequencing data

BMC Biology ◽

10.1186/s12915-020-00874-5 ◽

2020 ◽

Vol 18 (1) ◽

Author(s):

Gabriele Partel ◽

Markus M. Hilscher ◽

Giorgia Milli ◽

Leslie Solorzano ◽

Anna H. Klemm ◽

...

Keyword(s):

Gene Expression ◽

Mouse Brain ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Cell Types ◽

Sequencing Data ◽

Tissue Samples ◽

Gene Markers ◽

Smooth Transitions

Abstract Background Neuroanatomical compartments of the mouse brain are identified and outlined mainly based on manual annotations of samples using features related to tissue and cellular morphology, taking advantage of publicly available reference atlases. However, this task is challenging since sliced tissue sections are rarely perfectly parallel or angled with respect to sections in the reference atlas and organs from different individuals may vary in size and shape and requires manual annotation. With the advent of in situ sequencing technologies and automated approaches, it is now possible to profile the gene expression of targeted genes inside preserved tissue samples and thus spatially map biological processes across anatomical compartments. Results Here, we show how in situ sequencing data combined with dimensionality reduction and clustering can be used to identify spatial compartments that correspond to known anatomical compartments of the brain. We also visualize gradients in gene expression and sharp as well as smooth transitions between different compartments. We apply our method on mouse brain sections and show that a fully unsupervised approach can computationally define anatomical compartments, which are highly reproducible across individuals, using as few as 18 gene markers. We also show that morphological variation does not always follow gene expression, and different spatial compartments can be defined by various cell types with common morphological features but distinct gene expression profiles. Conclusion We show that spatial gene expression data can be used for unsupervised and unbiased annotations of mouse brain spatial compartments based only on molecular markers, without the need of subjective manual annotations based on tissue and cell morphology or matching reference atlases.

Download Full-text

Non-negative matrix factorization of gene expression profiles: a plug-in for BRB-ArrayTools

Bioinformatics ◽

10.1093/bioinformatics/btp009 ◽

2009 ◽

Vol 25 (4) ◽

pp. 545-547 ◽

Cited By ~ 26

Author(s):

Q. Qi ◽

Y. Zhao ◽

M. Li ◽

R. Simon

Keyword(s):

Gene Expression ◽

Matrix Factorization ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Non Negative Matrix Factorization

Download Full-text

G2S3: A gene graph-based imputation method for single-cell RNA sequencing data

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009029 ◽

2021 ◽

Vol 17 (5) ◽

pp. e1009029

Author(s):

Weimiao Wu ◽

Yunqing Liu ◽

Qile Dai ◽

Xiting Yan ◽

Zuoheng Wang

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Large Scale ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Sequencing Data ◽

High Data ◽

Study Gene Expression ◽

Single Cell Rna Sequencing

Single-cell RNA sequencing technology provides an opportunity to study gene expression at single-cell resolution. However, prevalent dropout events result in high data sparsity and noise that may obscure downstream analyses in single-cell transcriptomic studies. We propose a new method, G2S3, that imputes dropouts by borrowing information from adjacent genes in a sparse gene graph learned from gene expression profiles across cells. We applied G2S3 and ten existing imputation methods to eight single-cell transcriptomic datasets and compared their performance. Our results demonstrated that G2S3 has superior overall performance in recovering gene expression, identifying cell subtypes, reconstructing cell trajectories, identifying differentially expressed genes, and recovering gene regulatory and correlation relationships. Moreover, G2S3 is computationally efficient for imputation in large-scale single-cell transcriptomic datasets.

Download Full-text

SC-JNMF: Single-cell clustering integrating multiple quantification methods based on joint non-negative matrix factorization

10.1101/2020.09.30.319921 ◽

2020 ◽

Author(s):

Mikio Shiga ◽

Shigeto Seno ◽

Makoto Onizuka ◽

Hideo Matsuda

Keyword(s):

Gene Expression ◽

Single Cell ◽

Matrix Factorization ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Clustering Methods ◽

Clustering Method ◽

Multiple Gene ◽

Cell Clustering ◽

Non Negative Matrix Factorization

AbstractUnsupervised cell clustering is important in discovering cell diversity and subpopulations. Single-cell clustering using gene expression profiles is known to show different results depending on the method of expression quantification; nevertheless, most single-cell clustering methods do not consider the method.In this article, we propose a robust and highly accurate clustering method using joint non-negative matrix factorization (joint NMF) based on multiple gene expression profiles quantified using different methods. Matrix factorization is an excellent method for dimension reduction and feature extraction of data. In particular, NMF approximates the data matrix as the product of two matrices in which all factors are non-negative. Our joint NMF can extract common factors among multiple gene expression profiles by applying each NMF to them under the constraint that one of the factorized matrices is shared among the multiple NMFs. The joint NMF determines more robust and accurate cell clustering results by leveraging multiple quantification methods compared to the conventional clustering methods, which uses only a single quantification method. In conclusion, our study showed that our clustering method using multiple gene expression profiles is more accurate than other popular methods.

Download Full-text