The Role of Scale in the Estimation of Cell-type Proportions

Mapping Intimacies ◽

10.1101/857805 ◽

2019 ◽

Author(s):

Gregory J. Hunt ◽

Johann A. Gagnon-Bartsch

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Hybrid Approach ◽

Expression Data ◽

Cell Type ◽

Statistical Efficiency ◽

Gene Expressions ◽

New Approach ◽

Cell Type Composition ◽

Type Composition

ABSTRACTComplex tissues are composed of a large number of different types of cells, each involved in a multitude of biological processes. Consequently, an important component to understanding such processes is understanding the cell-type composition of the tissues. Estimating cell type composition using high-throughput gene expression data is known as cell-type deconvolution. In this paper, we first summarize the extensive deconvolution literature by identifying a common regression-like approach to deconvolution. We call this approach the Unified Deconvolution-as-Regression (UDAR) framework. While methods that fall under this framework all use a similar model, they fit using data on different scales. Two popular scales for gene expression data are logarithmic and linear. Unfortunately, each of these scales has problems in the UDAR framework. Using log-scale gene expressions proposes a biologically implausible model and using linear-scale gene expressions will lead to statistically inefficient estimators. To overcome these problems, we propose a new approach for cell-type deconvolution that works on a hybrid of the two scales. This new approach is biologically plausible and improves statistical efficiency. We compare the hybrid approach to other methods on simulations as well as a collection of eleven real benchmark datasets. Here, we find the hybrid approach to be accurate and robust.deconvolution, gene expression, microarray, RNA-seq

Download Full-text

The Gene Expression Deconvolution Interactive Tool (GEDIT): accurate cell type quantification from gene expression data

GigaScience ◽

10.1093/gigascience/giab002 ◽

2021 ◽

Vol 10 (2) ◽

Author(s):

Brian B Nadel ◽

David Lopez ◽

Dennis J Montoya ◽

Feiyang Ma ◽

Hannah Waddel ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Reference Data ◽

Expression Data ◽

Cell Type ◽

Tissue Samples ◽

Interactive Tool ◽

Cell Type Composition ◽

Type Composition ◽

Human And Mouse

Abstract Background The cell type composition of heterogeneous tissue samples can be a critical variable in both clinical and laboratory settings. However, current experimental methods of cell type quantification (e.g., cell flow cytometry) are costly, time consuming and have potential to introduce bias. Computational approaches that use expression data to infer cell type abundance offer an alternative solution. While these methods have gained popularity, most fail to produce accurate predictions for the full range of platforms currently used by researchers or for the wide variety of tissue types often studied. Results We present the Gene Expression Deconvolution Interactive Tool (GEDIT), a flexible tool that utilizes gene expression data to accurately predict cell type abundances. Using both simulated and experimental data, we extensively evaluate the performance of GEDIT and demonstrate that it returns robust results under a wide variety of conditions. These conditions include multiple platforms (microarray and RNA-seq), tissue types (blood and stromal), and species (human and mouse). Finally, we provide reference data from 8 sources spanning a broad range of stromal and hematopoietic types in both human and mouse. GEDIT also accepts user-submitted reference data, thus allowing the estimation of any cell type or subtype, provided that reference data are available. Conclusions GEDIT is a powerful method for evaluating the cell type composition of tissue samples and provides excellent accuracy and versatility compared to similar tools. The reference database provided here also allows users to obtain estimates for a wide variety of tissue samples without having to provide their own data.

Download Full-text

Accurate estimation of cell-type composition from gene expression data

Nature Communications ◽

10.1038/s41467-019-10802-z ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 18

Author(s):

Daphne Tsoucas ◽

Rui Dong ◽

Haide Chen ◽

Qian Zhu ◽

Guoji Guo ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Accurate Estimation ◽

Expression Data ◽

Cell Type ◽

Cell Type Composition ◽

Type Composition

Download Full-text

ImmQuant: a user-friendly tool for inferring immune cell-type composition from gene-expression data

Bioinformatics ◽

10.1093/bioinformatics/btw535 ◽

2016 ◽

Vol 32 (24) ◽

pp. 3842-3843 ◽

Cited By ~ 25

Author(s):

Amit Frishberg ◽

Avital Brodt ◽

Yael Steuerman ◽

Irit Gat-Viks

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Immune Cell ◽

Expression Data ◽

Cell Type ◽

Immune Cell Type ◽

Cell Type Composition ◽

Type Composition ◽

User Friendly

Download Full-text

Mining The Cancer Genome Atlas gene expression data for lineage markers in distinguishing bladder urothelial carcinoma and prostate adenocarcinoma

Scientific Reports ◽

10.1038/s41598-021-85993-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Ewe Seng Ch’ng

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

The Cancer Genome Atlas ◽

Relative Importance ◽

Expression Data ◽

Gene Expressions ◽

Urothelial Carcinomas ◽

Cancer Genome Atlas ◽

Lineage Markers ◽

Genome Atlas

AbstractDistinguishing bladder urothelial carcinomas from prostate adenocarcinomas for poorly differentiated carcinomas derived from the bladder neck entails the use of a panel of lineage markers to help make this distinction. Publicly available The Cancer Genome Atlas (TCGA) gene expression data provides an avenue to examine utilities of these markers. This study aimed to verify expressions of urothelial and prostate lineage markers in the respective carcinomas and to seek the relative importance of these markers in making this distinction. Gene expressions of these markers were downloaded from TCGA Pan-Cancer database for bladder and prostate carcinomas. Differential gene expressions of these markers were analyzed. Standard linear discriminant analyses were applied to establish the relative importance of these markers in lineage determination and to construct the model best in making the distinction. This study shows that all urothelial lineage genes except for the gene for uroplakin III were significantly expressed in bladder urothelial carcinomas (p < 0.001). In descending order of importance to distinguish from prostate adenocarcinomas, genes for uroplakin II, S100P, GATA3 and thrombomodulin had high discriminant loadings (> 0.3). All prostate lineage genes were significantly expressed in prostate adenocarcinomas(p < 0.001). In descending order of importance to distinguish from bladder urothelial carcinomas, genes for NKX3.1, prostate specific antigen (PSA), prostate-specific acid phosphatase, prostein, and prostate-specific membrane antigen had high discriminant loadings (> 0.3). Combination of gene expressions for uroplakin II, S100P, NKX3.1 and PSA approached 100% accuracy in tumor classification both in the training and validation sets. Mining gene expression data, a combination of four lineage markers helps distinguish between bladder urothelial carcinomas and prostate adenocarcinomas.

Download Full-text

Cox Survival Analysis of Microarray Gene Expression Data Using Correlation Principal Component Regression

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1153 ◽

2007 ◽

Vol 6 (1) ◽

Cited By ~ 4

Author(s):

Qiang Zhao ◽

Jianguo Sun

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Principal Component Regression ◽

Predictive Ability ◽

Principal Component ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Microarray Gene Expression ◽

New Approach ◽

Microarray Gene

Statistical analysis of microarray gene expression data has recently attracted a great deal of attention. One problem of interest is to relate genes to survival outcomes of patients with the purpose of building regression models for the prediction of future patients' survival based on their gene expression data. For this, several authors have discussed the use of the proportional hazards or Cox model after reducing the dimension of the gene expression data. This paper presents a new approach to conduct the Cox survival analysis of microarray gene expression data with the focus on models' predictive ability. The method modifies the correlation principal component regression (Sun, 1995) to handle the censoring problem of survival data. The results based on simulated data and a set of publicly available data on diffuse large B-cell lymphoma show that the proposed method works well in terms of models' robustness and predictive ability in comparison with some existing partial least squares approaches. Also, the new approach is simpler and easy to implement.

Download Full-text

Seasonal changes in gene expression represent cell-type composition in whole blood

Human Molecular Genetics ◽

10.1093/hmg/ddt665 ◽

2014 ◽

Vol 23 (10) ◽

pp. 2721-2728 ◽

Cited By ~ 27

Author(s):

S. De Jong ◽

M. Neeleman ◽

J. J. Luykx ◽

M. J. Ten Berg ◽

E. Strengman ◽

...

Keyword(s):

Gene Expression ◽

Whole Blood ◽

Seasonal Changes ◽

Cell Type ◽

Cell Type Composition ◽

Type Composition

Download Full-text

Assessing the Contribution of Relative Macrophage Frequencies to Subcutaneous Adipose Tissue

Frontiers in Nutrition ◽

10.3389/fnut.2021.675935 ◽

2021 ◽

Vol 8 ◽

Author(s):

Marianthi Kalafati ◽

Michael Lenz ◽

Gökhan Ertaylan ◽

Ilja C. W. Arts ◽

Chris T. Evelo ◽

...

Keyword(s):

Gene Expression ◽

Adipose Tissue ◽

Subcutaneous Adipose Tissue ◽

Tissue Cell ◽

Cell Type ◽

Expression Of Genes ◽

Cell Type Composition ◽

Type Composition ◽

Adipose Tissue Cell ◽

Subcutaneous Adipose

Background: Macrophages play an important role in regulating adipose tissue function, while their frequencies in adipose tissue vary between individuals. Adipose tissue infiltration by high frequencies of macrophages has been linked to changes in adipokine levels and low-grade inflammation, frequently associated with the progression of obesity. The objective of this project was to assess the contribution of relative macrophage frequencies to the overall subcutaneous adipose tissue gene expression using publicly available datasets.Methods: Seven publicly available microarray gene expression datasets from human subcutaneous adipose tissue biopsies (n = 519) were used together with TissueDecoder to determine the adipose tissue cell-type composition of each sample. We divided the subjects in four groups based on their relative macrophage frequencies. Differential gene expression analysis between the high and low relative macrophage frequencies groups was performed, adjusting for sex and study. Finally, biological processes were identified using pathway enrichment and network analysis.Results: We observed lower frequencies of adipocytes and higher frequencies of adipose stem cells in individuals characterized by high macrophage frequencies. We additionally studied whether, within subcutaneous adipose tissue, interindividual differences in the relative frequencies of macrophages were reflected in transcriptional differences in metabolic and inflammatory pathways. Adipose tissue of individuals with high macrophage frequencies had a higher expression of genes involved in complement activation, chemotaxis, focal adhesion, and oxidative stress. Similarly, we observed a lower expression of genes involved in lipid metabolism, fatty acid synthesis, and oxidation and mitochondrial respiration.Conclusion: We present an approach that combines publicly available subcutaneous adipose tissue gene expression datasets with a deconvolution algorithm to calculate subcutaneous adipose tissue cell-type composition. The results showed the expected increased inflammation gene expression profile accompanied by decreased gene expression in pathways related to lipid metabolism and mitochondrial respiration in subcutaneous adipose tissue in individuals characterized by high macrophage frequencies. This approach demonstrates the hidden strength of reusing publicly available data to gain cell-type-specific insights into adipose tissue function.

Download Full-text

A Hybrid Approach to Estimate True Density Function for Gene Expression Data

Advances in Digital Image Processing and Information Technology - Communications in Computer and Information Science ◽

10.1007/978-3-642-24055-3_5 ◽

2011 ◽

pp. 44-54

Author(s):

Ganesh Kumar Pugalendhi ◽

Mahibha David ◽

Aruldoss Albert Victoire

Keyword(s):

Gene Expression ◽

Density Function ◽

Gene Expression Data ◽

Hybrid Approach ◽

Expression Data ◽

True Density

Download Full-text

Building Gene Networks by Analyzing Gene Expression Profiles

Advanced Methodologies and Technologies in Medicine and Healthcare - Advances in Medical Diagnosis, Treatment, and Care ◽

10.4018/978-1-5225-7489-7.ch003 ◽

2019 ◽

pp. 27-44

Author(s):

Crescenzio Gallo

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Gene Networks ◽

Dna Microarrays ◽

Expression Profiles ◽

Expression Patterns ◽

Gene Expression Profiles ◽

Expression Data ◽

Gene Expressions ◽

Over Time

The possible applications of modeling and simulation in the field of bioinformatics are very extensive, ranging from understanding basic metabolic paths to exploring genetic variability. Experimental results carried out with DNA microarrays allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. In this chapter, the authors examine various methods for analyzing gene expression data, addressing the important topics of (1) selecting the most differentially expressed genes, (2) grouping them by means of their relationships, and (3) classifying samples based on gene expressions.

Download Full-text

Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data

eLife ◽

10.7554/elife.26476 ◽

2017 ◽

Vol 6 ◽

Cited By ~ 107

Author(s):

Julien Racle ◽

Kaat de Jonge ◽

Petra Baumgaertner ◽

Daniel E Speiser ◽

David Gfeller

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Immune Cell ◽

Expression Profiles ◽

Cell Types ◽

Response To Therapy ◽

Expression Data ◽

Cell Type ◽

Tumor Gene Expression ◽

Tumor Gene

Immune cells infiltrating tumors can have important impact on tumor progression and response to therapy. We present an efficient algorithm to simultaneously estimate the fraction of cancer and immune cell types from bulk tumor gene expression data. Our method integrates novel gene expression profiles from each major non-malignant cell type found in tumors, renormalization based on cell-type-specific mRNA content, and the ability to consider uncharacterized and possibly highly variable cell types. Feasibility is demonstrated by validation with flow cytometry, immunohistochemistry and single-cell RNA-Seq analyses of human melanoma and colorectal tumor specimens. Altogether, our work not only improves accuracy but also broadens the scope of absolute cell fraction predictions from tumor gene expression data, and provides a unique novel experimental benchmark for immunogenomics analyses in cancer research (http://epic.gfellerlab.org).

Download Full-text