scholarly journals The Role of Scale in the Estimation of Cell-type Proportions

2019 ◽  
Author(s):  
Gregory J. Hunt ◽  
Johann A. Gagnon-Bartsch

ABSTRACTComplex tissues are composed of a large number of different types of cells, each involved in a multitude of biological processes. Consequently, an important component to understanding such processes is understanding the cell-type composition of the tissues. Estimating cell type composition using high-throughput gene expression data is known as cell-type deconvolution. In this paper, we first summarize the extensive deconvolution literature by identifying a common regression-like approach to deconvolution. We call this approach the Unified Deconvolution-as-Regression (UDAR) framework. While methods that fall under this framework all use a similar model, they fit using data on different scales. Two popular scales for gene expression data are logarithmic and linear. Unfortunately, each of these scales has problems in the UDAR framework. Using log-scale gene expressions proposes a biologically implausible model and using linear-scale gene expressions will lead to statistically inefficient estimators. To overcome these problems, we propose a new approach for cell-type deconvolution that works on a hybrid of the two scales. This new approach is biologically plausible and improves statistical efficiency. We compare the hybrid approach to other methods on simulations as well as a collection of eleven real benchmark datasets. Here, we find the hybrid approach to be accurate and robust.deconvolution, gene expression, microarray, RNA-seq

GigaScience ◽  
2021 ◽  
Vol 10 (2) ◽  
Author(s):  
Brian B Nadel ◽  
David Lopez ◽  
Dennis J Montoya ◽  
Feiyang Ma ◽  
Hannah Waddel ◽  
...  

Abstract Background The cell type composition of heterogeneous tissue samples can be a critical variable in both clinical and laboratory settings. However, current experimental methods of cell type quantification (e.g., cell flow cytometry) are costly, time consuming and have potential to introduce bias. Computational approaches that use expression data to infer cell type abundance offer an alternative solution. While these methods have gained popularity, most fail to produce accurate predictions for the full range of platforms currently used by researchers or for the wide variety of tissue types often studied. Results We present the Gene Expression Deconvolution Interactive Tool (GEDIT), a flexible tool that utilizes gene expression data to accurately predict cell type abundances. Using both simulated and experimental data, we extensively evaluate the performance of GEDIT and demonstrate that it returns robust results under a wide variety of conditions. These conditions include multiple platforms (microarray and RNA-seq), tissue types (blood and stromal), and species (human and mouse). Finally, we provide reference data from 8 sources spanning a broad range of stromal and hematopoietic types in both human and mouse. GEDIT also accepts user-submitted reference data, thus allowing the estimation of any cell type or subtype, provided that reference data are available. Conclusions GEDIT is a powerful method for evaluating the cell type composition of tissue samples and provides excellent accuracy and versatility compared to similar tools. The reference database provided here also allows users to obtain estimates for a wide variety of tissue samples without having to provide their own data.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Daphne Tsoucas ◽  
Rui Dong ◽  
Haide Chen ◽  
Qian Zhu ◽  
Guoji Guo ◽  
...  

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ewe Seng Ch’ng

AbstractDistinguishing bladder urothelial carcinomas from prostate adenocarcinomas for poorly differentiated carcinomas derived from the bladder neck entails the use of a panel of lineage markers to help make this distinction. Publicly available The Cancer Genome Atlas (TCGA) gene expression data provides an avenue to examine utilities of these markers. This study aimed to verify expressions of urothelial and prostate lineage markers in the respective carcinomas and to seek the relative importance of these markers in making this distinction. Gene expressions of these markers were downloaded from TCGA Pan-Cancer database for bladder and prostate carcinomas. Differential gene expressions of these markers were analyzed. Standard linear discriminant analyses were applied to establish the relative importance of these markers in lineage determination and to construct the model best in making the distinction. This study shows that all urothelial lineage genes except for the gene for uroplakin III were significantly expressed in bladder urothelial carcinomas (p < 0.001). In descending order of importance to distinguish from prostate adenocarcinomas, genes for uroplakin II, S100P, GATA3 and thrombomodulin had high discriminant loadings (> 0.3). All prostate lineage genes were significantly expressed in prostate adenocarcinomas(p < 0.001). In descending order of importance to distinguish from bladder urothelial carcinomas, genes for NKX3.1, prostate specific antigen (PSA), prostate-specific acid phosphatase, prostein, and prostate-specific membrane antigen had high discriminant loadings (> 0.3). Combination of gene expressions for uroplakin II, S100P, NKX3.1 and PSA approached 100% accuracy in tumor classification both in the training and validation sets. Mining gene expression data, a combination of four lineage markers helps distinguish between bladder urothelial carcinomas and prostate adenocarcinomas.


Author(s):  
Qiang Zhao ◽  
Jianguo Sun

Statistical analysis of microarray gene expression data has recently attracted a great deal of attention. One problem of interest is to relate genes to survival outcomes of patients with the purpose of building regression models for the prediction of future patients' survival based on their gene expression data. For this, several authors have discussed the use of the proportional hazards or Cox model after reducing the dimension of the gene expression data. This paper presents a new approach to conduct the Cox survival analysis of microarray gene expression data with the focus on models' predictive ability. The method modifies the correlation principal component regression (Sun, 1995) to handle the censoring problem of survival data. The results based on simulated data and a set of publicly available data on diffuse large B-cell lymphoma show that the proposed method works well in terms of models' robustness and predictive ability in comparison with some existing partial least squares approaches. Also, the new approach is simpler and easy to implement.


2014 ◽  
Vol 23 (10) ◽  
pp. 2721-2728 ◽  
Author(s):  
S. De Jong ◽  
M. Neeleman ◽  
J. J. Luykx ◽  
M. J. Ten Berg ◽  
E. Strengman ◽  
...  

2021 ◽  
Vol 8 ◽  
Author(s):  
Marianthi Kalafati ◽  
Michael Lenz ◽  
Gökhan Ertaylan ◽  
Ilja C. W. Arts ◽  
Chris T. Evelo ◽  
...  

Background: Macrophages play an important role in regulating adipose tissue function, while their frequencies in adipose tissue vary between individuals. Adipose tissue infiltration by high frequencies of macrophages has been linked to changes in adipokine levels and low-grade inflammation, frequently associated with the progression of obesity. The objective of this project was to assess the contribution of relative macrophage frequencies to the overall subcutaneous adipose tissue gene expression using publicly available datasets.Methods: Seven publicly available microarray gene expression datasets from human subcutaneous adipose tissue biopsies (n = 519) were used together with TissueDecoder to determine the adipose tissue cell-type composition of each sample. We divided the subjects in four groups based on their relative macrophage frequencies. Differential gene expression analysis between the high and low relative macrophage frequencies groups was performed, adjusting for sex and study. Finally, biological processes were identified using pathway enrichment and network analysis.Results: We observed lower frequencies of adipocytes and higher frequencies of adipose stem cells in individuals characterized by high macrophage frequencies. We additionally studied whether, within subcutaneous adipose tissue, interindividual differences in the relative frequencies of macrophages were reflected in transcriptional differences in metabolic and inflammatory pathways. Adipose tissue of individuals with high macrophage frequencies had a higher expression of genes involved in complement activation, chemotaxis, focal adhesion, and oxidative stress. Similarly, we observed a lower expression of genes involved in lipid metabolism, fatty acid synthesis, and oxidation and mitochondrial respiration.Conclusion: We present an approach that combines publicly available subcutaneous adipose tissue gene expression datasets with a deconvolution algorithm to calculate subcutaneous adipose tissue cell-type composition. The results showed the expected increased inflammation gene expression profile accompanied by decreased gene expression in pathways related to lipid metabolism and mitochondrial respiration in subcutaneous adipose tissue in individuals characterized by high macrophage frequencies. This approach demonstrates the hidden strength of reusing publicly available data to gain cell-type-specific insights into adipose tissue function.


Author(s):  
Crescenzio Gallo

The possible applications of modeling and simulation in the field of bioinformatics are very extensive, ranging from understanding basic metabolic paths to exploring genetic variability. Experimental results carried out with DNA microarrays allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. In this chapter, the authors examine various methods for analyzing gene expression data, addressing the important topics of (1) selecting the most differentially expressed genes, (2) grouping them by means of their relationships, and (3) classifying samples based on gene expressions.


eLife ◽  
2017 ◽  
Vol 6 ◽  
Author(s):  
Julien Racle ◽  
Kaat de Jonge ◽  
Petra Baumgaertner ◽  
Daniel E Speiser ◽  
David Gfeller

Immune cells infiltrating tumors can have important impact on tumor progression and response to therapy. We present an efficient algorithm to simultaneously estimate the fraction of cancer and immune cell types from bulk tumor gene expression data. Our method integrates novel gene expression profiles from each major non-malignant cell type found in tumors, renormalization based on cell-type-specific mRNA content, and the ability to consider uncharacterized and possibly highly variable cell types. Feasibility is demonstrated by validation with flow cytometry, immunohistochemistry and single-cell RNA-Seq analyses of human melanoma and colorectal tumor specimens. Altogether, our work not only improves accuracy but also broadens the scope of absolute cell fraction predictions from tumor gene expression data, and provides a unique novel experimental benchmark for immunogenomics analyses in cancer research (http://epic.gfellerlab.org).


Sign in / Sign up

Export Citation Format

Share Document