Latent-space embedding of expression data identifies gene signatures from sputum samples of asthmatic patients

10.21203/rs.2.21701/v1 ◽

2020 ◽

Author(s):

Shaoke LOU ◽

Tianxiao Li ◽

Daniel Spakowicz ◽

Xiting Yan ◽

Geoffrey Lowell Chupp ◽

...

Keyword(s):

Gene Expression ◽

Random Forest ◽

Gene Expression Data ◽

Asthma Severity ◽

Support Vector ◽

Expression Data ◽

Denoising Autoencoder ◽

Gene Signatures ◽

Latent Space ◽

Key Genes

Abstract Backgrounds: The pathogenesis of asthma is a complex process involving multiple genes and pathways. Identifying biomarkers from asthma datasets, especially those that include heterogeneous subpopulations, is challenging. In this work, we developed a framework that incorporates a denoising autoencoder and a supervised learning approach to identify gene signatures related to asthma severity. The autoencoder embeds high-dimensional gene expression data into a lower -dimensional latent space in an unsupervised fashion, enabling us to extract distinguishing features from gene expression data. Results: Using the trained autoencoder model, we found that the weights on hidden units in the latent space correlate well with previously defined and clinically relevant clusters of patients. Moreover, pathway analysis based on each gene's contribution to the hidden units showed significant enrichment in known asthma-related pathways. We then used genes that contribute most to the hidden units to develop a secondary supervised classifier (based on random forest) for directly predicting asthma severity. The random-forest importance metric from this classifier identified a signature based on 50 key genes, which can predict severity with an AUROC of 0.81 and thus have potential as diagnostic biomarkers. Furthermore, the key genes could also be used for successfully estimating, via support-vector-machine regression, the FEV1/FVC ratios across patients, achieving pre- and post-treatment correlations of 0.56 and 0.65, respectively (between predicted and observed values). Conclusions: The denoising autoencoder framework could extract meaningful functional genes and patient groups from the gene expression profile of asthma patients. These patterns may provide potential sources for biomarkers for asthma severity.

Download Full-text

Latent-space embedding of expression data identifies gene signatures from sputum samples of asthmatic patients

10.21203/rs.2.21701/v3 ◽

2020 ◽

Author(s):

Shaoke LOU ◽

Tianxiao Li ◽

Daniel Spakowicz ◽

Xiting Yan ◽

Geoffrey Lowell Chupp ◽

...

Keyword(s):

Gene Expression ◽

Asthma Severity ◽

Support Vector ◽

Expression Data ◽

Denoising Autoencoder ◽

Gene Signatures ◽

Asthmatic Patients ◽

Latent Space ◽

Key Genes ◽

Low Dimensional

Abstract Background: The pathogenesis of asthma is a complex process involving multiple genes and pathways. Identifying biomarkers from asthma datasets, especially those that include heterogeneous subpopulations, is challenging. Potentially, autoencoders provide ideal frameworks for such tasks as they can embed complex, noisy high-dimensional gene expression data into a low-dimensional latent space in an unsupervised fashion, enabling us to extract distinguishing features from expression data.Results: Here, we developed a framework combining a denoising autoencoder and a supervised learning classifier to identify gene signatures related to asthma severity. Using the trained autoencoder with 50 hidden units, we found that hierarchical clustering on the low-dimensional embedding corresponds well with previously defined and clinically relevant clusters of patients. Moreover, each hidden unit has contributions from each of the genes, and pathway analysis of these contributions shows that the hidden units are significantly enriched in known asthma-related pathways. We then used genes that contribute most to the hidden units to develop a secondary random-forest classifier for directly predicting asthma severity. The feature importance metric from this classifier identified a signature based on 50 key genes, which are associated with severity. Furthermore, we can use these key genes to successfully estimate FEV1/FVC ratios across patients, via support-vector-machine regression. Conclusion: We found that the denoising autoencoder framework can extract meaningful patterns corresponding to functional gene groups and patient clusters from the gene expression of asthma patients.

Download Full-text

Latent-space embedding of expression data identifies gene signatures from sputum samples of asthmatic patients

BMC Bioinformatics ◽

10.1186/s12859-020-03785-y ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Shaoke Lou ◽

Tianxiao Li ◽

Daniel Spakowicz ◽

Xiting Yan ◽

Geoffrey Lowell Chupp ◽

...

Keyword(s):

Gene Expression ◽

Asthma Severity ◽

Support Vector ◽

Expression Data ◽

Denoising Autoencoder ◽

Gene Signatures ◽

Asthmatic Patients ◽

Latent Space ◽

Key Genes ◽

Low Dimensional

Abstract Background The pathogenesis of asthma is a complex process involving multiple genes and pathways. Identifying biomarkers from asthma datasets, especially those that include heterogeneous subpopulations, is challenging. Potentially, autoencoders provide ideal frameworks for such tasks as they can embed complex, noisy high-dimensional gene expression data into a low-dimensional latent space in an unsupervised fashion, enabling us to extract distinguishing features from expression data. Results Here, we developed a framework combining a denoising autoencoder and a supervised learning classifier to identify gene signatures related to asthma severity. Using the trained autoencoder with 50 hidden units, we found that hierarchical clustering on the low-dimensional embedding corresponds well with previously defined and clinically relevant clusters of patients. Moreover, each hidden unit has contributions from each of the genes, and pathway analysis of these contributions shows that the hidden units are significantly enriched in known asthma-related pathways. We then used genes that contribute most to the hidden units to develop a secondary random-forest classifier for directly predicting asthma severity. The feature importance metric from this classifier identified a signature based on 50 key genes, which are associated with severity. Furthermore, we can use these key genes to successfully estimate FEV1/FVC ratios across patients, via support-vector-machine regression. Conclusion We found that the denoising autoencoder framework can extract meaningful patterns corresponding to functional gene groups and patient clusters from the gene expression of asthma patients.

Download Full-text

Latent-space embedding of expression data identifies gene signatures from sputum samples of asthmatic patients

10.21203/rs.2.21701/v4 ◽

2020 ◽

Author(s):

Shaoke LOU ◽

Tianxiao Li ◽

Daniel Spakowicz ◽

Xiting Yan ◽

Geoffrey Lowell Chupp ◽

...

Keyword(s):

Gene Expression ◽

Asthma Severity ◽

Support Vector ◽

Expression Data ◽

Denoising Autoencoder ◽

Gene Signatures ◽

Asthmatic Patients ◽

Latent Space ◽

Key Genes ◽

Low Dimensional

Abstract Background: The pathogenesis of asthma is a complex process involving multiple genes and pathways. Identifying biomarkers from asthma datasets, especially those that include heterogeneous subpopulations, is challenging. Potentially, autoencoders provide ideal frameworks for such tasks as they can embed complex, noisy high-dimensional gene expression data into a low-dimensional latent space in an unsupervised fashion, enabling us to extract distinguishing features from expression data.Results: Here, we developed a framework combining a denoising autoencoder and a supervised learning classifier to identify gene signatures related to asthma severity. Using the trained autoencoder with 50 hidden units, we found that hierarchical clustering on the low-dimensional embedding corresponds well with previously defined and clinically relevant clusters of patients. Moreover, each hidden unit has contributions from each of the genes, and pathway analysis of these contributions shows that the hidden units are significantly enriched in known asthma-related pathways. We then used genes that contribute most to the hidden units to develop a secondary random-forest classifier for directly predicting asthma severity. The feature importance metric from this classifier identified a signature based on 50 key genes, which are associated with severity. Furthermore, we can use these key genes to successfully estimate FEV1/FVC ratios across patients, via support-vector-machine regression. Conclusion: We found that the denoising autoencoder framework can extract meaningful patterns corresponding to functional gene groups and patient clusters from the gene expression of asthma patients.

Download Full-text

Latent-space embedding of expression data identifies gene signatures from sputum samples of asthmatic patients

10.21203/rs.2.21701/v2 ◽

2020 ◽

Author(s):

Shaoke LOU ◽

Tianxiao Li ◽

Daniel Spakowicz ◽

Xiting Yan ◽

Geoffrey Lowell Chupp ◽

...

Keyword(s):

Gene Expression ◽

Biological Networks ◽

Asthma Severity ◽

Support Vector ◽

Expression Data ◽

Denoising Autoencoder ◽

Gene Signatures ◽

Latent Space ◽

Key Genes ◽

Low Dimensional

Abstract Background: The pathogenesis of asthma is a complex process involving multiple genes and pathways. Identifying biomarkers from asthma datasets, especially those that include heterogeneous subpopulations, is challenging. Potentially, autoencoders provide ideal frameworks for such tasks as they can embed high-dimensional gene expression data into a low-dimensional latent space in an unsupervised fashion, enabling us to extract distinguishing features from expression data.Results: Here, we developed a framework combining a denoising autoencoder and a supervised learning classifier to identify gene signatures related to asthma severity. Using the trained autoencoder with 50 hidden units, we found that hierarchical clustering on the low-dimensional embedding corresponds well with previously defined and clinically relevant clusters of patients. Moreover, each hidden unit has contributions from each of the genes, and pathway analysis of these contributions shows that the hidden units are significantly enriched in known asthma-related pathways. We then used genes that contribute most to the hidden units to develop a secondary random-forest classifier for directly predicting asthma severity. The feature importance metric from this classifier identified a signature based on 50 key genes, which can predict severity with an AUROC of 0.81 and thus have potential as diagnostic biomarkers. Furthermore, we could use these key genes for successfully estimating, via support-vector-machine regression, the FEV1/FVC ratios across patients, achieving pre- and post-treatment correlations of 0.56 and 0.65, respectively (between predicted and observed values). Conclusion: We found that the denoising autoencoder framework can extract meaningful patterns corresponding to functional gene groups and patient clusters from the gene expression of asthma patients. Specifically, from the top-weighted gene set in the hidden units, we identified 50 genes that are predictive of asthma severity and other relevant clinical traits. Furthermore, we found that these genes play central roles in biological networks and pathways. Thus, we believe that they could provide a potential source for biomarkers for asthma severity.

Download Full-text

Random-Forest (RF) and Support Vector Machine (SVM) Implementation for Analysis of Gene Expression Data in Chronic Kidney Disease (CKD)

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/546/5/052066 ◽

2019 ◽

Vol 546 ◽

pp. 052066 ◽

Cited By ~ 1

Author(s):

Zuherman Rustam ◽

Ely Sudarsono ◽

Devvi Sarwinda

Keyword(s):

Gene Expression ◽

Chronic Kidney Disease ◽

Support Vector Machine ◽

Random Forest ◽

Kidney Disease ◽

Gene Expression Data ◽

Support Vector ◽

Expression Data

Download Full-text

Inference of Genetic Networks From Time-Series and Static Gene Expression Data: Combining a Random-Forest-Based Inference Method With Feature Selection Methods

Frontiers in Genetics ◽

10.3389/fgene.2020.595912 ◽

2020 ◽

Vol 11 ◽

Author(s):

Shuhei Kimura ◽

Ryo Fukutomi ◽

Masato Tokuhisa ◽

Mariko Okada

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Random Forest ◽

Gene Expression Data ◽

Computational Cost ◽

Expression Data ◽

Selection Methods ◽

Inference Method ◽

Combined Application ◽

Inference Methods

Several researchers have focused on random-forest-based inference methods because of their excellent performance. Some of these inference methods also have a useful ability to analyze both time-series and static gene expression data. However, they are only of use in ranking all of the candidate regulations by assigning them confidence values. None have been capable of detecting the regulations that actually affect a gene of interest. In this study, we propose a method to remove unpromising candidate regulations by combining the random-forest-based inference method with a series of feature selection methods. In addition to detecting unpromising regulations, our proposed method uses outputs from the feature selection methods to adjust the confidence values of all of the candidate regulations that have been computed by the random-forest-based inference method. Numerical experiments showed that the combined application with the feature selection methods improved the performance of the random-forest-based inference method on 99 of the 100 trials performed on the artificial problems. However, the improvement tends to be small, since our combined method succeeded in removing only 19% of the candidate regulations at most. The combined application with the feature selection methods moreover makes the computational cost higher. While a bigger improvement at a lower computational cost would be ideal, we see no impediments to our investigation, given that our aim is to extract as much useful information as possible from a limited amount of gene expression data.

Download Full-text

Transcription factor regulation can be accurately predicted from the presence of target gene signatures in microarray gene expression data

Nucleic Acids Research ◽

10.1093/nar/gkq149 ◽

2010 ◽

Vol 38 (11) ◽

pp. e120-e120 ◽

Cited By ~ 134

Author(s):

Ahmed Essaghir ◽

Federica Toffalini ◽

Laurent Knoops ◽

Anders Kallin ◽

Jacques van Helden ◽

...

Keyword(s):

Gene Expression ◽

Transcription Factor ◽

Gene Expression Data ◽

Target Gene ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Microarray Gene Expression ◽

Gene Signatures ◽

Transcription Factor Regulation ◽

Microarray Gene

Download Full-text

Multidimensional support vector machines for visualization of gene expression data

Proceedings of the 2004 ACM symposium on Applied computing - SAC '04 ◽

10.1145/967900.967936 ◽

2004 ◽

Cited By ~ 5

Author(s):

Daisuke Komura ◽

Hiroshi Nakamura ◽

Shuichi Tsutsumi ◽

Hiroyuki Aburatani ◽

Sigeo Ihara

Keyword(s):

Gene Expression ◽

Support Vector Machines ◽

Gene Expression Data ◽

Support Vector ◽

Expression Data ◽

Vector Machines

Download Full-text

Feature Selection and Ranking of Key Genes for Tumor Classification: Using Microarray Gene Expression Data

Artificial Intelligence and Soft Computing – ICAISC 2006 - Lecture Notes in Computer Science ◽

10.1007/11785231_100 ◽

2006 ◽

pp. 951-961 ◽

Cited By ~ 4

Author(s):

Srinivas Mukkamala ◽

Qingzhong Liu ◽

Rajeev Veeraghattam ◽

Andrew H. Sung

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Microarray Gene Expression Data ◽

Tumor Classification ◽

Expression Data ◽

Microarray Gene Expression ◽

Microarray Gene ◽

Key Genes ◽

Selection And Ranking

Download Full-text