PhenotypeXpression: sub-classification of disease states using public gene expression data and literature

Mapping Intimacies ◽

10.1101/461301 ◽

2018 ◽

Author(s):

Lucy Lu Wang ◽

Huaiying Lin ◽

Xiaojun Bao ◽

Subhajit Sengupta ◽

Ben Busby ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Disease Classification ◽

Expression Data ◽

Phenotypic Profile ◽

Disease States ◽

Impact On Treatment ◽

Differential Gene ◽

Disease Subtypes

AbstractThe success of personalized medicine relies on proper disease classification and subtyping. Differential gene expression among disease subtypes can have a significant impact on treatment effect. This complicates the role of clinicians seeking more tailored diagnoses in cases where granular disease subtypes are not well defined. PhenotypeXpression (PhenoX) is a tool for rapid disease subtyping using publicly available gene expression data and literature. PhenoX aggregates and clusters gene expression data to determine potential disease subtypes, and develops a phenotypic profile for each subtype using term co-occurrences in published literature. Although the availability of public gene expression data is limited, we are able to observe clearly defined subtypes for several conditions.

Download Full-text

Collaborative Cross Founder Expression Analysis (CCFEA)

10.1101/2021.03.04.422591 ◽

2021 ◽

Author(s):

Richard R Green ◽

Renee C Ireton ◽

Martin Ferris ◽

Kathleen Muenzen ◽

David R Crosslin ◽

...

Keyword(s):

Gene Expression ◽

Genetic Variation ◽

Gene Expression Data ◽

Viral Infections ◽

Collaborative Cross ◽

Visualization Tool ◽

Expression Data ◽

Individual Gene ◽

Rnaseq Data

To understand the role of genetic variation in SARS and Influenza infections we developed CCFEA, a shiny visualization tool using public RNAseq data from the collaborative cross (CC) founder strains (A/J, C57BL/6J, 129s1/SvImJ, NOD/ShILtJ, NZO/HILtJ, CAST/EiJ, PWK/PhJ, and WSB/EiJ). Individual gene expression data is displayed across founders, viral infections and days post infection.

Download Full-text

Gene Expression Mining Guided by Background Knowledge

Data Mining and Medical Knowledge Management ◽

10.4018/978-1-60566-218-3.ch013 ◽

2011 ◽

pp. 268-292

Author(s):

Jirí Kléma ◽

Filip Železný ◽

Igor Trajkovski ◽

Filip Karel ◽

Bruno Crémilleux

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Descriptive Analysis ◽

Background Knowledge ◽

Heterogeneous Data ◽

Expression Data ◽

Heterogeneous Data Sources ◽

Biological Entities ◽

Quantitative Association Rule

This chapter points out the role of genomic background knowledge in gene expression data mining. The authors demonstrate its application in several tasks such as relational descriptive analysis, constraintbased knowledge discovery, feature selection and construction or quantitative association rule mining. The chapter also accentuates diversity of background knowledge. In genomics, it can be stored in formats such as free texts, ontologies, pathways, links among biological entities, and many others. The authors hope that understanding of automated integration of heterogeneous data sources helps researchers to reach compact and transparent as well as biologically valid and plausible results of their gene-expression data analysis.

Download Full-text

Research on Disease Classification Model and Algorithms Based on Gene Expression Data

2019 3rd International Conference on Data Science and Business Analytics (ICDSBA) ◽

10.1109/icdsba48748.2019.00055 ◽

2019 ◽

Author(s):

Yue Li ◽

Changyin Zhou

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Disease Classification ◽

Classification Model ◽

Expression Data

Download Full-text

Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data

BMC Bioinformatics ◽

10.1186/1471-2105-8-67 ◽

2007 ◽

Vol 8 (1) ◽

Cited By ~ 13

Author(s):

Xin Zhao ◽

Leo Wang-Kit Cheung

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Gaussian Processes ◽

Microarray Gene Expression Data ◽

Disease Classification ◽

Expression Data ◽

Microarray Gene Expression ◽

Microarray Gene

Download Full-text

Analysis on Differential Gene Expression Data for Prediction of New Biological Features in Permanent Atrial Fibrillation

PLoS ONE ◽

10.1371/journal.pone.0076166 ◽

2013 ◽

Vol 8 (10) ◽

pp. e76166 ◽

Cited By ~ 7

Author(s):

Feng Ou ◽

Nini Rao ◽

Xudong Jiang ◽

Mengyao Qian ◽

Wei Feng ◽

...

Keyword(s):

Gene Expression ◽

Atrial Fibrillation ◽

Differential Gene Expression ◽

Gene Expression Data ◽

Expression Data ◽

Biological Features ◽

Permanent Atrial Fibrillation ◽

Differential Gene

Download Full-text

SFARI Genes and where to find them; classification modelling to identify genes associated with Autism Spectrum Disorder from RNA-seq data

10.1101/2021.01.29.428754 ◽

2021 ◽

Author(s):

Magdalena Navarro ◽

T Ian Simpson

Keyword(s):

Gene Expression ◽

Autism Spectrum Disorder ◽

Differential Gene Expression ◽

Gene Expression Data ◽

Gene List ◽

Autism Spectrum ◽

Spectrum Disorder ◽

Expression Data ◽

Link Type ◽

Differential Gene

AbstractMotivationAutism spectrum disorder (ASD) has a strong, yet heterogeneous, genetic component. Among the various methods that are being developed to help reveal the underlying molecular aetiology of the disease, one that is gaining popularity is the combination of gene expression and clinical genetic data. For ASD, the SFARI-gene database comprises lists of curated genes in which presumed causative mutations have been identified in patients. In order to predict novel candidate SFARI-genes we built classification models combining differential gene expression data for ASD patients and unaffected individuals with a gene’s status in the SFARI-gene list.ResultsSFARI-genes were not found to be significantly associated with differential gene expression patterns, nor were they enriched in gene co-expression network modules that had a strong correlation with ASD diagnosis. However, network analysis and machine learning models that incorporate information from the whole gene co-expression network were able to predict novel candidate genes that share features of existing SFARI genes and have support for roles in ASD in the literature. We found a statistically significant bias related to the absolute level of gene expression for existing SFARI genes and their scores. It is essential that this bias be taken into account when studies interpret ASD gene expression data at gene, module and whole-network levels.AvailabilitySource code is available from GitHub (https://doi.org/10.5281/zenodo.4463693) and the accompanying data from The University of Edinburgh DataStore (https://doi.org/10.7488/ds/2980)[email protected]

Download Full-text

Maximizing the Reusability of Public Gene Expression Data by Predicting Missing Metadata

10.1101/792382 ◽

2019 ◽

Author(s):

Pei-Yau Lung ◽

Xiaodong Pang ◽

Yan Li ◽

Jinfeng Zhang

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Gene Expression Data ◽

Missing Values ◽

Expression Data ◽

New Approach ◽

Machine Learning Methods ◽

Differential Gene ◽

Missing Variables ◽

Better Than

AbstractReusability is part of the FAIR data principle, which aims to make data Findable, Accessible, Interoperable, and Reusable. One of the current efforts to increase the reusability of public genomics data has been to focus on the inclusion of quality metadata associated with the data. When necessary metadata are missing, most researchers will consider the data useless. In this study, we develop a framework to predict the missing metadata of gene expression datasets to maximize their reusability. We propose a new metric called Proportion of Cases Accurately Predicted (PCAP), which is optimized in our specifically-designed machine learning pipeline. The new approach performed better than pipelines using commonly used metrics such as F1-score in terms of maximizing the reusability of data with missing values. We also found that different variables might need to be predicted using different machine learning methods and/or different data processing protocols. Using differential gene expression analysis as an example, we show that when missing variables are accurately predicted, the corresponding gene expression data can be reliably used in downstream analyses.

Download Full-text

The ant colony algorithm for feature selection in high-dimension gene expression data for disease classification

Mathematical Medicine and Biology A Journal of the IMA ◽

10.1093/imammb/dqn001 ◽

2007 ◽

Vol 24 (4) ◽

pp. 413-426 ◽

Cited By ~ 30

Author(s):

K. R. Robbins ◽

W. Zhang ◽

J. K. Bertrand ◽

R. Rekaya

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

High Dimension ◽

Ant Colony Algorithm ◽

Ant Colony ◽

Disease Classification ◽

Expression Data

Download Full-text

A Systems Biology Strategy on Differential Gene Expression Data Discloses Some Biological Features of Atrial Fibrillation

PLoS ONE ◽

10.1371/journal.pone.0013668 ◽

2010 ◽

Vol 5 (10) ◽

pp. e13668 ◽

Cited By ~ 11

Author(s):

Federica Censi ◽

Giovanni Calcagnini ◽

Pietro Bartolini ◽

Alessandro Giuliani

Keyword(s):

Gene Expression ◽

Atrial Fibrillation ◽

Systems Biology ◽

Differential Gene Expression ◽

Gene Expression Data ◽

Expression Data ◽

Biological Features ◽

Differential Gene

Download Full-text

Directly labeled mRNA produces highly precise and unbiased differential gene expression data

Nucleic Acids Research ◽

10.1093/nar/gng013 ◽

2003 ◽

Vol 31 (4) ◽

pp. 13e-13 ◽

Cited By ~ 19

Author(s):

V. Gupta

Keyword(s):

Gene Expression ◽

Differential Gene Expression ◽

Gene Expression Data ◽

Expression Data ◽

Differential Gene

Download Full-text