scholarly journals PhenotypeXpression: sub-classification of disease states using public gene expression data and literature

2018 ◽  
Author(s):  
Lucy Lu Wang ◽  
Huaiying Lin ◽  
Xiaojun Bao ◽  
Subhajit Sengupta ◽  
Ben Busby ◽  
...  

AbstractThe success of personalized medicine relies on proper disease classification and subtyping. Differential gene expression among disease subtypes can have a significant impact on treatment effect. This complicates the role of clinicians seeking more tailored diagnoses in cases where granular disease subtypes are not well defined. PhenotypeXpression (PhenoX) is a tool for rapid disease subtyping using publicly available gene expression data and literature. PhenoX aggregates and clusters gene expression data to determine potential disease subtypes, and develops a phenotypic profile for each subtype using term co-occurrences in published literature. Although the availability of public gene expression data is limited, we are able to observe clearly defined subtypes for several conditions.

2021 ◽  
Author(s):  
Richard R Green ◽  
Renee C Ireton ◽  
Martin Ferris ◽  
Kathleen Muenzen ◽  
David R Crosslin ◽  
...  

To understand the role of genetic variation in SARS and Influenza infections we developed CCFEA, a shiny visualization tool using public RNAseq data from the collaborative cross (CC) founder strains (A/J, C57BL/6J, 129s1/SvImJ, NOD/ShILtJ, NZO/HILtJ, CAST/EiJ, PWK/PhJ, and WSB/EiJ). Individual gene expression data is displayed across founders, viral infections and days post infection.


Author(s):  
Jirí Kléma ◽  
Filip Železný ◽  
Igor Trajkovski ◽  
Filip Karel ◽  
Bruno Crémilleux

This chapter points out the role of genomic background knowledge in gene expression data mining. The authors demonstrate its application in several tasks such as relational descriptive analysis, constraintbased knowledge discovery, feature selection and construction or quantitative association rule mining. The chapter also accentuates diversity of background knowledge. In genomics, it can be stored in formats such as free texts, ontologies, pathways, links among biological entities, and many others. The authors hope that understanding of automated integration of heterogeneous data sources helps researchers to reach compact and transparent as well as biologically valid and plausible results of their gene-expression data analysis.


2021 ◽  
Author(s):  
Magdalena Navarro ◽  
T Ian Simpson

AbstractMotivationAutism spectrum disorder (ASD) has a strong, yet heterogeneous, genetic component. Among the various methods that are being developed to help reveal the underlying molecular aetiology of the disease, one that is gaining popularity is the combination of gene expression and clinical genetic data. For ASD, the SFARI-gene database comprises lists of curated genes in which presumed causative mutations have been identified in patients. In order to predict novel candidate SFARI-genes we built classification models combining differential gene expression data for ASD patients and unaffected individuals with a gene’s status in the SFARI-gene list.ResultsSFARI-genes were not found to be significantly associated with differential gene expression patterns, nor were they enriched in gene co-expression network modules that had a strong correlation with ASD diagnosis. However, network analysis and machine learning models that incorporate information from the whole gene co-expression network were able to predict novel candidate genes that share features of existing SFARI genes and have support for roles in ASD in the literature. We found a statistically significant bias related to the absolute level of gene expression for existing SFARI genes and their scores. It is essential that this bias be taken into account when studies interpret ASD gene expression data at gene, module and whole-network levels.AvailabilitySource code is available from GitHub (https://doi.org/10.5281/zenodo.4463693) and the accompanying data from The University of Edinburgh DataStore (https://doi.org/10.7488/ds/2980)[email protected]


2019 ◽  
Author(s):  
Pei-Yau Lung ◽  
Xiaodong Pang ◽  
Yan Li ◽  
Jinfeng Zhang

AbstractReusability is part of the FAIR data principle, which aims to make data Findable, Accessible, Interoperable, and Reusable. One of the current efforts to increase the reusability of public genomics data has been to focus on the inclusion of quality metadata associated with the data. When necessary metadata are missing, most researchers will consider the data useless. In this study, we develop a framework to predict the missing metadata of gene expression datasets to maximize their reusability. We propose a new metric called Proportion of Cases Accurately Predicted (PCAP), which is optimized in our specifically-designed machine learning pipeline. The new approach performed better than pipelines using commonly used metrics such as F1-score in terms of maximizing the reusability of data with missing values. We also found that different variables might need to be predicted using different machine learning methods and/or different data processing protocols. Using differential gene expression analysis as an example, we show that when missing variables are accurately predicted, the corresponding gene expression data can be reliably used in downstream analyses.


Sign in / Sign up

Export Citation Format

Share Document