OncoMX: A Knowledgebase for Exploring Cancer Biomarkers in the Context of Related Cancer and Healthy Data

JCO Clinical Cancer Informatics ◽

10.1200/cci.19.00117 ◽

2020 ◽

pp. 210-220 ◽

Cited By ~ 2

Author(s):

Hayley M. Dingerdissen ◽

Frederic Bastian ◽

K. Vijay-Shanker ◽

Marc Robinson-Rechavi ◽

Amanda Bell ◽

...

Keyword(s):

Gene Expression ◽

Differential Expression ◽

Cancer Biomarkers ◽

Cancer Biomarker ◽

Research Network ◽

Multidimensional Data ◽

Literature Mining ◽

Data Types ◽

Cancer Mutation ◽

Biomarker Data

PURPOSE The purpose of OncoMX 1 knowledgebase development was to integrate cancer biomarker and relevant data types into a meta-portal, enabling the research of cancer biomarkers side by side with other pertinent multidimensional data types. METHODS Cancer mutation, cancer differential expression, cancer expression specificity, healthy gene expression from human and mouse, literature mining for cancer mutation and cancer expression, and biomarker data were integrated, unified by relevant biomedical ontologies, and subjected to rule-based automated quality control before ingestion into the database. RESULTS OncoMX provides integrated data encompassing more than 1,000 unique biomarker entries (939 from the Early Detection Research Network [EDRN] and 96 from the US Food and Drug Administration) mapped to 20,576 genes that have either mutation or differential expression in cancer. Sentences reporting mutation or differential expression in cancer were extracted from more than 40,000 publications, and healthy gene expression data with samples mapped to organs are available for both human genes and their mouse orthologs. CONCLUSION OncoMX has prioritized user feedback as a means of guiding development priorities. By mapping to and integrating data from several cancer genomics resources, it is hoped that OncoMX will foster a dynamic engagement between bioinformaticians and cancer biomarker researchers. This engagement should culminate in a community resource that substantially improves the ability and efficiency of exploring cancer biomarker data and related multidimensional data.

Download Full-text

A guide to creating design matrices for gene expression experiments

F1000Research ◽

10.12688/f1000research.27893.1 ◽

2020 ◽

Vol 9 ◽

pp. 1444

Author(s):

Charity W. Law ◽

Kathleen Zeglinski ◽

Xueyi Dong ◽

Monther Alhamdoosh ◽

Gordon K. Smyth ◽

...

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Differential Expression ◽

Rna Sequencing ◽

Expression Analysis ◽

Graphical Representation ◽

Differential Expression Analysis ◽

Data Types ◽

Software Packages ◽

Set Up

Differential expression analysis of genomic data types, such as RNA-sequencing experiments, use linear models to determine the size and direction of the changes in gene expression. For RNA-sequencing, there are several established software packages for this purpose accompanied with analysis pipelines that are well described. However, there are two crucial steps in the analysis process that can be a stumbling block for many -- the set up an appropriate model via design matrices and the set up of comparisons of interest via contrast matrices. These steps are particularly troublesome because an extensive catalogue for design and contrast matrices does not currently exist. One would usually search for example case studies across different platforms and mix and match the advice from those sources to suit the dataset they have at hand. This article guides the reader through the basics of how to set up design and contrast matrices. We take a practical approach by providing code and graphical representation of each case study, starting with simpler examples (e.g. models with a single explanatory variable) and move onto more complex ones (e.g. interaction models, mixed effects models, higher order time series and cyclical models). Although our work has been written specifically with a limma-style pipeline in mind, most of it is also applicable to other software packages for differential expression analysis, and the ideas covered can be adapted to data analysis of other high-throughput technologies. Where appropriate, we explain the interpretation and differences between models to aid readers in their own model choices. Unnecessary jargon and theory is omitted where possible so that our work is accessible to a wide audience of readers, from beginners to those with experience in genomics data analysis.

Download Full-text

Subtyping of common complex diseases and disorders by integrating heterogeneous data. Identifying clusters among women with lower urinary tract symptoms in the LURN study

10.1101/2021.09.17.21263124 ◽

2021 ◽

Author(s):

Victor P. Andreev ◽

Margaret E. Helmuth ◽

Gang Liu ◽

Abigail R. Smith ◽

Robert M. Merion ◽

...

Keyword(s):

Urinary Tract ◽

Lower Urinary Tract Symptoms ◽

Lower Urinary Tract ◽

Heterogeneous Data ◽

Research Network ◽

Lower Urinary Tract Dysfunction ◽

Multidimensional Data ◽

Data Types ◽

Urinary Tract Symptoms ◽

Lower Urinary

ABSTRACTWe present a novel methodology for subtyping of persons with a common clinical symptom complex by integrating heterogeneous continuous and categorical data. We illustrate it by clustering women with lower urinary tract symptoms (LUTS), who represent a heterogeneous cohort with overlapping symptoms and multifactorial etiology. Identifying subtypes within this group would potentially lead to better diagnosis and treatment decision-making. Data collected in the Symptoms of Lower Urinary Tract Dysfunction Research Network (LURN), a multi-center prospective observational cohort study, included self-reported urinary and non-urinary symptoms, bladder diaries, and physical examination data for 545 women. Heterogeneity in these multidimensional data required thorough and non-trivial preprocessing, including scaling by controls and weighting to mitigate data redundancy, while the various data types (continuous and categorical) required novel methodology using a weighted Tanimoto indices approach. Data domains only available on a subset of the cohort were integrated using a semi-supervised clustering approach. Novel contrast criterion for determination of the optimal number of clusters in consensus clustering was introduced and compared with existing criteria. Distinctiveness of the clusters was confirmed by using multiple criteria for cluster quality, and by testing for significantly different variables in pairwise comparisons of the clusters. Cluster dynamics were explored by analyzing longitudinal data at 3- and 12-month follow-up. Five distinct clusters of women with LUTS were identified using the developed methodology. The clinical relevance of the identified clusters is discussed and compared with the current conventional approaches to the evaluation of LUTS patients. Rationale and thought process are described for selection of procedures for data preprocessing, clustering, and cluster evaluation. Suggestions are provided for minimum reporting requirements in publications utilizing clustering methodology with multiple heterogeneous data domains.

Download Full-text

Prediction of cancer mutation states using multiple data modalities reveals the utility and consistency of gene expression and DNA methylation

10.1101/2021.10.27.466140 ◽

2021 ◽

Author(s):

Jake Crawford ◽

Brock C Christensen ◽

Maria Chikina ◽

Casey S Greene

Keyword(s):

Gene Expression ◽

Dna Methylation ◽

Rna Sequencing ◽

Genetic Alterations ◽

Cellular Function ◽

Data Type ◽

Omics Data ◽

Data Types ◽

Cancer Mutation ◽

Combining Data

In studies of cellular function in cancer, researchers are increasingly able to choose from many -omics assays as functional readouts. Choosing the correct readout for a given study can be difficult, and which layer of cellular function is most suitable to capture the relevant signal may be unclear. In this study, we consider prediction of cancer mutation status (presence or absence) from functional -omics data as a representative problem. Since functional signatures of cancer mutation have been identified across many data types, this problem presents an opportunity to quantify and compare the ability of different -omics readouts to capture signals of dysregulation in cancer. The TCGA Pan-Cancer Atlas contains genetic alteration data including somatic mutations and copy number variants (CNVs), as well as several -omics data types. From TCGA, we focus on RNA sequencing, DNA methylation arrays, reverse phase protein arrays (RPPA), microRNA, and somatic mutational signatures as -omics readouts. Across a collection of cancer-associated genetic alterations, RNA sequencing and DNA methylation were the most effective predictors of alteration state. Surprisingly, we found that for most alterations, they were approximately equally effective predictors. The target gene was the primary driver of performance, rather than the data type, and there was little difference between the top data types for the majority of genes. We also found that combining data types into a single multi-omics model often provided little or no improvement in predictive ability over the best individual data type. Based on our results, for the design of studies focused on the functional outcomes of cancer mutations, we recommend focusing on gene expression or DNA methylation as first-line readouts.

Download Full-text

MiR-21 in the Cancers of the Digestive System and Its Potential Role as a Diagnostic, Predictive, and Therapeutic Biomarker

Biology ◽

10.3390/biology10050417 ◽

2021 ◽

Vol 10 (5) ◽

pp. 417

Author(s):

Ha Thi Nguyen ◽

Salah Eddine Oussama Kacimi ◽

Truc Ly Nguyen ◽

Kamrul Hassan Suman ◽

Roselyn Lemus-Martin ◽

...

Keyword(s):

Digestive System ◽

Potential Role ◽

Target Genes ◽

Prognostic Biomarker ◽

Cancer Biomarkers ◽

Cancer Biomarker ◽

Molecular Networks ◽

Therapeutic Tool ◽

Comprehensive Review ◽

Non Coding Rnas

MicroRNAs (miRNAs) are small non-coding RNAs. They can regulate the expression of their target genes, and thus, their dysregulation significantly contributes to the development of cancer. Growing evidence suggests that miRNAs could be used as cancer biomarkers. As an oncogenic miRNA, the roles of miR-21 as a diagnostic and prognostic biomarker, and its therapeutic applications have been extensively studied. In this review, the roles of miR-21 are first demonstrated via its different molecular networks. Then, a comprehensive review on the potential targets and the current applications as a diagnostic and prognostic cancer biomarker and the therapeutic roles of miR-21 in six different cancers in the digestive system is provided. Lastly, a brief discussion on the challenges for the use of miR-21 as a therapeutic tool for these cancers is added.

Download Full-text

The Detection and Bioinformatic Analysis of Alternative 3′ UTR Isoforms as Potential Cancer Biomarkers

International Journal of Molecular Sciences ◽

10.3390/ijms22105322 ◽

2021 ◽

Vol 22 (10) ◽

pp. 5322

Author(s):

Nitika Kandhari ◽

Calvin A. Kraupner-Taylor ◽

Paul F. Harrison ◽

David R. Powell ◽

Traude H. Beilharz

Keyword(s):

Gene Expression ◽

Cell Transformation ◽

Alternative Polyadenylation ◽

Cancer Biomarkers ◽

Bioinformatic Analysis ◽

Alternative Transcript ◽

Clinical Parameters ◽

Transcript Cleavage ◽

Cleavage And Polyadenylation ◽

Potential Cancer

Alternative transcript cleavage and polyadenylation is linked to cancer cell transformation, proliferation and outcome. This has led researchers to develop methods to detect and bioinformatically analyse alternative polyadenylation as potential cancer biomarkers. If incorporated into standard prognostic measures such as gene expression and clinical parameters, these could advance cancer prognostic testing and possibly guide therapy. In this review, we focus on the existing methodologies, both experimental and computational, that have been applied to support the use of alternative polyadenylation as cancer biomarkers.

Download Full-text

High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis

Human Genomics ◽

10.1186/s40246-021-00308-5 ◽

2021 ◽

Vol 15 (1) ◽

Author(s):

Weitong Cui ◽

Huaru Xue ◽

Lei Wei ◽

Jinghua Jin ◽

Xuewen Tian ◽

...

Keyword(s):

Gene Expression ◽

Differential Expression ◽

Small Sample ◽

Differentially Expressed ◽

Cancer Type ◽

Rna Seq ◽

Sample Sizes ◽

Large Sample ◽

Expression Levels ◽

Gene Expression Levels

Abstract Background RNA sequencing (RNA-Seq) has been widely applied in oncology for monitoring transcriptome changes. However, the emerging problem that high variation of gene expression levels caused by tumor heterogeneity may affect the reproducibility of differential expression (DE) results has rarely been studied. Here, we investigated the reproducibility of DE results for any given number of biological replicates between 3 and 24 and explored why a great many differentially expressed genes (DEGs) were not reproducible. Results Our findings demonstrate that poor reproducibility of DE results exists not only for small sample sizes, but also for relatively large sample sizes. Quite a few of the DEGs detected are specific to the samples in use, rather than genuinely differentially expressed under different conditions. Poor reproducibility of DE results is mainly caused by high variation of gene expression levels for the same gene in different samples. Even though biological variation may account for much of the high variation of gene expression levels, the effect of outlier count data also needs to be treated seriously, as outlier data severely interfere with DE analysis. Conclusions High heterogeneity exists not only in tumor tissue samples of each cancer type studied, but also in normal samples. High heterogeneity leads to poor reproducibility of DEGs, undermining generalization of differential expression results. Therefore, it is necessary to use large sample sizes (at least 10 if possible) in RNA-Seq experimental designs to reduce the impact of biological variability and DE results should be interpreted cautiously unless soundly validated.

Download Full-text

Discovery of prostate cancer biomarkers by microarray gene expression profiling

Expert Review of Molecular Diagnostics ◽

10.1586/erm.09.74 ◽

2010 ◽

Vol 10 (1) ◽

pp. 49-64 ◽

Cited By ~ 49

Author(s):

Karina Dalsgaard Sørensen ◽

Torben Falck Ørntoft

Keyword(s):

Gene Expression ◽

Prostate Cancer ◽

Gene Expression Profiling ◽

Expression Profiling ◽

Cancer Biomarkers ◽

Microarray Gene Expression ◽

Microarray Gene

Download Full-text

Content-based search of gene expression databases using binary fingerprints of differential expression profiles

Network Modeling Analysis in Health Informatics and Bioinformatics ◽

10.1007/s13721-015-0076-3 ◽

2015 ◽

Vol 4 (1) ◽

Author(s):

Francis Bell ◽

Ahmet Sacan

Keyword(s):

Gene Expression ◽

Differential Expression ◽

Expression Profiles ◽

Binary Fingerprints

Download Full-text

Identification of unique venous thromboembolism-susceptibility variants in African-Americans

Thrombosis and Haemostasis ◽

10.1160/th16-08-0652 ◽

2017 ◽

Vol 117 (04) ◽

pp. 758-768 ◽

Cited By ~ 16

Author(s):

Sebastian Armasu ◽

Bryan McCauley ◽

Iftikhar Kullo ◽

Hugues Sicotte ◽

Jyotishman Pathak ◽

...

Keyword(s):

Gene Expression ◽

Venous Thromboembolism ◽

African Americans ◽

Differential Expression ◽

White Women ◽

Genome Wide Association Study ◽

Expression Data ◽

Significant Differential Expression ◽

Genome Wide ◽

A Genome

SummaryTo identify novel single nucleotide polymorphisms (SNPs) associated with venous thromboembolism (VTE) in African-Americans (AAs), we performed a genome-wide association study (GWAS) of VTE in AAs using the Electronic Medical Records and Genomics (eMERGE) Network, comprised of seven sites each with DNA biobanks (total ~39,200 unique DNA samples) with genome-wide SNP data (imputed to 1000 Genomes Project cosmopolitan reference panel) and linked to electronic health records (EHRs). Using a validated EHR-driven phenotype extraction algorithm, we identified VTE cases and controls and tested for an association between each SNP and VTE using unconditional logistic regression, adjusted for age, sex, stroke, site-platform combination and sickle cell risk genotype. Among 393 AA VTE cases and 4,941 AA controls, three intragenic SNPs reached genome-wide significance: LEMD3 rs138916004 (OR=3.2; p=1.3E-08), LY86 rs3804476 (OR=1.8; p=2E-08) and LOC100130298 rs142143628 (OR=4.5; p=4.4E-08); all three SNPs validated using internal cross-validation, parametric bootstrap and meta-analysis methods. LEMD3 rs138916004 and LOC100130298 rs142143628 are only present in Africans (1000G data). LEMD3 showed a significant differential expression in both NCBI Gene Expression Omnibus (GEO) and the Mayo Clinic gene expression data, LOC100130298 showed a significant differential expression only in the GEO expression data, and LY86 showed a significant differential expression only in the Mayo expression data. LEMD3 encodes for an antagonist of TGF-β-induced cell proliferation arrest. LY86 encodes for MD-1 which down-regulates the pro-inflammatory response to lipopolysaccharide; LY86 variation was previously associated with VTE in white women; LOC100130298 is a non-coding RNA gene with unknown regulatory activity in gene expression and epigenetics.Supplementary Material to this article is available online at www.thrombosis-online.com.

Download Full-text

Microarray analysis reveals novel gene expression changes associated with erectile dysfunction in diabetic rats

Physiological Genomics ◽

10.1152/physiolgenomics.00112.2005 ◽

2005 ◽

Vol 23 (2) ◽

pp. 192-205 ◽

Cited By ~ 34

Author(s):

Chris J. Sullivan ◽

Thomas H. Teal ◽

Ian P. Luttrell ◽

Khoa B. Tran ◽

Mette A. Peters ◽

...

Keyword(s):

Gene Expression ◽

Smooth Muscle ◽

Erectile Dysfunction ◽

Splice Variants ◽

Full Range ◽

Diabetic Rats ◽

Differentially Expressed ◽

Diabetic Rat ◽

Literature Mining ◽

Mrna Levels

To investigate the full range of molecular changes associated with erectile dysfunction (ED) in Type 1 diabetes, we examined alterations in penile gene expression in streptozotocin-induced diabetic rats and littermate controls. With the use of Affymetrix GeneChip arrays and statistical filtering, 529 genes/transcripts were considered to be differentially expressed in the diabetic rat cavernosum compared with control. Gene Ontology (GO) classification indicated that there was a decrease in numerous extracellular matrix genes (e.g., collagen and elastin related) and an increase in oxidative stress-associated genes in the diabetic rat cavernosum. In addition, PubMatrix literature mining identified differentially expressed genes previously shown to mediate vascular dysfunction [e.g., ceruloplasmin ( Cp), lipoprotein lipase, and Cd36] as well as genes involved in the modulation of the smooth muscle phenotype (e.g., Kruppel-like factor 5 and chemokine C-X3-C motif ligand 1). Real-time PCR was used to confirm changes in expression for 23 relevant genes. Further validation of Cp expression in the diabetic rat cavernosum demonstrated increased mRNA levels of the secreted and anchored splice variants of Cp. CP protein levels showed a 1.9-fold increase in tissues from diabetic rats versus controls. Immunohistochemistry demonstrated localization of CP protein in cavernosal sinusoids of control and diabetic animals, including endothelial and smooth muscle layers. Overall, this study broadens the scope of candidate genes and pathways that may be relevant to the pathophysiology of diabetes-induced ED as well as highlights the potential complexity of this disorder.

Download Full-text