On triangular inequalities of correlation-based distances for gene expression profiles

Mapping Intimacies ◽

10.1101/582106 ◽

2019 ◽

Cited By ~ 2

Author(s):

Chen Jiaxing ◽

Yen Kaow Ng ◽

Lu Lin ◽

Yiqi Jiang ◽

Shuaicheng Li

Keyword(s):

Gene Expression ◽

Expression Profiles ◽

Pearson Correlation ◽

Gene Expression Profiles ◽

Biological Data ◽

Distance Functions ◽

Gene Clustering ◽

Spearman Correlation ◽

Correlation Distance ◽

Absolute Correlation

Various distance functions for evaluating the differences between gene expression profiles have been proposed in the past. Such a function would output a low value if the profiles are strongly correlated—either negatively or positively—and vice versa. One popular distance function is the absolute correlation distance, da=1−|ρ|, where ρ is some similarity measures, such as Pearson or Spearman correlation. However, absolute correlation distance fails to fulfill the triangular inequality, which would have guaranteed better performance at vector quantization, allowed fast data localization, as well as sped up data clustering. In this work, we propose dr=√1−|ρ| as an alternative. We prove that dr satisfies the triangular equality when ρ represents Pearson correlation, Spearman correlation, or Cosine similarity. We empirically compared dr with da in gene clustering and sample clustering experiment, using real biological data. The two distances performed similarly in both gene cluster and sample cluster in hierarchical cluster and PAM cluster. However, dr demonstrated more robust clustering. According to bootstrap experiment, the number of times where dr generated more robust sample pair partition is significantly (p-value <0.05) larger. This advantage in robustness is also supported by the class "dissolved" event.

Download Full-text

Expression Concordance of 325 Novel RNA Biomarkers between Data Generated by NanoString nCounter and Affymetrix GeneChip

Disease Markers ◽

10.1155/2019/1940347 ◽

2019 ◽

Vol 2019 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Lucas Delmonico ◽

Said Attiya ◽

Joan W. Chen ◽

John C. Obenauer ◽

Edward C. Goodwin ◽

...

Keyword(s):

Gene Expression ◽

Expression Profiles ◽

Pearson Correlation ◽

Gene Expression Profiles ◽

Standard Operating Procedure ◽

Driver Genes ◽

Technology Platforms ◽

Formalin Fixed Paraffin ◽

Formalin Fixed Paraffin Embedded ◽

Rna Biomarkers

Background. With the development of new drug combinations and targeted treatments for multiple types of cancer, the ability to stratify categories of patient populations and to develop companion diagnostics has become increasingly important. A panel of 325 RNA biomarkers was selected based on cancer-related biological processes of healthy cells and gene expression changes over time during nonmalignant epithelial cell organization. This “cancer in reverse” approach resulted in a panel of biomarkers relevant for at least 7 cancer types, providing gene expression profiles representing key cellular signaling pathways beyond mutations in “driver genes.” Objective. To further investigate this biomarker panel, the objective of the current study is to (1) validate the assay reproducibility for the 325 RNA biomarkers and (2) compare gene expression profiles side by side using two technology platforms. Methods and Results. We have mapped the 325 RNA transcripts and in a custom NanoString nCounter expression panel to be compared to all potential probe sets in the Affymetrix Human Genome U133 Plus 2.0. The experiments were conducted with 10 unique biological formalin-fixed paraffin-embedded (FFPE) breast tumor samples. Each site extracted RNA from four sections of 10-micron thick FFPE tissue over three different days by two different operators using an optimized standard operating procedure and quality control criteria. Samples were analyzed using mas5 in BioConductor and NanoStringNorm in R. Pearson correlation showed reproducibility between sites for all 60 samples with r=0.995 for Affymetrix and r=0.999 for NanoString. Correlation in multiple days and multiple users was for Affymetrix r=0.962−0.999 and for NanoString r=0.982−0.991. Conclusion. The 325 RNA biomarkers showed reproducibility in two technology platforms with moderate to high concordance. Future directions include performing clinical validation studies and generating rationale for patient selection in clinical trials using the technically validated assay.

Download Full-text

Comparison between Pearson Correlation Coefficient and Mutual Information as a Similarity Measure of Gene Expression Profiles

Japanese Journal of Biometrics ◽

10.5691/jjb.33.125 ◽

2013 ◽

Vol 33 (2) ◽

pp. 125-143 ◽

Cited By ~ 3

Author(s):

Daisuke Horyu ◽

Takeshi Hayashi

Keyword(s):

Gene Expression ◽

Mutual Information ◽

Correlation Coefficient ◽

Similarity Measure ◽

Expression Profiles ◽

Pearson Correlation ◽

Gene Expression Profiles ◽

Pearson Correlation Coefficient

Download Full-text

Random Subspace Aggregation for Cancer Prediction with Gene Expression Profiles

BioMed Research International ◽

10.1155/2016/4596326 ◽

2016 ◽

Vol 2016 ◽

pp. 1-10 ◽

Cited By ~ 2

Author(s):

Liying Yang ◽

Zhimin Liu ◽

Xiguo Yuan ◽

Jianhua Wei ◽

Junying Zhang

Keyword(s):

Gene Expression ◽

Nearest Neighbor ◽

Signal To Noise Ratio ◽

Small Sample Size ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Small Sample ◽

Biological Data ◽

Superior Performance ◽

Random Subspaces

Background. Precisely predicting cancer is crucial for cancer treatment. Gene expression profiles make it possible to analyze patterns between genes and cancers on the genome-wide scale. Gene expression data analysis, however, is confronted with enormous challenges for its characteristics, such as high dimensionality, small sample size, and low Signal-to-Noise Ratio.Results. This paper proposes a method, termed RS_SVM, to predict gene expression profiles via aggregating SVM trained on random subspaces. After choosing gene features through statistical analysis, RS_SVM randomly selects feature subsets to yield random subspaces and training SVM classifiers accordingly and then aggregates SVM classifiers to capture the advantage of ensemble learning. Experiments on eight real gene expression datasets are performed to validate the RS_SVM method. Experimental results show that RS_SVM achieved better classification accuracy and generalization performance in contrast with single SVM,K-nearest neighbor, decision tree, Bagging, AdaBoost, and the state-of-the-art methods. Experiments also explored the effect of subspace size on prediction performance.Conclusions. The proposed RS_SVM method yielded superior performance in analyzing gene expression profiles, which demonstrates that RS_SVM provides a good channel for such biological data.

Download Full-text

Differential Gene Expression Profiles in CD34+ Myelodysplastic Syndrome Marrow Cells.

Blood ◽

10.1182/blood.v106.11.3424.3424 ◽

2005 ◽

Vol 106 (11) ◽

pp. 3424-3424 ◽

Cited By ~ 2

Author(s):

Kunju Sridhar ◽

Patrick O. Brown ◽

Robert Tibshirani ◽

Catriona Jamieson ◽

Irv Weissman ◽

...

Keyword(s):

Gene Expression ◽

Myelodysplastic Syndrome ◽

Clinical Outcomes ◽

Expression Profiles ◽

False Positive Rate ◽

Gene Expression Profiles ◽

Gene Clustering ◽

Cd34 Cells ◽

Positive Rate ◽

Marrow Cells

Abstract Gene expression profiles (GEPs) were obtained from marrow hematopoietic precursor cells (HPC)(CD34+ cells) from 30 myelodysplastic syndrome (MDS) patients: RARS 2, RA 15, RAEB 9, RAEBT 4; IPSS Low 11, Int-1 10, Int-2 5, High 4, and 6 Normal individuals. Fluorescently labeled cDNA was prepared from CD34+ cells (>90% purity), isolated by immunomagnetic column separation, after reverse transcription of high fidelity PCR-amplified poly(A) RNA (aRNA). The Cy-conjugated nucleotides for aRNA were hybridized to 40,000 gene chip microarrays obtained from the Stanford Functional Genomics Microarray Facility. aRNA from pooled normal CD34+ marrow cells was used as a Reference standard. High resolution scans were obtained to compile a dataset for each microarray, through files submitted to the Stanford Microarray Database. Dendrograms generated by unsupervised hierarchical gene clustering indicated major differences of GEP between Normal and MDS patients. Significance Analysis for Microarray (SAM) yielded 2327 genes significantly differentially expressed by MDS vs Normal: 2269 genes overexpressed, 58 underexpressed, with a false positive rate of ~10%. Prediction Analysis of Microarray (PAM) distinctly separated the MDS and Normal patients, requiring a minimum of 31 genes (which were also SAM significant). Class analysis by PAM correctly predicted 29 of the 30 to be MDS and 5 of the 6 to be Normal. Four disparate differential GEP regions in the dendrograms, comprising predominantly genes of differing functional categories provided signatures associated with differing MDS clinical subgroups. Nine of 10 patients with poor clinical outcomes were associated with a differing GEP signature than that which occurred in 14 of 20 patients with relatively good outcomes. Compared to the remainder of MDS patients, those with 5q- syndrome (n=5) had a differing GEP signature, with under-expression of 1018 genes, 11 of which were within the 5q31–32 CDS. Two of these genes (antioxidant protein1 and interferon regulatory factor1) have previously been proffered as candidate genes for this syndrome. Analysis of FACS-sorted highly purified marrow HPC subsets: CD34+38+ (late) and CD34+38- (early HPCs), indicated these ratios to be 4.3±2.1 (n=2) for MDS and 3.2±1.2 (n=12) for Normals. These findings suggest that the differing GEPs between the MDS and Normal CD34+ cells were not due to major differences in their proportions of CD38 cell subsets. SAM and PAM significant differential GEPs were noted between these cell subsets (also differing between MDS and Normal), indicating alteration of gene expression during differentiation. Wnt1 and β-catenin1 (genes involved in cell self-renewal) were over-expressed in both MDS CD38- and CD38+ cells compared to Normal. These data demonstrate: (1) molecular differences between MDS and Normal HPCs and within HPC subsets; (2) GEP signatures characterizing MDS patients with differing cytogenetic abnormalities (eg, 5q-) and clinical outcomes; (3) molecular criteria refining the prognostic categorization of MDS; and (4) gene expression data aiding characterization of the heterogeneous nature of this spectrum of diseases.

Download Full-text

Clustering of gene expression profiles: creating initialization-independent clusterings by eliminating unstable genes

Journal of Integrative Bioinformatics ◽

10.1515/jib-2010-134 ◽

2010 ◽

Vol 7 (3) ◽

Author(s):

Wim De Mulder ◽

Martin Kuiper ◽

René Boel

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Expression Profiles ◽

Clustering Algorithms ◽

Gene Expression Profiles ◽

Biological Data ◽

Biological Knowledge ◽

Expression Data ◽

Data Set ◽

Cluster Membership

SummaryClustering is an important approach in the analysis of biological data, and often a first step to identify interesting patterns of coexpression in gene expression data. Because of the high complexity and diversity of gene expression data, many genes cannot be easily assigned to a cluster, but even if the dissimilarity of these genes with all other gene groups is large, they will finally be forced to become member of a cluster. In this paper we show how to detect such elements, called unstable elements. We have developed an approach for iterative clustering algorithms in which unstable elements are deleted, making the iterative algorithm less dependent on initial centers. Although the approach is unsupervised, it is less likely that the clusters into which the reduced data set is subdivided contain false positives. This clustering yields a more differentiated approach for biological data, since the cluster analysis is divided into two parts: the pruned data set is divided into highly consistent clusters in an unsupervised way and the removed, unstable elements for which no meaningful cluster exists in unsupervised terms can be given a cluster with the use of biological knowledge and information about the likelihood of cluster membership. We illustrate our framework on both an artificial and real biological data set.

Download Full-text

1327: Gene Expression Profiles in Benign Prostatic Hyperplasia

The Journal of Urology ◽

10.1016/s0022-5347(18)38552-5 ◽

2004 ◽

Vol 171 (4S) ◽

pp. 349-350

Author(s):

Gaelle Fromont ◽

Michel Vidaud ◽

Alain Latil ◽

Guy Vallancien ◽

Pierre Validire ◽

...

Keyword(s):

Gene Expression ◽

Benign Prostatic Hyperplasia ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Prostatic Hyperplasia

Download Full-text

cDNA microarray analysis of gene expression profiles in human placenta: up-regulation of the transcript encoding muscle subunit of glycogen phosphorylase in preeclampsia

Journal of the Society for Gynecologic Investigation ◽

10.1016/s1071-5576(03)00154-0 ◽

2003 ◽

Vol 10 (8) ◽

pp. 496-502 ◽

Cited By ~ 21

Author(s):

S Tsoi

Keyword(s):

Gene Expression ◽

Microarray Analysis ◽

Cdna Microarray ◽

Human Placenta ◽

Glycogen Phosphorylase ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Cdna Microarray Analysis

Download Full-text

Intrinsic Gene Expression Profiles of Gliomas Are a Better Predictor of Survival than Histology

Yearbook of Neurology and Neurosurgery ◽

10.1016/s0513-5117(10)79306-6 ◽

2010 ◽

Vol 2010 ◽

pp. 113-114

Author(s):

J. Uhm

Keyword(s):

Gene Expression ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Intrinsic Gene

Download Full-text

Stromal Cells Derived from Non-Small Cell Lung Cancer and Normal Lung Tissue Display Mesenchymal Stem Cell Characteristics and Differ in Their Gene Expression Profiles and Functional Behaviour

Pneumologie ◽

10.1055/s-0029-1213954 ◽

2009 ◽

Vol 63 (S 01) ◽

Author(s):

S Gottschling ◽

A Jauch ◽

M Granzow ◽

R Kuner ◽

T Muley ◽

...

Keyword(s):

Gene Expression ◽

Lung Cancer ◽

Stem Cell ◽

Mesenchymal Stem Cell ◽

Stromal Cells ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Small Cell ◽

Normal Lung ◽

Small Cell Lung

Download Full-text

Gene Expression Profiles in CML Non-responders

Case Medical Research ◽

10.31525/ct1-nct04219111 ◽

2020 ◽

Author(s):

Keyword(s):

Gene Expression ◽

Expression Profiles ◽

Gene Expression Profiles

Download Full-text