Random Subspace Aggregation for Cancer Prediction with Gene Expression Profiles

BioMed Research International ◽

10.1155/2016/4596326 ◽

2016 ◽

Vol 2016 ◽

pp. 1-10 ◽

Cited By ~ 2

Author(s):

Liying Yang ◽

Zhimin Liu ◽

Xiguo Yuan ◽

Jianhua Wei ◽

Junying Zhang

Keyword(s):

Gene Expression ◽

Nearest Neighbor ◽

Signal To Noise Ratio ◽

Small Sample Size ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Small Sample ◽

Biological Data ◽

Superior Performance ◽

Random Subspaces

Background. Precisely predicting cancer is crucial for cancer treatment. Gene expression profiles make it possible to analyze patterns between genes and cancers on the genome-wide scale. Gene expression data analysis, however, is confronted with enormous challenges for its characteristics, such as high dimensionality, small sample size, and low Signal-to-Noise Ratio.Results. This paper proposes a method, termed RS_SVM, to predict gene expression profiles via aggregating SVM trained on random subspaces. After choosing gene features through statistical analysis, RS_SVM randomly selects feature subsets to yield random subspaces and training SVM classifiers accordingly and then aggregates SVM classifiers to capture the advantage of ensemble learning. Experiments on eight real gene expression datasets are performed to validate the RS_SVM method. Experimental results show that RS_SVM achieved better classification accuracy and generalization performance in contrast with single SVM,K-nearest neighbor, decision tree, Bagging, AdaBoost, and the state-of-the-art methods. Experiments also explored the effect of subspace size on prediction performance.Conclusions. The proposed RS_SVM method yielded superior performance in analyzing gene expression profiles, which demonstrates that RS_SVM provides a good channel for such biological data.

Download Full-text

Combining Nearest Neighbor Classifiers Versus Cross-Validation Selection

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1054 ◽

2004 ◽

Vol 3 (1) ◽

pp. 1-19 ◽

Cited By ~ 8

Author(s):

Minhui Paik ◽

Yuhong Yang

Keyword(s):

Gene Expression ◽

Cross Validation ◽

Nearest Neighbor ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Data Sets ◽

Weighting Method ◽

Considerable Uncertainty ◽

Combined Classifier ◽

Nearest Neighbor Classifiers

Various discriminant methods have been applied for classification of tumors based on gene expression profiles, among which the nearest neighbor (NN) method has been reported to perform relatively well. Usually cross-validation (CV) is used to select the neighbor size as well as the number of variables for the NN method. However, CV can perform poorly when there is considerable uncertainty in choosing the best candidate classifier. As an alternative to selecting a single “winner," we propose a weighting method to combine the multiple NN rules. Four gene expression data sets are used to compare its performance with CV methods. The results show that when the CV selection is unstable, the combined classifier performs much better.

Download Full-text

Biomarkers in Parkinson Disease: global gene expression analysis in peripheral blood from patients with and without mutations in PARK2 and PARK8

Einstein (São Paulo) ◽

10.1590/s1679-45082010ao1674 ◽

2010 ◽

Vol 8 (3) ◽

pp. 291-297 ◽

Cited By ~ 8

Author(s):

Patricia Maria de Carvalho Aguiar ◽

Patricia Severino

Keyword(s):

Gene Expression ◽

Parkinson Disease ◽

Expression Analysis ◽

Peripheral Blood ◽

Gene Expression Analysis ◽

Small Sample Size ◽

Expression Profiles ◽

Global Gene Expression ◽

Small Sample ◽

Global Gene Expression Analysis

ABSTRACT Objective: To evaluate the performance of gene expression analysis in the peripheral blood of Parkinson disease patients with different genetic profiles using microarray as a tool to identify possible diseases related biomarkers which could contribute to the elucidation of the pathological process, as well as be useful in diagnosis. Methods: Global gene expression analysis by means of DNA microarrays was performed in peripheral blood of Parkinson disease patients with previously identified mutations in PARK2 or PARK8 genes, Parkinson disease patients without known mutations in these genes and normal controls. Each group consisted of five individuals. Results: Global gene expression profiles were heterogeneous among patients and controls, and it was not possible to detect a consistent pattern between groups. However, analyzing genes with differential expression of p < 0.005 and fold change ≥ 1.2, we were able to identify a small group of well-annotated genes. Conclusions: Despite the small sample size, the identification of differentially expressed genes suggests that the microarray technique may be useful in identifying potential biomarkers in the peripheral blood of Parkinson disease patients or in people at risk of developing the disease. This will be important once neuroprotective therapies become available, and may contribute to the identification of new pathways involved in the disease physiopathology. Results presented here should be further validated in larger groups of patients.

Download Full-text

Multiplatform biomarker identification using a data-driven approach enables single-sample classification

BMC Bioinformatics ◽

10.1186/s12859-019-3140-7 ◽

2019 ◽

Vol 20 (1) ◽

Author(s):

Ling Zhang ◽

Ishwor Thapa ◽

Christian Haas ◽

Dhundy Bastola

Keyword(s):

Gene Expression ◽

Expression Profiles ◽

Blood Platelets ◽

Gene Expression Profiles ◽

Housekeeping Genes ◽

Disease Classification ◽

Single Sample ◽

Data Driven ◽

Superior Performance

Abstract Background High-throughput gene expression profiles have allowed discovery of potential biomarkers enabling early diagnosis, prognosis and developing individualized treatment. However, it remains a challenge to identify a set of reliable and reproducible biomarkers across various gene expression platforms and laboratories for single sample diagnosis and prognosis. We address this need with our Data-Driven Reference (DDR) approach, which employs stably expressed housekeeping genes as references to eliminate platform-specific biases and non-biological variabilities. Results Our method identifies biomarkers with “built-in” features, and these features can be interpreted consistently regardless of profiling technology, which enable classification of single-sample independent of platforms. Validation with RNA-seq data of blood platelets shows that DDR achieves the superior performance in classification of six different tumor types as well as molecular target statuses (such as MET or HER2-positive, and mutant KRAS, EGFR or PIK3CA) with smaller sets of biomarkers. We demonstrate on the three microarray datasets that our method is capable of identifying robust biomarkers for subgrouping medulloblastoma samples with data perturbation due to different microarray platforms. In addition to identifying the majority of subgroup-specific biomarkers in CodeSet of nanoString, some potential new biomarkers for subgrouping medulloblastoma were detected by our method. Conclusions In this study, we present a simple, yet powerful data-driven method which contributes significantly to identification of robust cross-platform gene signature for disease classification of single-patient to facilitate precision medicine. In addition, our method provides a new strategy for transcriptome analysis.

Download Full-text

Multiplatform Biomarker Identification using a Data-driven Approach Enables Single-sample Classification

10.1101/581686 ◽

2019 ◽

Author(s):

Ling Zhang ◽

Ishwor Thapa ◽

Christian Haas ◽

Dhundy Bastola

Keyword(s):

Gene Expression ◽

Expression Profiles ◽

Blood Platelets ◽

Gene Expression Profiles ◽

Housekeeping Genes ◽

Single Sample ◽

Data Driven ◽

Superior Performance ◽

Sample Classification

AbstractHigh-throughput gene expression profiles have allowed discovery of potential biomarkers enabling early diagnosis, prognosis and developing individualized treatment. However, it remains a challenge to identify a set of reliable and reproducible biomarkers across various gene expression platforms and laboratories for single sample diagnosis and prognosis. We address this need with our Data-Driven Reference (DDR) approach, which employs stably expressed housekeeping genes as references to eliminate platform-specific biases and non-biological variabilities. Our method identifies biomarkers with “built-in” features, and these features can be interpreted consistently regardless of profiling technology, which enable classification of single-sample independent of platforms. Validation with RNA-seq data of blood platelets shows that DDR achieves the superior performance in classification of six different tumor types as well as molecular target statuses (such asMETorHER2-positive, and mutantKRAS, EGFRorPIK3CA) with smaller sets of biomarkers. We demonstrate on the three microarray datasets that our method is capable of identifying robust biomarkers for subgrouping medulloblastoma samples with data perturbation due to different microarray platforms. In addition to identifying the majority of subgroup-specific biomarkers in Code-Set of nanoString, some potential new biomarkers for subgrouping medulloblastoma were detected by our method. Our results show that the DDR method contributes significantly to single-sample classification of disease and shed light on personalized medicine.

Download Full-text

On triangular inequalities of correlation-based distances for gene expression profiles

10.1101/582106 ◽

2019 ◽

Cited By ~ 2

Author(s):

Chen Jiaxing ◽

Yen Kaow Ng ◽

Lu Lin ◽

Yiqi Jiang ◽

Shuaicheng Li

Keyword(s):

Gene Expression ◽

Expression Profiles ◽

Pearson Correlation ◽

Gene Expression Profiles ◽

Biological Data ◽

Distance Functions ◽

Gene Clustering ◽

Spearman Correlation ◽

Correlation Distance ◽

Absolute Correlation

Various distance functions for evaluating the differences between gene expression profiles have been proposed in the past. Such a function would output a low value if the profiles are strongly correlated—either negatively or positively—and vice versa. One popular distance function is the absolute correlation distance, da=1−|ρ|, where ρ is some similarity measures, such as Pearson or Spearman correlation. However, absolute correlation distance fails to fulfill the triangular inequality, which would have guaranteed better performance at vector quantization, allowed fast data localization, as well as sped up data clustering. In this work, we propose dr=√1−|ρ| as an alternative. We prove that dr satisfies the triangular equality when ρ represents Pearson correlation, Spearman correlation, or Cosine similarity. We empirically compared dr with da in gene clustering and sample clustering experiment, using real biological data. The two distances performed similarly in both gene cluster and sample cluster in hierarchical cluster and PAM cluster. However, dr demonstrated more robust clustering. According to bootstrap experiment, the number of times where dr generated more robust sample pair partition is significantly (p-value <0.05) larger. This advantage in robustness is also supported by the class "dissolved" event.

Download Full-text

Clustering of gene expression profiles: creating initialization-independent clusterings by eliminating unstable genes

Journal of Integrative Bioinformatics ◽

10.1515/jib-2010-134 ◽

2010 ◽

Vol 7 (3) ◽

Author(s):

Wim De Mulder ◽

Martin Kuiper ◽

René Boel

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Expression Profiles ◽

Clustering Algorithms ◽

Gene Expression Profiles ◽

Biological Data ◽

Biological Knowledge ◽

Expression Data ◽

Data Set ◽

Cluster Membership

SummaryClustering is an important approach in the analysis of biological data, and often a first step to identify interesting patterns of coexpression in gene expression data. Because of the high complexity and diversity of gene expression data, many genes cannot be easily assigned to a cluster, but even if the dissimilarity of these genes with all other gene groups is large, they will finally be forced to become member of a cluster. In this paper we show how to detect such elements, called unstable elements. We have developed an approach for iterative clustering algorithms in which unstable elements are deleted, making the iterative algorithm less dependent on initial centers. Although the approach is unsupervised, it is less likely that the clusters into which the reduced data set is subdivided contain false positives. This clustering yields a more differentiated approach for biological data, since the cluster analysis is divided into two parts: the pruned data set is divided into highly consistent clusters in an unsupervised way and the removed, unstable elements for which no meaningful cluster exists in unsupervised terms can be given a cluster with the use of biological knowledge and information about the likelihood of cluster membership. We illustrate our framework on both an artificial and real biological data set.

Download Full-text

1327: Gene Expression Profiles in Benign Prostatic Hyperplasia

The Journal of Urology ◽

10.1016/s0022-5347(18)38552-5 ◽

2004 ◽

Vol 171 (4S) ◽

pp. 349-350

Author(s):

Gaelle Fromont ◽

Michel Vidaud ◽

Alain Latil ◽

Guy Vallancien ◽

Pierre Validire ◽

...

Keyword(s):

Gene Expression ◽

Benign Prostatic Hyperplasia ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Prostatic Hyperplasia

Download Full-text

cDNA microarray analysis of gene expression profiles in human placenta: up-regulation of the transcript encoding muscle subunit of glycogen phosphorylase in preeclampsia

Journal of the Society for Gynecologic Investigation ◽

10.1016/s1071-5576(03)00154-0 ◽

2003 ◽

Vol 10 (8) ◽

pp. 496-502 ◽

Cited By ~ 21

Author(s):

S Tsoi

Keyword(s):

Gene Expression ◽

Microarray Analysis ◽

Cdna Microarray ◽

Human Placenta ◽

Glycogen Phosphorylase ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Cdna Microarray Analysis

Download Full-text

Intrinsic Gene Expression Profiles of Gliomas Are a Better Predictor of Survival than Histology

Yearbook of Neurology and Neurosurgery ◽

10.1016/s0513-5117(10)79306-6 ◽

2010 ◽

Vol 2010 ◽

pp. 113-114

Author(s):

J. Uhm

Keyword(s):

Gene Expression ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Intrinsic Gene

Download Full-text

Stromal Cells Derived from Non-Small Cell Lung Cancer and Normal Lung Tissue Display Mesenchymal Stem Cell Characteristics and Differ in Their Gene Expression Profiles and Functional Behaviour

Pneumologie ◽

10.1055/s-0029-1213954 ◽

2009 ◽

Vol 63 (S 01) ◽

Author(s):

S Gottschling ◽

A Jauch ◽

M Granzow ◽

R Kuner ◽

T Muley ◽

...

Keyword(s):

Gene Expression ◽

Lung Cancer ◽

Stem Cell ◽

Mesenchymal Stem Cell ◽

Stromal Cells ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Small Cell ◽

Normal Lung ◽

Small Cell Lung

Download Full-text