Applying Cost-Sensitive Extreme Learning Machine and Dissimilarity Integration to Gene Expression Data Classification

Computational Intelligence and Neuroscience ◽

10.1155/2016/8056253 ◽

2016 ◽

Vol 2016 ◽

pp. 1-9 ◽

Cited By ~ 12

Author(s):

Yanqiu Liu ◽

Huijuan Lu ◽

Ke Yan ◽

Haixia Xia ◽

Chunlin An

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Imbalanced Data ◽

Expression Data ◽

Classification Problems ◽

High Scale ◽

Rejection Cost ◽

Misclassification Costs ◽

Learning Machine ◽

The Cost

Embedding cost-sensitive factors into the classifiers increases the classification stability and reduces the classification costs for classifying high-scale, redundant, and imbalanced datasets, such as the gene expression data. In this study, we extend our previous work, that is, Dissimilar ELM (D-ELM), by introducing misclassification costs into the classifier. We name the proposed algorithm as the cost-sensitive D-ELM (CS-D-ELM). Furthermore, we embed rejection cost into the CS-D-ELM to increase the classification stability of the proposed algorithm. Experimental results show that the rejection cost embedded CS-D-ELM algorithm effectively reduces the average and overall cost of the classification process, while the classification accuracy still remains competitive. The proposed method can be extended to classification problems of other redundant and imbalanced data.

Download Full-text

Regularised extreme learning machine with misclassification cost and rejection cost for gene expression data classification

International Journal of Data Mining and Bioinformatics ◽

10.1504/ijdmb.2015.069657 ◽

2015 ◽

Vol 12 (3) ◽

pp. 294 ◽

Cited By ~ 4

Author(s):

Huijuan Lu ◽

Shasha Wei ◽

Zili Zhou ◽

Yanzi Miao ◽

Yi Lu

Keyword(s):

Gene Expression ◽

Extreme Learning Machine ◽

Gene Expression Data ◽

Data Classification ◽

Expression Data ◽

Misclassification Cost ◽

Rejection Cost ◽

Learning Machine

Download Full-text

Learning misclassification costs for imbalanced classification on gene expression data

BMC Bioinformatics ◽

10.1186/s12859-019-3255-x ◽

2019 ◽

Vol 20 (S25) ◽

Author(s):

Huijuan Lu ◽

Yige Xu ◽

Minchao Ye ◽

Ke Yan ◽

Zhigang Gao ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Accurate Method ◽

Classification Problem ◽

Expression Data ◽

Optimal Cost ◽

Imbalanced Classification ◽

Function Fitting ◽

Misclassification Costs ◽

Learning Machine

Abstract Background Cost-sensitive algorithm is an effective strategy to solve imbalanced classification problem. However, the misclassification costs are usually determined empirically based on user expertise, which leads to unstable performance of cost-sensitive classification. Therefore, an efficient and accurate method is needed to calculate the optimal cost weights. Results In this paper, two approaches are proposed to search for the optimal cost weights, targeting at the highest weighted classification accuracy (WCA). One is the optimal cost weights grid searching and the other is the function fitting. Comparisons are made between these between the two algorithms above. In experiments, we classify imbalanced gene expression data using extreme learning machine to test the cost weights obtained by the two approaches. Conclusions Comprehensive experimental results show that the function fitting method is generally more efficient, which can well find the optimal cost weights with acceptable WCA.

Download Full-text

Bootstrapping Consistency Method for Optimal Gene Selection from Microarray Gene Expression Data for Classification Problems

Machine Learning in Bioinformatics ◽

10.1002/9780470397428.ch4 ◽

2009 ◽

pp. 89-110 ◽

Cited By ~ 1

Author(s):

Shaoning Pang ◽

Ilkka Havukkala ◽

Yingjie Hu ◽

Nikola Kasabov

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Gene Selection ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Classification Problems ◽

Microarray Gene Expression ◽

Microarray Gene ◽

Consistency Method

Download Full-text

Multi-class HingeBoost

Methods of Information in Medicine ◽

10.3414/me11-02-0020 ◽

2012 ◽

Vol 51 (02) ◽

pp. 162-167 ◽

Cited By ~ 7

Author(s):

Z. Wang

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Cancer Classification ◽

Alternative Methods ◽

Expression Data ◽

Classification Problems ◽

Benchmark Data ◽

Hinge Loss ◽

Selection Behavior

SummaryBackground: Multi-class molecular cancer classification has great potential clinical implications. Such applications require statistical methods to accurately classify cancer types with a small subset of genes from thousands of genes in the data.Objectives: This paper presents a new functional gradient descent boosting algorithm that directly extends the HingeBoost algorithm from the binary case to the multi-class case without reducing the original problem to multiple binary problems.Methods: Minimizing a multi-class hinge loss with boosting technique, the proposed Hinge-Boost has good theoretical properties by implementing the Bayes decision rule and providing a unifying framework with either equal or unequal misclassification costs. Furthermore, we propose Twin HingeBoost which has better feature selection behavior than Hinge-Boost by reducing the number of ineffective covariates. Simulated data, benchmark data and two cancer gene expression data sets are utilized to evaluate the performance of the proposed approach.Results: Simulations and the benchmark data showed that the multi-class HingeBoost generated accurate predictions when compared with the alternative methods, especially with high-dimensional covariates. The multi-class Hinge-Boost also produced more accurate prediction or comparable prediction in two cancer classification problems using gene expression data.Conclusions: This work has shown that the HingeBoost provides a powerful tool for multi-classification problems. In many applications, the classification accuracy and feature selection behavior can be further improved when using Twin HingeBoost.

Download Full-text

Analyzing gene expression data for pediatric and adult cancer diagnosis using logic learning machine and standard supervised methods

BMC Bioinformatics ◽

10.1186/s12859-019-2953-8 ◽

2019 ◽

Vol 20 (S9) ◽

Cited By ~ 1

Author(s):

Damiano Verda ◽

Stefano Parodi ◽

Enrico Ferrari ◽

Marco Muselli

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Cancer Diagnosis ◽

Cancer Biology ◽

Nearest Neighbor ◽

Support Vector ◽

Expression Data ◽

Network Support ◽

Supervised Methods ◽

Learning Machine

Abstract Background Logic Learning Machine (LLM) is an innovative method of supervised analysis capable of constructing models based on simple and intelligible rules. In this investigation the performance of LLM in classifying patients with cancer was evaluated using a set of eight publicly available gene expression databases for cancer diagnosis. LLM accuracy was assessed by summary ROC curve (sROC) analysis and estimated by the area under an sROC curve (sAUC). Its performance was compared in cross validation with that of standard supervised methods, namely: decision tree, artificial neural network, support vector machine (SVM) and k-nearest neighbor classifier. Results LLM showed an excellent accuracy (sAUC = 0.99, 95%CI: 0.98–1.0) and outperformed any other method except SVM. Conclusions LLM is a new powerful tool for the analysis of gene expression data for cancer diagnosis. Simple rules generated by LLM could contribute to a better understanding of cancer biology, potentially addressing therapeutic approaches.

Download Full-text

Analysis of complexity indices for classification problems: Cancer gene expression data

Neurocomputing ◽

10.1016/j.neucom.2011.03.054 ◽

2012 ◽

Vol 75 (1) ◽

pp. 33-42 ◽

Cited By ~ 26

Author(s):

Ana C. Lorena ◽

Ivan G. Costa ◽

Newton Spolaôr ◽

Marcilio C.P. de Souto

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Cancer Gene ◽

Expression Data ◽

Classification Problems

Download Full-text

Robust gene expression-based classification of cancers without normalization

10.1101/2020.04.28.051953 ◽

2020 ◽

Author(s):

Aixiang Jiang ◽

Laura K. Hilton ◽

Jeffrey Tang ◽

Christopher K. Rushton ◽

Bruno M. Grande ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Binary Classification ◽

B Cell Lymphoma ◽

Data Sets ◽

Expression Data ◽

Classification Problems ◽

Data Types ◽

Normalization Methods

AbstractBinary classification using gene expression data is commonly used to stratify cancers into molecular subgroups that may have distinct prognoses and therapeutic options. A limitation of many such methods is the requirement for comparable training and testing data sets. Here, we describe and demonstrate a self-training implementation of probability ratio-based classification prediction score (PRPS-ST) that facilitates the porting of existing classification models to other gene expression data sets. We demonstrate its robustness through application to two binary classification problems in diffuse large B-cell lymphoma using a diverse variety of gene expression data types and normalization methods.

Download Full-text

A Construction Method of Gene Expression Data Based on Information Gain and Extreme Learning Machine Classifier on Cloud Platform

International Journal of Database Theory and Application ◽

10.14257/ijdta.2014.7.2.10 ◽

2014 ◽

Vol 7 (2) ◽

pp. 99-108 ◽

Cited By ~ 4

Author(s):

Sha-Sha Wei ◽

Hui-Juan Lu ◽

Wei Jin ◽

Chao Li

Keyword(s):

Gene Expression ◽

Extreme Learning Machine ◽

Gene Expression Data ◽

Information Gain ◽

Construction Method ◽

Expression Data ◽

Cloud Platform ◽

Learning Machine

Download Full-text

Dissimilarity based ensemble of extreme learning machine for gene expression data classification

Neurocomputing ◽

10.1016/j.neucom.2013.02.052 ◽

2014 ◽

Vol 128 ◽

pp. 22-30 ◽

Cited By ~ 32

Author(s):

Hui-juan Lu ◽

Chun-lin An ◽

En-hui Zheng ◽

Yi Lu

Keyword(s):

Gene Expression ◽

Extreme Learning Machine ◽

Gene Expression Data ◽

Data Classification ◽

Expression Data ◽

Learning Machine

Download Full-text

Learning Misclassification Costs for Imbalanced Datasets, Application in Gene Expression Data Classification

Intelligent Computing Theories and Application - Lecture Notes in Computer Science ◽

10.1007/978-3-319-95930-6_47 ◽

2018 ◽

pp. 513-519 ◽

Cited By ~ 1

Author(s):

Huijuan Lu ◽

Yige Xu ◽

Minchao Ye ◽

Ke Yan ◽

Qun Jin ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Data Classification ◽

Expression Data ◽

Imbalanced Datasets ◽

Misclassification Costs

Download Full-text