Using support vector regression in gene selection and fuzzy rule generation for relapse time prediction of breast cancer

To compare with two datasets based on attributes by using classification algorithms, for the attributes, the authors need to select them by rules and the system is known as rule-based reasoning system which classifies a given test instance into a particular outcome from the learned rules. The test instance carries multiple attributes, which are usually the values of diagnostic tests. In this article, the authors propose a classifier ensemble-based method for comparison of two breast cancer datasets. The ensemble data mining learning methods are applied to rule generation, and a multi-criterion evaluation approach is used for selecting reliable rules over the results of the ensemble methods. The efficacy of the proposed methodology is illustrated via an example of two breast cancer datasets. This article introduces a novel fuzzy rule-based classification method called FURIA, to obtain a relationship between two breast cancer datasets. Hence, it can find the similarity between these two datasets. The new method is compared vis-à-vis with other classical statistical approaches such as correlation and mutual information gain.

Download Full-text

Regulatory genes identification within functional genomics experiments for tissue classification into binary classes via machine learning techniques

Journal of the Pakistan Medical Association ◽

10.47391/jpma.201 ◽

2020 ◽

pp. 1-21

Author(s):

Bushra Wazir ◽

Dost Muhammad Khan ◽

Umair Khalil ◽

Muhammad Hamraz ◽

Naz Gul ◽

...

Keyword(s):

Breast Cancer ◽

Random Forest ◽

Classification Accuracy ◽

Gene Selection ◽

Machine Learning Techniques ◽

Brier Score ◽

Support Vector ◽

Target Tissue ◽

Selection Methods ◽

Microarray Gene Expression

Abstract Objectives: The aim of this study is to filter out the most informative genes that mainly regulate the target tissue class, increase classification accuracy, reduce the curse of dimensionality, and discard redundant and irrelevant genes. Methods: This paper presented the idea of gene selection using bagging sub-forest (BSF). The proposed method provided genes importance grounded on the idea specified in the standard random forest algorithm. The new method is compared with three state-of-the art methods, i.e., Wilcoxon, masked painter and proportional overlapped score (POS). These methods were applied on 5 data sets, i.e. Colon, Lymph node breast cancer, Leukemia, Serrated colorectal carcinomas, and Breast Cancer. Comparison was done by selecting top 20 genes by applying the gene selection methods and applying random forest (RF) and support vector machine (SVM) classifiers to assess their predictive performance on the datasets with selected genes. Classification accuracy, Brier score, and sensitivity have been used as performance measures. Results: The proposed method gave better results than the other methods using both random forest and SVM classifiers on all the datasets among all the feature selection methods. Conclusion: The proposed method showed improved performance in terms of classification accuracy, Brier score and sensitivity, and hence, could be used as a novel method for gene selection to classify tissue samples into their correct classes. Key Words: Gene selection, classification, random forest, cancer, microarray gene expression.

Download Full-text