Comparative Study of Classification Models with Genetic Search Based Feature Selection Technique

2018 ◽  
Vol 9 (3) ◽  
pp. 1-11
Author(s):  
Sanat Kumar Sahu ◽  
A. K. Shrivas

Feature selection plays a very important role to retrieve the relevant features from datasets and computationally improves the performance of a model. The objective of this study is to evaluate the most important features of a chronic kidney disease (CKD) dataset and diagnose the CKD problem. In this research work, the authors have used a genetic search with the Wrapper Subset Evaluator method for feature selection to increase the overall performance of the classification model. They have also used Bayes Network, Classification and Regression Tree (CART), Radial Basis Function Network (RBFN) and J48 classifier for classification of CKD and non-CKD data. The proposed genetic search based feature selection technique (GSBFST) selects the best features from CKD dataset and compares the performance of classifiers with proposed and existing genetic search feature selection techniques (FSTs). All classification models give the better result with proposed GSBFST as compared to without FST and existing genetic search FSTs.

2020 ◽  
pp. 773-783
Author(s):  
Sanat Kumar Sahu ◽  
A. K. Shrivas

Feature selection plays a very important role to retrieve the relevant features from datasets and computationally improves the performance of a model. The objective of this study is to evaluate the most important features of a chronic kidney disease (CKD) dataset and diagnose the CKD problem. In this research work, the authors have used a genetic search with the Wrapper Subset Evaluator method for feature selection to increase the overall performance of the classification model. They have also used Bayes Network, Classification and Regression Tree (CART), Radial Basis Function Network (RBFN) and J48 classifier for classification of CKD and non-CKD data. The proposed genetic search based feature selection technique (GSBFST) selects the best features from CKD dataset and compares the performance of classifiers with proposed and existing genetic search feature selection techniques (FSTs). All classification models give the better result with proposed GSBFST as compared to without FST and existing genetic search FSTs.


Author(s):  
Norsyela Muhammad Noor Mathivanan ◽  
Nor Azura Md.Ghani ◽  
Roziah Mohd Janor

<span>The curse of dimensionality and the empty space phenomenon emerged as a critical problem in text classification. One way of dealing with this problem is applying a feature selection technique before performing a classification model. This technique helps to reduce the time complexity and sometimes increase the classification accuracy. This study introduces a feature selection technique using K-Means clustering to overcome the weaknesses of traditional feature selection technique such as principal component analysis (PCA) that require a lot of time to transform all the inputs data. This proposed technique decides on features to retain based on the significance value of each feature in a cluster. This study found that k-means clustering helps to increase the efficiency of KNN model for a large data set while KNN model without feature selection technique is suitable for a small data set. A comparison between K-Means clustering and PCA as a feature selection technique shows that proposed technique is better than PCA especially in term of computation time. Hence, k-means clustering is found to be helpful in reducing the data dimensionality with less time complexity compared to PCA without affecting the accuracy of KNN model for a high frequency data.</span>


2013 ◽  
Vol 22 (05) ◽  
pp. 1360010 ◽  
Author(s):  
HUANJING WANG ◽  
TAGHI M. KHOSHGOFTAAR ◽  
QIANHUI (ALTHEA) LIANG

Software metrics (features or attributes) are collected during the software development cycle. Metric selection is one of the most important preprocessing steps in the process of building defect prediction models and may improve the final prediction result. However, the addition or removal of program modules (instances or samples) can alter the subsets chosen by a feature selection technique, rendering the previously-selected feature sets invalid. Very limited research have been done considering both stability (or robustness) and defect prediction model performance together in the software engineering domain, despite the importance of both aspects when choosing a feature selection technique. In this paper, we test the stability and classification model performance of eighteen feature selection techniques as the magnitude of change to the datasets and the size of the selected feature subsets are varied. All experiments were conducted on sixteen datasets from three real-world software projects. The experimental results demonstrate that Gain Ratio shows the least stability while two different versions of ReliefF show the most stability, followed by the PRC- and AUC-based threshold-based feature selection techniques. Results also show that the signal-to-noise ranker performed moderately in terms of robustness and was the best ranker in terms of model performance. Finally, we conclude that while for some rankers, stability and classification performance are correlated, this is not true for other rankers, and therefore performance according to one scheme (stability or model performance) cannot be used to predict performance according to the other.


Microarray technology has been developed as one of the powerful tools that have attracted many researchers to analyze gene expression level for a given organism. It has been observed that gene expression data have very large (in terms of thousands) of features and less number of samples (in terms of hundreds). This characteristic makes difficult to do an analysis of gene expression data. Hence efficient feature selection technique must be applied before we go for any kind of analysis. Feature selection plays a vital role in the classification of gene expression data. There are several feature selection techniques have been induced in this field. But Support Vector Machine with Recursive Feature Elimination (SVM-RFE) has been proven as the promising feature selection methods among others. SVM-RFE ranks the genes (features) by training the SVM classification model and with the combination of RFE method key genes are selected. Huge time consumption is the main issue of SVM-RFE. We introduced an efficient implementation of linier SVM to overcome this problem and improved the RFE with variable step size. Then, combined method was used for selecting informative genes. Effective resampling method is proposed to preprocess the datasets. This is used to make the distribution of samples balanced, which gives more reliable classification results. In this paper, we have also studied the applicability of common classifiers. Detailed experiments are conducted on four commonly used microarray gene expression datasets. The results show that the proposed method comparable classification performance


Healthcare diagnosis system is very important and critical task in medical science for doctors and medical students. Chronic kidney disease is a very serious and dangerous problem which is directly related to the human life. In this research work, we have used data mining and feature selection technique to develop the robust and computationally efficient model for classifying chronic and non chronic kidney disease. An ensemble model is constructing through combination of two more similar types of trained model which helps to improve the performance. Feature selection is frequently used in machine learning area to raise a model with a few numbers of features which increase the performance of classification accuracy. The proposed feature selection techniques principle of Genetic Search (GS) and Greedy Stepwise Search (GSW). This proposed technique called GS-NB utilizes a pursuit methodology which is embedded in the Genetic Algorithm to select the features based on natural selection, the procedure that drives biological evolution. Then proposed technique called GSW-NB utilizes a search strategy that is included in the Greedy Stepwise to search the relevant feature based on problem solving heuristic for settling the locally ideal decision at each stage. The performance of suggested technique were estimated on Chronic Kidney Disease (CKD) classification problems and compared with proposed feature selection method. The classification techniques namely the Single Rule Classification (SRC), Conditional Inference Tree (CIT) and their ensemble model (SRC, CIT) have used for classification of CKD. The proposed ensemble model have used stacking learning technique which combines multiple classifiers, hence we can improve the performance of classifiers. The classifier performance is measured with observed accuracy, sensitivity and specificity. The experimental results demonstrated that the ensemble model (SRC, CIT) with GS-NB and GSW-NB can recognized CKD better than existing model. The proposed model can be beneficial and useful in medical science for identifying and diagnosis of chronic kidney disease.


Author(s):  
Hua Tang ◽  
Chunmei Zhang ◽  
Rong Chen ◽  
Po Huang ◽  
Chenggang Duan ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document