Comparative Study of Classification Models with Genetic Search Based Feature Selection Technique

Feature selection plays a very important role to retrieve the relevant features from datasets and computationally improves the performance of a model. The objective of this study is to evaluate the most important features of a chronic kidney disease (CKD) dataset and diagnose the CKD problem. In this research work, the authors have used a genetic search with the Wrapper Subset Evaluator method for feature selection to increase the overall performance of the classification model. They have also used Bayes Network, Classification and Regression Tree (CART), Radial Basis Function Network (RBFN) and J48 classifier for classification of CKD and non-CKD data. The proposed genetic search based feature selection technique (GSBFST) selects the best features from CKD dataset and compares the performance of classifiers with proposed and existing genetic search feature selection techniques (FSTs). All classification models give the better result with proposed GSBFST as compared to without FST and existing genetic search FSTs.

Download Full-text

A Classification Model for Multispectral Forest Datatype with the help of a Decision Tree and Wrapper Based Forward Feature Selection Technique

Lecture Notes in Networks and Systems - Advances in Distributed Computing and Machine Learning ◽

10.1007/978-981-16-4807-6_42 ◽

2022 ◽

pp. 444-456

Author(s):

Madhusmita Sahu ◽

Rasmita Dash

Keyword(s):

Feature Selection ◽

Decision Tree ◽

Classification Model ◽

Feature Selection Technique ◽

Selection Technique

Download Full-text

A comparative study on dimensionality reduction between principal component analysis and k-means clustering

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v16.i2.pp752-758 ◽

2019 ◽

Vol 16 (2) ◽

pp. 752

Author(s):

Norsyela Muhammad Noor Mathivanan ◽

Nor Azura Md.Ghani ◽

Roziah Mohd Janor

Keyword(s):

Principal Component Analysis ◽

Feature Selection ◽

Time Complexity ◽

Principal Component ◽

Component Analysis ◽

Classification Model ◽

Small Data ◽

Feature Selection Technique ◽

Data Set ◽

Selection Technique

<span>The curse of dimensionality and the empty space phenomenon emerged as a critical problem in text classification. One way of dealing with this problem is applying a feature selection technique before performing a classification model. This technique helps to reduce the time complexity and sometimes increase the classification accuracy. This study introduces a feature selection technique using K-Means clustering to overcome the weaknesses of traditional feature selection technique such as principal component analysis (PCA) that require a lot of time to transform all the inputs data. This proposed technique decides on features to retain based on the significance value of each feature in a cluster. This study found that k-means clustering helps to increase the efficiency of KNN model for a large data set while KNN model without feature selection technique is suitable for a small data set. A comparison between K-Means clustering and PCA as a feature selection technique shows that proposed technique is better than PCA especially in term of computation time. Hence, k-means clustering is found to be helpful in reducing the data dimensionality with less time complexity compared to PCA without affecting the accuracy of KNN model for a high frequency data.</span>

Download Full-text

Classification of Chronic Kidney Disease with Genetic Search Intersection Based Feature Selection Technique

Advances in Intelligent Systems and Computing - 4th International Conference on Internet of Things and Connected Technologies (ICIoTCT), 2019 ◽

10.1007/978-3-030-39875-0_2 ◽

2020 ◽

pp. 11-21

Author(s):

Sanat Kumar Sahu ◽

Prem Kumar Chandrakar

Keyword(s):

Chronic Kidney Disease ◽

Feature Selection ◽

Kidney Disease ◽

Feature Selection Technique ◽

Genetic Search ◽

Selection Technique

Download Full-text

A STUDY OF SOFTWARE METRIC SELECTION TECHNIQUES: STABILITY ANALYSIS AND DEFECT PREDICTION MODEL PERFORMANCE

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213013600105 ◽

2013 ◽

Vol 22 (05) ◽

pp. 1360010 ◽

Cited By ~ 6

Author(s):

HUANJING WANG ◽

TAGHI M. KHOSHGOFTAAR ◽

QIANHUI (ALTHEA) LIANG

Keyword(s):

Feature Selection ◽

Prediction Model ◽

Prediction Models ◽

Model Performance ◽

Classification Model ◽

Defect Prediction ◽

Feature Selection Technique ◽

Selection Technique ◽

Metric Selection ◽

Feature Selection Techniques

Software metrics (features or attributes) are collected during the software development cycle. Metric selection is one of the most important preprocessing steps in the process of building defect prediction models and may improve the final prediction result. However, the addition or removal of program modules (instances or samples) can alter the subsets chosen by a feature selection technique, rendering the previously-selected feature sets invalid. Very limited research have been done considering both stability (or robustness) and defect prediction model performance together in the software engineering domain, despite the importance of both aspects when choosing a feature selection technique. In this paper, we test the stability and classification model performance of eighteen feature selection techniques as the magnitude of change to the datasets and the size of the selected feature subsets are varied. All experiments were conducted on sixteen datasets from three real-world software projects. The experimental results demonstrate that Gain Ratio shows the least stability while two different versions of ReliefF show the most stability, followed by the PRC- and AUC-based threshold-based feature selection techniques. Results also show that the signal-to-noise ranker performed moderately in terms of robustness and was the best ranker in terms of model performance. Finally, we conclude that while for some rankers, stability and classification performance are correlated, this is not true for other rankers, and therefore performance according to one scheme (stability or model performance) cannot be used to predict performance according to the other.

Download Full-text

Classification of Gene Expression Data using Efficient Feature Selection Technique and Resampling Method

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.e7816.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 406-414

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Classification Model ◽

Support Vector ◽

Expression Data ◽

Feature Selection Technique ◽

Selection Technique ◽

Resampling Method

Microarray technology has been developed as one of the powerful tools that have attracted many researchers to analyze gene expression level for a given organism. It has been observed that gene expression data have very large (in terms of thousands) of features and less number of samples (in terms of hundreds). This characteristic makes difficult to do an analysis of gene expression data. Hence efficient feature selection technique must be applied before we go for any kind of analysis. Feature selection plays a vital role in the classification of gene expression data. There are several feature selection techniques have been induced in this field. But Support Vector Machine with Recursive Feature Elimination (SVM-RFE) has been proven as the promising feature selection methods among others. SVM-RFE ranks the genes (features) by training the SVM classification model and with the combination of RFE method key genes are selected. Huge time consumption is the main issue of SVM-RFE. We introduced an efficient implementation of linier SVM to overcome this problem and improved the RFE with variable step size. Then, combined method was used for selecting informative genes. Effective resampling method is proposed to preprocess the datasets. This is used to make the distribution of samples balanced, which gives more reliable classification results. In this paper, we have also studied the applicability of common classifiers. Detailed experiments are conducted on four commonly used microarray gene expression datasets. The results show that the proposed method comparable classification performance

Download Full-text

A Proposed Ensemble Model with Feature Selection Technique for Classification of Chronic Kidney Disease

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a2207.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 966-972 ◽

Cited By ~ 1

Keyword(s):

Chronic Kidney Disease ◽

Feature Selection ◽

Kidney Disease ◽

Medical Science ◽

Research Work ◽

Conditional Inference ◽

Ensemble Model ◽

Feature Selection Technique ◽

Selection Technique

Healthcare diagnosis system is very important and critical task in medical science for doctors and medical students. Chronic kidney disease is a very serious and dangerous problem which is directly related to the human life. In this research work, we have used data mining and feature selection technique to develop the robust and computationally efficient model for classifying chronic and non chronic kidney disease. An ensemble model is constructing through combination of two more similar types of trained model which helps to improve the performance. Feature selection is frequently used in machine learning area to raise a model with a few numbers of features which increase the performance of classification accuracy. The proposed feature selection techniques principle of Genetic Search (GS) and Greedy Stepwise Search (GSW). This proposed technique called GS-NB utilizes a pursuit methodology which is embedded in the Genetic Algorithm to select the features based on natural selection, the procedure that drives biological evolution. Then proposed technique called GSW-NB utilizes a search strategy that is included in the Greedy Stepwise to search the relevant feature based on problem solving heuristic for settling the locally ideal decision at each stage. The performance of suggested technique were estimated on Chronic Kidney Disease (CKD) classification problems and compared with proposed feature selection method. The classification techniques namely the Single Rule Classification (SRC), Conditional Inference Tree (CIT) and their ensemble model (SRC, CIT) have used for classification of CKD. The proposed ensemble model have used stacking learning technique which combines multiple classifiers, hence we can improve the performance of classifiers. The classifier performance is measured with observed accuracy, sensitivity and specificity. The experimental results demonstrated that the ensemble model (SRC, CIT) with GS-NB and GSW-NB can recognized CKD better than existing model. The proposed model can be beneficial and useful in medical science for identifying and diagnosis of chronic kidney disease.

Download Full-text

SyntcRec: a Syntactic Recommender System Based on Improved Feature Selection Technique in Large Scholarly Data

International Journal on Communications Antenna and Propagation (IRECAP) ◽

10.15866/irecap.v7i6.13353 ◽

2017 ◽

Vol 7 (6) ◽

pp. 537

Author(s):

Deepa Mandave ◽

Govind Pole

Keyword(s):

Feature Selection ◽

Recommender System ◽

Feature Selection Technique ◽

Selection Technique ◽

Scholarly Data

Download Full-text

Identification of Secretory Proteins of Malaria Parasite by Feature Selection Technique

Letters in Organic Chemistry ◽

10.2174/1570178614666170329155502 ◽

2017 ◽

Vol 14 (9) ◽

Cited By ~ 14

Author(s):

Hua Tang ◽

Chunmei Zhang ◽

Rong Chen ◽

Po Huang ◽

Chenggang Duan ◽

...

Keyword(s):

Feature Selection ◽

Malaria Parasite ◽

Secretory Proteins ◽

Feature Selection Technique ◽

Selection Technique

Download Full-text

Hybrid feature selection technique for prediction of cardiovascular diseases

Materials Today Proceedings ◽

10.1016/j.matpr.2021.03.225 ◽

2021 ◽

Author(s):

Pavithra V ◽

Jayalakshmi V

Keyword(s):

Feature Selection ◽

Cardiovascular Diseases ◽

Feature Selection Technique ◽

Selection Technique

Download Full-text