Classification Performance Improvement Using Random Subset Feature Selection Algorithm for Data Mining

2018 ◽  
Vol 12 ◽  
pp. 1-12 ◽  
Author(s):  
Lakshmipadmaja D ◽  
B. Vishnuvardhan
2013 ◽  
Vol 22 (04) ◽  
pp. 1350027
Author(s):  
JAGANATHAN PALANICHAMY ◽  
KUPPUCHAMY RAMASAMY

Feature selection is essential in data mining and pattern recognition, especially for database classification. During past years, several feature selection algorithms have been proposed to measure the relevance of various features to each class. A suitable feature selection algorithm normally maximizes the relevancy and minimizes the redundancy of the selected features. The mutual information measure can successfully estimate the dependency of features on the entire sampling space, but it cannot exactly represent the redundancies among features. In this paper, a novel feature selection algorithm is proposed based on maximum relevance and minimum redundancy criterion. The mutual information is used to measure the relevancy of each feature with class variable and calculate the redundancy by utilizing the relationship between candidate features, selected features and class variables. The effectiveness is tested with ten benchmarked datasets available in UCI Machine Learning Repository. The experimental results show better performance when compared with some existing algorithms.


2020 ◽  
Author(s):  
Esra Sarac Essiz ◽  
Murat Oturakci

Abstract As a nature-inspired algorithm, artificial bee colony (ABC) is an optimization algorithm that is inspired by the search behaviour of honey bees. The main aim of this study is to examine the effects of the ABC-based feature selection algorithm on classification performance for cyberbullying, which has become a significant worldwide social issue in recent years. With this purpose, the classification performance of the proposed ABC-based feature selection method is compared with three different traditional methods such as information gain, ReliefF and chi square. Experimental results present that ABC-based feature selection method outperforms than three traditional methods for the detection of cyberbullying. The Macro averaged F_measure of the data set is increased from 0.659 to 0.8 using proposed ABC-based feature selection method.


Webology ◽  
2021 ◽  
Vol 18 (SI02) ◽  
pp. 01-20
Author(s):  
S. Bharani Nayagi ◽  
T.S. Shiny Angel

The eradication of correlated evidence of the enormous volume of the directory is designated as data mining. Extracting discriminate knowledge associate with the approach is performed by a feature of knowledge. Knowledge rejuvenation is carried out as features and the process is delineated as a feature selection mechanism. Feature selection is a subset of features, acquired more information. Before data mining, Feature selection is essential to trim down the elevated dimensional information. Without feature selection pre-processing techniques, classification required interminable calculation duration which might lead to intricacy. The foremost intention of the analysis is to afford a summary of feature selection approaches adopted to evaluate the extreme extensive features.


Entropy ◽  
2021 ◽  
Vol 23 (6) ◽  
pp. 704
Author(s):  
Jiucheng Xu ◽  
Kanglin Qu ◽  
Meng Yuan ◽  
Jie Yang

Feature selection is one of the core contents of rough set theory and application. Since the reduction ability and classification performance of many feature selection algorithms based on rough set theory and its extensions are not ideal, this paper proposes a feature selection algorithm that combines the information theory view and algebraic view in the neighborhood decision system. First, the neighborhood relationship in the neighborhood rough set model is used to retain the classification information of continuous data, to study some uncertainty measures of neighborhood information entropy. Second, to fully reflect the decision ability and classification performance of the neighborhood system, the neighborhood credibility and neighborhood coverage are defined and introduced into the neighborhood joint entropy. Third, a feature selection algorithm based on neighborhood joint entropy is designed, which improves the disadvantage that most feature selection algorithms only consider information theory definition or algebraic definition. Finally, experiments and statistical analyses on nine data sets prove that the algorithm can effectively select the optimal feature subset, and the selection result can maintain or improve the classification performance of the data set.


Data Scientists focus on high dimensional data to predict and reveal some interesting patterns as well as most useful information to the modern world. Feature Selection is a preprocessing technique which improves the accuracy and efficiency of mining algorithms. There exist a numerous feature selection algorithms. Most of the algorithms failed to give better mining results as the scale increases. In this paper, feature selection for supervised algorithms in data mining are considered and given an overview of existing machine learning algorithm for supervised feature selection. This paper introduces an enhanced supervised feature selection algorithm which selects the best feature subset by eliminating irrelevant features using distance correlation and redundant features using symmetric uncertainty. The experimental results show that the proposed algorithm provides better classification accuracy and selects minimum number of features.


2019 ◽  
Vol 2019 ◽  
pp. 1-19 ◽  
Author(s):  
Muhammad Hammad Memon ◽  
Jian Ping Li ◽  
Amin Ul Haq ◽  
Muhammad Hunain Memon ◽  
Wang Zhou

The accurate and efficient diagnosis of breast cancer is extremely necessary for recovery and treatment in early stages in IoT healthcare environment. Internet of Things has witnessed the transition in life for the last few years which provides a way to analyze both the real-time data and past data by the emerging role of artificial intelligence and data mining techniques. The current state-of-the-art method does not effectively diagnose the breast cancer in the early stages, and most of the ladies suffered from this dangerous disease. Thus, the early detection of breast cancer significantly poses a great challenge for medical experts and researchers. To solve the problem of early-stage detection of breast cancer, we proposed machine learning-based diagnostic system which effectively classifies the malignant and benign people in the environment of IoT. In the development of our proposed system, a machine learning classifier support vector machine is used to classify the malignant and benign people. To improve the classification performance of the classification system, we used a recursive feature selection algorithm to select more suitable features from the breast cancer dataset. The training/testing splits method is applied for training and testing of the classifier for the best predictive model. Additionally, the classifier performance has been checked on by using performance evaluation metrics such as classification, specificity, sensitivity, Matthews’s correlation coefficient, F1-score, and execution time. To test the proposed method, the dataset “Wisconsin Diagnostic Breast Cancer” has been used in this research study. The experimental results demonstrate that the recursive feature selection algorithm selects the best subset of features, and the classifier SVM achieved optimal classification performance on this best subset of features. The SVM kernel linear achieved high classification accuracy (99%), specificity (99%), and sensitivity (98%), and the Matthews’s correlation coefficient is 99%. From these experimental results, we concluded that the proposed system performance is excellent due to the selection of more appropriate features that are selected by the recursive feature selection algorithm. Furthermore, we suggest this proposed system for effective and efficient early stages diagnosis of breast cancer. Thus, through this system, the recovery and treatment will be more effective for breast cancer. Lastly, the implementation of the proposed system is very reliable in all aspects of IoT healthcare for breast cancer.


Sign in / Sign up

Export Citation Format

Share Document