Naive Bayes-Guided Bat Algorithm for Feature Selection

The Scientific World JOURNAL ◽

10.1155/2013/325973 ◽

2013 ◽

Vol 2013 ◽

pp. 1-9 ◽

Cited By ~ 23

Author(s):

Ahmed Majid Taha ◽

Aida Mustapha ◽

Soong-Der Chen

Keyword(s):

Feature Selection ◽

Classification Accuracy ◽

Naive Bayes ◽

Bat Algorithm ◽

Naïve Bayes ◽

Bayes Classifier ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Benchmark Datasets ◽

Selection Algorithms

When the amount of data and information is said to double in every 20 months or so, feature selection has become highly important and beneficial. Further improvements in feature selection will positively affect a wide array of applications in fields such as pattern recognition, machine learning, or signal processing. Bio-inspired method called Bat Algorithm hybridized with a Naive Bayes classifier has been presented in this work. The performance of the proposed feature selection algorithm was investigated using twelve benchmark datasets from different domains and was compared to three other well-known feature selection algorithms. Discussion focused on four perspectives: number of features, classification accuracy, stability, and feature generalization. The results showed that BANB significantly outperformed other algorithms in selecting lower number of features, hence removing irrelevant, redundant, or noisy features while maintaining the classification accuracy. BANB is also proven to be more stable than other methods and is capable of producing more general feature subsets.

Download Full-text

High-Accuracy Power Quality Disturbance Classification Using the Adaptive ABC-PSO as Optimal Feature Selection Algorithm

Energies ◽

10.3390/en14051238 ◽

2021 ◽

Vol 14 (5) ◽

pp. 1238

Author(s):

Supanat Chamchuen ◽

Apirat Siritaratiwat ◽

Pradit Fuangfoo ◽

Puripong Suthisopapan ◽

Pirat Khunkitti

Keyword(s):

Feature Selection ◽

Power Quality ◽

Distribution System ◽

Classification Accuracy ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Electrical Distribution ◽

Power Quality Disturbance ◽

Optimal Feature Selection ◽

Optimal Feature

Power quality disturbance (PQD) is an important issue in electrical distribution systems that needs to be detected promptly and identified to prevent the degradation of system reliability. This work proposes a PQD classification using a novel algorithm, comprised of the artificial bee colony (ABC) and the particle swarm optimization (PSO) algorithms, called “adaptive ABC-PSO” as the feature selection algorithm. The proposed adaptive technique is applied to a combination of ABC and PSO algorithms, and then used as the feature selection algorithm. A discrete wavelet transform is used as the feature extraction method, and a probabilistic neural network is used as the classifier. We found that the highest classification accuracy (99.31%) could be achieved through nine optimally selected features out of all 72 extracted features. Moreover, the proposed PQD classification system demonstrated high performance in a noisy environment, as well as the real distribution system. When comparing the presented PQD classification system’s performance to previous studies, PQD classification accuracy using adaptive ABC-PSO as the optimal feature selection algorithm is considered to be at a high-range scale; therefore, the adaptive ABC-PSO algorithm can be used to classify the PQD in a practical electrical distribution system.

Download Full-text

A hybrid feature selection algorithm combining ReliefF and Particle swarm optimization for high-dimensional medical data

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202948 ◽

2021 ◽

pp. 1-15

Author(s):

Zhaozhao Xu ◽

Derong Shen ◽

Yue Kou ◽

Tiezheng Nie

Keyword(s):

Feature Selection ◽

Particle Swarm Optimization ◽

Random Forest ◽

Classification Accuracy ◽

Particle Swarm ◽

Medical Data ◽

High Dimensional ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Swarm Optimization

Due to high-dimensional feature and strong correlation of features, the classification accuracy of medical data is not as good enough as expected. feature selection is a common algorithm to solve this problem, and selects effective features by reducing the dimensionality of high-dimensional data. However, traditional feature selection algorithms have the blindness of threshold setting and the search algorithms are liable to fall into a local optimal solution. Based on it, this paper proposes a hybrid feature selection algorithm combining ReliefF and Particle swarm optimization. The algorithm is mainly divided into three parts: Firstly, the ReliefF is used to calculate the feature weight, and the features are ranked by the weight. Then ranking feature is grouped according to the density equalization, where the density of features in each group is the same. Finally, the Particle Swarm Optimization algorithm is used to search the ranking feature groups, and the feature selection is performed according to a new fitness function. Experimental results show that the random forest has the highest classification accuracy on the features selected. More importantly, it has the least number of features. In addition, experimental results on 2 medical datasets show that the average accuracy of random forest reaches 90.20%, which proves that the hybrid algorithm has a certain application value.

Download Full-text

Feature selection for chemical compound extraction using wrapper approach with Naive Bayes classifier

2017 6th International Conference on Electrical Engineering and Informatics (ICEEI) ◽

10.1109/iceei.2017.8312421 ◽

2017 ◽

Cited By ~ 2

Author(s):

Basel Alshaikhdeeb ◽

Kamsuriah Ahmad

Keyword(s):

Feature Selection ◽

Chemical Compound ◽

Naive Bayes ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Wrapper Approach ◽

Selection For

Download Full-text

Improving classification accuracy using Fuzzy Clustering Coefficients of Variations (FCCV) feature selection algorithm

2014 IEEE 15th International Symposium on Computational Intelligence and Informatics (CINTI) ◽

10.1109/cinti.2014.7028666 ◽

2014 ◽

Cited By ~ 2

Author(s):

Simon Fong ◽

Justin Liang ◽

Yan Zhuang

Keyword(s):

Feature Selection ◽

Fuzzy Clustering ◽

Classification Accuracy ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Coefficients Of Variations ◽

Clustering Coefficients

Download Full-text

Improved Measures of Redundancy and Relevance for mRMR Feature Selection

Computers ◽

10.3390/computers8020042 ◽

2019 ◽

Vol 8 (2) ◽

pp. 42 ◽

Cited By ~ 1

Author(s):

Insik Jo ◽

Sangbum Lee ◽

Sejong Oh

Keyword(s):

Feature Selection ◽

Classification Accuracy ◽

Computing Time ◽

Performance Comparison ◽

Medical Data ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Learning Tasks ◽

R Value ◽

Measure Of Performance

Many biological or medical data have numerous features. Feature selection is one of the data preprocessing steps that can remove the noise from data as well as save the computing time when the dataset has several hundred thousand or more features. Another goal of feature selection is improving the classification accuracy in machine learning tasks. Minimum Redundancy Maximum Relevance (mRMR) is a well-known feature selection algorithm that selects features by calculating redundancy between features and relevance between features and class vector. mRMR adopts mutual information theory to measure redundancy and relevance. In this research, we propose a method to improve the performance of mRMR feature selection. We apply Pearson’s correlation coefficient as a measure of redundancy and R-value as a measure of relevance. To compare original mRMR and the proposed method, features were selected using both of two methods from various datasets, and then we performed a classification test. The classification accuracy was used as a measure of performance comparison. In many cases, the proposed method showed higher accuracy than original mRMR.

Download Full-text

Enhancing the Classification Accuracy in Sentiment Analysis using Joint Sentiment Topic Detection with Naive Bayes Classifier

Asian Journal of Research in Social Sciences and Humanities ◽

10.5958/2249-7315.2016.01280.6 ◽

2016 ◽

Vol 6 (12) ◽

pp. 105

Author(s):

PCD Kalaivaani ◽

R. Thangarajan

Keyword(s):

Sentiment Analysis ◽

Classification Accuracy ◽

Naive Bayes ◽

Naïve Bayes ◽

Topic Detection ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier

Download Full-text

A NOVEL FEATURE SELECTION ALGORITHM WITH SUPERVISED MUTUAL INFORMATION FOR CLASSIFICATION

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213013500279 ◽

2013 ◽

Vol 22 (04) ◽

pp. 1350027

Author(s):

JAGANATHAN PALANICHAMY ◽

KUPPUCHAMY RAMASAMY

Keyword(s):

Machine Learning ◽

Data Mining ◽

Feature Selection ◽

Mutual Information ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Class A ◽

Selection Algorithms ◽

The Relationship ◽

Class Variable

Feature selection is essential in data mining and pattern recognition, especially for database classification. During past years, several feature selection algorithms have been proposed to measure the relevance of various features to each class. A suitable feature selection algorithm normally maximizes the relevancy and minimizes the redundancy of the selected features. The mutual information measure can successfully estimate the dependency of features on the entire sampling space, but it cannot exactly represent the redundancies among features. In this paper, a novel feature selection algorithm is proposed based on maximum relevance and minimum redundancy criterion. The mutual information is used to measure the relevancy of each feature with class variable and calculate the redundancy by utilizing the relationship between candidate features, selected features and class variables. The effectiveness is tested with ten benchmarked datasets available in UCI Machine Learning Repository. The experimental results show better performance when compared with some existing algorithms.

Download Full-text

Effective Feature Selection for 5G IM Applications Traffic Classification

Mobile Information Systems ◽

10.1155/2017/6805056 ◽

2017 ◽

Vol 2017 ◽

pp. 1-12 ◽

Cited By ~ 3

Author(s):

Muhammad Shafiq ◽

Xiangzhan Yu ◽

Asif Ali Laghari ◽

Dawei Wang

Keyword(s):

Feature Selection ◽

Classification Accuracy ◽

Statistical Test ◽

Traffic Classification ◽

Features Selection ◽

Traffic Flows ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Wrapper Method ◽

Selection For

Recently, machine learning (ML) algorithms have widely been applied in Internet traffic classification. However, due to the inappropriate features selection, ML-based classifiers are prone to misclassify Internet flows as that traffic occupies majority of traffic flows. To address this problem, a novel feature selection metric named weighted mutual information (WMI) is proposed. We develop a hybrid feature selection algorithm named WMI_ACC, which filters most of the features with WMI metric. It further uses a wrapper method to select features for ML classifiers with accuracy (ACC) metric. We evaluate our approach using five ML classifiers on the two different network environment traces captured. Furthermore, we also apply Wilcoxon pairwise statistical test on the results of our proposed algorithm to find out the robust features from the selected set of features. Experimental results show that our algorithm gives promising results in terms of classification accuracy, recall, and precision. Our proposed algorithm can achieve 99% flow accuracy results, which is very promising.

Download Full-text

Sentiment Analysis Of Full Day School Policy Comment Using Naïve Bayes Classifier Algorithm

SinkrOn ◽

10.33395/sinkron.v5i1.10564 ◽

2020 ◽

Vol 5 (1) ◽

Author(s):

Miftahul Kahfi Al Fath ◽

Arini Arini ◽

Nasrul Hakiem

Keyword(s):

Feature Selection ◽

Naive Bayes ◽

School Policy ◽

Naïve Bayes ◽

Day School ◽

Training Data ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Full Day

Sentiment analysis is an important and emerging research topic today. Sentiment analysis is done to see opinion or tendency of opinion to a problem or object by someone, whether it tends to have a negative or positive view. The main purpose of this study is to find out public sentiment on Full Day school's policy comment from Facebook Page of Kemendikbud RI and to find out the performance of the Naïve Bayes Classifier Algorithm. In this study, the authors used the Naïve Bayes Classifier algorithm with trigram and quad ram character feature selection with two different training data models and labeling of training data using Lexicon Based method in the classification of public sentiment toward the Full day school policy. The result of this research shows that public negative sentiment toward Full Day School policy is more than positive or neutral sentiment. The highest accuracy value is the Naïve Bayes Classifier algorithm with trigram feature selection of 300 data training models with a value of 80%. The greater of training data and feature selection used on the Naïve Bayes Classifier Algorithm affected the accurate result.

Download Full-text

MapReduce Implementation of a Multinomial and Mixed Naive Bayes Classifier

International Journal of Intelligent Information Technologies ◽

10.4018/ijiit.2020040101 ◽

2020 ◽

Vol 16 (2) ◽

pp. 1-23 ◽

Cited By ~ 1

Author(s):

Sikha Bagui ◽

Keerthi Devulapalli ◽

Sharon John

Keyword(s):

Big Data ◽

Gaussian Distribution ◽

Classification Accuracy ◽

Naive Bayes ◽

Multinomial Distribution ◽

Naïve Bayes ◽

Probability Estimation ◽

Bayes Classifier ◽

Discrete Values ◽

Block Sizes

This study presents an efficient way to deal with discrete as well as continuous values in Big Data in a parallel Naïve Bayes implementation on Hadoop's MapReduce environment. Two approaches were taken: (i) discretizing continuous values using a binning method; and (ii) using a multinomial distribution for probability estimation of discrete values and a Gaussian distribution for probability estimation of continuous values. The models were analyzed and compared for performance with respect to run time and classification accuracy for varying data sizes, data block sizes, and map memory sizes.

Download Full-text