scholarly journals Feature Selection for High-Dimensional Datasets through a Novel Artificial Bee Colony Framework

Algorithms ◽  
2021 ◽  
Vol 14 (11) ◽  
pp. 324
Author(s):  
Yuanzi Zhang ◽  
Jing Wang ◽  
Xiaolin Li ◽  
Shiguo Huang ◽  
Xiuli Wang

There are generally many redundant and irrelevant features in high-dimensional datasets, which leads to the decline of classification performance and the extension of execution time. To tackle this problem, feature selection techniques are used to screen out redundant and irrelevant features. The artificial bee colony (ABC) algorithm is a popular meta-heuristic algorithm with high exploration and low exploitation capacities. To balance between both capacities of the ABC algorithm, a novel ABC framework is proposed in this paper. Specifically, the solutions are first updated by the process of employing bees to retain the original exploration ability, so that the algorithm can explore the solution space extensively. Then, the solutions are modified by the updating mechanism of an algorithm with strong exploitation ability in the onlooker bee phase. Finally, we remove the scout bee phase from the framework, which can not only reduce the exploration ability but also speed up the algorithm. In order to verify our idea, the operators of the grey wolf optimization (GWO) algorithm and whale optimization algorithm (WOA) are introduced into the framework to enhance the exploitation capability of onlooker bees, named BABCGWO and BABCWOA, respectively. It has been found that these two algorithms are superior to four state-of-the-art feature selection algorithms using 12 high-dimensional datasets, in terms of the classification error rate, size of feature subset and execution speed.

2016 ◽  
Vol 2016 ◽  
pp. 1-6 ◽  
Author(s):  
Gürcan Yavuz ◽  
Doğan Aydin

Optimal feature subset selection is an important and a difficult task for pattern classification, data mining, and machine intelligence applications. The objective of the feature subset selection is to eliminate the irrelevant and noisy feature in order to select optimum feature subsets and increase accuracy. The large number of features in a dataset increases the computational complexity thus leading to performance degradation. In this paper, to overcome this problem, angle modulation technique is used to reduce feature subset selection problem to four-dimensional continuous optimization problem instead of presenting the problem as a high-dimensional bit vector. To present the effectiveness of the problem presentation with angle modulation and to determine the efficiency of the proposed method, six variants of Artificial Bee Colony (ABC) algorithms employ angle modulation for feature selection. Experimental results on six high-dimensional datasets show that Angle Modulated ABC algorithms improved the classification accuracy with fewer feature subsets.


2021 ◽  
Author(s):  
Jing Wang ◽  
Yuanzi Zhang ◽  
Minglin Hong ◽  
Haiyang He ◽  
Shiguo Huang

Abstract Feature selection is an important data preprocessing method in data mining and machine learning, yet it faces the challenge of “curse of dimensionality” when dealing with high-dimensional data. In this paper, a self-adaptive level-based learning artificial bee colony (SLLABC) algorithm is proposed for high-dimensional feature selection problem. The SLLABC algorithm includes three new mechanisms: (1) A novel level-based learning mechanism is introduced to accelerate the convergence of the basic artificial bee colony algorithm, which divides the population into several levels and the individuals on each level learn from the individuals on higher levels, especially, the individuals on the highest level learn from each other. (2) A self-adaptive method is proposed to keep the balance between exploration and exploitation abilities, which takes the diversity of population into account to determine the number of levels. The lower the diversity is, the fewer the levels are divided. (3) A new update mechanism is proposed to reduce the number of selected features. In this mechanism, if the error rate of an offspring is higher than or is equal to that of its parent but selects more features, then the offspring is discarded and the parent is retained, otherwise, the offspring replaces its parent. Further, we discuss and analyze the contribution of these novelties to the diversity of population and the performance of classification. Finally, the results, compared with 8 state-of-the-art algorithms on 12 high-dimensional datasets, confirm the competitive performance of the proposed SLLABC on both classification accuracy and the size of the feature subset.


2020 ◽  
Author(s):  
Mumine Kaya Keles ◽  
Umit Kilic ◽  
Abdullah Emre Keles

Abstract Datasets have relevant and irrelevant features whose evaluations are fundamental for classification or clustering processes. The effects of these relevant features make classification accuracy more accurate and stable. At this point, optimization methods are used for feature selection process. This process is a feature reduction process finding the most relevant feature subset without decrement of the accuracy rate obtained by original feature sets. Varied nature inspiration-based optimization algorithms have been proposed as feature selector. The density of data in construction projects and the inability of extracting these data cause various losses in field studies. In this respect, the behaviors of leaders are important in the selection and efficient use of these data. The objective of this study is implementing Artificial Bee Colony (ABC) algorithm as a feature selection method to predict the leadership perception of the construction employees. When Random Forest, Sequential Minimal Optimization and K-Nearest Neighborhood (KNN) are used as classifier, 84.1584% as highest accuracy result and 0.805 as highest F-Measure result were obtained by using KNN and Random Forest classifier with proposed ABC Algorithm as feature selector. The results show that a nature inspiration-based optimization algorithm like ABC algorithm as feature selector is satisfactory in prediction of the Construction Employee’s Leadership Perception.


2020 ◽  
Vol 43 (1) ◽  
pp. 103-125
Author(s):  
Yi Zhong ◽  
Jianghua He ◽  
Prabhakar Chalise

With the advent of high throughput technologies, the high-dimensional datasets are increasingly available. This has not only opened up new insight into biological systems but also posed analytical challenges. One important problem is the selection of informative feature-subset and prediction of the future outcome. It is crucial that models are not overfitted and give accurate results with new data. In addition, reliable identification of informative features with high predictive power (feature selection) is of interests in clinical settings. We propose a two-step framework for feature selection and classification model construction, which utilizes a nested and repeated cross-validation method. We evaluated our approach using both simulated data and two publicly available gene expression datasets. The proposed method showed comparatively better predictive accuracy for new cases than the standard cross-validation method.


Author(s):  
Shunta Imamura ◽  
◽  
Toshiya Kaihara ◽  
Nobutada Fujii ◽  
Daisuke Kokuryo ◽  
...  

The artificial bee colony (ABC) algorithm, which is inspired by the foraging behavior of honey bees, is one of the swarm intelligence systems. This algorithm can provide an efficient exploration of the optimal solutions using three different types of agents for optimization problems with multimodal functions. However, the performance of the conventional ABC algorithm decreases for high-dimensional problems. In this study, we propose an improved algorithm using the network structure of agents to enhance the ability for global search. The efficacy of the proposed algorithm is evaluated by performing computer experiments with high-dimensional benchmark functions.


2021 ◽  
Author(s):  
E Hancer ◽  
Bing Xue ◽  
Mengjie Zhang ◽  
D Karaboga ◽  
B Akay

© 2015 IEEE. Feature selection often involves two conflicting objectives of minimizing the feature subset size and the maximizing the classification accuracy. In this paper, a multi-objective artificial bee colony (MOABC) framework is developed for feature selection in classification, and a new fuzzy mutual information based criterion is proposed to evaluate the relevance of feature subsets. Three new multi-objective feature selection approaches are proposed by integrating MOABC with three filter fitness evaluation criteria, which are mutual information, fuzzy mutual information and the proposed fuzzy mutual information. The proposed multi-objective feature selection approaches are examined by comparing them with three single-objective ABC-based feature selection approaches on six commonly used datasets. The results show that the proposed approaches are able to achieve better performance than the original feature set in terms of the classification accuracy and the number of features. By using the same evaluation criterion, the proposed multi-objective algorithms generally perform better than the single-objective methods, especially in terms of reducing the number of features. Furthermore, the proposed fuzzy mutual information criterion outperforms mutual information and the original fuzzy mutual information in both single-objective and multi-objective manners. This work is the first study on multi-objective ABC for filter feature selection in classification, which shows that multi-objective ABC can be effectively used to address feature selection problems.


2020 ◽  
Author(s):  
Amjad Osmani ◽  
Jamshid Bagherzadeh Mohasefi ◽  
Farhad Soleimanian Gharehchopogh

Abstract Artificial bee colony (ABC) optimization and imperialist competitive algorithm (ICA) are two famous metaheuristic methods. In ABC, exploration is good because each bee moves toward random neighbors in the first and second phases. In ABC, exploitation is poor because it does not try to examine a promising region of search space carefully to see if it contains a good local minimum. In this study, ICA is considered to improve ABC exploitation, and two novel swarm-based hybrid methods called ABC–ICA and ABC–ICA1 are proposed, which combine the characteristics of ABC and ICA. The proposed methods improve the evaluations results in both continuous and discrete environments compared to the baseline methods. The second method improves the first optimization method as well. Feature selection can be considered to be an optimization problem because selecting the appropriate feature subset is very important and the action of appropriate feature selection has a great influence on the efficiency of classifier algorithms in supervised methods. Therefore, to focus on feature selection is a key issue and is very important. In this study, different discrete versions of the proposed methods have been introduced that can be used in feature selection and feature scoring problems, which have been successful in evaluations. In this study, a problem called cold start is introduced, and a solution is presented that has a great impact on the efficiency of the proposed methods in feature scoring problem. A total of 16 UCI data sets and 2 Amazon data sets have been used for the evaluation of the proposed methods in feature selection problem. The parameters that have been compared are classification accuracy and the number of features required for classification. Also, the proposed methods can be used to create a proper sentiment dictionary. Evaluation results confirm the better performance of the proposed methods in most experiments.


2021 ◽  
Author(s):  
E Hancer ◽  
Bing Xue ◽  
Mengjie Zhang ◽  
D Karaboga ◽  
B Akay

© 2015 IEEE. Feature selection often involves two conflicting objectives of minimizing the feature subset size and the maximizing the classification accuracy. In this paper, a multi-objective artificial bee colony (MOABC) framework is developed for feature selection in classification, and a new fuzzy mutual information based criterion is proposed to evaluate the relevance of feature subsets. Three new multi-objective feature selection approaches are proposed by integrating MOABC with three filter fitness evaluation criteria, which are mutual information, fuzzy mutual information and the proposed fuzzy mutual information. The proposed multi-objective feature selection approaches are examined by comparing them with three single-objective ABC-based feature selection approaches on six commonly used datasets. The results show that the proposed approaches are able to achieve better performance than the original feature set in terms of the classification accuracy and the number of features. By using the same evaluation criterion, the proposed multi-objective algorithms generally perform better than the single-objective methods, especially in terms of reducing the number of features. Furthermore, the proposed fuzzy mutual information criterion outperforms mutual information and the original fuzzy mutual information in both single-objective and multi-objective manners. This work is the first study on multi-objective ABC for filter feature selection in classification, which shows that multi-objective ABC can be effectively used to address feature selection problems.<div><br></div><div><div><table><tr><td>© 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.</td></tr></table></div></div>


Sign in / Sign up

Export Citation Format

Share Document