Swarm Intelligence Algorithms for Feature Selection: A Review

Lucija Brezočnik; Iztok Fister; Vili Podgorelec

doi:10.3390/app8091521

Swarm Intelligence Algorithms for Feature Selection: A Review

Applied Sciences ◽

10.3390/app8091521 ◽

2018 ◽

Vol 8 (9) ◽

pp. 1521 ◽

Cited By ~ 47

Author(s):

Lucija Brezočnik ◽

Iztok Fister ◽

Vili Podgorelec

Keyword(s):

Feature Selection ◽

Swarm Intelligence ◽

Optimization Problems ◽

Relevant Information ◽

High Dimensional ◽

Comprehensive Literature Review ◽

Open Questions ◽

Common Application ◽

High Dimensional Datasets ◽

Taxonomic Categories

The increasingly rapid creation, sharing and exchange of information nowadays put researchers and data scientists ahead of a challenging task of data analysis and extracting relevant information out of data. To be able to learn from data, the dimensionality of the data should be reduced first. Feature selection (FS) can help to reduce the amount of data, but it is a very complex and computationally demanding task, especially in the case of high-dimensional datasets. Swarm intelligence (SI) has been proved as a technique which can solve NP-hard (Non-deterministic Polynomial time) computational problems. It is gaining popularity in solving different optimization problems and has been used successfully for FS in some applications. With the lack of comprehensive surveys in this field, it was our objective to fill the gap in coverage of SI algorithms for FS. We performed a comprehensive literature review of SI algorithms and provide a detailed overview of 64 different SI algorithms for FS, organized into eight major taxonomic categories. We propose a unified SI framework and use it to explain different approaches to FS. Different methods, techniques, and their settings are explained, which have been used for various FS aspects. The datasets used most frequently for the evaluation of SI algorithms for FS are presented, as well as the most common application areas. The guidelines on how to develop SI approaches for FS are provided to support researchers and analysts in their data mining tasks and endeavors while existing issues and open questions are being discussed. In this manner, using the proposed framework and the provided explanations, one should be able to design an SI approach to be used for a specific FS problem.

Download Full-text

A Hybrid Swarm Intelligence Based Feature Selection Algorithm for High Dimensional Datasets

IJCI. International Journal of Computers and Information ◽

10.21608/ijci.2021.62499.1040 ◽

2021 ◽

Vol 0 (0) ◽

pp. 0-0

Author(s):

Jomana Yousef ◽

Anas Youssef ◽

Arabi Keshk

Keyword(s):

Feature Selection ◽

Swarm Intelligence ◽

High Dimensional ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Hybrid Swarm ◽

High Dimensional Datasets

Download Full-text

Ranking Based Unsupervised Feature Selection Methods: An Empirical Comparative Study in High Dimensional Datasets

Advances in Soft Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-030-04491-6_16 ◽

2018 ◽

pp. 205-218

Author(s):

Saúl Solorio-Fernández ◽

J. Ariel Carrasco-Ochoa ◽

José Fco. Martínez-Trinidad

Keyword(s):

Feature Selection ◽

Comparative Study ◽

High Dimensional ◽

Selection Methods ◽

Unsupervised Feature Selection ◽

High Dimensional Datasets

Download Full-text

Nested and Repeated Cross Validation for Classification Model With High-Dimensional Data

Revista Colombiana de Estadística ◽

10.15446/rce.v43n1.80000 ◽

2020 ◽

Vol 43 (1) ◽

pp. 103-125

Author(s):

Yi Zhong ◽

Jianghua He ◽

Prabhakar Chalise

Keyword(s):

Feature Selection ◽

Cross Validation ◽

Predictive Accuracy ◽

Simulated Data ◽

Classification Model ◽

High Dimensional ◽

Clinical Settings ◽

Feature Subset ◽

Validation Method ◽

High Dimensional Datasets

With the advent of high throughput technologies, the high-dimensional datasets are increasingly available. This has not only opened up new insight into biological systems but also posed analytical challenges. One important problem is the selection of informative feature-subset and prediction of the future outcome. It is crucial that models are not overfitted and give accurate results with new data. In addition, reliable identification of informative features with high predictive power (feature selection) is of interests in clinical settings. We propose a two-step framework for feature selection and classification model construction, which utilizes a nested and repeated cross-validation method. We evaluated our approach using both simulated data and two publicly available gene expression datasets. The proposed method showed comparatively better predictive accuracy for new cases than the standard cross-validation method.

Download Full-text

Modified Firefly Algorithm With Chaos Theory for Feature Selection

International Journal of Swarm Intelligence Research ◽

10.4018/ijsir.2019040101 ◽

2019 ◽

Vol 10 (2) ◽

pp. 1-20 ◽

Cited By ~ 2

Author(s):

Sujata Dash ◽

Ruppa Thulasiram ◽

Parimala Thulasiraman

Keyword(s):

Feature Selection ◽

Firefly Algorithm ◽

Optimization Problems ◽

Search Algorithm ◽

Heuristic Method ◽

High Dimensional ◽

Support Vector ◽

Combinatorial Optimization Problems ◽

Meta Search ◽

Combined Algorithm

Conventional algorithms such as gradient-based optimization methods usually struggle to deal with high-dimensional non-linear problems and often land up with local minima. Recently developed nature-inspired optimization algorithms are the best approaches for finding global solutions for combinatorial optimization problems like microarray datasets. In this article, a novel hybrid swarm intelligence-based meta-search algorithm is proposed by combining a heuristic method called conditional mutual information maximization with chaos-based firefly algorithm. The combined algorithm is computed in an iterative manner to boost the sharing of information between fireflies, enhancing the search efficiency of chaos-based firefly algorithm and reduces the computational complexities of feature selection. The meta-search model is implemented using a well-established classifier, such as support vector machine as the modeler in a wrapper approach. The chaos-based firefly algorithm increases the global search mobility of fireflies. The efficiency of the model is studied over high-dimensional disease datasets and compared with standard firefly algorithm, particle swarm optimization, and genetic algorithm in the same experimental environment to establish its superiority of feature selection over selected counterparts.

Download Full-text

A Novel Algorithm for Clustering and Feature Selection of High Dimensional Datasets

Advances in Modelling and Analysis B ◽

10.18280/ama_b.600301 ◽

2017 ◽

Vol 60 (3) ◽

pp. 525-538

Author(s):

Thulasi Bikku ◽

Alapati Priya

Keyword(s):

Feature Selection ◽

High Dimensional ◽

High Dimensional Datasets ◽

Selection Of ◽

Novel Algorithm

Download Full-text

A Master Slave Parallel Genetic Algorithm for Feature Selection in High Dimensional Datasets

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c4184.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 379-384

Keyword(s):

Genetic Algorithm ◽

Genetic Algorithms ◽

Feature Selection ◽

Information Gain ◽

Optimal Number ◽

Good Choice ◽

High Dimensional ◽

Parallel Genetic Algorithm ◽

Efficient Manner ◽

High Dimensional Datasets

Feature Selection in High Dimensional Datasets is a combinatorial problem as it selects the optimal subsets from N dimensional data having 2N possible subsets. Genetic Algorithms are generally a good choice for feature selection in large datasets, though for some high dimensional problems it may take varied amount of time - few seconds, few hours or even few days. Therefore, it is important to use Genetic Algorithms that can give quality results in reasonably acceptable time limit. For this purpose, it is becoming necessary to implement Genetic Algorithms in an efficient manner. In this paper, a Master Slave Parallel Genetic Algorithm is implemented as a Feature Selection procedure to diminish the time intricacies of sequential genetic algorithm. This paper describes the speed gains in parallel Master-Slave Genetic Algorithm and also discusses the theoretical analysis of optimal number of slaves required for an efficient master slave implementation. The experiments are performed on three high-dimensional gene expression data. As Genetic Algorithm is a wrapper technique and takes more time to find the importance of any feature, Information Gain technique is used first as pre-processing task to remove the irrelevant features.

Download Full-text

A sequential cosine similarity based feature selection technique for high dimensional datasets

2015 39th National Systems Conference (NSC) ◽

10.1109/natsys.2015.7489113 ◽

2015 ◽

Cited By ~ 3

Author(s):

Vimal Kumar Dubey ◽

Amit Kumar Saxena

Keyword(s):

Feature Selection ◽

Cosine Similarity ◽

High Dimensional ◽

Feature Selection Technique ◽

Selection Technique ◽

High Dimensional Datasets

Download Full-text

Missing Data Estimation in High-Dimensional Datasets: A Swarm Intelligence-Deep Neural Network Approach

Lecture Notes in Computer Science - Advances in Swarm Intelligence ◽

10.1007/978-3-319-41000-5_26 ◽

2016 ◽

pp. 259-270 ◽

Cited By ~ 13

Author(s):

Collins Leke ◽

Tshilidzi Marwala

Keyword(s):

Neural Network ◽

Missing Data ◽

Swarm Intelligence ◽

Deep Neural Network ◽

High Dimensional ◽

Network Approach ◽

Neural Network Approach ◽

Missing Data Estimation ◽

High Dimensional Datasets ◽

Data Estimation

Download Full-text

A novel hybrid feature selection based on ReliefF and binary dragonfly for high dimensional datasets

2019 9th International Conference on Computer and Knowledge Engineering (ICCKE) ◽

10.1109/iccke48569.2019.8965106 ◽

2019 ◽

Cited By ~ 1

Author(s):

Atefe Asadi Karizaki ◽

Mohammadreza Tavassoli

Keyword(s):

Feature Selection ◽

High Dimensional ◽

High Dimensional Datasets

Download Full-text

Enhanced Filter Feature Selection Methods for Arabic Text Categorization

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2018040101 ◽

2018 ◽

Vol 8 (2) ◽

pp. 1-24 ◽

Cited By ~ 1

Author(s):

Abdullah Saeed Ghareb ◽

Azuraliza Abu Bakara ◽

Qasem A. Al-Radaideh ◽

Abdul Razak Hamdan

Keyword(s):

Feature Selection ◽

Text Categorization ◽

Selection Process ◽

High Dimensional Data ◽

Relevant Information ◽

High Dimensional ◽

Arabic Text ◽

Relevant Feature ◽

Associative Classification ◽

Selection Methods

The filtering of a large amount of data is an important process in data mining tasks, particularly for the categorization of unstructured high dimensional data. Therefore, a feature selection process is desired to reduce the space of high dimensional data into small relevant subset dimensions that represent the best features for text categorization. In this article, three enhanced filter feature selection methods, Category Relevant Feature Measure, Modified Category Discriminated Measure, and Odd Ratio2, are proposed. These methods combine the relevant information about features in both the inter- and intra-category. The effectiveness of the proposed methods with Naïve Bayes and associative classification is evaluated by traditional measures of text categorization, namely, macro-averaging of precision, recall, and F-measure. Experiments are conducted on three Arabic text datasets used for text categorization. The experimental results showed that the proposed methods are able to achieve better and comparable results when compared to 12 well known traditional methods.

Download Full-text