scholarly journals Swarm Intelligence Algorithms for Feature Selection: A Review

2018 ◽  
Vol 8 (9) ◽  
pp. 1521 ◽  
Author(s):  
Lucija Brezočnik ◽  
Iztok Fister ◽  
Vili Podgorelec

The increasingly rapid creation, sharing and exchange of information nowadays put researchers and data scientists ahead of a challenging task of data analysis and extracting relevant information out of data. To be able to learn from data, the dimensionality of the data should be reduced first. Feature selection (FS) can help to reduce the amount of data, but it is a very complex and computationally demanding task, especially in the case of high-dimensional datasets. Swarm intelligence (SI) has been proved as a technique which can solve NP-hard (Non-deterministic Polynomial time) computational problems. It is gaining popularity in solving different optimization problems and has been used successfully for FS in some applications. With the lack of comprehensive surveys in this field, it was our objective to fill the gap in coverage of SI algorithms for FS. We performed a comprehensive literature review of SI algorithms and provide a detailed overview of 64 different SI algorithms for FS, organized into eight major taxonomic categories. We propose a unified SI framework and use it to explain different approaches to FS. Different methods, techniques, and their settings are explained, which have been used for various FS aspects. The datasets used most frequently for the evaluation of SI algorithms for FS are presented, as well as the most common application areas. The guidelines on how to develop SI approaches for FS are provided to support researchers and analysts in their data mining tasks and endeavors while existing issues and open questions are being discussed. In this manner, using the proposed framework and the provided explanations, one should be able to design an SI approach to be used for a specific FS problem.

2020 ◽  
Vol 43 (1) ◽  
pp. 103-125
Author(s):  
Yi Zhong ◽  
Jianghua He ◽  
Prabhakar Chalise

With the advent of high throughput technologies, the high-dimensional datasets are increasingly available. This has not only opened up new insight into biological systems but also posed analytical challenges. One important problem is the selection of informative feature-subset and prediction of the future outcome. It is crucial that models are not overfitted and give accurate results with new data. In addition, reliable identification of informative features with high predictive power (feature selection) is of interests in clinical settings. We propose a two-step framework for feature selection and classification model construction, which utilizes a nested and repeated cross-validation method. We evaluated our approach using both simulated data and two publicly available gene expression datasets. The proposed method showed comparatively better predictive accuracy for new cases than the standard cross-validation method.


2019 ◽  
Vol 10 (2) ◽  
pp. 1-20 ◽  
Author(s):  
Sujata Dash ◽  
Ruppa Thulasiram ◽  
Parimala Thulasiraman

Conventional algorithms such as gradient-based optimization methods usually struggle to deal with high-dimensional non-linear problems and often land up with local minima. Recently developed nature-inspired optimization algorithms are the best approaches for finding global solutions for combinatorial optimization problems like microarray datasets. In this article, a novel hybrid swarm intelligence-based meta-search algorithm is proposed by combining a heuristic method called conditional mutual information maximization with chaos-based firefly algorithm. The combined algorithm is computed in an iterative manner to boost the sharing of information between fireflies, enhancing the search efficiency of chaos-based firefly algorithm and reduces the computational complexities of feature selection. The meta-search model is implemented using a well-established classifier, such as support vector machine as the modeler in a wrapper approach. The chaos-based firefly algorithm increases the global search mobility of fireflies. The efficiency of the model is studied over high-dimensional disease datasets and compared with standard firefly algorithm, particle swarm optimization, and genetic algorithm in the same experimental environment to establish its superiority of feature selection over selected counterparts.


Feature Selection in High Dimensional Datasets is a combinatorial problem as it selects the optimal subsets from N dimensional data having 2N possible subsets. Genetic Algorithms are generally a good choice for feature selection in large datasets, though for some high dimensional problems it may take varied amount of time - few seconds, few hours or even few days. Therefore, it is important to use Genetic Algorithms that can give quality results in reasonably acceptable time limit. For this purpose, it is becoming necessary to implement Genetic Algorithms in an efficient manner. In this paper, a Master Slave Parallel Genetic Algorithm is implemented as a Feature Selection procedure to diminish the time intricacies of sequential genetic algorithm. This paper describes the speed gains in parallel Master-Slave Genetic Algorithm and also discusses the theoretical analysis of optimal number of slaves required for an efficient master slave implementation. The experiments are performed on three high-dimensional gene expression data. As Genetic Algorithm is a wrapper technique and takes more time to find the importance of any feature, Information Gain technique is used first as pre-processing task to remove the irrelevant features.


2018 ◽  
Vol 8 (2) ◽  
pp. 1-24 ◽  
Author(s):  
Abdullah Saeed Ghareb ◽  
Azuraliza Abu Bakara ◽  
Qasem A. Al-Radaideh ◽  
Abdul Razak Hamdan

The filtering of a large amount of data is an important process in data mining tasks, particularly for the categorization of unstructured high dimensional data. Therefore, a feature selection process is desired to reduce the space of high dimensional data into small relevant subset dimensions that represent the best features for text categorization. In this article, three enhanced filter feature selection methods, Category Relevant Feature Measure, Modified Category Discriminated Measure, and Odd Ratio2, are proposed. These methods combine the relevant information about features in both the inter- and intra-category. The effectiveness of the proposed methods with Naïve Bayes and associative classification is evaluated by traditional measures of text categorization, namely, macro-averaging of precision, recall, and F-measure. Experiments are conducted on three Arabic text datasets used for text categorization. The experimental results showed that the proposed methods are able to achieve better and comparable results when compared to 12 well known traditional methods.


Sign in / Sign up

Export Citation Format

Share Document