A Scalable Memetic Algorithm for Simultaneous Instance and Feature Selection

Nicolás García-Pedrajas; Aida de Haro-García; Javier Pérez-Rodríguez

doi:10.1162/evco_a_00102

A Scalable Memetic Algorithm for Simultaneous Instance and Feature Selection

Evolutionary Computation ◽

10.1162/evco_a_00102 ◽

2014 ◽

Vol 22 (1) ◽

pp. 1-45 ◽

Cited By ~ 20

Author(s):

Nicolás García-Pedrajas ◽

Aida de Haro-García ◽

Javier Pérez-Rodríguez

Keyword(s):

Feature Selection ◽

Memetic Algorithm ◽

Fitness Function ◽

Large Datasets ◽

Support Vector ◽

Instance Selection ◽

Huge Amount ◽

Explanatory Variables ◽

Vector Machines ◽

Testing Error

Instance selection is becoming increasingly relevant due to the huge amount of data that is constantly produced in many fields of research. At the same time, most of the recent pattern recognition problems involve highly complex datasets with a large number of possible explanatory variables. For many reasons, this abundance of variables significantly harms classification or recognition tasks. There are efficiency issues, too, because the speed of many classification algorithms is largely improved when the complexity of the data is reduced. One of the approaches to address problems that have too many features or instances is feature or instance selection, respectively. Although most methods address instance and feature selection separately, both problems are interwoven, and benefits are expected from facing these two tasks jointly. This paper proposes a new memetic algorithm for dealing with many instances and many features simultaneously by performing joint instance and feature selection. The proposed method performs four different local search procedures with the aim of obtaining the most relevant subsets of instances and features to perform an accurate classification. A new fitness function is also proposed that enforces instance selection but avoids putting too much pressure on removing features. We prove experimentally that this fitness function improves the results in terms of testing error. Regarding the scalability of the method, an extension of the stratification approach is developed for simultaneous instance and feature selection. This extension allows the application of the proposed algorithm to large datasets. An extensive comparison using 55 medium to large datasets from the UCI Machine Learning Repository shows the usefulness of our method. Additionally, the method is applied to 30 large problems, with very good results. The accuracy of the method for class-imbalanced problems in a set of 40 datasets is shown. The usefulness of the method is also tested using decision trees and support vector machines as classification methods.

Download Full-text

feamiR: Feature selection based on Genetic Algorithms for predicting miRNA-mRNA interactions

10.1101/2020.12.23.424130 ◽

2020 ◽

Author(s):

Eleanor C. Williams ◽

Anisoara Calinescu ◽

Irina Mohorianu

Keyword(s):

Genetic Algorithms ◽

Feature Selection ◽

Large Datasets ◽

Short Length ◽

Support Vector ◽

Tissue Cell ◽

Seed Region ◽

Translation Inhibition ◽

Vector Machines ◽

Flanking Regions

AbstractmicroRNAs play a key role in RNA interference, the sequence-driven targeting of mRNAs that regulates their translation to proteins, through translation inhibition or the degradation of the mRNA. Around ~ 30% of animal genes may be tuned by microRNAs. The prediction of miRNA/mRNA interactions is hindered by the short length of the interaction (seed) region (~7- 8nt). We collate several large datasets overviewing validated interactions and propose feamiR, a novel pipeline comprising optimised classification approaches (Decision Trees/Random Forests and an efficient feature selection based on embryonic Genetic Algorithms used in conjunction with Support Vector Machines) aimed at identifying discriminative nucleotide features, on the seed, compensatory and flanking regions, that increase the prediction accuracy for interactions. Common and specific combinations of features illustrate differences between reference organisms, validation techniques or tissue/cell localisation. feamiR revealed new key positions that drive the miRNA/mRNA interactions, leading to novel questions on the mode-of-action of miRNAs.

Download Full-text

Minimax feature selection problem for constructing a classifier using support vector machines

Computational Mathematics and Mathematical Physics ◽

10.1134/s0965542510050143 ◽

2010 ◽

Vol 50 (5) ◽

pp. 917-925

Author(s):

Yu. V. Goncharov

Keyword(s):

Feature Selection ◽

Support Vector Machines ◽

Selection Problem ◽

Support Vector ◽

Feature Selection Problem ◽

Vector Machines

Download Full-text

High dimensional data classification and feature selection using support vector machines

European Journal of Operational Research ◽

10.1016/j.ejor.2017.08.040 ◽

2018 ◽

Vol 265 (3) ◽

pp. 993-1004 ◽

Cited By ~ 63

Author(s):

Bissan Ghaddar ◽

Joe Naoum-Sawaya

Keyword(s):

Feature Selection ◽

Support Vector Machines ◽

High Dimensional Data ◽

Data Classification ◽

High Dimensional ◽

Support Vector ◽

Vector Machines

Download Full-text

Radar Emitter Signal Recognition Based on Feature Selection and Support Vector Machines

Lecture Notes in Computer Science - Advances in Intelligent Computing ◽

10.1007/11538059_74 ◽

2005 ◽

pp. 707-716 ◽

Cited By ~ 2

Author(s):

Gexiang Zhang ◽

Zhexin Cao ◽

Yajun Gu ◽

Weidong Jin ◽

Laizhao Hu

Keyword(s):

Feature Selection ◽

Support Vector Machines ◽

Support Vector ◽

Signal Recognition ◽

Vector Machines

Download Full-text

Optimizing Training Data and Hyperparameters of Support Vector Machines Using a Memetic Algorithm

Advances in Intelligent Systems and Computing - Man-Machine Interactions 6 ◽

10.1007/978-3-030-31964-9_22 ◽

2019 ◽

pp. 229-238

Author(s):

Wojciech Dudzik ◽

Michal Kawulok ◽

Jakub Nalepa

Keyword(s):

Support Vector Machines ◽

Memetic Algorithm ◽

Training Data ◽

Support Vector ◽

Vector Machines

Download Full-text

Purity Filtering: An Instance Selection Method for Support Vector Machines

Lecture Notes in Computer Science - Artificial Intelligence XXXVI ◽

10.1007/978-3-030-34885-4_2 ◽

2019 ◽

pp. 21-35

Author(s):

David Morán-Pomés ◽

Lluís A. Belanche-Muñoz

Keyword(s):

Support Vector Machines ◽

Selection Method ◽

Support Vector ◽

Instance Selection ◽

Vector Machines

Download Full-text

Inspeção Automática de Defeitos em Madeiras de Pinus usando Visão Computacional

Revista de Informática Teórica e Aplicada ◽

10.22456/2175-2745.7033 ◽

2008 ◽

Vol 15 (2) ◽

pp. 203-218

Author(s):

Luiz E. S. Oliveira ◽

Paulo R. Cavalin ◽

Alceu S. Britto Jr ◽

Alessandro L. Koerich

Keyword(s):

Neural Networks ◽

Genetic Algorithms ◽

Feature Selection ◽

Defect Detection ◽

Color Image ◽

Support Vector ◽

Grayscale Image ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Vector Machines

This paper addresses the issue of detecting defects in Pine wood using features extracted from grayscale images. The feature set proposed here is based on the concept of texture and it is computed from the co-occurrence matrices. The features provide measures of properties such as smoothness, coarseness, and regularity. Comparative experiments using a color image based feature set extracted from percentile histograms are carried to demonstrate the efficiency of the proposed feature set. Two different learning paradigms, neural networks and support vector machines, and a feature selection algorithm based on multi-objective genetic algorithms were considered in our experiments. The experimental results show that after feature selection, the grayscale image based feature set achieves very competitive performance for the problem of wood defect detection relative to the color image based features.

Download Full-text

A Survey on Phishing Detection and The Importance of Feature Selection In Data Mining Classification Algorithms

Issue 4 - Journal of Science and Technology ◽

10.46243/jst.2020.v5.i6.pp11-18 ◽

2020 ◽

pp. 11-18

Keyword(s):

Data Mining ◽

Feature Selection ◽

Support Vector ◽

Classification Algorithms ◽

End User ◽

Preparation Methods ◽

Survey Paper ◽

Vector Machines ◽

Feature Selection Techniques ◽

Phishing Detection

: In this era of Internet, the issue of security of information is at its peak. One of the main threats in this cyber world is phishing attacks which is an email or website fraud method that targets the genuine webpage or an email and hacks it without the consent of the end user. There are various techniques which help to classify whether the website or an email is legitimate or fake. The major contributors in the process of detection of these phishing frauds include the classification algorithms, feature selection techniques or dataset preparation methods and the feature extraction that plays an important role in detection as well as in prevention of these attacks. This Survey Paper studies the effect of all these contributors and the approaches that are applied in the study conducted on the recent papers. Some of the classification algorithms that are implemented includes Decision tree, Random Forest , Support Vector Machines, Logistic Regression , Lazy K Star, Naive Bayes and J48 etc.

Download Full-text