Ensemble of Clustering Approaches for Feature Selection of High Dimensional Data

Clustering is the most complex in multi/high dimensional data because of sub feature selection from overall features present in categorical data sources. Sub set feature be the aggressive approach to decrease feature dimensionality in mining of data, identification of patterns. Main aim behind selection of feature with respect to selection of optimal feature and decrease the redundancy. In-order to compute with redundant/irrelevant features in high dimensional sample data exploration based on feature selection calculation with data granular described in this document. Propose aNovel Granular Feature Multi-variant Clustering based Genetic Algorithm (NGFMCGA) model to evaluate the performance results in this implementation. This model main consists two phases, in first phase, based on theoretic graph grouping procedure divide features into different clusters, in second phase, select strongly representative related feature from each cluster with respect to matching of subset of features. Features present in this concept are independent because of features select from different clusters, proposed approach clustering have high probability in processing and increasing the quality of independent and useful features.Optimal subset feature selection improves accuracy of clustering and feature classification, performance of proposed approach describes better accuracy with respect to optimal subset selection is applied on publicly related data sets and it is compared with traditional supervised evolutionary approaches

Download Full-text

Feature Selection of High Dimensional Data by Adaptive Potential Particle Swarm Optimization

2019 IEEE Congress on Evolutionary Computation (CEC) ◽

10.1109/cec.2019.8790366 ◽

2019 ◽

Cited By ~ 3

Author(s):

Xingyue Huang ◽

Yizhou Chi ◽

Yu Zhou

Keyword(s):

Feature Selection ◽

Particle Swarm Optimization ◽

High Dimensional Data ◽

Particle Swarm ◽

High Dimensional ◽

Adaptive Potential ◽

Swarm Optimization ◽

Selection Of

Download Full-text

A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data

Computational and Mathematical Methods in Medicine ◽

10.1155/2017/7907163 ◽

2017 ◽

Vol 2017 ◽

pp. 1-18 ◽

Cited By ~ 5

Author(s):

Andrea Bommert ◽

Jörg Rahnenführer ◽

Michel Lang

Keyword(s):

Feature Selection ◽

Predictive Model ◽

Predictive Accuracy ◽

Pearson Correlation ◽

High Dimensional Data ◽

High Dimensional ◽

Sparse Models ◽

Data Set ◽

The Stability ◽

Selection Of

Finding a good predictive model for a high-dimensional data set can be challenging. For genetic data, it is not only important to find a model with high predictive accuracy, but it is also important that this model uses only few features and that the selection of these features is stable. This is because, in bioinformatics, the models are used not only for prediction but also for drawing biological conclusions which makes the interpretability and reliability of the model crucial. We suggest using three target criteria when fitting a predictive model to a high-dimensional data set: the classification accuracy, the stability of the feature selection, and the number of chosen features. As it is unclear which measure is best for evaluating the stability, we first compare a variety of stability measures. We conclude that the Pearson correlation has the best theoretical and empirical properties. Also, we find that for the stability assessment behaviour it is most important that a measure contains a correction for chance or large numbers of chosen features. Then, we analyse Pareto fronts and conclude that it is possible to find models with a stable selection of few features without losing much predictive accuracy.

Download Full-text

Feature Selection of High Dimensional Data Using Hybrid FSA-IG

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/864/1/012066 ◽

2020 ◽

Vol 864 ◽

pp. 012066

Author(s):

Nur Fatin Liyana Mohd Rosely ◽

Azlan Mohd Zain ◽

Yusliza Yusoff

Keyword(s):

Feature Selection ◽

High Dimensional Data ◽

High Dimensional ◽

Selection Of

Download Full-text

BagMeLiF: stable boosting-based hybrid-ensemble feature selection algorithm for high-dimensional data

2020 International Conference on Control, Robotics and Intelligent System ◽

10.1145/3437802.3437835 ◽

2020 ◽

Author(s):

Nikita Pilnenskiy ◽

Ivan Smetannikov

Keyword(s):

Feature Selection ◽

High Dimensional Data ◽

High Dimensional ◽

Selection Algorithm ◽

Feature Selection Algorithm

Download Full-text

On fuzzy feature selection in designing fuzzy classifiers for high-dimensional data

Evolving Systems ◽

10.1007/s12530-015-9142-4 ◽

2015 ◽

Vol 7 (4) ◽

pp. 255-265 ◽

Cited By ~ 6

Author(s):

Eghbal G. Mansoori ◽

Khadijeh S. Shafiee

Keyword(s):

Feature Selection ◽

High Dimensional Data ◽

High Dimensional ◽

Fuzzy Classifiers ◽

Fuzzy Feature Selection

Download Full-text

Risk of Selection of Irrelevant Features from High-Dimensional Data with Small Sample Size

Springer Proceedings in Mathematics & Statistics - Stochastic Models, Statistics and Their Applications ◽

10.1007/978-3-319-13881-7_44 ◽

2015 ◽

pp. 399-405

Author(s):

Henryk Maciejewski

Keyword(s):

Sample Size ◽

Small Sample Size ◽

High Dimensional Data ◽

Small Sample ◽

High Dimensional ◽

Selection Of

Download Full-text

A Hybrid Feature Selection Method Based on Symmetrical Uncertainty and Support Vector Machine for High-Dimensional Data Classification

Intelligent Information and Database Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-319-54472-4_67 ◽

2017 ◽

pp. 721-727 ◽

Cited By ~ 2

Author(s):

Yongjun Piao ◽

Keun Ho Ryu

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

High Dimensional Data ◽

Feature Selection Method ◽

Data Classification ◽

Selection Method ◽

High Dimensional ◽

Support Vector ◽

Symmetrical Uncertainty

Download Full-text

High dimensional data classification and feature selection using support vector machines

European Journal of Operational Research ◽

10.1016/j.ejor.2017.08.040 ◽

2018 ◽

Vol 265 (3) ◽

pp. 993-1004 ◽

Cited By ~ 63

Author(s):

Bissan Ghaddar ◽

Joe Naoum-Sawaya

Keyword(s):

Feature Selection ◽

Support Vector Machines ◽

High Dimensional Data ◽

Data Classification ◽

High Dimensional ◽

Support Vector ◽

Vector Machines

Download Full-text

Simultaneous Channel and Feature Selection of Fused EEG Features Based on Sparse Group Lasso

BioMed Research International ◽

10.1155/2015/703768 ◽

2015 ◽

Vol 2015 ◽

pp. 1-13 ◽

Cited By ~ 11

Author(s):

Jin-Jia Wang ◽

Fang Xue ◽

Hui Li

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Group Lasso ◽

High Dimensional ◽

Test Accuracy ◽

Gradient Descent Method ◽

Feature Subset ◽

Eeg Signals ◽

Sparse Group Lasso ◽

Selection Of

Feature extraction and classification of EEG signals are core parts of brain computer interfaces (BCIs). Due to the high dimension of the EEG feature vector, an effective feature selection algorithm has become an integral part of research studies. In this paper, we present a new method based on a wrapped Sparse Group Lasso for channel and feature selection of fused EEG signals. The high-dimensional fused features are firstly obtained, which include the power spectrum, time-domain statistics, AR model, and the wavelet coefficient features extracted from the preprocessed EEG signals. The wrapped channel and feature selection method is then applied, which uses the logistical regression model with Sparse Group Lasso penalized function. The model is fitted on the training data, and parameter estimation is obtained by modified blockwise coordinate descent and coordinate gradient descent method. The best parameters and feature subset are selected by using a 10-fold cross-validation. Finally, the test data is classified using the trained model. Compared with existing channel and feature selection methods, results show that the proposed method is more suitable, more stable, and faster for high-dimensional feature fusion. It can simultaneously achieve channel and feature selection with a lower error rate. The test accuracy on the data used from international BCI Competition IV reached 84.72%.

Download Full-text