A Hybrid Scheme for Feature Selection of High Dimensional Educational Data

Simultaneous Channel and Feature Selection of Fused EEG Features Based on Sparse Group Lasso

BioMed Research International ◽

10.1155/2015/703768 ◽

2015 ◽

Vol 2015 ◽

pp. 1-13 ◽

Cited By ~ 11

Author(s):

Jin-Jia Wang ◽

Fang Xue ◽

Hui Li

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Group Lasso ◽

High Dimensional ◽

Test Accuracy ◽

Gradient Descent Method ◽

Feature Subset ◽

Eeg Signals ◽

Sparse Group Lasso ◽

Selection Of

Feature extraction and classification of EEG signals are core parts of brain computer interfaces (BCIs). Due to the high dimension of the EEG feature vector, an effective feature selection algorithm has become an integral part of research studies. In this paper, we present a new method based on a wrapped Sparse Group Lasso for channel and feature selection of fused EEG signals. The high-dimensional fused features are firstly obtained, which include the power spectrum, time-domain statistics, AR model, and the wavelet coefficient features extracted from the preprocessed EEG signals. The wrapped channel and feature selection method is then applied, which uses the logistical regression model with Sparse Group Lasso penalized function. The model is fitted on the training data, and parameter estimation is obtained by modified blockwise coordinate descent and coordinate gradient descent method. The best parameters and feature subset are selected by using a 10-fold cross-validation. Finally, the test data is classified using the trained model. Compared with existing channel and feature selection methods, results show that the proposed method is more suitable, more stable, and faster for high-dimensional feature fusion. It can simultaneously achieve channel and feature selection with a lower error rate. The test accuracy on the data used from international BCI Competition IV reached 84.72%.

Download Full-text

A Novel Algorithm for Clustering and Feature Selection of High Dimensional Datasets

Advances in Modelling and Analysis B ◽

10.18280/ama_b.600301 ◽

2017 ◽

Vol 60 (3) ◽

pp. 525-538

Author(s):

Thulasi Bikku ◽

Alapati Priya

Keyword(s):

Feature Selection ◽

High Dimensional ◽

High Dimensional Datasets ◽

Selection Of ◽

Novel Algorithm

Download Full-text

A Novel Granularity Optimal Feature Selection based on Multi-Variant Clustering for High Dimensional Data

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i3.2031 ◽

2021 ◽

Vol 12 (3) ◽

pp. 5051-5062

Author(s):

Srinivas Kolli Et. al.

Keyword(s):

Feature Selection ◽

High Dimensional Data ◽

Classification Performance ◽

High Dimensional ◽

Second Phase ◽

Data Sets ◽

Aggressive Approach ◽

Related Data ◽

Optimal Feature ◽

Selection Of

Clustering is the most complex in multi/high dimensional data because of sub feature selection from overall features present in categorical data sources. Sub set feature be the aggressive approach to decrease feature dimensionality in mining of data, identification of patterns. Main aim behind selection of feature with respect to selection of optimal feature and decrease the redundancy. In-order to compute with redundant/irrelevant features in high dimensional sample data exploration based on feature selection calculation with data granular described in this document. Propose aNovel Granular Feature Multi-variant Clustering based Genetic Algorithm (NGFMCGA) model to evaluate the performance results in this implementation. This model main consists two phases, in first phase, based on theoretic graph grouping procedure divide features into different clusters, in second phase, select strongly representative related feature from each cluster with respect to matching of subset of features. Features present in this concept are independent because of features select from different clusters, proposed approach clustering have high probability in processing and increasing the quality of independent and useful features.Optimal subset feature selection improves accuracy of clustering and feature classification, performance of proposed approach describes better accuracy with respect to optimal subset selection is applied on publicly related data sets and it is compared with traditional supervised evolutionary approaches

Download Full-text

A hybrid algorithm based on binary chemical reaction optimization and tabu search for feature selection of high-dimensional biomedical data

Tsinghua Science & Technology ◽

10.26599/tst.2018.9010101 ◽

2018 ◽

Vol 23 (6) ◽

pp. 733-743 ◽

Cited By ~ 6

Author(s):

Chaokun Yan ◽

Jingjing Ma ◽

Huimin Luo ◽

Jianxin Wang

Keyword(s):

Feature Selection ◽

Tabu Search ◽

Chemical Reaction ◽

Hybrid Algorithm ◽

High Dimensional ◽

Biomedical Data ◽

Chemical Reaction Optimization ◽

Reaction Optimization ◽

Selection Of

Download Full-text

Feature Selection of High Dimensional Data by Adaptive Potential Particle Swarm Optimization

2019 IEEE Congress on Evolutionary Computation (CEC) ◽

10.1109/cec.2019.8790366 ◽

2019 ◽

Cited By ~ 3

Author(s):

Xingyue Huang ◽

Yizhou Chi ◽

Yu Zhou

Keyword(s):

Feature Selection ◽

Particle Swarm Optimization ◽

High Dimensional Data ◽

Particle Swarm ◽

High Dimensional ◽

Adaptive Potential ◽

Swarm Optimization ◽

Selection Of

Download Full-text

A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data

Computational and Mathematical Methods in Medicine ◽

10.1155/2017/7907163 ◽

2017 ◽

Vol 2017 ◽

pp. 1-18 ◽

Cited By ~ 5

Author(s):

Andrea Bommert ◽

Jörg Rahnenführer ◽

Michel Lang

Keyword(s):

Feature Selection ◽

Predictive Model ◽

Predictive Accuracy ◽

Pearson Correlation ◽

High Dimensional Data ◽

High Dimensional ◽

Sparse Models ◽

Data Set ◽

The Stability ◽

Selection Of

Finding a good predictive model for a high-dimensional data set can be challenging. For genetic data, it is not only important to find a model with high predictive accuracy, but it is also important that this model uses only few features and that the selection of these features is stable. This is because, in bioinformatics, the models are used not only for prediction but also for drawing biological conclusions which makes the interpretability and reliability of the model crucial. We suggest using three target criteria when fitting a predictive model to a high-dimensional data set: the classification accuracy, the stability of the feature selection, and the number of chosen features. As it is unclear which measure is best for evaluating the stability, we first compare a variety of stability measures. We conclude that the Pearson correlation has the best theoretical and empirical properties. Also, we find that for the stability assessment behaviour it is most important that a measure contains a correction for chance or large numbers of chosen features. Then, we analyse Pareto fronts and conclude that it is possible to find models with a stable selection of few features without losing much predictive accuracy.

Download Full-text

An Efficient hybrid filter-wrapper metaheuristic-based gene selection method for high dimensional datasets

Scientific Reports ◽

10.1038/s41598-019-54987-1 ◽

2019 ◽

Vol 9 (1) ◽

Author(s):

Jamshid Pirgazi ◽

Mohsen Alimoradi ◽

Tahereh Esmaeili Abharian ◽

Mohammad Hossein Olyaee

Keyword(s):

Feature Selection ◽

Large Scale ◽

Gene Selection ◽

Data Classification ◽

Convergence Time ◽

High Dimensional ◽

Compact Set ◽

Feature Selection Problem ◽

High Dimensional Datasets ◽

Selection Of

AbstractFeature selection problem is one of the most significant issues in data classification. The purpose of feature selection is selection of the least number of features in order to increase accuracy and decrease the cost of data classification. In recent years, due to appearance of high-dimensional datasets with low number of samples, classification models have encountered over-fitting problem. Therefore, the need for feature selection methods that are used to remove the extensions and irrelevant features is felt. Recently, although, various methods have been proposed for selecting the optimal subset of features with high precision, these methods have encountered some problems such as instability, high convergence time, selection of a semi-optimal solution as the final result. In other words, they have not been able to fully extract the effective features. In this paper, a hybrid method based on the IWSSr method and Shuffled Frog Leaping Algorithm (SFLA) is proposed to select effective features in a large-scale gene dataset. The proposed algorithm is implemented in two phases: filtering and wrapping. In the filter phase, the Relief method is used for weighting features. Then, in the wrapping phase, by using the SFLA and the IWSSr algorithms, the search for effective features in a feature-rich area is performed. The proposed method is evaluated by using some standard gene expression datasets. The experimental results approve that the proposed approach in comparison to similar methods, has been achieved a more compact set of features along with high accuracy. The source code and testing datasets are available at https://github.com/jimy2020/SFLA_IWSSr-Feature-Selection.

Download Full-text

Effects of Random Forest Parameters in the Selection of Biomarkers

The Computer Journal ◽

10.1093/comjnl/bxz161 ◽

2020 ◽

Author(s):

Utkarsh Mahadeo Khaire ◽

R Dhanalakshmi

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Random Forest ◽

Microarray Data ◽

Selection Procedure ◽

Microarray Dataset ◽

Random Permutation ◽

Classification Algorithm ◽

High Dimensional ◽

Selection Of

Abstract A microarray dataset contains thousands of DNA spots covering almost every gene in the genome. Microarray-based gene expression helps with the diagnosis, prognosis and treatment of cancer. The nature of diseases frequently changes, which in turn generates a considerable volume of data. The main drawback of microarray data is the curse of dimensionality. It hinders useful information and leads to computational instability. The main objective of feature selection is to extract and remove insignificant and irrelevant features to determine the informative genes that cause cancer. Random forest is a well-suited classification algorithm for microarray data. To enhance the importance of the variables, we proposed out-of-bag (OOB) cases in every tree of the forest to count the number of votes for the exact class. The incorporation of random permutation in the variables of these OOB cases enables us to select the crucial features from high-dimensional microarray data. In this study, we analyze the effects of various random forest parameters on the selection procedure. ‘Variable drop fraction’ regulates the forest construction. The higher variable drop fraction value efficiently decreases the dimensionality of the microarray data. Forest built with 800 trees chooses fewer important features under any variable drop fraction value that reduces microarray data dimensionality.

Download Full-text

Feature selection of high-dimensional biomedical data using improved SFLA for disease diagnosis

2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm.2015.7359728 ◽

2015 ◽

Cited By ~ 4

Author(s):

Yongqiang Dai ◽

Bin Hu ◽

Yun Su ◽

Chengsheng Mao ◽

Jing Chen ◽

...

Keyword(s):

Feature Selection ◽

Disease Diagnosis ◽

High Dimensional ◽

Biomedical Data ◽

Selection Of

Download Full-text

The Optimization of Feature Selection Based on Chaos Clustering Strategy and Niche Particle Swarm Optimization

Mathematical Problems in Engineering ◽

10.1155/2020/3138659 ◽

2020 ◽

Vol 2020 ◽

pp. 1-8

Author(s):

Longzhen Duan ◽

Shuqing Yang ◽

Dongbo Zhang

Keyword(s):

Feature Selection ◽

Particle Swarm Optimization ◽

Particle Swarm ◽

High Dimensional ◽

Swarm Optimization ◽

New Model ◽

Selection Algorithms ◽

Np Problem ◽

Selection Of ◽

Better Than

With the rapid increase of the data size, there are increasing demands for feature selection which has been a powerful tool to handle high-dimensional data. In this paper, we propose a novel feature selection of niche particle swarm optimization based on the chaos group, which is used for evaluating the importance of feature selection algorithms. An iterative algorithm is proposed to optimize the new model. It has been proved that solving the new model is equivalent to solving a NP problem with a flexible and adaptable norm regularization. First, the whole population is divided into two groups: NPSO group and chaos group. The two groups are iterated, respectively, and the global optimization is updated. Secondly, the cross-iteration of NPSO group and chaos group avoids the particles falling into the local optimization. Finally, three representative algorithms are selected to be compared with each other in 10 UCI datasets. The experimental results show that the feature selection performance of the algorithm is better than that of the comparison algorithm, and the classification accuracy is significantly improved.

Download Full-text