scholarly journals Optimal Feature Aggregation and Combination for Two-Dimensional Ensemble Feature Selection

Information ◽  
2020 ◽  
Vol 11 (1) ◽  
pp. 38 ◽  
Author(s):  
Machmud Roby Alhamidi ◽  
Wisnu Jatmiko

Feature selection is a way of reducing the features of data such that, when the classification algorithm runs, it produces better accuracy. In general, conventional feature selection is quite unstable when faced with changing data characteristics. It would be inefficient to implement individual feature selection in some cases. Ensemble feature selection exists to overcome this problem. However, with the advantages of ensemble feature selection, some issues like stability, threshold, and feature aggregation still need to be overcome. We propose a new framework to deal with stability and feature aggregation. We also used an automatic threshold to see whether it was efficient or not; the results showed that the proposed method always produces the best performance in both accuracy and feature reduction. The accuracy comparison between the proposed method and other methods was 0.5–14% and reduced more features than other methods by 50%. The stability of the proposed method was also excellent, with an average of 0.9. However, when we applied the automatic threshold, there was no beneficial improvement compared to without an automatic threshold. Overall, the proposed method presented excellent performance compared to previous work and standard ReliefF.

Author(s):  
Jerlin Rubini Lambert ◽  
Eswaran Perumal

Aim: Recently, classification of medical data gives more importance to identify the existence of disease. Background: Numerous classification algorithms for chronic kidney disease (CKD) are developed and produced better classification results. But, the inclusion of different factors in the identification of CKD reduces the effectiveness of the employed classification algorithm. Objective: To overcome this issue, feature selection (FS) approaches are proposed to minimize the computational complexity and also to improve the classification performance in the identification of CKD. Since numerous bio-inspired based FS methodologies are developed, a need arises to examine the feature selection approaches performance of different algorithms on the identification of CKD. Method: This paper proposes a new framework for classification and prediction of CKD. Three feature selection approaches are used namely Ant Colony Optimization (ACO) algorithm, Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) in the classification process of CKD. Finally, logistic regression (LR) classifier is employed for effective classification. Results: The effectiveness of the ACO-FS, GA-FS and PSO-FS are validated by testing it against a benchmark CKD dataset. Conclusion: The empirical results state that the ACO-FS algorithm performs well and the results reported that the classification performance is improved by the inclusion of feature selection methodologies in CKD classification.


2005 ◽  
Vol 128 (2) ◽  
pp. 276-283 ◽  
Author(s):  
Evan M. Griffing ◽  
S. George Bankoff ◽  
Michael J. Miksis ◽  
Robert A. Schluter

Thin films of oil flowing down a nearly-vertical plate were subjected to a strong normal electrostatic field. Steady-state height profiles were measured by fluorescence imaging. For electrode potentials less than that required to produce an instability, the two-dimensional response of the interface was <1%. Calculations of the fluid height coupled with the electric field solution were identical to uncoupled calculations for electric fields below the stability threshold. Pressure profiles under the film and three-dimensional effects are also discussed.


2020 ◽  
Author(s):  
Chie Ikeda ◽  
Karim Ouazzane ◽  
Qicheng Yu

Financial fraud activities have soared despite the advancement of fraud detection models empowered by machine learning (ML). To address this issue, we propose a new framework of feature engineering for ML models. The framework consists of feature creation that combines feature aggregation and feature transformation, and feature selection that accommodates a variety of ML algorithms. To illustrate the effectiveness of the framework, we conduct an experiment using an actual financial transaction dataset and show that the framework significantly improves the performance of ML fraud detection models. Specifically, all the ML models complemented by a feature set generated from our framework surpass the same models without such a feature set by nearly 40% on the F1-measure and 20% on the Area Under the Curve (AUC) value.


2021 ◽  
Vol 15 (4) ◽  
pp. 1-46
Author(s):  
Kui Yu ◽  
Lin Liu ◽  
Jiuyong Li

In this article, we aim to develop a unified view of causal and non-causal feature selection methods. The unified view will fill in the gap in the research of the relation between the two types of methods. Based on the Bayesian network framework and information theory, we first show that causal and non-causal feature selection methods share the same objective. That is to find the Markov blanket of a class attribute, the theoretically optimal feature set for classification. We then examine the assumptions made by causal and non-causal feature selection methods when searching for the optimal feature set, and unify the assumptions by mapping them to the restrictions on the structure of the Bayesian network model of the studied problem. We further analyze in detail how the structural assumptions lead to the different levels of approximations employed by the methods in their search, which then result in the approximations in the feature sets found by the methods with respect to the optimal feature set. With the unified view, we can interpret the output of non-causal methods from a causal perspective and derive the error bounds of both types of methods. Finally, we present practical understanding of the relation between causal and non-causal methods using extensive experiments with synthetic data and various types of real-world data.


2020 ◽  
Vol 83 (1) ◽  
pp. 91-114
Author(s):  
Adrian Blau

AbstractThis paper proposes a new framework for categorizing approaches to the history of political thought. Previous categorizations exclude much research; political theory, if included, is often caricatured. And previous categorizations are one-dimensional, presenting different approaches as alternatives. My framework is two-dimensional, distinguishing six kinds of end (two empirical, four theoretical) and six kinds of means. Importantly, these choices are not alternatives: studies may have more than one end and typically use several means. Studies with different ends often use some of the same means. And all studies straddle the supposed empirical/theoretical “divide.” Quentin Skinner himself expertly combines empirical and theoretical analysis—yet the latter is often overlooked, not least because of Skinner's own methodological pronouncements. This highlights a curious disjuncture in methodological writings, between what they say we do, and what we should do. What we should do is much broader than existing categorizations imply.


In the first part of this paper opportunity has been taken to make some adjustments in certain general formulae of previous papers, the necessity for which appeared in discussions with other workers on this subject. The general results thus amended are then applied to a general discussion of the stability problem including the effect of the trailing wake which was deliberately excluded in the previous paper. The general conclusion is that to a first approximation the wake, as usually assumed, has little or no effect on the reality of the roots of the period equation, but that it may introduce instability of the oscillations, if the centre of gravity of the element is not sufficiently far forward. During the discussion contact is made with certain partial results recently obtained by von Karman and Sears, which are shown to be particular cases of the general formulae. An Appendix is also added containing certain results on the motion of a vortex behind a moving cylinder, which were obtained to justify certain of the assumptions underlying the trail theory.


Energies ◽  
2021 ◽  
Vol 14 (5) ◽  
pp. 1238
Author(s):  
Supanat Chamchuen ◽  
Apirat Siritaratiwat ◽  
Pradit Fuangfoo ◽  
Puripong Suthisopapan ◽  
Pirat Khunkitti

Power quality disturbance (PQD) is an important issue in electrical distribution systems that needs to be detected promptly and identified to prevent the degradation of system reliability. This work proposes a PQD classification using a novel algorithm, comprised of the artificial bee colony (ABC) and the particle swarm optimization (PSO) algorithms, called “adaptive ABC-PSO” as the feature selection algorithm. The proposed adaptive technique is applied to a combination of ABC and PSO algorithms, and then used as the feature selection algorithm. A discrete wavelet transform is used as the feature extraction method, and a probabilistic neural network is used as the classifier. We found that the highest classification accuracy (99.31%) could be achieved through nine optimally selected features out of all 72 extracted features. Moreover, the proposed PQD classification system demonstrated high performance in a noisy environment, as well as the real distribution system. When comparing the presented PQD classification system’s performance to previous studies, PQD classification accuracy using adaptive ABC-PSO as the optimal feature selection algorithm is considered to be at a high-range scale; therefore, the adaptive ABC-PSO algorithm can be used to classify the PQD in a practical electrical distribution system.


2021 ◽  
pp. 1-34
Author(s):  
Kadam Vikas Samarthrao ◽  
Vandana M. Rohokale

Email has sustained to be an essential part of our lives and as a means for better communication on the internet. The challenge pertains to the spam emails residing a large amount of space and bandwidth. The defect of state-of-the-art spam filtering methods like misclassification of genuine emails as spam (false positives) is the rising challenge to the internet world. Depending on the classification techniques, literature provides various algorithms for the classification of email spam. This paper tactics to develop a novel spam detection model for improved cybersecurity. The proposed model involves several phases like dataset acquisition, feature extraction, optimal feature selection, and detection. Initially, the benchmark dataset of email is collected that involves both text and image datasets. Next, the feature extraction is performed using two sets of features like text features and visual features. In the text features, Term Frequency-Inverse Document Frequency (TF-IDF) is extracted. For the visual features, color correlogram and Gray-Level Co-occurrence Matrix (GLCM) are determined. Since the length of the extracted feature vector seems to the long, the optimal feature selection process is done. The optimal feature selection is performed by a new meta-heuristic algorithm called Fitness Oriented Levy Improvement-based Dragonfly Algorithm (FLI-DA). Once the optimal features are selected, the detection is performed by the hybrid learning technique that is composed of two deep learning approaches named Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN). For improving the performance of existing deep learning approaches, the number of hidden neurons of RNN and CNN is optimized by the same FLI-DA. Finally, the optimized hybrid learning technique having CNN and RNN classifies the data into spam and ham. The experimental outcomes show the ability of the proposed method to perform the spam email classification based on improved deep learning.


2021 ◽  
Vol 11 (15) ◽  
pp. 6983
Author(s):  
Maritza Mera-Gaona ◽  
Diego M. López ◽  
Rubiel Vargas-Canas

Identifying relevant data to support the automatic analysis of electroencephalograms (EEG) has become a challenge. Although there are many proposals to support the diagnosis of neurological pathologies, the current challenge is to improve the reliability of the tools to classify or detect abnormalities. In this study, we used an ensemble feature selection approach to integrate the advantages of several feature selection algorithms to improve the identification of the characteristics with high power of differentiation in the classification of normal and abnormal EEG signals. Discrimination was evaluated using several classifiers, i.e., decision tree, logistic regression, random forest, and Support Vecctor Machine (SVM); furthermore, performance was assessed by accuracy, specificity, and sensitivity metrics. The evaluation results showed that Ensemble Feature Selection (EFS) is a helpful tool to select relevant features from the EEGs. Thus, the stability calculated for the EFS method proposed was almost perfect in most of the cases evaluated. Moreover, the assessed classifiers evidenced that the models improved in performance when trained with the EFS approach’s features. In addition, the classifier of epileptiform events built using the features selected by the EFS method achieved an accuracy, sensitivity, and specificity of 97.64%, 96.78%, and 97.95%, respectively; finally, the stability of the EFS method evidenced a reliable subset of relevant features. Moreover, the accuracy, sensitivity, and specificity of the EEG detector are equal to or greater than the values reported in the literature.


Sign in / Sign up

Export Citation Format

Share Document