scholarly journals Feature Optimization of Exhaled Breath Signals Based on Pearson-BPSO

2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Lijun Hao ◽  
Min Zhang ◽  
Gang Huang

Feature optimization, which is the theme of this paper, is actually the selective selection of the variables on the input side at the time of making a predictive kind of model. However, an improved feature optimization algorithm for breath signal based on the Pearson-BPSO was proposed and applied to distinguish hepatocellular carcinoma by electronic nose (eNose) in the paper. First, the multidimensional features of the breath curves of hepatocellular carcinoma patients and healthy controls in the training samples were extracted; then, the features with less relevance to the classification were removed according to the Pearson correlation coefficient; next, the fitness function was constructed based on K-Nearest Neighbor (KNN) classification error and feature dimension, and the feature optimization transformation matrix was obtained based on BPSO. Furthermore, the transformation matrix was applied to optimize the test sample’s features. Finally, the performance of the optimization algorithm was evaluated by the classifier. The experiment results have shown that the Pearson-BPSO algorithm could effectively improve the classification performance compared with BPSO and PCA optimization methods. The accuracy of SVM and RF classifier was 86.03% and 90%, respectively, and the sensitivity and specificity were about 90% and 80%. Consequently, the application of Pearson-BPSO feature optimization algorithm will help improve the accuracy of hepatocellular carcinoma detection by eNose and promote the clinical application of intelligent detection.

2008 ◽  
Vol 18 (06) ◽  
pp. 459-467 ◽  
Author(s):  
ROBERTO GIL-PITA ◽  
XIN YAO

The k-nearest neighbor method is a classifier based on the evaluation of the distances to each pattern in the training set. The edited version of this method consists of the application of this classifier with a subset of the complete training set in which some of the training patterns are excluded, in order to reduce the classification error rate. In recent works, genetic algorithms have been successfully applied to determine which patterns must be included in the edited subset. In this paper we propose a novel implementation of a genetic algorithm for designing edited k-nearest neighbor classifiers. It includes the definition of a novel mean square error based fitness function, a novel clustered crossover technique, and the proposal of a fast smart mutation scheme. In order to evaluate the performance of the proposed method, results using the breast cancer database, the diabetes database and the letter recognition database from the UCI machine learning benchmark repository have been included. Both error rate and computational cost have been considered in the analysis. Obtained results show the improvement achieved by the proposed editing method.


Author(s):  
Amit Saxena ◽  
John Wang

This paper presents a two-phase scheme to select reduced number of features from a dataset using Genetic Algorithm (GA) and testing the classification accuracy (CA) of the dataset with the reduced feature set. In the first phase of the proposed work, an unsupervised approach to select a subset of features is applied. GA is used to select stochastically reduced number of features with Sammon Error as the fitness function. Different subsets of features are obtained. In the second phase, each of the reduced features set is applied to test the CA of the dataset. The CA of a data set is validated using supervised k-nearest neighbor (k-nn) algorithm. The novelty of the proposed scheme is that each reduced feature set obtained in the first phase is investigated for CA using the k-nn classification with different Minkowski metric i.e. non-Euclidean norms instead of conventional Euclidean norm (L2). Final results are presented in the paper with extensive simulations on seven real and one synthetic, data sets. It is revealed from the proposed investigation that taking different norms produces better CA and hence a scope for better feature subset selection.


Electronics ◽  
2020 ◽  
Vol 9 (8) ◽  
pp. 1206
Author(s):  
Hui Xu ◽  
Krzysztof Przystupa ◽  
Ce Fang ◽  
Andrzej Marciniak ◽  
Orest Kochan ◽  
...  

With the widespread use of the Internet, network security issues have attracted more and more attention, and network intrusion detection has become one of the main security technologies. As for network intrusion detection, the original data source always has a high dimension and a large amount of data, which greatly influence the efficiency and the accuracy. Thus, both feature selection and the classifier then play a significant role in raising the performance of network intrusion detection. This paper takes the results of classification optimization of weighted K-nearest neighbor (KNN) with those of the feature selection algorithm into consideration, and proposes a combination strategy of feature selection based on an integrated optimization algorithm and weighted KNN, in order to improve the performance of network intrusion detection. Experimental results show that the weighted KNN can increase the efficiency at the expense of a small amount of the accuracy. Thus, the proposed combination strategy of feature selection based on an integrated optimization algorithm and weighted KNN can then improve both the efficiency and the accuracy of network intrusion detection.


2022 ◽  
Vol 000 (000) ◽  
pp. 000-000
Author(s):  
Chuanli Liu ◽  
Hongli Yang ◽  
Yuemin Feng ◽  
Cuihong Liu ◽  
Fajuan Rui ◽  
...  

2022 ◽  
Vol 13 (1) ◽  
pp. 0-0

This research presents a way of feature selection problem for classification of sentiments that use ensemble-based classifier. This includes a hybrid approach of minimum redundancy and maximum relevance (mRMR) technique and Forest Optimization Algorithm (FOA) (i.e. mRMR-FOA) based feature selection. Before applying the FOA on sentiment analysis, it has been used as feature selection technique applied on 10 different classification datasets publically available on UCI machine learning repository. The classifiers for example k-Nearest Neighbor (k-NN), Support Vector Machine (SVM) and Naïve Bayes used the ensemble based algorithm for available datasets. The mRMR-FOA uses the Blitzer’s dataset (customer reviews on electronic products survey) to select the significant features. The classification of sentiments has noticed to improve by 12 to 18%. The evaluated results are further enhanced by the ensemble of k-NN, NB and SVM with an accuracy of 88.47% for the classification of sentiment analysis task.


2021 ◽  
Vol 6 (4) ◽  
Author(s):  
Aminat B. Yusuf ◽  
Ogar O. Austin ◽  
Shinaigo Y. Tadi ◽  
Fatsuma Jauro

Medical industry contains a large amount of sensitive data that must be evaluated in order to get insight into records. The nonlinearity, non-normality, correlation structures and complicated diabetic medical records, on the other hand, makes accurate predictions difficult. The Pima Indian Diabetes dataset is one of them, owing to the dataset's imbalance, large number of missing values and difficulty in identifying highly risk factors. Some of these challenges have been solved using computational approaches such as machine learning methods, but they have not performed ideally, with pre-processing techniques being recognized as critical to achieving correct findings. The goal of this work is to apply multiple pre-processing approaches to increase the accuracy of some simple models. These multiple pre-processing techniques are median imputation in which null values are substituted by finding the median of the input variables dependent on whether or not the patient is diabetic and then follow by applying oversampling and under-sampling procedures on both majority and minority votes. These votes are applied in order to address the problem of class imbalance as pointed out from the literature. Finally, the dimension reduction Pearson correlation is used to detect high-risk features since it is effective at quantifying information between attributes and their labels. In this study, these techniques are applied in the same order to Linear Regression, Naive Bayes, Decision Tree, K Nearest Neighbor, Random Forest and Gaussian Boosting classifiers. The utility of the techniques on the mentioned classifiers is validated using performance measures such as Accuracy, Precision and Recall.  The Random Forest Classifier is found to be the best-improved model, with 95 percent accuracy, 94.25 percent precision and 95.35 percent recall. Medical practitioners may find the provided strategies beneficial in improving the efficiency of diabetes analysis. Keywords— Classifiers, diabetes, Pima Indian Diabetes dataset, pre-processing techniques


Sign in / Sign up

Export Citation Format

Share Document