The classification of motor imagery response: an accuracy enhancement through the ensemble of random subspace k-NN

PeerJ Computer Science ◽

10.7717/peerj-cs.374 ◽

2021 ◽

Vol 7 ◽

pp. e374

Author(s):

Mamunur Rashid ◽

Bifta Sama Bari ◽

Md Jahid Hasan ◽

Mohd Azraai Mohd Razman ◽

Rabiu Muazu Musa ◽

...

Keyword(s):

Feature Selection ◽

Random Forest ◽

Motor Imagery ◽

Communication Strategy ◽

Support Vector ◽

Random Subspace ◽

Common Spatial Pattern ◽

Feature Selection Technique ◽

Linear Discriminant ◽

Feature Dimension

Brain-computer interface (BCI) is a viable alternative communication strategy for patients of neurological disorders as it facilitates the translation of human intent into device commands. The performance of BCIs primarily depends on the efficacy of the feature extraction and feature selection techniques, as well as the classification algorithms employed. More often than not, high dimensional feature set contains redundant features that may degrade a given classifier’s performance. In the present investigation, an ensemble learning-based classification algorithm, namely random subspace k-nearest neighbour (k-NN) has been proposed to classify the motor imagery (MI) data. The common spatial pattern (CSP) has been applied to extract the features from the MI response, and the effectiveness of random forest (RF)-based feature selection algorithm has also been investigated. In order to evaluate the efficacy of the proposed method, an experimental study has been implemented using four publicly available MI dataset (BCI Competition III dataset 1 (data-1), dataset IIIA (data-2), dataset IVA (data-3) and BCI Competition IV dataset II (data-4)). It was shown that the ensemble-based random subspace k-NN approach achieved the superior classification accuracy (CA) of 99.21%, 93.19%, 93.57% and 90.32% for data-1, data-2, data-3 and data-4, respectively against other models evaluated, namely linear discriminant analysis, support vector machine, random forest, Naïve Bayes and the conventional k-NN. In comparison with other classification approaches reported in the recent studies, the proposed method enhanced the accuracy by 2.09% for data-1, 1.29% for data-2, 4.95% for data-3 and 5.71% for data-4, respectively. Moreover, it is worth highlighting that the RF feature selection technique employed in the present study was able to significantly reduce the feature dimension without compromising the overall CA. The outcome from the present study implies that the proposed method may significantly enhance the accuracy of MI data classification.

Download Full-text

Random forest–based feature selection and detection method for drunk driving recognition

International Journal of Distributed Sensor Networks ◽

10.1177/1550147720905234 ◽

2020 ◽

Vol 16 (2) ◽

pp. 155014772090523

Author(s):

ZhenLong Li ◽

HaoXin Wang ◽

YaoWei Zhang ◽

XiaoHua Zhao

Keyword(s):

Feature Selection ◽

Random Forest ◽

Driving Simulator ◽

Characteristic Curve ◽

Area Under The Curve ◽

Drunk Driving ◽

Support Vector ◽

Linear Discriminant ◽

Dummy Variable ◽

University Of Technology

A method for drunk driving detection using Feature Selection based on the Random Forest was proposed. First, driving behavior data were collected using a driving simulator at Beijing University of Technology. Second, the features were selected according to the Feature Importance in the random forest. Third, a dummy variable was introduced to encode the geometric characteristics of different roads so that drunk driving under different road conditions can be detected with the same classifier based on the random forest. Finally, the linear discriminant analysis, support vector machine, and AdaBoost classifiers were used and compared with the random forest. The accuracy, F1 score, receiver operating characteristic curve, and area under the curve value were used to evaluate the performance of the classifiers. The results show that Accelerator Depth, Speed, Distance to the Center of the Lane, Acceleration, Engine Revolution, Brake Depth, and Steering Angle have important influences on identifying the drivers’ states and can be used to detect drunk driving. Specifically, the classifiers with Accelerator Depth outperformed the other classifiers without Accelerator Depth. This means that Accelerator Depth is an important feature. Both the AdaBoost and random forest classifiers have an accuracy of 81.48%, which verified the effectiveness of the proposed method.

Download Full-text

APPLICATION OF QUANTUM-BEHAVED PARTICLE SWARM OPTIMIZATION TO MOTOR IMAGERY EEG CLASSIFICATION

International Journal of Neural Systems ◽

10.1142/s0129065713500263 ◽

2013 ◽

Vol 23 (06) ◽

pp. 1350026 ◽

Cited By ~ 35

Author(s):

WEI-YEN HSU

Keyword(s):

Feature Selection ◽

Particle Swarm Optimization ◽

Motor Imagery ◽

Particle Swarm ◽

Recognition System ◽

Support Vector ◽

Data Sets ◽

Swarm Optimization ◽

Linear Discriminant ◽

Trial Analysis

In this study, we propose a recognition system for single-trial analysis of motor imagery (MI) electroencephalogram (EEG) data. Applying event-related brain potential (ERP) data acquired from the sensorimotor cortices, the system chiefly consists of automatic artifact elimination, feature extraction, feature selection and classification. In addition to the use of independent component analysis, a similarity measure is proposed to further remove the electrooculographic (EOG) artifacts automatically. Several potential features, such as wavelet-fractal features, are then extracted for subsequent classification. Next, quantum-behaved particle swarm optimization (QPSO) is used to select features from the feature combination. Finally, selected sub-features are classified by support vector machine (SVM). Compared with without artifact elimination, feature selection using a genetic algorithm (GA) and feature classification with Fisher's linear discriminant (FLD) on MI data from two data sets for eight subjects, the results indicate that the proposed method is promising in brain–computer interface (BCI) applications.

Download Full-text

Feature Selection and Scaling for Random Forest Powered Malware Detection System

10.21203/rs.3.rs-778333/v1 ◽

2021 ◽

Author(s):

Ashutosh Tripathi ◽

Naman Bhoj ◽

Mayank Khari ◽

Bishwajeet Pandey

Keyword(s):

Feature Selection ◽

Random Forest ◽

Detection System ◽

Malware Detection ◽

Feature Space ◽

Conclusive Evidence ◽

Security And Privacy ◽

Gradient Boosting ◽

Support Vector ◽

Feature Selection Technique

Abstract With the rise of internet usage malwares pose a great threat to user security and privacy. Therefore, to mitigate the problem it is essential to develop an efficient malware detection framework. In our research we experimented with various machine learning and feature scaling algorithms. Chi-Square was used as the feature selection technique which selected a set of 48 features from a feature space of 128 features. The empirical results provide us with conclusive evidence that Random Forest is the best algorithm for detection of malware achieving an accuracy of 91.300% on our dataset followed by Gradient Boosting and Support Vector Machines.

Download Full-text

Multiscale Supervised Classification of Point Clouds with Urban and Forest Applications

Sensors ◽

10.3390/s19204523 ◽

2019 ◽

Vol 19 (20) ◽

pp. 4523 ◽

Cited By ~ 1

Author(s):

Carlos Cabo ◽

Celestino Ordóñez ◽

Fernando Sáchez-Lasheras ◽

Javier Roca-Pardiñas ◽

and Javier de Cos-Juez

Keyword(s):

Random Forest ◽

Laser Scanning ◽

Supervised Classification ◽

Computing Time ◽

Principal Component ◽

Point Clouds ◽

Support Vector ◽

Linear Discriminant ◽

Vector Machines ◽

Input Variables

We analyze the utility of multiscale supervised classification algorithms for object detection and extraction from laser scanning or photogrammetric point clouds. Only the geometric information (the point coordinates) was considered, thus making the method independent of the systems used to collect the data. A maximum of five features (input variables) was used, four of them related to the eigenvalues obtained from a principal component analysis (PCA). PCA was carried out at six scales, defined by the diameter of a sphere around each observation. Four multiclass supervised classification models were tested (linear discriminant analysis, logistic regression, support vector machines, and random forest) in two different scenarios, urban and forest, formed by artificial and natural objects, respectively. The results obtained were accurate (overall accuracy over 80% for the urban dataset, and over 93% for the forest dataset), in the range of the best results found in the literature, regardless of the classification method. For both datasets, the random forest algorithm provided the best solution/results when discrimination capacity, computing time, and the ability to estimate the relative importance of each variable are considered together.

Download Full-text

Feature Selection Method Based on Mutual Information and Support Vector Machine

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s021800142150021x ◽

2021 ◽

pp. 2150021

Author(s):

Gang Liu ◽

Chunlei Yang ◽

Sen Liu ◽

Chunbao Xiao ◽

Bin Song

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Mutual Information ◽

Classification Accuracy ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Svm Classifier ◽

Standard Data ◽

Feature Dimension

A feature selection method based on mutual information and support vector machine (SVM) is proposed in order to eliminate redundant feature and improve classification accuracy. First, local correlation between features and overall correlation is calculated by mutual information. The correlation reflects the information inclusion relationship between features, so the features are evaluated and redundant features are eliminated with analyzing the correlation. Subsequently, the concept of mean impact value (MIV) is defined and the influence degree of input variables on output variables for SVM network based on MIV is calculated. The importance weights of the features described with MIV are sorted by descending order. Finally, the SVM classifier is used to implement feature selection according to the classification accuracy of feature combination which takes MIV order of feature as a reference. The simulation experiments are carried out with three standard data sets of UCI, and the results show that this method can not only effectively reduce the feature dimension and high classification accuracy, but also ensure good robustness.

Download Full-text

Systematic Framework to Predict Early-Stage Liver Carcinoma Using Hybrid of Feature Selection Techniques and Regression Techniques

Complexity ◽

10.1155/2022/7816200 ◽

2022 ◽

Vol 2022 ◽

pp. 1-11

Author(s):

Marium Mehmood ◽

Nasser Alshammari ◽

Saad Awadh Alanazi ◽

Fahad Ahmad

Keyword(s):

Feature Selection ◽

Random Forest ◽

Liver Diseases ◽

Early Stage ◽

Support Vector ◽

Liver Carcinoma ◽

Random Forest Regression ◽

Soft Computing Techniques ◽

Regression Algorithms ◽

Regression Techniques

The liver is the human body’s mandatory organ, but detecting liver disease at an early stage is very difficult due to the hiddenness of symptoms. Liver diseases may cause loss of energy or weakness when some irregularities in the working of the liver get visible. Cancer is one of the most common diseases of the liver and also the most fatal of all. Uncontrolled growth of harmful cells is developed inside the liver. If diagnosed late, it may cause death. Treatment of liver diseases at an early stage is, therefore, an important issue as is designing a model to diagnose early disease. Firstly, an appropriate feature should be identified which plays a more significant part in the detection of liver cancer at an early stage. Therefore, it is essential to extract some essential features from thousands of unwanted features. So, these features will be mined using data mining and soft computing techniques. These techniques give optimized results that will be helpful in disease diagnosis at an early stage. In these techniques, we use feature selection methods to reduce the dataset’s feature, which include Filter, Wrapper, and Embedded methods. Different Regression algorithms are then applied to these methods individually to evaluate the result. Regression algorithms include Linear Regression, Ridge Regression, LASSO Regression, Support Vector Regression, Decision Tree Regression, Multilayer Perceptron Regression, and Random Forest Regression. Based on the accuracy and error rates generated by these Regression algorithms, we have evaluated our results. The result shows that Random Forest Regression with the Wrapper Method from all the deployed Regression techniques is the best and gives the highest R2-Score of 0.8923 and lowest MSE of 0.0618.

Download Full-text

Predicting discontinuation of docetaxel treatment for metastatic castration-resistant prostate cancer (mCRPC) with random forest

F1000Research ◽

10.12688/f1000research.8353.1 ◽

2016 ◽

Vol 5 ◽

pp. 2673 ◽

Cited By ~ 1

Author(s):

Daniel Kristiyanto ◽

Kevin E. Anderson ◽

Ling-Hong Hung ◽

Ka Yee Yeung

Keyword(s):

Prostate Cancer ◽

Feature Selection ◽

Random Forest ◽

Developed Countries ◽

Hill Climbing ◽

Support Vector ◽

Castration Resistant Prostate Cancer ◽

K Nearest Neighbor ◽

Missing Data Imputation ◽

Docetaxel Treatment

Prostate cancer is the most common cancer among men in developed countries. Androgen deprivation therapy (ADT) is the standard treatment for prostate cancer. However, approximately one third of all patients with metastatic disease treated with ADT develop resistance to ADT. This condition is called metastatic castrate-resistant prostate cancer (mCRPC). Patients who do not respond to hormone therapy are often treated with a chemotherapy drug called docetaxel. Sub-challenge 2 of the Prostate Cancer DREAM Challenge aims to improve the prediction of whether a patient with mCRPC would discontinue docetaxel treatment due to adverse effects. Specifically, a dataset containing three distinct clinical studies of patients with mCRPC treated with docetaxel was provided. We applied the k-nearest neighbor method for missing data imputation, the hill climbing algorithm and random forest importance for feature selection, and the random forest algorithm for classification. We also empirically studied the performance of many classification algorithms, including support vector machines and neural networks. Additionally, we found using random forest importance for feature selection provided slightly better results than the more computationally expensive method of hill climbing.

Download Full-text

Techniques for Detecting Malware Traffic: A Comprehensive Approach to Feature Selection and Classification

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.39088 ◽

2021 ◽

Vol 9 (12) ◽

pp. 1-10

Author(s):

Harsha A K

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Learning Algorithms ◽

Malware Detection ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Steady Increase ◽

Extreme Gradient Boosting

Abstract: Since the advent of encryption, there has been a steady increase in malware being transmitted over encrypted networks. Traditional approaches to detect malware like packet content analysis are inefficient in dealing with encrypted data. In the absence of actual packet contents, we can make use of other features like packet size, arrival time, source and destination addresses and other such metadata to detect malware. Such information can be used to train machine learning classifiers in order to classify malicious and benign packets. In this paper, we offer an efficient malware detection approach using classification algorithms in machine learning such as support vector machine, random forest and extreme gradient boosting. We employ an extensive feature selection process to reduce the dimensionality of the chosen dataset. The dataset is then split into training and testing sets. Machine learning algorithms are trained using the training set. These models are then evaluated against the testing set in order to assess their respective performances. We further attempt to tune the hyper parameters of the algorithms, in order to achieve better results. Random forest and extreme gradient boosting algorithms performed exceptionally well in our experiments, resulting in area under the curve values of 0.9928 and 0.9998 respectively. Our work demonstrates that malware traffic can be effectively classified using conventional machine learning algorithms and also shows the importance of dimensionality reduction in such classification problems. Keywords: Malware Detection, Extreme Gradient Boosting, Random Forest, Feature Selection.

Download Full-text

Feature Selection Using Random Forest Algorithm to Diagnose Tuberculosis From Lung CT Images

AI Innovation in Medical Imaging Diagnostics - Advances in Medical Technologies and Clinical Practice ◽

10.4018/978-1-7998-3092-4.ch005 ◽

2021 ◽

pp. 92-100

Author(s):

Beaulah Jeyavathana Rajendran ◽

Kanimozhi K. V.

Keyword(s):

Feature Selection ◽

Random Forest ◽

The Body ◽

Support Vector ◽

Feature Descriptor ◽

Feature Sets ◽

Modified Particle Swarm Optimization ◽

Tuberculosis Disease ◽

Optimal Feature ◽

Lung Ct

Tuberculosis is one of the hazardous infectious diseases that can be categorized by the evolution of tubercles in the tissues. This disease mainly affects the lungs and also the other parts of the body. The disease can be easily diagnosed by the radiologists. The main objective of this chapter is to get best solution selected by means of modified particle swarm optimization is regarded as optimal feature descriptor. Five stages are being used to detect tuberculosis disease. They are pre-processing an image, segmenting the lungs and extracting the feature, feature selection and classification. These stages that are used in medical image processing to identify the tuberculosis. In the feature extraction, the GLCM approach is used to extract the features and from the extracted feature sets the optimal features are selected by random forest. Finally, support vector machine classifier method is used for image classification. The experimentation is done, and intermediate results are obtained. The proposed system accuracy results are better than the existing method in classification.

Download Full-text

EEG Feature Extraction and Pattern Classification Based on Motor Imagery in Brain-Computer Interface

International Journal of Software Science and Computational Intelligence ◽

10.4018/ijssci.2011070104 ◽

2011 ◽

Vol 3 (3) ◽

pp. 43-56 ◽

Cited By ~ 2

Author(s):

Ling Zou ◽

Xinguang Wang ◽

Guodong Shi ◽

Zhenghua Ma

Keyword(s):

Support Vector Machine ◽

Motor Imagery ◽

Brain Computer Interface ◽

Average Power ◽

Computer Interface ◽

Support Vector ◽

Discrete Wavelet ◽

Classification Rate ◽

Linear Discriminant ◽

Reconstructed Signal

Accurate classification of EEG left and right hand motor imagery is an important issue in brain-computer interface. Firstly, discrete wavelet transform method was used to decompose the average power of C3 electrode and C4 electrode in left-right hands imagery movement during some periods of time. The reconstructed signal of approximation coefficient A6 on the sixth level was selected to build up a feature signal. Secondly, the performances by Fisher Linear Discriminant Analysis with two different threshold calculation ways and Support Vector Machine methods were compared. The final classification results showed that false classification rate by Support Vector Machine was lower and gained an ideal classification results.

Download Full-text