Machine learning approaches for binary classification using brain signals

10.36227/techrxiv.12089496.v1 ◽

2020 ◽

Author(s):

Tejas Wadiwala ◽

Vikas Trikha ◽

Jinan Fiaidhi

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Gradient Boosting ◽

Support Vector ◽

Learning Approaches ◽

Eeg Signals ◽

Brain Wave ◽

Brain Signals ◽

Learning Practices ◽

The Brain

This paper attempts to perform a comparative analysis of brain signals dataset using various machine learning classifiers such as random forest, gradient boosting, support vector machine, extra trees classifier. The comparative analysis is accomplished based on the performance parameters such as accuracy, area under the ROC curve (AUC), specificity, recall, and precision. The key focus of this paper is to exercise the machine learning practices over an Electroencephalogram (EEG) signals dataset provided by Rochester Institute of Technology and to provide meaningful results using the same. EEG signals are usually captivated to diagnose the problems related to the electrical activities of the brain as it tracks and records brain wave patterns to produce a definitive report on seizure activities of the brain. While exercising machine learning practices, various data preprocessing techniques were implemented to attain cleansed and organized data to predict better results and higher accuracy. Section II gives a comprehensive presurvey of existing work performed so far on the same; furthermore, section III sheds light on the dataset used for this research.

Download Full-text

A Comprehensive Analysis of 2D&3D Video Watching of EEG Signals by Increasing PLSR and SVM Classification Results

The Computer Journal ◽

10.1093/comjnl/bxz043 ◽

2019 ◽

Vol 63 (3) ◽

pp. 425-434 ◽

Cited By ~ 3

Author(s):

Negin Manshouri ◽

Temel Kayikcioglu

Keyword(s):

3D Video ◽

Support Vector ◽

Least Squares Regression ◽

Eeg Signals ◽

3D Technology ◽

Svm Classification ◽

Brain Signals ◽

Short Period ◽

The Impact ◽

The Brain

Abstract Despite the development of two- and three-dimensional (2D&3D) technology, it has attracted the attention of researchers in recent years. This research is done to reveal the detailed effects of 2D in comparison with 3D technology on the human brain waves. The impact of 2D&3D video watching using electroencephalography (EEG) brain signals is studied. A group of eight healthy volunteers with the average age of 31 ± 3.06 years old participated in this three-stage test. EEG signal recording consisted of three stages: After a bit of relaxation (a), a 2D video was displayed (b), the recording of the signal continued for a short period of time as rest (c), and finally the trial ended. Exactly the same steps were repeated for the 3D video. Power spectrum density (PSD) based on short time Fourier transform (STFT) was used to analyze the brain signals of 2D&3D video viewers. After testing all the EEG frequency bands, delta and theta were extracted as the features. Partial least squares regression (PLSR) and Support vector machine (SVM) classification algorithms were considered in order to classify EEG signals obtained as the result of 2D&3D video watching. Successful classification results were obtained by selecting the correct combinations of effective channels representing the brain regions.

Download Full-text

A Comparative Analysis of Enhanced Machine Learning Algorithms for Smart Grid Stability Prediction

10.36227/techrxiv.16863145.v1 ◽

2021 ◽

Author(s):

ANKIT GHOSH ◽

ALOK KOLE

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Smart Grid ◽

Smart Grids ◽

Machine Learning Algorithms ◽

Electricity Sector ◽

Stochastic Gradient Descent ◽

Energy Output ◽

Gradient Boosting ◽

Support Vector

Smart grid is an essential concept in the transformation of the electricity sector into an intelligent digitalized energy network that can deliver optimal energy from the source to the consumers. Smart grids being self-sufficient systems are constructed through the integration of information, telecommunication, and advanced power technologies with the existing electricity systems. Artificial Intelligence (AI) is an important technology driver in smart grids. The application of AI techniques in smart grid is becoming more apparent because the traditional modelling optimization and control techniques have their own limitations. Machine Learning (ML) being a sub-set of AI enables intelligent decision-making and response to sudden changes in the customer energy demands, unexpected disruption of power supply, sudden variations in renewable energy output or any other catastrophic events in a smart grid. This paper presents the comparison among some of the state-of-the-art ML algorithms for predicting smart grid stability. The dataset that has been selected contains results from simulations of smart grid stability. Enhanced ML algorithms such as Support Vector Machine (SVM), Logistic Regression, K-Nearest Neighbour (KNN), Naïve Bayes (NB), Decision Tree (DT), Random Forest (RF), Stochastic Gradient Descent (SGD) classifier, XGBoost and Gradient Boosting classifiers have been implemented to forecast smart grid stability. A comparative analysis among the different ML models has been performed based on the following evaluation metrics such as accuracy, precision, recall, F1-score, AUC-ROC, and AUC-PR curves. The test results that have been obtained have been quite promising with the XGBoost classifier outperforming all the other models with an accuracy of 97.5%, recall of 98.4%, precision of 97.6%, F1-score of 97.9%, AUC-ROC of 99.8% and AUC-PR of 99.9%.

Download Full-text

Diagnosis of Brain Diseases using Neural Networks

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b1036.1292s19 ◽

2019 ◽

Vol 9 (2S) ◽

pp. 402-407

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Neurological Diseases ◽

Brain Diseases ◽

Learning Approaches ◽

Learning Practices ◽

Computer Aided Analysis ◽

Computer Aided ◽

The Brain

Intensification in the occurrence of brain diseases and the need for the initial diagnosis for ailments like Tumor, Alzheimer’s, Epilepsy and Parkinson’s has riveted the attention of researchers. Machine learning practices, specifically deep learning, is considered as a beneficial diagnostic tool. Deep learning approaches to neuroimaging will assist computer-aided analysis of neurological diseases. Feature extraction of neuroimages carried out using Artificial Neural Networks leads to better diagnoses. In this study, all the brain diseases are revisited to consolidate the methodologies carried out by various authors in the literature.

Download Full-text

A Comparative Analysis of Enhanced Machine Learning Algorithms for Smart Grid Stability Prediction

10.36227/techrxiv.16863145 ◽

2021 ◽

Author(s):

ANKIT GHOSH ◽

ALOK KOLE

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Smart Grid ◽

Smart Grids ◽

Machine Learning Algorithms ◽

Electricity Sector ◽

Stochastic Gradient Descent ◽

Energy Output ◽

Gradient Boosting ◽

Support Vector

Smart grid is an essential concept in the transformation of the electricity sector into an intelligent digitalized energy network that can deliver optimal energy from the source to the consumers. Smart grids being self-sufficient systems are constructed through the integration of information, telecommunication, and advanced power technologies with the existing electricity systems. Artificial Intelligence (AI) is an important technology driver in smart grids. The application of AI techniques in smart grid is becoming more apparent because the traditional modelling optimization and control techniques have their own limitations. Machine Learning (ML) being a sub-set of AI enables intelligent decision-making and response to sudden changes in the customer energy demands, unexpected disruption of power supply, sudden variations in renewable energy output or any other catastrophic events in a smart grid. This paper presents the comparison among some of the state-of-the-art ML algorithms for predicting smart grid stability. The dataset that has been selected contains results from simulations of smart grid stability. Enhanced ML algorithms such as Support Vector Machine (SVM), Logistic Regression, K-Nearest Neighbour (KNN), Naïve Bayes (NB), Decision Tree (DT), Random Forest (RF), Stochastic Gradient Descent (SGD) classifier, XGBoost and Gradient Boosting classifiers have been implemented to forecast smart grid stability. A comparative analysis among the different ML models has been performed based on the following evaluation metrics such as accuracy, precision, recall, F1-score, AUC-ROC, and AUC-PR curves. The test results that have been obtained have been quite promising with the XGBoost classifier outperforming all the other models with an accuracy of 97.5%, recall of 98.4%, precision of 97.6%, F1-score of 97.9%, AUC-ROC of 99.8% and AUC-PR of 99.9%.

Download Full-text

Ensemble Machine Learning Approach Improves Predicted Spatial Variation of Surface Soil Organic Carbon Stocks in Data-Limited Northern Circumpolar Region

Frontiers in Big Data ◽

10.3389/fdata.2020.528441 ◽

2020 ◽

Vol 3 ◽

Author(s):

Umakant Mishra ◽

Sagar Gautam ◽

William J. Riley ◽

Forrest M. Hoffman

Keyword(s):

Machine Learning ◽

Environmental Factors ◽

Soil Properties ◽

Spatial Variation ◽

Prediction Accuracy ◽

Gradient Boosting ◽

Support Vector ◽

Learning Approaches ◽

Regression Kriging ◽

Soc Stocks

Various approaches of differing mathematical complexities are being applied for spatial prediction of soil properties. Regression kriging is a widely used hybrid approach of spatial variation that combines correlation between soil properties and environmental factors with spatial autocorrelation between soil observations. In this study, we compared four machine learning approaches (gradient boosting machine, multinarrative adaptive regression spline, random forest, and support vector machine) with regression kriging to predict the spatial variation of surface (0–30 cm) soil organic carbon (SOC) stocks at 250-m spatial resolution across the northern circumpolar permafrost region. We combined 2,374 soil profile observations (calibration datasets) with georeferenced datasets of environmental factors (climate, topography, land cover, bedrock geology, and soil types) to predict the spatial variation of surface SOC stocks. We evaluated the prediction accuracy at randomly selected sites (validation datasets) across the study area. We found that different techniques inferred different numbers of environmental factors and their relative importance for prediction of SOC stocks. Regression kriging produced lower prediction errors in comparison to multinarrative adaptive regression spline and support vector machine, and comparable prediction accuracy to gradient boosting machine and random forest. However, the ensemble median prediction of SOC stocks obtained from all four machine learning techniques showed highest prediction accuracy. Although the use of different approaches in spatial prediction of soil properties will depend on the availability of soil and environmental datasets and computational resources, we conclude that the ensemble median prediction obtained from multiple machine learning approaches provides greater spatial details and produces the highest prediction accuracy. Thus an ensemble prediction approach can be a better choice than any single prediction technique for predicting the spatial variation of SOC stocks.

Download Full-text

Machine Learning Approaches to Predict Peak Demand Days of Cardiovascular Admissions Considering Environmental Exposure

10.21203/rs.2.19636/v2 ◽

2020 ◽

Author(s):

Hang Qiu ◽

Lin Luo ◽

Ziqi Su ◽

Li Zhou ◽

Liya Wang ◽

...

Keyword(s):

Machine Learning ◽

Loss Function ◽

Ambient Air ◽

Quality Data ◽

Gradient Boosting ◽

Support Vector ◽

Learning Approaches ◽

Learning Models ◽

Peak Demand ◽

Logarithmic Loss

Abstract Background: Accumulating evidence has linked environmental exposures, such as ambient air pollution and meteorological factors to the development and severity of cardiovascular diseases (CVDs), resulting in increased healthcare demand. Effective prediction of demand for healthcare services, particularly those associated with peak events of CVDs, can be useful in optimizing the allocation of medical resources. However, few studies have attempted to adopt machine learning approaches with excellent predictive abilities to forecast the healthcare demand for CVDs. This study aims to develop and compare several machine learning models in predicting the peak demand days of CVDs admissions using the hospital admissions data, air quality data and meteorological data in Chengdu, China from 2015 to 2017. Methods: Six machine learning algorithms, including logistic regression (LR), support vector machine (SVM), artificial neural network (ANN), random forest (RF), extreme gradient boosting (XGBoost), and light gradient boosting machine (LightGBM) were applied to build the predictive models with a unique feature set. The area under a receiver operating characteristic curve (AUC), logarithmic loss function, accuracy, sensitivity, specificity, precision, and F1 score were used to evaluate the predictive performances between the six models. Results: The LightGBM model exhibited the highest AUC (0.940, 95% CI: 0.900-0.980), which was significantly higher than that of LR (0.842, 95% CI: 0.783-0.901), SVM (0.834, 95% CI: 0.774-0.894) and ANN (0.890, 95% CI: 0.836-0.944), but did not differ significantly from that of RF (0.926, 95% CI: 0.879-0.974) and XGBoost (0.930, 95% CI: 0.878-0.982). In addition, the LightGBM has the optimal logarithmic loss function (0.218), accuracy (91.3%), specificity (94.1%), precision (0.695), and F1 score (0.725). Feature importance identification indicated that the contribution rate of meteorological conditions and air pollutants for the prediction was 32% and 43%, respectively. Conclusion: This study suggests that ensemble learning models, especially the LightGBM model, can be used to effectively predict the peak events of CVDs admissions, and therefore could be a very useful decision making tool for medical resource management.

Download Full-text

Performance of Statistical and Machine Learning-Based Methods for Predicting Biogeographical Patterns of Fungal Productivity in Forest Ecosystems

10.21203/rs.3.rs-122045/v1 ◽

2020 ◽

Author(s):

Albert Morera ◽

Juan Martínez de Aragón ◽

José Antonio Bonet ◽

Jingjing Liang ◽

Sergio de-Miguel

Keyword(s):

Machine Learning ◽

Random Forest ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Learning Approaches ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models ◽

Modelling Approaches

Abstract BackgroundThe prediction of biogeographical patterns from a large number of driving factors with complex interactions, correlations and non-linear dependences require advanced analytical methods and modelling tools. This study compares different statistical and machine learning models for predicting fungal productivity biogeographical patterns as a case study for the thorough assessment of the performance of alternative modelling approaches to provide accurate and ecologically-consistent predictions.MethodsWe evaluated and compared the performance of two statistical modelling techniques, namely, generalized linear mixed models and geographically weighted regression, and four machine learning models, namely, random forest, extreme gradient boosting, support vector machine and deep learning to predict fungal productivity. We used a systematic methodology based on substitution, random, spatial and climatic blocking combined with principal component analysis, together with an evaluation of the ecological consistency of spatially-explicit model predictions.ResultsFungal productivity predictions were sensitive to the modelling approach and complexity. Moreover, the importance assigned to different predictors varied between machine learning modelling approaches. Decision tree-based models increased prediction accuracy by ~7% compared to other machine learning approaches and by more than 25% compared to statistical ones, and resulted in higher ecological consistence at the landscape level.ConclusionsWhereas a large number of predictors are often used in machine learning algorithms, in this study we show that proper variable selection is crucial to create robust models for extrapolation in biophysically differentiated areas. When dealing with spatial-temporal data in the analysis of biogeographical patterns, climatic blocking is postulated as a highly informative technique to be used in cross-validation to assess the prediction error over larger scales. Random forest was the best approach for prediction both in sampling-like environments as well as in extrapolation beyond the spatial and climatic range of the modelling data.

Download Full-text

Predicting Breast Cancer: A Comparative Analysis of Machine Learning Algorithms

Proceeding International Conference on Science and Engineering ◽

10.14421/icse.v3.545 ◽

2020 ◽

Vol 3 ◽

pp. 455-459

Author(s):

Pulung Hendro Prastyo ◽

I Gede Yudi Paramartha ◽

Michael S. Moses Pakpahan ◽

Igi Ardiyanto

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Confusion Matrix ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Learning Approaches ◽

K Nearest Neighbors ◽

Common Cancer

Breast cancer is the most common cancer among women (43.3 incidents per 100.000 women), with the highest mortality (14.3 incidents per 100.000 women). Early detection is critical for survival. Using machine learning approaches, the problem can be effectively classified, predicted, and analyzed. In this study, we compared eight machine learning algorithms: Gaussian Naïve Bayes (GNB), k-Nearest Neighbors (K-NN), Support Vector Machine(SVM), Random Forest (RF), AdaBoost, Gradient Boosting (GB), XGBoost, and Multi-Layer Perceptron (MLP). The experiment is conducted using Breast Cancer Wisconsin datasets, confusion matrix, and 5-folds cross-validation. Experimental results showed that XGBoost provides the best performance. XGBoost obtained accuracy (97,19%), recall (96,75%), precision (97,28%), F1-score (96,99%), and AUC (99,61%). Our result showed that XGBoost is the most effective method to predict breast cancer in the Breast Cancer Wisconsin dataset.

Download Full-text

APPLYING ECONOMIC MEASURES TO LAPSE RISK MANAGEMENT WITH MACHINE LEARNING APPROACHES

Astin Bulletin ◽

10.1017/asb.2021.10 ◽

2021 ◽

pp. 1-33

Author(s):

Stéphane Loisel ◽

Pierrick Piette ◽

Cheng-Hsien Jason Tsai

Keyword(s):

Machine Learning ◽

Risk Management ◽

Regression Tree ◽

Classification Problem ◽

Point Of View ◽

Classification And Regression Tree ◽

Gradient Boosting ◽

Support Vector ◽

Learning Approaches ◽

Extreme Gradient Boosting

Abstract Modeling policyholders’ lapse behaviors is important to a life insurer, since lapses affect pricing, reserving, profitability, liquidity, risk management, and the solvency of the insurer. In this paper, we apply two machine learning methods to lapse modeling. Then, we evaluate the performance of these two methods along with two popular statistical methods by means of statistical accuracy and profitability measure. Moreover, we adopt an innovative point of view on the lapse prediction problem that comes from churn management. We transform the classification problem into a regression question and then perform optimization, which is new to lapse risk management. We apply the aforementioned four methods to a large real-world insurance dataset. The results show that Extreme Gradient Boosting (XGBoost) and support vector machine outperform logistic regression (LR) and classification and regression tree with respect to statistic accuracy, while LR performs as well as XGBoost in terms of retention gains. This highlights the importance of a proper validation metric when comparing different methods. The optimization after the transformation brings out significant and consistent increases in economic gains. Therefore, the insurer should conduct optimization on its economic objective to achieve optimal lapse management.

Download Full-text