Comparison of Regression and Classification Models for User-Independent and Personal Stress Detection

Pekka Siirtola; Juha Röning

doi:10.3390/s20164402

Comparison of Regression and Classification Models for User-Independent and Personal Stress Detection

Sensors ◽

10.3390/s20164402 ◽

2020 ◽

Vol 20 (16) ◽

pp. 4402

Author(s):

Pekka Siirtola ◽

Juha Röning

Keyword(s):

Feature Selection ◽

Regression Model ◽

Regression Models ◽

Personal Data ◽

The Other ◽

Training Data ◽

Classification Model ◽

Classification Models ◽

Stress Detection ◽

Personal Training

In this article, regression and classification models are compared for stress detection. Both personal and user-independent models are experimented. The article is based on publicly open dataset called AffectiveROAD, which contains data gathered using Empatica E4 sensor and unlike most of the other stress detection datasets, it contains continuous target variables. The used classification model is Random Forest and the regression model is Bagged tree based ensemble. Based on experiments, regression models outperform classification models, when classifying observations as stressed or not-stressed. The best user-independent results are obtained using a combination of blood volume pulse and skin temperature features, and using these the average balanced accuracy was 74.1% with classification model and 82.3% using regression model. In addition, regression models can be used to estimate the level of the stress. Moreover, the results based on models trained using personal data are not encouraging showing that biosignals have a lot of variation not only between the study subjects but also between the session gathered from the same person. On the other hand, it is shown that with subject-wise feature selection for user-independent model, it is possible to improve recognition models more than by using personal training data to build personal models. In fact, it is shown that with subject-wise feature selection, the average detection rate can be improved as much as 4%-units, and it is especially useful to reduce the variance in the recognition rates between the study subjects.

Download Full-text

Application of Photo Texture Analysis and Weather Data in Assessment of Air Quality in Terms of Airborne PM10 and PM2.5 Particulate Matter

Sensors ◽

10.3390/s21165483 ◽

2021 ◽

Vol 21 (16) ◽

pp. 5483

Author(s):

Monika Chuchro ◽

Wojciech Sarlej ◽

Marta Grzegorczyk ◽

Karolina Nurzyńska

Keyword(s):

Air Quality ◽

Texture Analysis ◽

Regression Models ◽

Quality Measurement ◽

Classification Model ◽

Weather Data ◽

Classification Models ◽

Pm10 And Pm2.5 ◽

Air Quality Assessment ◽

Short Video

The study was undertaken in Krakow, which is situated in Lesser Poland Voivodeship, where bad PM10 air-quality indicators occurred on more than 100 days in the years 2010–2019. Krakow has continuous air quality measurement in seven locations that are run by the Province Environmental Protection Inspectorate. The research aimed to create regression and classification models for PM10 and PM2.5 estimation based on sky photos and basic weather data. For this research, one short video with a resolution of 1920 × 1080 px was captured each day. From each film, only five frames were used, the information from which was averaged. Then, texture analysis was performed on each averaged photo frame. The results of the texture analysis were used in the regression and classification models. The regression models’ quality for the test datasets equals 0.85 and 0.73 for PM10 and 0.63 for PM2.5. The quality of each classification model differs (0.86 and 0.73 for PM10, and 0.80 for PM2.5). The obtained results show that the created classification models could be used in PM10 and PM2.5 air quality assessment. Moreover, the character of the obtained regression models indicates that their quality could be enhanced; thus, improved results could be obtained.

Download Full-text

Parametric regression model for response time in clinical trials – a bayesian approach

Journal of Management and Science ◽

10.26524/jms.2017.1 ◽

2017 ◽

Vol 7 (1) ◽

pp. 1-7 ◽

Cited By ~ 1

Author(s):

SUNDARAM N

Keyword(s):

Clinical Trials ◽

Monte Carlo ◽

Regression Model ◽

Survival Data ◽

Regression Models ◽

Bayesian Regression ◽

The Other ◽

Mcmc Methods ◽

Censored Survival Data ◽

Parametric Regression

In this paper an attempt has been made to model the censored survival data using Bayesian regressions with Markov Chain Monte Carlo (MCMC) methods. Bayesian LogNormal (LN) regression model are found to be providing better fit than the other Bayesian regression models namely Exponential (E), Generalized Exponential (GE), Webull (W), LogLogistic (LL) and Gamma (G).

Download Full-text

Deep learning classification and regression models for temperature values on a simulated fibre specklegram sensor

Journal of Physics Conference Series ◽

10.1088/1742-6596/2139/1/012001 ◽

2021 ◽

Vol 2139 (1) ◽

pp. 012001

Author(s):

J D Arango ◽

V H Aristizabal ◽

J F Carrasquilla ◽

J A Gomez ◽

J C Quijano ◽

...

Keyword(s):

Deep Learning ◽

Regression Model ◽

Regression Models ◽

Absolute Error ◽

Measurement Range ◽

Classification Model ◽

Alternative Methods ◽

The Finite Element Method ◽

Temperature Range ◽

Modal Interference

Abstract Fiber optic specklegram sensors use the modal interference pattern (or specklegram) to determine the magnitude of a disturbance. The most used interrogation methods for these sensors have focused on point measurements of intensity or correlations between specklegrams, with limitations in sensitivity and useful measurement range. To investigate alternative methods of specklegram interrogation that improve the performance of the fiber specklegram sensors, we implemented and compared two deep learning models: a classification model and a regression model. To test and train the models, we use physical-optical models and simulations by the finite element method to create a database of specklegram images, covering the temperature range between 0 °C and 100 °C. With the prediction tests, we showed that both models can cover the entire proposed temperature range and achieve an accuracy of 99.5%, for the classification model, and a mean absolute error of 2.3 °C, in the regression model. We believe that these results show that the strategies implemented can improve the metrological capabilities of this type of sensor.

Download Full-text

Assessments of Feature Selection Techniques with Respect to Data Sampling for Highly Imbalanced Software Measurement Data

International Journal of Reliability Quality and Safety Engineering ◽

10.1142/s0218539315500102 ◽

2015 ◽

Vol 22 (02) ◽

pp. 1550010 ◽

Cited By ~ 1

Author(s):

Kehan Gao ◽

Taghi M. Khoshgoftaar

Keyword(s):

Feature Selection ◽

Measurement Data ◽

Classification Performance ◽

Training Data ◽

Classification Model ◽

Sampling Techniques ◽

Data Sampling ◽

Software Measurement ◽

Data Set ◽

The Stability

In the process of software defect prediction, a classification model is first built using software metrics and fault data gathered from a past software development project, then that model is applied to data in a similar project or a new release of the same project to predict new program modules as either fault-prone (fp) or not-fault-prone (nfp). The benefit of such a model is to facilitate the optimal use of limited financial and human resources for software testing and inspection. The predictive power of a classification model constructed from a given data set is affected by many factors. In this paper, we are more interested in two problems that often arise in software measurement data: high dimensionality and unequal example set size of the two types of modules (e.g., many more nfp modules than fp modules found in a data set). These directly result in learning time extension and a decline in predictive performance of classification models. We consider using data sampling followed by feature selection (FS) to deal with these problems. Six data sampling strategies (which are made up of three sampling techniques, each consisting of two post-sampling proportion ratios) and six commonly used feature ranking approaches are employed in this study. We evaluate the FS techniques by means of: (1) a general method, i.e., assessing the classification performance after the training data is modified, and (2) studying the stability of a FS method, specifically with the goal of understanding the effect of data sampling techniques on the stability of FS when using the sampled data. The experiments were performed on nine data sets from a real-world software project. The results demonstrate that the FS techniques that most enhance the models' classification performance do not also show the best stability, and vice versa. In addition, the classification performance is more affected by the sampling techniques themselves rather than by the post-sampling proportions, whereas this is opposite for the stability.

Download Full-text

Leveraging Unlabeled Data for Classification

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch181 ◽

2011 ◽

pp. 1164-1169

Author(s):

Yinghui Yang ◽

Balaji Padmanabhan

Keyword(s):

Research Question ◽

Unlabeled Data ◽

Training Data ◽

Bank Loan ◽

Classification Model ◽

Classification Models ◽

Class Label ◽

Data Record ◽

Model Training ◽

Class Labels

Classification is a form of data analysis that can be used to extract models to predict categorical class labels (Han & Kamber, 2001). Data classification has proven to be very useful in a wide variety of applications. For example, a classification model can be built to categorize bank loan applications as either safe or risky. In order to build a classification model, training data containing multiple independent variables and a dependant variable (class label) is needed. If a data record has a known value for its class label, this data record is termed “labeled”. If the value for its class is unknown, it is “unlabeled”. There are situations with a large amount of unlabeled data and a small amount of labeled data. Using only labeled data to build classification models can potentially ignore useful information contained in the unlabeled data. Furthermore, unlabeled data can often be much cheaper and more plentiful than labeled data, and so if useful information can be extracted from it that reduces the need for labeled examples, this can be a significant benefit (Balcan & Blum 2005). The default practice is to use only the labeled data to build a classification model and then assign class labels to the unlabeled data. However, when the amount of labeled data is not enough, the classification model built only using the labeled data can be biased and far from accurate. The class labels assigned to the unlabeled data can then be inaccurate. How to leverage the information contained in the unlabeled data to help improve the accuracy of the classification model is an important research question. There are two streams of research that addresses the challenging issue of how to appropriately use unlabeled data for building classification models. The details are discussed below.

Download Full-text

Using A Low-Cost Sensor Array and Machine Learning Techniques to Detect Complex Pollutant Mixtures and Identify Likely Sources

Sensors ◽

10.3390/s19173723 ◽

2019 ◽

Vol 19 (17) ◽

pp. 3723 ◽

Cited By ~ 4

Author(s):

Jacob Thorson ◽

Ashley Collier-Oxandale ◽

Michael Hannigan

Keyword(s):

Neural Networks ◽

Linear Regression ◽

Multiple Linear Regression ◽

Random Forests ◽

Regression Models ◽

Low Cost ◽

Classification Model ◽

Support Vector ◽

Classification Models ◽

Pollutant Mixtures

An array of low-cost sensors was assembled and tested in a chamber environment wherein several pollutant mixtures were generated. The four classes of sources that were simulated were mobile emissions, biomass burning, natural gas emissions, and gasoline vapors. A two-step regression and classification method was developed and applied to the sensor data from this array. We first applied regression models to estimate the concentrations of several compounds and then classification models trained to use those estimates to identify the presence of each of those sources. The regression models that were used included forms of multiple linear regression, random forests, Gaussian process regression, and neural networks. The regression models with human-interpretable outputs were investigated to understand the utility of each sensor signal. The classification models that were trained included logistic regression, random forests, support vector machines, and neural networks. The best combination of models was determined by maximizing the F1 score on ten-fold cross-validation data. The highest F1 score, as calculated on testing data, was 0.72 and was produced by the combination of a multiple linear regression model utilizing the full array of sensors and a random forest classification model.

Download Full-text

Robustness against outliers: A new variance inflated regression model for proportions

Statistical Modelling ◽

10.1177/1471082x18821213 ◽

2019 ◽

Vol 20 (3) ◽

pp. 274-309

Author(s):

Agnese Maria Di Brisco ◽

Sonia Migliorati ◽

Andrea Ongaro

Keyword(s):

Regression Model ◽

Bayesian Approach ◽

Regression Models ◽

The Other ◽

Regression Coefficients ◽

Posterior Distributions ◽

New Model ◽

Model Based ◽

Regression Curves ◽

New Distribution

This article addresses the issue of building regression models for bounded responses, which are robust in the presence of outliers. To this end, a new distribution on (0,1) and a regression model based on it are proposed and some properties are derived. The distribution is a mixture of two beta components. One of them, showing a higher variance (variance inflated) is expected to capture outliers. Within a Bayesian approach, an extensive robustness study is performed to compare the new model with three competing ones present in the literature. A broad range of inferential tools are considered, aimed at measuring the influence of various outlier patterns from diverse perspectives. It emerges that the new model displays a better performance in terms of stability of regression coefficients’ posterior distributions and of regression curves under all outlier patterns. Moreover, it exhibits an adequate behaviour under all considered settings, unlike the other models.

Download Full-text

Comparative Study of Classification Models with Genetic Search Based Feature Selection Technique

International Journal of Applied Evolutionary Computation ◽

10.4018/ijaec.2018070101 ◽

2018 ◽

Vol 9 (3) ◽

pp. 1-11

Author(s):

Sanat Kumar Sahu ◽

A. K. Shrivas

Keyword(s):

Feature Selection ◽

Radial Basis Function Network ◽

Research Work ◽

Regression Tree ◽

Classification And Regression Tree ◽

Classification Model ◽

Classification Models ◽

Feature Selection Technique ◽

Genetic Search ◽

Selection Technique

Feature selection plays a very important role to retrieve the relevant features from datasets and computationally improves the performance of a model. The objective of this study is to evaluate the most important features of a chronic kidney disease (CKD) dataset and diagnose the CKD problem. In this research work, the authors have used a genetic search with the Wrapper Subset Evaluator method for feature selection to increase the overall performance of the classification model. They have also used Bayes Network, Classification and Regression Tree (CART), Radial Basis Function Network (RBFN) and J48 classifier for classification of CKD and non-CKD data. The proposed genetic search based feature selection technique (GSBFST) selects the best features from CKD dataset and compares the performance of classifiers with proposed and existing genetic search feature selection techniques (FSTs). All classification models give the better result with proposed GSBFST as compared to without FST and existing genetic search FSTs.

Download Full-text

Comparison of Machine-Learning Classification Models for Glaucoma Management

Journal of Healthcare Engineering ◽

10.1155/2018/6874765 ◽

2018 ◽

Vol 2018 ◽

pp. 1-8 ◽

Cited By ~ 11

Author(s):

Guangzhou An ◽

Kazuko Omodaka ◽

Satoru Tsuda ◽

Yukihiro Shiga ◽

Naoko Takada ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Laser Speckle ◽

Classification Model ◽

Background Information ◽

Laser Speckle Flowgraphy ◽

Support Vector ◽

Classification Models ◽

Machine Learning Classification ◽

Optic Discs

This study develops an objective machine-learning classification model for classifying glaucomatous optic discs and reveals the classificatory criteria to assist in clinical glaucoma management. In this study, 163 glaucoma eyes were labelled with four optic disc types by three glaucoma specialists and then randomly separated into training and test data. All the images of these eyes were captured using optical coherence tomography and laser speckle flowgraphy to quantify the ocular structure and blood-flow-related parameters. A total of 91 parameters were extracted from each eye along with the patients’ background information. Machine-learning classifiers, including the neural network (NN), naïve Bayes (NB), support vector machine (SVM), and gradient boosted decision trees (GBDT), were trained to build the classification models, and a hybrid feature selection method that combines minimum redundancy maximum relevance and genetic-algorithm-based feature selection was applied to find the most valid and relevant features for NN, NB, and SVM. A comparison of the performance of the three machine-learning classification models showed that the NN had the best classification performance with a validated accuracy of 87.8% using only nine ocular parameters. These selected quantified parameters enabled the trained NN to classify glaucomatous optic discs with relatively high performance without requiring color fundus images.

Download Full-text

Improving the Chemical Selectivity of an Electronic Nose to TNT, DNT and RDX Using Machine Learning

Sensors ◽

10.3390/s19235207 ◽

2019 ◽

Vol 19 (23) ◽

pp. 5207 ◽

Cited By ~ 3

Author(s):

Anton Gradišek ◽

Marion van Midden ◽

Matija Koterle ◽

Vid Prezelj ◽

Drago Strle ◽

...

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

The Other ◽

Classification Model ◽

Machine Learning Algorithm ◽

Classification Models ◽

Functionalized Surfaces ◽

Other Substances ◽

Chemical Selectivity ◽

Excellent Accuracy

We used a 16-channel e-nose demonstrator based on micro-capacitive sensors with functionalized surfaces to measure the response of 30 different sensors to the vapours from 11 different substances, including the explosives 1,3,5-trinitro-1,3,5-triazinane (RDX), 1-methyl-2,4-dinitrobenzene (DNT) and 2-methyl-1,3,5-trinitrobenzene (TNT). A classification model was developed using the Random Forest machine-learning algorithm and trained the models on a set of signals, where the concentration and flow of a selected single vapour were varied independently. It is demonstrated that our classification models are successful in recognizing the signal pattern of different sets of substances. An excellent accuracy of 96% was achieved for identifying the explosives from among the other substances. These experiments clearly demonstrate that the silane monolayers used in our sensors as receptor layers are particularly well suited to selecting and recognizing TNT and similar types of explosives from among other substances.

Download Full-text