Stacked Regression and Poststratification

Joseph T. Ornstein

doi:10.1017/pan.2019.43

Stacked Regression and Poststratification

Political Analysis ◽

10.1017/pan.2019.43 ◽

2019 ◽

Vol 28 (2) ◽

pp. 293-301

Author(s):

Joseph T. Ornstein

Keyword(s):

Public Opinion ◽

Nearest Neighbors ◽

Local Area ◽

Parametric Model ◽

Large Datasets ◽

Gradient Boosting ◽

Multilevel Regression ◽

K Nearest Neighbors ◽

Data Generating Process ◽

Diverse Ensemble

I develop a procedure for estimating local-area public opinion called stacked regression and poststratification (SRP), a generalization of classical multilevel regression and poststratification (MRP). This procedure employs a diverse ensemble of predictive models—including multilevel regression, LASSO, k-nearest neighbors, random forest, and gradient boosting—to improve the cross-validated fit of the first-stage predictions. In a Monte Carlo simulation, SRP significantly outperforms MRP when there are deep interactions in the data generating process, without requiring the researcher to specify a complex parametric model in advance. In an empirical application, I show that SRP produces superior local public opinion estimates on a broad range of issue areas, particularly when trained on large datasets.

Download Full-text

Predicting hospitalization following psychiatric crisis care using machine learning

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-020-01361-1 ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Matthijs Blankers ◽

Louk F. M. van der Post ◽

Jack J. M. Dekker

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Prediction Models ◽

Learning Algorithms ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Ensemble Model ◽

K Nearest Neighbors ◽

Crisis Care

Abstract Background Accurate prediction models for whether patients on the verge of a psychiatric criseis need hospitalization are lacking and machine learning methods may help improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate the accuracy of ten machine learning algorithms, including the generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact. We also evaluate an ensemble model to optimize the accuracy and we explore individual predictors of hospitalization. Methods Data from 2084 patients included in the longitudinal Amsterdam Study of Acute Psychiatry with at least one reported psychiatric crisis care contact were included. Target variable for the prediction models was whether the patient was hospitalized in the 12 months following inclusion. The predictive power of 39 variables related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts was evaluated. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared and we also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis and the five best performing algorithms were combined in an ensemble model using stacking. Results All models performed above chance level. We found Gradient Boosting to be the best performing algorithm (AUC = 0.774) and K-Nearest Neighbors to be the least performing (AUC = 0.702). The performance of GLM/logistic regression (AUC = 0.76) was slightly above average among the tested algorithms. In a Net Reclassification Improvement analysis Gradient Boosting outperformed GLM/logistic regression by 2.9% and K-Nearest Neighbors by 11.3%. GLM/logistic regression outperformed K-Nearest Neighbors by 8.7%. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was in most cases modest. The results show that a predictive accuracy similar to the best performing model can be achieved when combining multiple algorithms in an ensemble model.

Download Full-text

Predicting Hospitalization following Psychiatric Crisis Care using Machine Learning

10.21203/rs.2.12338/v1 ◽

2019 ◽

Author(s):

Matthijs Blankers ◽

Louk F. M. van der Post ◽

Jack J. M. Dekker

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Learning Algorithms ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Predictor Variables ◽

Gradient Boosting ◽

K Nearest Neighbors ◽

Psychiatric Crisis ◽

Crisis Care

Abstract Background: It is difficult to accurately predict whether a patient on the verge of a potential psychiatric crisis will need to be hospitalized. Machine learning may be helpful to improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate and compare the accuracy of ten machine learning algorithms including the commonly used generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact, and explore the most important predictor variables of hospitalization. Methods: Data from 2,084 patients with at least one reported psychiatric crisis care contact included in the longitudinal Amsterdam Study of Acute Psychiatry were used. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared. We also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis. Target variable for the prediction models was whether or not the patient was hospitalized in the 12 months following inclusion in the study. The 39 predictor variables were related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts. Results: We found Gradient Boosting to perform the best (AUC=0.774) and K-Nearest Neighbors performing the least (AUC=0.702). The performance of GLM/logistic regression (AUC=0.76) was above average among the tested algorithms. Gradient Boosting outperformed GLM/logistic regression and K-Nearest Neighbors, and GLM outperformed K-Nearest Neighbors in a Net Reclassification Improvement analysis, although the differences between Gradient Boosting and GLM/logistic regression were small. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions: Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was modest. Future studies may consider to combine multiple algorithms in an ensemble model for optimal performance and to mitigate the risk of choosing suboptimal performing algorithms.

Download Full-text

A semi-empirical approach using gradient boosting and k-nearest neighbors regression for GEFCom2014 probabilistic solar power forecasting

International Journal of Forecasting ◽

10.1016/j.ijforecast.2015.11.002 ◽

2016 ◽

Vol 32 (3) ◽

pp. 1081-1086 ◽

Cited By ~ 34

Author(s):

Jing Huang ◽

Matthew Perry

Keyword(s):

Solar Power ◽

Nearest Neighbors ◽

Empirical Approach ◽

Gradient Boosting ◽

K Nearest Neighbors ◽

Semi Empirical ◽

Solar Power Forecasting ◽

Power Forecasting

Download Full-text

Towards Optimization of Malware Detection using Chi-square Feature Selection on Ensemble Classifiers

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.d2359.0410421 ◽

2021 ◽

Vol 10 (4) ◽

pp. 254-262

Author(s):

*Fadare Oluwaseun Gbenga ◽

Adetunmbi Adebayo Olusola ◽

(Mrs) Oyinloye Oghenerukevwe Eloho ◽

Mogaji Stephen Alaba

Keyword(s):

Feature Selection ◽

Malware Detection ◽

Feature Selection Method ◽

Ensemble Methods ◽

Nearest Neighbors ◽

Selection Method ◽

Gradient Boosting ◽

K Nearest Neighbors ◽

Chi Square ◽

Extreme Gradient Boosting

The multiplication of malware variations is probably the greatest problem in PC security and the protection of information in form of source code against unauthorized access is a central issue in computer security. In recent times, machine learning has been extensively researched for malware detection and ensemble technique has been established to be highly effective in terms of detection accuracy. This paper proposes a framework that combines combining the exploit of both Chi-square as the feature selection method and eight ensemble learning classifiers on five base learners- K-Nearest Neighbors, Naïve Bayes, Support Vector Machine, Decision Trees, and Logistic Regression. K-Nearest Neighbors returns the highest accuracy of 95.37%, 87.89% on chi-square, and without feature selection respectively. Extreme Gradient Boosting Classifier ensemble accuracy is the highest with 97.407%, 91.72% with Chi-square as feature selection, and ensemble methods without feature selection respectively. Extreme Gradient Boosting Classifier and Random Forest are leading in the seven evaluative measures of chi-square as a feature selection method and ensemble methods without feature selection respectively. The study results show that the tree-based ensemble model is compelling for malware classification.

Download Full-text

Classification of Individual Finger Movements from Right Hand Using fNIRS Signals

Sensors ◽

10.3390/s21237943 ◽

2021 ◽

Vol 21 (23) ◽

pp. 7943

Author(s):

Haroon Khan ◽

Farzan M. Noori ◽

Anis Yazidi ◽

Md Zia Uddin ◽

M. N. Afzal Khan ◽

...

Keyword(s):

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Finger Tapping ◽

Gradient Boosting ◽

Support Vector ◽

K Nearest Neighbors ◽

Physiological Noise ◽

Individual Finger ◽

Extreme Gradient Boosting ◽

Depth Analysis

Functional near-infrared spectroscopy (fNIRS) is a comparatively new noninvasive, portable, and easy-to-use brain imaging modality. However, complicated dexterous tasks such as individual finger-tapping, particularly using one hand, have been not investigated using fNIRS technology. Twenty-four healthy volunteers participated in the individual finger-tapping experiment. Data were acquired from the motor cortex using sixteen sources and sixteen detectors. In this preliminary study, we applied standard fNIRS data processing pipeline, i.e. optical densities conversation, signal processing, feature extraction, and classification algorithm implementation. Physiological and non-physiological noise is removed using 4th order band-pass Butter-worth and 3rd order Savitzky–Golay filters. Eight spatial statistical features were selected: signal-mean, peak, minimum, Skewness, Kurtosis, variance, median, and peak-to-peak form data of oxygenated haemoglobin changes. Sophisticated machine learning algorithms were applied, such as support vector machine (SVM), random forests (RF), decision trees (DT), AdaBoost, quadratic discriminant analysis (QDA), Artificial neural networks (ANN), k-nearest neighbors (kNN), and extreme gradient boosting (XGBoost). The average classification accuracies achieved were 0.75±0.04, 0.75±0.05, and 0.77±0.06 using k-nearest neighbors (kNN), Random forest (RF) and XGBoost, respectively. KNN, RF and XGBoost classifiers performed exceptionally well on such a high-class problem. The results need to be further investigated. In the future, a more in-depth analysis of the signal in both temporal and spatial domains will be conducted to investigate the underlying facts. The accuracies achieved are promising results and could open up a new research direction leading to enrichment of control commands generation for fNIRS-based brain-computer interface applications.

Download Full-text

Comparación del nivel de precisión de los clasificadores Support Vector Machines, k Nearest Neighbors, Random Forests, Extra Trees y Gradient Boosting en el reconocimiento de actividades infantiles utilizando sonido ambiental

Research in Computing Science ◽

10.13053/rcs-147-5-21 ◽

2018 ◽

Vol 147 (5) ◽

pp. 281-290 ◽

Cited By ~ 1

Author(s):

Diego M. Blanco-Murillo ◽

Antonio García-Domínguez ◽

Carlos E. Galván-Tejada ◽

José M. Celaya-Padilla

Keyword(s):

Support Vector Machines ◽

Random Forests ◽

Nearest Neighbors ◽

Gradient Boosting ◽

Support Vector ◽

K Nearest Neighbors ◽

Vector Machines

Download Full-text

Food Detection Using Histogram of Oriented Gradient (HOG) as Feature Extraction and K-Nearest Neighbors (K-NN) as Classifier

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2020/3191.52020 ◽

2020 ◽

Vol 9 (1.5) ◽

pp. 219-225

Author(s):

Diah Rahmadani

Keyword(s):

Feature Extraction ◽

Nearest Neighbors ◽

K Nearest Neighbors ◽

Histogram Of Oriented Gradient ◽

Food Detection

Download Full-text

The Implementation of Subspace Outlier Detection in K-Nearest Neighbors to Improve Accuracy in Bank Marketing Data

International Journal of Emerging Trends in Engineering Research ◽

10.30534/ijeter/2020/44822020 ◽

2020 ◽

Vol 8 (2) ◽

pp. 545-550

Author(s):

Dimas Aryo Anggoro

Keyword(s):

Outlier Detection ◽

Nearest Neighbors ◽

K Nearest Neighbors ◽

Improve Accuracy ◽

Marketing Data ◽

Bank Marketing

Download Full-text

Evolutionary Feature Scaling in K-Nearest Neighbors Based on Label Dispersion Minimization

2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC) ◽

10.1109/smc42975.2020.9282834 ◽

2020 ◽

Author(s):

Suryoday Basak ◽

Manfred Huber

Keyword(s):

Nearest Neighbors ◽

K Nearest Neighbors ◽

Feature Scaling

Download Full-text

Tropical Balls and Its Applications to K Nearest Neighbor over the Space of Phylogenetic Trees

Mathematics ◽

10.3390/math9070779 ◽

2021 ◽

Vol 9 (7) ◽

pp. 779

Author(s):

Ruriko Yoshida

Keyword(s):

Supervised Learning ◽

Phylogenetic Trees ◽

Nearest Neighbor ◽

Nearest Neighbors ◽

High Dimensional ◽

Learning Method ◽

Dimensional Vector ◽

K Nearest Neighbor ◽

K Nearest Neighbors

A tropical ball is a ball defined by the tropical metric over the tropical projective torus. In this paper we show several properties of tropical balls over the tropical projective torus and also over the space of phylogenetic trees with a given set of leaf labels. Then we discuss its application to the K nearest neighbors (KNN) algorithm, a supervised learning method used to classify a high-dimensional vector into given categories by looking at a ball centered at the vector, which contains K vectors in the space.

Download Full-text