Improved Multilevel Regression with Post-Stratification Through Machine Learning (autoMrP)

Is the ‘Local Candidate’ Advantage a Myth? Analysing the Effects of Localism in the 2015 UK General Election

10.31235/osf.io/6a3nt ◽

2016 ◽

Author(s):

Javier Sajuria

Keyword(s):

Mediation Analysis ◽

General Election ◽

Data Sources ◽

Election Study ◽

Multilevel Regression ◽

Factual Information ◽

Diverse Range ◽

Local Candidate ◽

Post Stratification

This paper studies the question of the so-called electoral advantage of local candidates. We use a diverse range of data sources to estimate whether a candidate residing in the same constituency they compete has any advantage at all. We then compare the effect of the factual information against a measure of perception of residence, taken from the British Election Study Internet Panel. We propose different methodological innovations from traditional analyses of this issue. We first concentrate on the top two candidates of the most competitive constituencies, and use a measurement of perception calculates using Multilevel Regression with Post-stratification. We use mediation analysis to estimate the overall effects. Our findings show that local candidates have an advantage only if they are perceived as local, and that incumbents are usually perceived as more local than challengers.

Download Full-text

An Introduction to Multilevel Regression and Post-Stratification for Estimating Constituency Opinion

Political Studies Review ◽

10.1177/1478929919864773 ◽

2019 ◽

Vol 18 (4) ◽

pp. 630-645 ◽

Cited By ~ 1

Author(s):

Chris Hanretty

Keyword(s):

Data Sources ◽

Multilevel Regression ◽

Small Areas ◽

Worked Example ◽

The Uk ◽

Post Stratification

This article provides an overview of multilevel regression and post-stratification. It reviews the stages in estimating opinion for small areas, identifies circumstances in which multilevel regression and post-stratification can go wrong, or go right, and provides a worked example for the UK using publicly available data sources and a previously published post-stratification frame.

Download Full-text

Polling India via regression and post-stratification of non-probability online samples

PLoS ONE ◽

10.1371/journal.pone.0260092 ◽

2021 ◽

Vol 16 (11) ◽

pp. e0260092

Author(s):

Roberto Cerina ◽

Raymond Duch

Keyword(s):

Large Scale ◽

Indian Population ◽

Absolute Error ◽

Training Data ◽

Simultaneous Estimation ◽

Multilevel Regression ◽

Integration Approach ◽

Technological Advances ◽

Core Components ◽

Post Stratification

Recent technological advances have facilitated the collection of large-scale administrative data and the online surveying of the Indian population. Building on these we propose a strategy for more robust, frequent and transparent projections of the Indian vote during the campaign. We execute a modified MrP model of Indian vote preferences that proposes innovations to each of its three core components: stratification frame, training data, and a learner. For the post-stratification frame we propose a novel Data Integration approach that allows the simultaneous estimation of counts from multiple complementary sources, such as census tables and auxiliary surveys. For the training data we assemble panels of respondents from two unorthodox online populations: Amazon Mechanical Turks workers and Facebook users. And as a modeling tool, we replace the Bayesian multilevel regression learner with Random Forests. Our 2019 pre-election forecasts for the two largest Lok Sahba coalitions were very close to actual outcomes: we predicted 41.8% for the NDA, against an observed value of 45.0% and 30.8% for the UPA against an observed vote share of just under 31.3%. Our uniform-swing seat projection outperforms other pollsters—we had the lowest absolute error of 89 seats (along with a poll from ‘Jan Ki Baat’); the lowest error on the NDA-UPA lead (a mere 8 seats), and we are the only pollster that can capture real-time preference shifts due to salient campaign events.

Download Full-text

Estimating Bycatch From Non-representative Samples (II): A Case Study of Pair Trawlers and Common Dolphins in the Bay of Biscay

Frontiers in Marine Science ◽

10.3389/fmars.2021.795942 ◽

2022 ◽

Vol 8 ◽

Author(s):

Etienne Rouby ◽

Laurent Dubroca ◽

Thomas Cloâtre ◽

Sebastien Demanèche ◽

Mathieu Genu ◽

...

Keyword(s):

Current Knowledge ◽

Anthropogenic Activities ◽

Mitigation Measures ◽

Common Dolphin ◽

Bay Of Biscay ◽

Multilevel Regression ◽

North East ◽

Wide Range ◽

Common Dolphins ◽

Post Stratification

Marine megafauna plays an important functional role in marine ecosystems as top predators but are threatened by a wide range of anthropogenic activities. Bycatch, the incidental capture of non-targeted species in commercial and recreational fisheries, is of particular concern for small cetacean species, such as dolphins and porpoises. In the North-East Atlantic, common dolphin (Delphinus delphis, Linné 1758) bycatch has been increasing and associated with large numbers of animals stranding during winter on the French Atlantic seashore since at least 2017. However, uncertainties around the true magnitude of common dolphin bycatch and the fisheries involved have led to delays in the implementation of mitigation measures. Current data collection on dolphin bycatch in France is with non-dedicated observers deployed on vessels for the purpose of national fisheries sampling programmes. These data cannot be assumed representative of the whole fisheries' bycatch events. This feature makes it difficult to use classic ratio estimators since they require a truly randomised sample of the fishery by dedicated observers. We applied a newly developed approach, regularised multilevel regression with post-stratification, to estimate total bycatch from unrepresentative samples and total fishing effort. The latter is needed for post-stratification and the former is analysed in a Bayesian framework with multilevel regression to regularise and better predict bycatch risk. We estimated the number of bycaught dolphins for each week and 10 International Council for the Exploration of the Sea (ICES) divisions from 2004 to 2020 by estimating jointly bycatch risk, haul duration, and the number of hauls per days at sea (DaS). Bycatch risk in pair trawlers flying the French flag was the highest in winter 2017 and 2019 and was associated with the longest haul durations. ICES divisions 8.a and 8.b (shelf part of the Bay of Biscay) were estimated to have the highest common dolphin bycatch. Our results were consistent with independent estimates of common dolphin bycatch from strandings. Our method show cases how non-representative observer data can nevertheless be analysed to estimate fishing duration, bycatch risk and, ultimately, the number of bycaught dolphins. These weekly-estimates improve upon current knowledge of the nature of common dolphin bycatch and can be used to inform management and policy decisions at a finer spatio-temporal scale than has been possible to date. Our results suggest that limiting haul duration, especially in winter, could serve as an effective mitigation strategy.

Download Full-text

BARP: Improving Mister P Using Bayesian Additive Regression Trees

American Political Science Review ◽

10.1017/s0003055419000480 ◽

2019 ◽

Vol 113 (4) ◽

pp. 1060-1065 ◽

Cited By ~ 1

Author(s):

JAMES BISBEE

Keyword(s):

Multilevel Model ◽

Regression Trees ◽

R Package ◽

Substantial Improvement ◽

Regularization Methods ◽

Multilevel Regression ◽

Nonparametric Approach ◽

Additive Regression ◽

Post Stratification ◽

Bayesian Additive Regression Trees

Multilevel regression and post-stratification (MRP) is the current gold standard for extrapolating opinion data from nationally representative surveys to smaller geographic units. However, innovations in nonparametric regularization methods can further improve the researcher’s ability to extrapolate opinion data to a geographic unit of interest. I test an ensemble of regularization algorithms and find that there is room for substantial improvement on the multilevel model via more flexible methods of regularization. I propose a modified version of MRP that replaces the multilevel model with a nonparametric approach called Bayesian additive regression trees (BART or, when combined with post-stratification, BARP). I compare both methods across a number of data contexts, demonstrating the benefits of applying more powerful regularization methods to extrapolate opinion data to target geographical units. I provide an R package that implements the BARP method.

Download Full-text

Devotion at Sub-National Level: Ramadan, Nighttime Lights, and Religiosity in the Egyptian Governorates

International Journal of Public Opinion Research ◽

10.1093/ijpor/edaa019 ◽

2020 ◽

Author(s):

Sabri Ciftci ◽

Michael Robbins ◽

Sofya Zaytseva

Keyword(s):

Survey Data ◽

Satellite Imagery ◽

National Level ◽

Multilevel Regression ◽

Nighttime Lights ◽

Post Stratification

Abstract This study aims to construct reliable measures of religiosity and to cross-validate survey-based measures in operationalization of this central variable. We obtain measures of sub-national religiosity in the Egyptian governorates from the Arab Barometer surveys using disaggregation and multilevel regression and post-stratification techniques. Then, we use satellite imagery to compare these measures to the intensity of nighttime lights during the holy month of Ramadan. Although not designed to be fully representative, the analysis reveals that survey data at the sub-national level can provide approximate measures when aggregated. These findings contribute to scholarship by introducing a novel measure of religiosity based on nighttime activity during Ramadan and by cross-validating the reliability of survey-based measures of aggregate religiosity.

Download Full-text

Mind wandering as data augmentation: How mental travel supports abstraction

Behavioral and Brain Sciences ◽

10.1017/s0140525x1900311x ◽

2020 ◽

Vol 43 ◽

Author(s):

Myrthe Faber

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Mental Content ◽

Mind Wandering ◽

Theoretical Framework ◽

Important Addition

Abstract Gilead et al. state that abstraction supports mental travel, and that mental travel critically relies on abstraction. I propose an important addition to this theoretical framework, namely that mental travel might also support abstraction. Specifically, I argue that spontaneous mental travel (mind wandering), much like data augmentation in machine learning, provides variability in mental content and context necessary for abstraction.

Download Full-text