Comparing Random Forest with Logistic Regression for Predicting Class-Imbalanced Civil War Onset Data

2016 ◽  
Vol 24 (1) ◽  
pp. 87-103 ◽  
Author(s):  
David Muchlinski ◽  
David Siroky ◽  
Jingrui He ◽  
Matthew Kocher

The most commonly used statistical models of civil war onset fail to correctly predict most occurrences of this rare event in out-of-sample data. Statistical methods for the analysis of binary data, such as logistic regression, even in their rare event and regularized forms, perform poorly at prediction. We compare the performance of Random Forests with three versions of logistic regression (classic logistic regression, Firth rare events logistic regression, andL1-regularized logistic regression), and find that the algorithmic approach provides significantly more accurate predictions of civil war onset in out-of-sample data than any of the logistic regression models. The article discusses these results and the ways in which algorithmic statistical methods like Random Forests can be useful to more accurately predict rare events in conflict data.

2004 ◽  
Vol 42 (2) ◽  
pp. 494-500 ◽  
Author(s):  
Robert H Bates

Stephen Haber et al. explore economic growth in key sectors of the Mexican economy, 1876–1929, an era of political instability and (1914–17) civil war. The authors demonstrate that economic growth continued amidst political instability and offer an explanation for their conunterintuitive finding. Reviewing the evidence advanced by the authors, Robert Bates summarizes and comments on their argument, and applies it to “out of sample” data from Africa.


2012 ◽  
Vol 12 (6) ◽  
pp. 1937-1947 ◽  
Author(s):  
M. Guns ◽  
V. Vanacker

Abstract. Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.


2018 ◽  
Vol 38 (1) ◽  
pp. 87-108 ◽  
Author(s):  
Eelco van der Maat

Scholars who have sought to identify the triggers of rare political events have met with limited success. With respect to civil war, studies teach us to expect conflict where it is feasible. However, although we understand where civil conflict occurs, we do not quite understand when it occurs. Focusing on civil conflict, I argue that time-variant and time-invariant explanations relate to the outcome by means of two distinct causal processes, which has implications for the identification of triggers of rare events. I provide an easily implementable approach to improve rare event estimation that uses matching to leverage constant attributes to estimate the effects of rare predictors. I demonstrate the utility of this procedure by providing an aggregate and disaggregate example of civil conflict onset estimation.


2021 ◽  
Vol 8 (1) ◽  
pp. 1497-1506
Author(s):  
Aba Dio ◽  
El Hadji Dème ◽  
Idrissa Sy ◽  
Aliou Diop

Logistic regression model is widely used in many studies to investigate the relationship between a binary response variable Y and a set of potential predictors X. The binary response may represent, for example, the occurrence of some outcome of interest (Y=1 if the outcome occurred and Y=0 otherwise). When the dependent variable Y represents a rare event, the logistic regression model shows relevant drawbacks. In order to overcome these drawbacks we propose the Generalized Extreme Value (GEV) regression model. In particularly, we suggest the quantile function of the GEV distribution as link function. Strokes are a serious pathology and a neurological emergency involving the vital prognosis and the functional prognosis. In Senegal, strokes account for more than 30% of hospitalizations and are responsible for nearly two thirds of mortality. In this work, we use the GVE regression model for binary data to determine the risk factors leading to stroke and to develop a predictive model of life-threatening outcomes in central Sénégal.


2020 ◽  
Vol 39 (6) ◽  
pp. 8463-8475
Author(s):  
Palanivel Srinivasan ◽  
Manivannan Doraipandian

Rare event detections are performed using spatial domain and frequency domain-based procedures. Omnipresent surveillance camera footages are increasing exponentially due course the time. Monitoring all the events manually is an insignificant and more time-consuming process. Therefore, an automated rare event detection contrivance is required to make this process manageable. In this work, a Context-Free Grammar (CFG) is developed for detecting rare events from a video stream and Artificial Neural Network (ANN) is used to train CFG. A set of dedicated algorithms are used to perform frame split process, edge detection, background subtraction and convert the processed data into CFG. The developed CFG is converted into nodes and edges to form a graph. The graph is given to the input layer of an ANN to classify normal and rare event classes. Graph derived from CFG using input video stream is used to train ANN Further the performance of developed Artificial Neural Network Based Context-Free Grammar – Rare Event Detection (ACFG-RED) is compared with other existing techniques and performance metrics such as accuracy, precision, sensitivity, recall, average processing time and average processing power are used for performance estimation and analyzed. Better performance metrics values have been observed for the ANN-CFG model compared with other techniques. The developed model will provide a better solution in detecting rare events using video streams.


2019 ◽  
Author(s):  
Oskar Flygare ◽  
Jesper Enander ◽  
Erik Andersson ◽  
Brjánn Ljótsson ◽  
Volen Z Ivanov ◽  
...  

**Background:** Previous attempts to identify predictors of treatment outcomes in body dysmorphic disorder (BDD) have yielded inconsistent findings. One way to increase precision and clinical utility could be to use machine learning methods, which can incorporate multiple non-linear associations in prediction models. **Methods:** This study used a random forests machine learning approach to test if it is possible to reliably predict remission from BDD in a sample of 88 individuals that had received internet-delivered cognitive behavioral therapy for BDD. The random forest models were compared to traditional logistic regression analyses. **Results:** Random forests correctly identified 78% of participants as remitters or non-remitters at post-treatment. The accuracy of prediction was lower in subsequent follow-ups (68%, 66% and 61% correctly classified at 3-, 12- and 24-month follow-ups, respectively). Depressive symptoms, treatment credibility, working alliance, and initial severity of BDD were among the most important predictors at the beginning of treatment. By contrast, the logistic regression models did not identify consistent and strong predictors of remission from BDD. **Conclusions:** The results provide initial support for the clinical utility of machine learning approaches in the prediction of outcomes of patients with BDD. **Trial registration:** ClinicalTrials.gov ID: NCT02010619.


Author(s):  
Kazutaka Uchida ◽  
Junichi Kouno ◽  
Shinichi Yoshimura ◽  
Norito Kinjo ◽  
Fumihiro Sakakibara ◽  
...  

AbstractIn conjunction with recent advancements in machine learning (ML), such technologies have been applied in various fields owing to their high predictive performance. We tried to develop prehospital stroke scale with ML. We conducted multi-center retrospective and prospective cohort study. The training cohort had eight centers in Japan from June 2015 to March 2018, and the test cohort had 13 centers from April 2019 to March 2020. We use the three different ML algorithms (logistic regression, random forests, XGBoost) to develop models. Main outcomes were large vessel occlusion (LVO), intracranial hemorrhage (ICH), subarachnoid hemorrhage (SAH), and cerebral infarction (CI) other than LVO. The predictive abilities were validated in the test cohort with accuracy, positive predictive value, sensitivity, specificity, area under the receiver operating characteristic curve (AUC), and F score. The training cohort included 3178 patients with 337 LVO, 487 ICH, 131 SAH, and 676 CI cases, and the test cohort included 3127 patients with 183 LVO, 372 ICH, 90 SAH, and 577 CI cases. The overall accuracies were 0.65, and the positive predictive values, sensitivities, specificities, AUCs, and F scores were stable in the test cohort. The classification abilities were also fair for all ML models. The AUCs for LVO of logistic regression, random forests, and XGBoost were 0.89, 0.89, and 0.88, respectively, in the test cohort, and these values were higher than the previously reported prediction models for LVO. The ML models developed to predict the probability and types of stroke at the prehospital stage had superior predictive abilities.


Cells ◽  
2021 ◽  
Vol 10 (4) ◽  
pp. 826
Author(s):  
Rafael Kretschmer ◽  
Marcelo Santos de Souza ◽  
Ivanete de Oliveira Furo ◽  
Michael N. Romanov ◽  
Ricardo José Gunski ◽  
...  

Interchromosomal rearrangements involving microchromosomes are rare events in birds. To date, they have been found mostly in Psittaciformes, Falconiformes, and Cuculiformes, although only a few orders have been analyzed. Hence, cytogenomic studies focusing on microchromosomes in species belonging to different bird orders are essential to shed more light on the avian chromosome and karyotype evolution. Based on this, we performed a comparative chromosome mapping for chicken microchromosomes 10 to 28 using interspecies BAC-based FISH hybridization in five species, representing four Neoaves orders (Caprimulgiformes, Piciformes, Suliformes, and Trogoniformes). Our results suggest that the ancestral microchromosomal syntenies are conserved in Pteroglossus inscriptus (Piciformes), Ramphastos tucanus tucanus (Piciformes), and Trogon surrucura surrucura (Trogoniformes). On the other hand, chromosome reorganization in Phalacrocorax brasilianus (Suliformes) and Hydropsalis torquata (Caprimulgiformes) included fusions involving both macro- and microchromosomes. Fissions in macrochromosomes were observed in P. brasilianus and H. torquata. Relevant hypothetical Neognathae and Neoaves ancestral karyotypes were reconstructed to trace these rearrangements. We found no interchromosomal rearrangement involving microchromosomes to be shared between avian orders where rearrangements were detected. Our findings suggest that convergent evolution involving microchromosomal change is a rare event in birds and may be appropriate in cytotaxonomic inferences in orders where these rearrangements occurred.


Sign in / Sign up

Export Citation Format

Share Document