scholarly journals Analyzing Accident Injury Severity via an Extreme Gradient Boosting (XGBoost) Model

2021 ◽  
Vol 2021 ◽  
pp. 1-11 ◽  
Author(s):  
Shubo Wu ◽  
Quan Yuan ◽  
Zhongwei Yan ◽  
Qing Xu

Vehicle to vulnerable road user (VRU) crashes occupy a large proportion of traffic crashes in China, and crash injury severity analysis can support traffic managers to understand the implicit rules behind the crashes. Therefore, 554 VRUs-involved crashes are collected from January, 2017, to February, 2021, in a city in northern China, including 322 vehicle-pedestrian crashes and 232 vehicle-bicycle crashes. First, a descriptive statistical analysis is conducted to investigate the characteristics of VRUs-involved crashes. Second, the extreme gradient boosting (XGBoost) model is introduced to identify the importance of risk factors (i.e., time of day, day of week, rushing hour, crash position, weather, and crash involvements) of VRUs-involved crashes. The statistical analysis demonstrates that the risk factors are closely related to VRUs-involved crash injury severity. Moreover, the results of XGBoost reveal that time of day has the greatest impact on VRUs-involved crashes, and crash position shows the minimum importance among these risk factors.

2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Biao Wu ◽  
Xingyu Wang ◽  
Tuo Liu ◽  
Naibao Dong ◽  
Yun Li

To analyze the risk factors influencing the crash injury severity in rural-urban fringes, crash data in rural-urban fringes were collected from Harbin, China. Four risk factors, namely, time of day, vehicle type, road feature, and crash type, were investigated associated with the severity of rural-urban fringe crashes. The crash injury severity was divided into two categories, including fatal and nonfatal crash. The logistic regression was applied to explore the relationships between the severity outcomes and time of day, vehicle type, road feature, and crash type. The test methods of goodness-of-fit and badness-of-fit are conducted to examine the validity of estimation results. The results show considerable matching of the number of different crash types between calculated results and actual data. Compared with the other influencing factors, the time of day is significant factor for crash injury severity based on the study. As such, the proposed calibration procedure and the factors of choice are recommended as a validated approach to analyze and identify the main factors influencing crash injury severity in rural-urban fringes.


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Mingyue Xue ◽  
Yinxia Su ◽  
Chen Li ◽  
Shuxia Wang ◽  
Hua Yao

Background. An estimated 425 million people globally have diabetes, accounting for 12% of the world’s health expenditures, and the number continues to grow, placing a huge burden on the healthcare system, especially in those remote, underserved areas. Methods. A total of 584,168 adult subjects who have participated in the national physical examination were enrolled in this study. The risk factors for type II diabetes mellitus (T2DM) were identified by p values and odds ratio, using logistic regression (LR) based on variables of physical measurement and a questionnaire. Combined with the risk factors selected by LR, we used a decision tree, a random forest, AdaBoost with a decision tree (AdaBoost), and an extreme gradient boosting decision tree (XGBoost) to identify individuals with T2DM, compared the performance of the four machine learning classifiers, and used the best-performing classifier to output the degree of variables’ importance scores of T2DM. Results. The results indicated that XGBoost had the best performance (accuracy=0.906, precision=0.910, recall=0.902, F‐1=0.906, and AUC=0.968). The degree of variables’ importance scores in XGBoost showed that BMI was the most significant feature, followed by age, waist circumference, systolic pressure, ethnicity, smoking amount, fatty liver, hypertension, physical activity, drinking status, dietary ratio (meat to vegetables), drink amount, smoking status, and diet habit (oil loving). Conclusions. We proposed a classifier based on LR-XGBoost which used fourteen variables of patients which are easily obtained and noninvasive as predictor variables to identify potential incidents of T2DM. The classifier can accurately screen the risk of diabetes in the early phrase, and the degree of variables’ importance scores gives a clue to prevent diabetes occurrence.


2021 ◽  
Author(s):  
Anmin Hu ◽  
Hui-Ping Li ◽  
Zhen Li ◽  
Zhongjun Zhang ◽  
Xiong-Xiong Zhong

Abstract Purpose: The aim of this study was to use machine learning to construct a model for the analysis of risk factors and prediction of delirium among ICU patients.Methods: We developed a set of real-world data to enable the comparison of the reliability and accuracy of delirium prediction models from the MIMIC-III database, the MIMIC-IV database and the eICU Collaborative Research Database. Significance tests, correlation analysis, and factor analysis were used to individually screen 80 potential risk factors. The predictive algorithms were run using the following models: Logistic regression, naive Bayesian, K-nearest neighbors, support vector machine, random forest, and eXtreme Gradient Boosting. Conventional E-PRE-DELIRIC and eighteen models, including all-factor (AF) models with all potential variables, characteristic variable (CV) models with principal component factors, and rapid predictive (RP) models without laboratory test results, were used to construct the risk prediction model for delirium. The performance of these machine learning models was measured by the area under the receiver operating characteristic curve (AUC) of tenfold cross-validation. The VIMs and SHAP algorithms, feature interpretation and sample prediction interpretation algorithms of the machine learning black box model were implemented.Results: A total of 78,365 patients were enrolled in this study, 22,159 of whom (28.28%) had positive delirium records. The E-PRE-DELIRIC model (AUC, 0.77), CV models (AUC, 0.77-0.93), CV models (AUC, 0.77-0.88) and RP models (AUC, 0.75-0.87) had discriminatory value. The random forest CV model found that the top five factors accounting for the weight of delirium were length of ICU stay, verbal response score, APACHE-III score, urine volume and hemoglobin. The SHAP values in the eXtreme Gradient Boosting CV model showed that the top three features that were negatively correlated with outcomes were verbal response score, urine volume, and hemoglobin; the top three characteristics that were positively correlated with outcomes were length of ICU stay, APACHE-III score, and alanine transaminase.Conclusion: Even with a small number of variables, machine learning has a good ability to predict delirium in critically ill patients. Characteristic variables provide direction for early intervention to reduce the risk of delirium.


2021 ◽  
Author(s):  
Sang Min Nam ◽  
Thomas A Peterson ◽  
Kyoung Yul Seo ◽  
Hyun Wook Han ◽  
Jee In Kang

BACKGROUND In epidemiological studies, finding the best subset of factors is challenging when the number of explanatory variables is large. OBJECTIVE Our study had two aims. First, we aimed to identify essential depression-associated factors using the extreme gradient boosting (XGBoost) machine learning algorithm from big survey data (the Korea National Health and Nutrition Examination Survey, 2012-2016). Second, we aimed to achieve a comprehensive understanding of multifactorial features in depression using network analysis. METHODS An XGBoost model was trained and tested to classify “current depression” and “no lifetime depression” for a data set of 120 variables for 12,596 cases. The optimal XGBoost hyperparameters were set by an automated machine learning tool (TPOT), and a high-performance sparse model was obtained by feature selection using the feature importance value of XGBoost. We performed statistical tests on the model and nonmodel factors using survey-weighted multiple logistic regression and drew a correlation network among factors. We also adopted statistical tests for the confounder or interaction effect of selected risk factors when it was suspected on the network. RESULTS The XGBoost-derived depression model consisted of 18 factors with an area under the weighted receiver operating characteristic curve of 0.86. Two nonmodel factors could be found using the model factors, and the factors were classified into direct (<i>P</i>&lt;.05) and indirect (<i>P</i>≥.05), according to the statistical significance of the association with depression. Perceived stress and asthma were the most remarkable risk factors, and urine specific gravity was a novel protective factor. The depression-factor network showed clusters of socioeconomic status and quality of life factors and suggested that educational level and sex might be predisposing factors. Indirect factors (eg, diabetes, hypercholesterolemia, and smoking) were involved in confounding or interaction effects of direct factors. Triglyceride level was a confounder of hypercholesterolemia and diabetes, smoking had a significant risk in females, and weight gain was associated with depression involving diabetes. CONCLUSIONS XGBoost and network analysis were useful to discover depression-related factors and their relationships and can be applied to epidemiological studies using big survey data.


2020 ◽  
Vol 4 (Supplement_1) ◽  
pp. 113-113
Author(s):  
Shuangshuang Wang ◽  
Nina Silverstein ◽  
Chae Man Lee ◽  
Frank Porell ◽  
Beth Dugan

Abstract The number of pedestrian crashes in the United States has increased by 35 percent from 2008 to 2017. Among all pedestrian fatalities in 2017, 48% were pedestrians aged 50 and older, which suggests a disproportionate threat to older residents’ health and safety. Massachusetts has a large older population and is experiencing increased numbers of older pedestrian crashes. This research identified risk factors and community characteristics contributing to older pedestrian crashes and suggests leveraging the state’s age-friendly efforts to speed the implementation of countermeasures. Based on ten-year statewide crash data (2006-2015) and community indicators from the 2018 Massachusetts Healthy Aging Data Report, this study examined 4,472 crashes across Massachusetts that involved pedestrians age 55 and over. The leading reasons for crashes were driver’s inattention, driver’s failure to yield right of way, and driver’s issues with visibility. Older pedestrians were hit while walking in the road, often in crosswalks at intersections. Many factors were found to contribute to older pedestrian crashes: time of day (rush hour), time of year (winter), and community factors (higher rates of disabilities, higher percentage of racial minority residents, higher number of cultural amenities, and lack of dementia-friendly community efforts. Greater awareness of older pedestrian safety risks is needed. Communities highlighted in this research warrant priority attention from planning, health, aging services, and transportation authorities to improve older pedestrian safety.


2021 ◽  
Vol 12 ◽  
Author(s):  
Ze Yu ◽  
Huanhuan Ji ◽  
Jianwen Xiao ◽  
Ping Wei ◽  
Lin Song ◽  
...  

The aim of this study was to apply machine learning methods to deeply explore the risk factors associated with adverse drug events (ADEs) and predict the occurrence of ADEs in Chinese pediatric inpatients. Data of 1,746 patients aged between 28 days and 18 years (mean age = 3.84 years) were included in the study from January 1, 2013, to December 31, 2015, in the Children’s Hospital of Chongqing Medical University. There were 247 cases of ADE occurrence, of which the most common drugs inducing ADEs were antibacterials. Seven algorithms, including eXtreme Gradient Boosting (XGBoost), CatBoost, AdaBoost, LightGBM, Random Forest (RF), Gradient Boosting Decision Tree (GBDT), and TPOT, were used to select the important risk factors, and GBDT was chosen to establish the prediction model with the best predicting abilities (precision = 44%, recall = 25%, F1 = 31.88%). The GBDT model has better performance than Global Trigger Tools (GTTs) for ADE prediction (precision 44 vs. 13.3%). In addition, multiple risk factors were identified via GBDT, such as the number of trigger true (TT) (+), number of doses, BMI, number of drugs, number of admission, height, length of hospital stay, weight, age, and number of diagnoses. The influencing directions of the risk factors on ADEs were displayed through Shapley Additive exPlanations (SHAP). This study provides a novel method to accurately predict adverse drug events in Chinese pediatric inpatients with the associated risk factors, which may be applicable in clinical practice in the future.


2019 ◽  
Vol 11 (19) ◽  
pp. 5194 ◽  
Author(s):  
Natalia Casado-Sanz ◽  
Begoña Guirao ◽  
Antonio Lara Galera ◽  
Maria Attard

According to the Spanish General Traffic Accident Directorate, in 2017 a total of 351 pedestrians were killed, and 14,322 pedestrians were injured in motor vehicle crashes in Spain. However, very few studies have been conducted in order to analyse the main factors that contribute to pedestrian injury severity. This study analyses the accidents that involve a single vehicle and a single pedestrian on Spanish crosstown roads from 2006 to 2016 (1535 crashes). The factors that explain these accidents include infractions committed by the pedestrian and the driver, crash profiles, and infrastructure characteristics. As a preliminary tool for the segmentation of 1535 pedestrian crashes, a k-means cluster analysis was applied. In addition, multinomial logit (MNL) models were used for analysing crash data, where possible outcomes were fatalities and severe and minor injured pedestrians. According to the results of these models, the risk factors associated with pedestrian injury severity are as follows: visibility restricted by weather conditions or glare, infractions committed by the pedestrian (such as not using crossings, crossing unlawfully, or walking on the road), infractions committed by the driver (such as distracted driving and not respecting a light or a crossing), and finally, speed infractions committed by drivers (such as inadequate speed). This study proposes the specific safety countermeasures that in turn will improve overall road safety in this particular type of road.


10.2196/27344 ◽  
2021 ◽  
Vol 23 (6) ◽  
pp. e27344
Author(s):  
Sang Min Nam ◽  
Thomas A Peterson ◽  
Kyoung Yul Seo ◽  
Hyun Wook Han ◽  
Jee In Kang

Background In epidemiological studies, finding the best subset of factors is challenging when the number of explanatory variables is large. Objective Our study had two aims. First, we aimed to identify essential depression-associated factors using the extreme gradient boosting (XGBoost) machine learning algorithm from big survey data (the Korea National Health and Nutrition Examination Survey, 2012-2016). Second, we aimed to achieve a comprehensive understanding of multifactorial features in depression using network analysis. Methods An XGBoost model was trained and tested to classify “current depression” and “no lifetime depression” for a data set of 120 variables for 12,596 cases. The optimal XGBoost hyperparameters were set by an automated machine learning tool (TPOT), and a high-performance sparse model was obtained by feature selection using the feature importance value of XGBoost. We performed statistical tests on the model and nonmodel factors using survey-weighted multiple logistic regression and drew a correlation network among factors. We also adopted statistical tests for the confounder or interaction effect of selected risk factors when it was suspected on the network. Results The XGBoost-derived depression model consisted of 18 factors with an area under the weighted receiver operating characteristic curve of 0.86. Two nonmodel factors could be found using the model factors, and the factors were classified into direct (P<.05) and indirect (P≥.05), according to the statistical significance of the association with depression. Perceived stress and asthma were the most remarkable risk factors, and urine specific gravity was a novel protective factor. The depression-factor network showed clusters of socioeconomic status and quality of life factors and suggested that educational level and sex might be predisposing factors. Indirect factors (eg, diabetes, hypercholesterolemia, and smoking) were involved in confounding or interaction effects of direct factors. Triglyceride level was a confounder of hypercholesterolemia and diabetes, smoking had a significant risk in females, and weight gain was associated with depression involving diabetes. Conclusions XGBoost and network analysis were useful to discover depression-related factors and their relationships and can be applied to epidemiological studies using big survey data.


Sign in / Sign up

Export Citation Format

Share Document