scholarly journals Identify Risk Pattern of E-Bike Riders in China Based on Machine Learning Framework

Entropy ◽  
2019 ◽  
Vol 21 (11) ◽  
pp. 1084 ◽  
Author(s):  
Wang ◽  
Kou ◽  
Song

In this paper, the risk pattern of e-bike riders in China was examined, based on tree-structured machine learning techniques. Three-year crash/violation data were acquired from the Kunshan traffic police department, China. Firstly, high-risk (HR) electric bicycle (e-bike) riders were defined as those with at-fault crash involvement, while others (i.e. non-at-fault or without crash involvement) were considered as non-high-risk (NHR) riders, based on quasi-induced exposure theory. Then, for e-bike riders, their demographics and previous violation-related features were developed based on the crash/violation records. After that, a systematic machine learning (ML) framework was proposed so as to capture the complex risk patterns of those e-bike riders. An ensemble sampling method was selected to deal with the imbalanced datasets. Four tree-structured machine learning methods were compared, and a gradient boost decision tree (GBDT) appeared to be the best. The feature importance and partial dependence were further examined. Interesting findings include the following: (1) tree-structured ML models are able to capture complex risk patterns and interpret them properly; (2) spatial-temporal violation features were found as important indicators of high-risk e-bike riders; and (3) violation behavior features appeared to be more effective than violation punishment-related features, in terms of identifying high-risk e-bike riders. In general, the proposed ML framework is able to identify the complex crash risk pattern of e-bike riders. This paper provides useful insights for policy-makers and traffic practitioners regarding e-bike safety improvement in China.

Author(s):  
Ke Wang ◽  
Qingwen Xue ◽  
Jian John Lu

Identifying high-risk drivers before an accident happens is necessary for traffic accident control and prevention. Due to the class-imbalance nature of driving data, high-risk samples as the minority class are usually ill-treated by standard classification algorithms. Instead of applying preset sampling or cost-sensitive learning, this paper proposes a novel automated machine learning framework that simultaneously and automatically searches for the optimal sampling, cost-sensitive loss function, and probability calibration to handle class-imbalance problem in recognition of risky drivers. The hyperparameters that control sampling ratio and class weight, along with other hyperparameters, are optimized by Bayesian optimization. To demonstrate the performance of the proposed automated learning framework, we establish a risky driver recognition model as a case study, using video-extracted vehicle trajectory data of 2427 private cars on a German highway. Based on rear-end collision risk evaluation, only 4.29% of all drivers are labeled as risky drivers. The inputs of the recognition model are the discrete Fourier transform coefficients of target vehicle’s longitudinal speed, lateral speed, and the gap between the target vehicle and its preceding vehicle. Among 12 sampling methods, 2 cost-sensitive loss functions, and 2 probability calibration methods, the result of automated machine learning is consistent with manual searching but much more computation-efficient. We find that the combination of Support Vector Machine-based Synthetic Minority Oversampling TEchnique (SVMSMOTE) sampling, cost-sensitive cross-entropy loss function, and isotonic regression can significantly improve the recognition ability and reduce the error of predicted probability.


Author(s):  
Mario W. L. Moreira ◽  
Joel J. P. C. Rodrigues ◽  
Vasco Furtado ◽  
Constandinos X. Mavromoustakis ◽  
Neeraj Kumar ◽  
...  

2020 ◽  
Vol 6 (1) ◽  
Author(s):  
Jay Jacobs ◽  
Sasha Romanosky ◽  
Idris Adjerid ◽  
Wade Baker

Abstract Despite significant innovations in IT security products and research over the past 20 years, the information security field is still immature and struggling. Practitioners lack the ability to properly assess cyber risk, and decision-makers continue to be paralyzed by vulnerability scanners that overload their staff with mountains of scan results. In order to cope, firms prioritize vulnerability remediation using crude heuristics and limited data, though they are still too often breached by known vulnerabilities for which patches have existed for months or years. And so, the key challenge firms face is trying to identify a remediation strategy that best balances two competing forces. On one hand, it could attempt to patch all vulnerabilities on its network. While this would provide the greatest ‘coverage’ of vulnerabilities patched, it would inefficiently consume resources by fixing low-risk vulnerabilities. On the other hand, patching a few high-risk vulnerabilities would be highly ‘efficient’, but may leave the firm exposed to many other high-risk vulnerabilities. Using a large collection of multiple datasets together with machine learning techniques, we construct a series of vulnerability remediation strategies and compare how each perform in regard to trading off coverage and efficiency. We expand and improve upon the small body of literature that uses predictions of ‘published exploits’, by instead using ‘exploits in the wild’ as our outcome variable. We implement the machine learning models by classifying vulnerabilities according to high- and low-risk, where we consider high-risk vulnerabilities to be those that have been exploited in actual firm networks.


2017 ◽  
Vol 29 (2) ◽  
pp. 190-209 ◽  
Author(s):  
Jennifer Helsby ◽  
Samuel Carton ◽  
Kenneth Joseph ◽  
Ayesha Mahmud ◽  
Youngsoo Park ◽  
...  

Adverse interactions between police and the public hurt police legitimacy, cause harm to both officers and the public, and result in costly litigation. Early intervention systems (EISs) that flag officers considered most likely to be involved in one of these adverse events are an important tool for police supervision and for targeting interventions such as counseling or training. However, the EISs that exist are not data-driven and based on supervisor intuition. We have developed a data-driven EIS that uses a diverse set of data sources from the Charlotte-Mecklenburg Police Department and machine learning techniques to more accurately predict the officers who will have an adverse event. Our approach is able to significantly improve accuracy compared with their existing EIS: Preliminary results indicate a 20% reduction in false positives and a 75% increase in true positives.


2021 ◽  
Author(s):  
Kalum J. Ost ◽  
David W. Anderson ◽  
David W. Cadotte

With the common adoption of electronic health records and new technologies capable of producing an unprecedented scale of data, a shift must occur in how we practice medicine in order to utilize these resources. We are entering an era in which the capacity of even the most clever human doctor simply is insufficient. As such, realizing “personalized” or “precision” medicine requires new methods that can leverage the massive amounts of data now available. Machine learning techniques provide one important toolkit in this venture, as they are fundamentally designed to deal with (and, in fact, benefit from) massive datasets. The clinical applications for such machine learning systems are still in their infancy, however, and the field of medicine presents a unique set of design considerations. In this chapter, we will walk through how we selected and adjusted the “Progressive Learning framework” to account for these considerations in the case of Degenerative Cervical Myeolopathy. We additionally compare a model designed with these techniques to similar static models run in “perfect world” scenarios (free of the clinical issues address), and we use simulated clinical data acquisition scenarios to demonstrate the advantages of our machine learning approach in providing personalized diagnoses.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Koen I. Neijenhuijs ◽  
Carel F. W. Peeters ◽  
Henk van Weert ◽  
Pim Cuijpers ◽  
Irma Verdonck-de Leeuw

Abstract Purpose Knowledge regarding symptom clusters may inform targeted interventions. The current study investigated symptom clusters among cancer survivors, using machine learning techniques on a large data set. Methods Data consisted of self-reports of cancer survivors who used a fully automated online application ‘Oncokompas’ that supports them in their self-management. This is done by 1) monitoring their symptoms through patient reported outcome measures (PROMs); and 2) providing a personalized overview of supportive care options tailored to their scores, aiming to reduce symptom burden and improve health-related quality of life. In the present study, data on 26 generic symptoms (physical and psychosocial) were used. Results of the PROM of each symptom are presented to the user as a no well-being risk, moderate well-being risk, or high well-being risk score. Data of 1032 cancer survivors were analysed using Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) on high risk scores and moderate-to-high risk scores separately. Results When analyzing the high risk scores, seven clusters were extracted: one main cluster which contained most frequently occurring physical and psychosocial symptoms, and six subclusters with different combinations of these symptoms. When analyzing moderate-to-high risk scores, three clusters were extracted: two main clusters were identified, which separated physical symptoms (and their consequences) and psycho-social symptoms, and one subcluster with only body weight issues. Conclusion There appears to be an inherent difference on the co-occurrence of symptoms dependent on symptom severity. Among survivors with high risk scores, the data showed a clustering of more connections between physical and psycho-social symptoms in separate subclusters. Among survivors with moderate-to-high risk scores, we observed less connections in the clustering between physical and psycho-social symptoms.


PLoS ONE ◽  
2019 ◽  
Vol 14 (6) ◽  
pp. e0217639 ◽  
Author(s):  
Jun Su Jung ◽  
Sung Jin Park ◽  
Eun Young Kim ◽  
Kyoung-Sae Na ◽  
Young Jae Kim ◽  
...  

2019 ◽  
Vol 491 (3) ◽  
pp. 3805-3819 ◽  
Author(s):  
K B Johnston ◽  
S M Caballero-Nieves ◽  
V Petit ◽  
A M Peter ◽  
R Haber

ABSTRACT Comprehensive observations of variable stars can include time domain photometry in a multitude of filters, spectroscopy, estimates of colour (e.g. U-B), etc. When the objective is to classify variable stars, traditional machine learning techniques distill these various representations (or views) into a single feature vector and attempt to discriminate among desired categories. In this work, we propose an alternative approach that inherently leverages multiple views of the same variable star. Our multiview metric learning framework enables robust characterization of star categories by directly learning to discriminate in a multifaceted feature space, thus, eliminating the need to combine feature representations prior to fitting the machine learning model. We also demonstrate how to extend standard multiview learning, which employs multiple vectorized views, to the matrix-variate case which allows very novel variable star signature representations. The performance of our proposed methods is evaluated on the UCR Starlight and LINEAR data sets. Both the vector and matrix-variate versions of our multiview learning framework perform favourably – demonstrating the ability to discriminate variable star categories.


2020 ◽  
Author(s):  
Patrick E McSharry ◽  
Charles Mutai ◽  
Innocent Ngaruye ◽  
Edouard Musabanganji

Abstract Aim: HIV prevention measures at sub-Saharan Africa are still short of attaining the UNAIDS 90-90-90 fast track targets set in 2014. Identifying predictors for HIV status may facilitate targeted screening interventions that improve health care. We aimed at identifying HIV predictors as well as predicting persons at high risk of the infection.Method: We applied six machine learning approaches for building models using population-based HIV Impact Assessment (PHIA) data for 41,939 male and 45,105 female respondents with 24 and 29 variables respectively from four countries in sub-Saharan countries. We trained and validated the six algorithms on 80% of data and tested on the remaining 20% where we rotated around the left-out country. An algorithm with the best mean f1 score was retained and trained on the most predictive variables. We used the model to identify people living with HIV and individuals with a higher likelihood of contracting the disease.Results: Application of XGBoost algorithm appeared to significantly improve identification of HIV positivity over the other six algorithms by f1 scoring mean of 78.9% and 92.8% for males and females respectively. Amongst the eight most predictor features in both sexes were: age, relationship with family head, the highest level of education, highest grade at that school level, work for payment, avoiding pregnancy, age at the first experience of sex, and wealth quintile. Model performance using these variables increased significantly compared to having all the variables included. We identified five males and seven females individuals that would require testing to find one HIV positive individual. We also predicted that 4·14% of males and 10.81% of females are at high risk of the infection.Conclusion: Our findings provide a potential use of XGBoost algorithm with socio-behavioural-driven data at substantially identifying HIV predictors and predicting individuals at high risk of infection for targeted screening.


2020 ◽  
Vol 634 ◽  
pp. A57 ◽  
Author(s):  
W. Dobbels ◽  
M. Baes ◽  
S. Viaene ◽  
S. Bianchi ◽  
J. I. Davies ◽  
...  

Context. Dust plays an important role in shaping a galaxy’s spectral energy distribution (SED). It absorbs ultraviolet (UV) to near-infrared radiation and re-emits this energy in the far-infrared (FIR). The FIR is essential to understand dust in galaxies. However, deep FIR observations require a space mission, none of which are still active today. Aims. We aim to infer the FIR emission across six Herschel bands, along with dust luminosity, mass, and effective temperature, based on the available UV to mid-infrared (MIR) observations. We also want to estimate the uncertainties of these predictions, compare our method to energy balance SED fitting, and determine possible limitations of the model. Methods. We propose a machine learning framework to predict the FIR fluxes from 14 UV–MIR broadband fluxes. We used a low redshift sample by combining DustPedia and H-ATLAS, and extracted Bayesian flux posteriors through SED fitting. We trained shallow neural networks to predict the far-infrared fluxes, uncertainties, and dust properties. We evaluated them on a test set using a root mean square error (RMSE) in log-space. Results. Our results (RMSE = 0.19 dex) significantly outperform UV–MIR energy balance SED fitting (RMSE = 0.38 dex), and are inherently unbiased. We can identify when the predictions are off, for example when the input has large uncertainties on WISE 22 μm, or when the input does not resemble the training set. Conclusions. The galaxies for which we have UV–FIR observations can be used as a blueprint for galaxies that lack FIR data. This results in a “virtual FIR telescope”, which can be applied to large optical-MIR galaxy samples. This helps bridge the gap until the next FIR mission.


Sign in / Sign up

Export Citation Format

Share Document