Boosting Poisson regression models with telematics car driving data

Machine Learning ◽

10.1007/s10994-021-05957-0 ◽

2021 ◽

Author(s):

Guangyuan Gao ◽

He Wang ◽

Mario V. Wüthrich

Keyword(s):

Regression Models ◽

Data Driven ◽

Insurance Companies ◽

Sources Of Information ◽

Claim Frequency ◽

One Step ◽

Actuarial Risk ◽

Network Approaches ◽

To Receive ◽

Car Driving

AbstractWith the emergence of telematics car driving data, insurance companies have started to boost classical actuarial regression models for claim frequency prediction with telematics car driving information. In this paper, we propose two data-driven neural network approaches that process telematics car driving data to complement classical actuarial pricing with a driving behavior risk factor from telematics data. Our neural networks simultaneously accommodate feature engineering and regression modeling which allows us to integrate telematics car driving data in a one-step approach into the claim frequency regression models. We conclude from our numerical analysis that both classical actuarial risk factors and telematics car driving data are necessary to receive the best predictive models. This emphasizes that these two sources of information interact and complement each other.

Download Full-text

COVID-19 Brings Data Equity Challenges to the Fore

Digital Government: Research and Practice (DGOV) ◽

10.1145/3440889 ◽

2021 ◽

Author(s):

H.V. Jagadish ◽

Julia Stoyanovich ◽

Bill Howe

Keyword(s):

Quality Control ◽

Government Policy ◽

Data Driven ◽

Contact Tracing ◽

Control Mechanisms ◽

Sources Of Information ◽

Decision Systems ◽

Physical Resources ◽

Multiple Levels ◽

Data Driven Decisions

The COVID-19 pandemic is compelling us to make crucial data-driven decisions quickly, bringing together diverse and unreliable sources of information without the usual quality control mechanisms we may employ. These decisions are consequential at multiple levels: they can inform local, state and national government policy, be used to schedule access to physical resources such as elevators and workspaces within an organization, and inform contact tracing and quarantine actions for individuals. In all these cases, significant inequities are likely to arise, and to be propagated and reinforced by data-driven decision systems. In this article, we propose a framework, called FIDES, for surfacing and reasoning about data equity in these systems.

Download Full-text

Research on a novel data-driven aging estimation method for battery systems in real-world electric vehicles

Advances in Mechanical Engineering ◽

10.1177/16878140211027735 ◽

2021 ◽

Vol 13 (7) ◽

pp. 168781402110277

Author(s):

Yankai Hou ◽

Zhaosheng Zhang ◽

Peng Liu ◽

Chunbao Song ◽

Zhenpo Wang

Keyword(s):

Electric Vehicles ◽

Real World ◽

Regression Models ◽

Estimation Method ◽

Recursive Least Squares ◽

Data Driven ◽

Accurate Estimation ◽

Support Vector ◽

Battery Degradation ◽

Operational Data

Accurate estimation of the degree of battery aging is essential to ensure safe operation of electric vehicles. In this paper, using real-world vehicles and their operational data, a battery aging estimation method is proposed based on a dual-polarization equivalent circuit (DPEC) model and multiple data-driven models. The DPEC model and the forgetting factor recursive least-squares method are used to determine the battery system’s ohmic internal resistance, with outliers being filtered using boxplots. Furthermore, eight common data-driven models are used to describe the relationship between battery degradation and the factors influencing this degradation, and these models are analyzed and compared in terms of both estimation accuracy and computational requirements. The results show that the gradient descent tree regression, XGBoost regression, and light GBM regression models are more accurate than the other methods, with root mean square errors of less than 6.9 mΩ. The AdaBoost and random forest regression models are regarded as alternative groups because of their relative instability. The linear regression, support vector machine regression, and k-nearest neighbor regression models are not recommended because of poor accuracy or excessively high computational requirements. This work can serve as a reference for subsequent battery degradation studies based on real-time operational data.

Download Full-text

A DATA-DRIVEN APPROACH TO USER-EXPERIENCE-FOCUSED MODEL-BASED ROADMAPPING FOR NEW PRODUCT PLANNING

Proceedings of the Design Society ◽

10.1017/pds.2021.7 ◽

2021 ◽

Vol 1 ◽

pp. 61-70

Author(s):

Ilia Iuskevich ◽

Andreas-Makoto Hein ◽

Kahina Amokrane-Ferka ◽

Abdelkrim Doufene ◽

Marija Jankovic

Keyword(s):

New Product Development ◽

User Experience ◽

Planning Process ◽

New Product ◽

Data Driven ◽

User Needs ◽

User Testing ◽

Model Based ◽

Data Driven Approach ◽

To Receive

AbstractUser experience (UX) focused business needs to survive and plan its new product development (NPD) activities in a highly turbulent environment. The latter is a function of volatile UX and technology trends, competition, unpredictable events, and user needs uncertainty. To address this problem, the concept of design roadmapping has been proposed in the literature. It was argued that tools built on the idea of design roadmapping have to be very flexible and data-driven (i.e., be able to receive feedback from users in an iterative manner). At the same time, a model-based approach to roadmapping has emerged, promising to achieve such flexibility. In this work, we propose to incorporate design roadmapping to model-based roadmapping and integrate it with various user testing approaches into a single tool to support a flexible data-driven NPD planning process.

Download Full-text

HISTORICAL RETE NETWORKS TO SUPPORT THE DEBUGGING OF FORWARD-CHAINING RULE-BASED PROGRAMS

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213093000059 ◽

1993 ◽

Vol 02 (01) ◽

pp. 47-70

Author(s):

SHARON M. TUTTLE ◽

CHRISTOPH F. EICK

Keyword(s):

Working Memory ◽

Data Driven ◽

Rule Based ◽

Historical Information ◽

Forward Chaining ◽

Changing Environments ◽

Time Performance ◽

Inference Network ◽

One Step ◽

Explanation System

Forward-chaining rule-based programs, being data-driven, can function in changing environments in which backward-chaining rule-based programs would have problems. But, degugging forward-chaining programs can be tedious; to debug a forward-chaining rule-based program, certain ‘historical’ information about the program run is needed. Programmers should be able to directly request such information, instead of having to rerun the program one step at a time or search a trace of run details. As a first step in designing an explanation system for answering such questions, this paper discusses how a forward-chaining program run’s ‘historical’ details can be stored in its Rete inference network, used to match rule conditions to working memory. This can be done without seriously affecting the network’s run-time performance. We call this generalization of the Rete network a historical Rete network. Various algorithms for maintaining this network are discussed, along with how it can be used during debugging, and a debugging tool, MIRO, that incorporates these techniques is also discussed.

Download Full-text

V The Verification of Technical Reserves in Non-Life Insurance

Journal of the Staple Inn Actuarial Society ◽

10.1017/s0020269x00008549 ◽

1971 ◽

Vol 20 (01) ◽

pp. 51-53

Author(s):

C. M. Stewart

Keyword(s):

Life Insurance ◽

Risk Groups ◽

Insurance Companies ◽

Life Assurance ◽

Careful Scrutiny ◽

Sufficient Detail ◽

Satisfactory Method ◽

Claim Frequency ◽

The Government ◽

Do So

The reader of this note will know well the method used in the U.K. for the verification of technical reserves (i.e. the net liability) in life assurance. The net liability must be calculated by a qualified actuary and the methods and bases used must be described in sufficient detail in Schedule 4 of The Insurance Companies (Accounts and Forms) Regulations 1968 for their suitability to be apparent from a careful scrutiny of these and the other financial statistics submitted in accordance with the Regulations. As the data are made public, this scrutiny can be made not only by the Government Actuary in advising the supervisory authorities at the Department of Trade and Industry, but also by any other qualified actuary who cares to do so, which is an equally important discipline. Under this system, the maximum freedom can be allowed to the company and its actuary, but there has hitherto been no equally satisfactory method available for the objective scrutiny of non-life technical reserves. However, the new Claim Frequency Analyses and Claim Settlement Analyses prescribed in Parts II and III of Schedule 3 to the 1968 Regulations should go a long way towards remedying this deficiency. These analyses are to be supplied separately for each class of insurance in each of a company's main markets, and separately for such risk groups within each class as the company decides to be appropriate.

Download Full-text

NeMoR: a New Method Based on Data-Driven for Neonatal Mortality Rate Forecasting

10.1101/2021.04.22.21255916 ◽

2021 ◽

Author(s):

Carlos Eduardo Beluzo ◽

Luciana Correia Alves ◽

Natália Martins Arruda ◽

Cátia Sepetauskas ◽

Everton Silva ◽

...

Keyword(s):

Public Health ◽

Machine Learning ◽

Neonatal Mortality ◽

Regression Models ◽

Mortality Rates ◽

Data Driven ◽

Health Policies ◽

Neonatal Mortality Rate ◽

Policy Makers ◽

Public Health Policies

ABSTRACTReduction in child mortality is one of the United Nations Sustainable Development Goals for 2030. In Brazil, despite recent reduction in child mortality in the last decades, the neonatal mortality is a persistent problem and it is associated with the quality of prenatal, childbirth care and social-environmental factors. In a proper health system, the effect of some of these factors could be minimized by the appropriate number of newborn intensive care units, number of health care units, number of neonatal incubators and even by the correct level of instruction of mothers, which can lead to a proper care along the prenatal period. With the intent of providing knowledge resources for planning public health policies focused on neonatal mortality reduction, we propose a new data-driven machine leaning method for Neonatal Mortality Rate forecasting called NeMoR, which predicts neonatal mortality rates for 4 months ahead, using NeoDeathForecast, a monthly base time series dataset composed by these factors and by neonatal mortality rates history (2006-2016), having 57,816 samples, for all 438 Brazilian administrative health regions. In order to build the model, Extra-Tree, XGBoost Regressor, Gradient Boosting Regressor and Lasso machine learning regression models were evaluated and a hyperparameters search was also performed as a fine tune step. The method has been validated using São Paulo city data, mainly because of data quality. On the better configuration the method predicted the neonatal mortality rates with a Mean Square Error lower than 0.18. Besides that, the forecast results may be useful as it provides a way for policy makers to anticipate trends on neonatal mortality rates curves, an important resource for planning public health policies.Graphical AbstractHighlightsProposition of a new data-driven approach for neonatal mortality rate forecast, which provides a way for policy-makers to anticipate trends on neonatal mortality rates curves, making a better planning of health policies focused on NMR reduction possible;a method for NMR forecasting with a MSE lower than 0.18;an extensive evaluation of different Machine Learning (ML) regression models, as well as hyperparameters search, which accounts for the last stage in NeMoR;a new time series database for NMR prediction problems;a new features projection space for NMR forecasting problems, which considerably reduces errors in NRM prediction.

Download Full-text

Regional regression models of percentile flows for the contiguous US: Expert versus data-driven independent variable selection

10.5194/hess-2016-639 ◽

2016 ◽

Author(s):

Geoffrey Fouad ◽

André Skupin ◽

Christina L. Tague

Keyword(s):

Regression Model ◽

Regression Models ◽

Predictive Performance ◽

Data Driven ◽

Mean Annual Precipitation ◽

Expert Assessment ◽

Independent Variables ◽

Regional Regression ◽

Data Driven Approach ◽

Small Set

Abstract. Percentile flows are statistics derived from the flow duration curve (FDC) that describe the flow equaled or exceeded for a given percent of time. These statistics provide important information for managing rivers, but are often unavailable since most basins are ungauged. A common approach for predicting percentile flows is to deploy regional regression models based on gauged percentile flows and related independent variables derived from physical and climatic data. The first step of this process identifies groups of basins through a cluster analysis of the independent variables, followed by the development of a regression model for each group. This entire process hinges on the independent variables selected to summarize the physical and climatic state of basins. Distributed physical and climatic datasets now exist for the contiguous United States (US). However, it remains unclear how to best represent these data for the development of regional regression models. The study presented here developed regional regression models for the contiguous US, and evaluated the effect of different approaches for selecting the initial set of independent variables on the predictive performance of the regional regression models. An expert assessment of the dominant controls on the FDC was used to identify a small set of independent variables likely related to percentile flows. A data-driven approach was also applied to evaluate two larger sets of variables that consist of either (1) the averages of data for each basin or (2) both the averages and statistical distribution of basin data distributed in space and time. The small set of variables from the expert assessment of the FDC and two larger sets of variables for the data-driven approach were each applied for a regional regression procedure. Differences in predictive performance were evaluated using 184 validation basins withheld from regression model development. The small set of independent variables selected through expert assessment produced similar, if not better, performance than the two larger sets of variables. A parsimonious set of variables only consisted of mean annual precipitation, potential evapotranspiration, and baseflow index. Additional variables in the two larger sets of variables added little to no predictive information. Regional regression models based on the parsimonious set of variables were developed using 734 calibration basins, and were converted into a tool for predicting 13 percentile flows in the contiguous US. Supplementary Material for this paper includes an R graphical user interface for predicting the percentile flows of basins within the range of conditions used to calibrate the regression models. The equations and performance statistics of the models are also supplied in tabular form.

Download Full-text

Remaining Useful Life Prediction of the Concrete Piston Based on Probability Statistics and Data Driven

Applied Sciences ◽

10.3390/app11188482 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8482

Author(s):

Jie Li ◽

Yuejin Tan ◽

Bingfeng Ge ◽

Hua Zhao ◽

Xin Lu

Keyword(s):

Inventory Management ◽

Regression Models ◽

Life Prediction ◽

Remaining Useful Life ◽

Data Driven ◽

Distribution Fitting ◽

Useful Life ◽

Concrete Pump Truck ◽

Concrete Pump ◽

Actual Life

This paper proposes a method on predicting the remaining useful life (RUL) of a concrete piston of a concrete pump truck based on probability statistics and data-driven approaches. Firstly, the average useful life of the concrete piston is determined by probability distribution fitting using actual life data. Secondly, according to condition monitoring data of the concrete pump truck, a concept of life coefficient of the concrete piston is proposed to represent the influence of the loading condition on the actual useful life of individual concrete pistons, and different regression models are established to predict the RUL of the concrete pistons. Finally, according to the prediction result of the concrete piston at different life stages, a replacement warning point is established to provide support for the inventory management and replacement plan of the concrete piston.

Download Full-text

Assessment of groundwater nitrate contamination hazard in a semi-arid region by using integrated parametric IPNOA and data-driven logistic regression models

Environmental Monitoring and Assessment ◽

10.1007/s10661-018-7013-8 ◽

2018 ◽

Vol 190 (11) ◽

Cited By ~ 17

Author(s):

Hossein Mojaddadi Rizeei ◽

Omer Saud Azeez ◽

Biswajeet Pradhan ◽

Hayder Hassan Khamees

Keyword(s):

Logistic Regression ◽

Arid Region ◽

Regression Models ◽

Nitrate Contamination ◽

Data Driven ◽

Logistic Regression Models ◽

Semi Arid Region ◽

Groundwater Nitrate Contamination ◽

Semi Arid

Download Full-text

Abstract P258: Factors Associated With Imaging Choice of Acute Ischemic Stroke Patients and the Related Health Outcomes

Circulation ◽

10.1161/circ.141.suppl_1.p258 ◽

2020 ◽

Vol 141 (Suppl_1) ◽

Author(s):

Jason J Wang ◽

Artem Boltyenkov ◽

Gabriela Martinez ◽

Jeffrey M Katz ◽

Angela Hoang ◽

...

Keyword(s):

Ischemic Stroke ◽

Health Outcomes ◽

Acute Ischemic Stroke ◽

Regression Models ◽

Critical Role ◽

Clear Understanding ◽

Negatively Associated ◽

Factors Associated ◽

To Receive ◽

Iv Tpa

Introduction: Acute ischemic stroke (AIS) presents an ongoing challenge for population health and availability of healthcare resources. Imaging plays a critical role in both diagnosis and treatment decisions in AIS, but optimal utilization regarding advanced imaging with angiography and perfusion using either CTAP or MRAP remain uncertain according to national guidelines. Consequently, wide variation in AIS imaging exists in clinical practice, mostly defaulted to physician preferences and institutional factors, without a clear understanding of the benefits and risks involved in stroke care. Although CTAP and MRAP each have unique benefits and risks in the AIS setting, the effect of this risk-benefit tradeoff on health outcomes and utilization of resources is unknown. This study analyses the factors associated with imaging preferences and the related health outcomes. Method: We performed a retrospective study on an AIS registry consisting of consecutive patients admitted to our institution from November 1, 2011, through October 1, 2018. Imaging and treatment selections and modified Rankin Score (mRS) at discharge were the main outcomes. Independent variables include age, gender, race-ethnicity, and NIH stroke score (NIHSS) at admission. Multivariable logistic regression models were performed. P<0.05 was considered statistically significant. Results: 1884 patients with curated imaging data during hospitalization were included. Among them, 32% were ≥80 years old, 47.4% female, 15.53% black, 60.3% white, and 24.4% with NIHSS≥10 at admission. CTAP and MRAP were performed in 21.1% and 72.2% patients, respectively. 46.1% received thrombolytics (IV-tPA), 1.3% had endovascular therapy (EVT), and 52.7% were not treated. The two clinical outcomes were independent functionality at discharge (mRS0-2) at 48.4%, and patients expired in hospital at 7.1%. Adjusted by all the factors, regression models showed that patients with NIHSS≥10 were more likely to receive CTAP (p<0.0001, OR=3.39) and less likely to receive MRAP (p<0.0001, OR=0.48); whereas age ≥80 was less likely to receive CTAP (p<0.0001, OR=0.37) or MRAP (p<0.0001, OR=0.37). NIHSS≥10 (p<0.0001, OR=0.15) and IV-tPA (p=0.0006, OR=0.69) were negatively related to independent functionality at discharge, and MRAP (p<0.0001, OR=1.97) was positively related to it. NIHSS≥10 (p=0.0212, OR=1.69) were positively related to mortality, while utilization of MRAP showed a negative relationship (p<0.0001, OR=0.26) with it. Conclusion: Higher NIHSS was positively associated with mortality and utilization of CTAP, while it is negatively associated with MRAP. MRAP was positively related to independent functionality at discharge. Older age was negatively associated with CTAP or MRAP utilization.

Download Full-text