scholarly journals Simulator Pre-Screening of Underprepared Drivers Prior to Licensing On-Road Examination: Clustering of Virtual Driving Test Time Series Data (Preprint)

2019 ◽  
Author(s):  
David Grethlein ◽  
Flaura Koplin Winston ◽  
Elizabeth Walshe ◽  
Sean Tanner ◽  
Venk Kandadai ◽  
...  

BACKGROUND A large Midwestern state commissioned a virtual driving test (VDT) to assess driving skills preparedness before the on-road examination (ORE). Since July 2017, a pilot deployment of the VDT in state licensing centers (VDT pilot) has collected both VDT and ORE data from new license applicants with the aim of creating a scoring algorithm that could predict those who were underprepared. OBJECTIVE Leveraging data collected from the VDT pilot, this study aimed to develop and conduct an initial evaluation of a novel machine learning (ML)–based classifier using limited domain knowledge and minimal feature engineering to reliably predict applicant pass/fail on the ORE. Such methods, if proven useful, could be applicable to the classification of other time series data collected within medical and other settings. METHODS We analyzed an initial dataset that comprised 4308 drivers who completed both the VDT and the ORE, in which 1096 (25.4%) drivers went on to fail the ORE. We studied 2 different approaches to constructing feature sets to use as input to ML algorithms: the standard method of reducing the time series data to a set of manually defined variables that summarize driving behavior and a novel approach using time series clustering. We then fed these representations into different ML algorithms to compare their ability to predict a driver’s ORE outcome (pass/fail). RESULTS The new method using time series clustering performed similarly compared with the standard method in terms of overall accuracy for predicting pass or fail outcome (76.1% vs 76.2%) and area under the curve (0.656 vs 0.682). However, the time series clustering slightly outperformed the standard method in differentially predicting failure on the ORE. The novel clustering method yielded a risk ratio for failure of 3.07 (95% CI 2.75-3.43), whereas the standard variables method yielded a risk ratio for failure of 2.68 (95% CI 2.41-2.99). In addition, the time series clustering method with logistic regression produced the lowest ratio of false alarms (those who were predicted to fail but went on to pass the ORE; 27.2%). CONCLUSIONS Our results provide initial evidence that the clustering method is useful for feature construction in classification tasks involving time series data when resources are limited to create multiple, domain-relevant variables.

10.2196/13995 ◽  
2020 ◽  
Vol 22 (6) ◽  
pp. e13995
Author(s):  
David Grethlein ◽  
Flaura Koplin Winston ◽  
Elizabeth Walshe ◽  
Sean Tanner ◽  
Venk Kandadai ◽  
...  

Background A large Midwestern state commissioned a virtual driving test (VDT) to assess driving skills preparedness before the on-road examination (ORE). Since July 2017, a pilot deployment of the VDT in state licensing centers (VDT pilot) has collected both VDT and ORE data from new license applicants with the aim of creating a scoring algorithm that could predict those who were underprepared. Objective Leveraging data collected from the VDT pilot, this study aimed to develop and conduct an initial evaluation of a novel machine learning (ML)–based classifier using limited domain knowledge and minimal feature engineering to reliably predict applicant pass/fail on the ORE. Such methods, if proven useful, could be applicable to the classification of other time series data collected within medical and other settings. Methods We analyzed an initial dataset that comprised 4308 drivers who completed both the VDT and the ORE, in which 1096 (25.4%) drivers went on to fail the ORE. We studied 2 different approaches to constructing feature sets to use as input to ML algorithms: the standard method of reducing the time series data to a set of manually defined variables that summarize driving behavior and a novel approach using time series clustering. We then fed these representations into different ML algorithms to compare their ability to predict a driver’s ORE outcome (pass/fail). Results The new method using time series clustering performed similarly compared with the standard method in terms of overall accuracy for predicting pass or fail outcome (76.1% vs 76.2%) and area under the curve (0.656 vs 0.682). However, the time series clustering slightly outperformed the standard method in differentially predicting failure on the ORE. The novel clustering method yielded a risk ratio for failure of 3.07 (95% CI 2.75-3.43), whereas the standard variables method yielded a risk ratio for failure of 2.68 (95% CI 2.41-2.99). In addition, the time series clustering method with logistic regression produced the lowest ratio of false alarms (those who were predicted to fail but went on to pass the ORE; 27.2%). Conclusions Our results provide initial evidence that the clustering method is useful for feature construction in classification tasks involving time series data when resources are limited to create multiple, domain-relevant variables.


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Ari Wibisono ◽  
Petrus Mursanto ◽  
Jihan Adibah ◽  
Wendy D. W. T. Bayu ◽  
May Iffah Rizki ◽  
...  

Abstract Real-time information mining of a big dataset consisting of time series data is a very challenging task. For this purpose, we propose using the mean distance and the standard deviation to enhance the accuracy of the existing fast incremental model tree with the drift detection (FIMT-DD) algorithm. The standard FIMT-DD algorithm uses the Hoeffding bound as its splitting criterion. We propose the further use of the mean distance and standard deviation, which are used to split a tree more accurately than the standard method. We verify our proposed method using the large Traffic Demand Dataset, which consists of 4,000,000 instances; Tennet’s big wind power plant dataset, which consists of 435,268 instances; and a road weather dataset, which consists of 30,000,000 instances. The results show that our proposed FIMT-DD algorithm improves the accuracy compared to the standard method and Chernoff bound approach. The measured errors demonstrate that our approach results in a lower Mean Absolute Percentage Error (MAPE) in every stage of learning by approximately 2.49% compared with the Chernoff Bound method and 19.65% compared with the standard method.


2014 ◽  
Vol 2014 ◽  
pp. 1-19 ◽  
Author(s):  
Seyedjamal Zolhavarieh ◽  
Saeed Aghabozorgi ◽  
Ying Wah Teh

Clustering of subsequence time series remains an open issue in time series clustering. Subsequence time series clustering is used in different fields, such as e-commerce, outlier detection, speech recognition, biological systems, DNA recognition, and text mining. One of the useful fields in the domain of subsequence time series clustering is pattern recognition. To improve this field, a sequence of time series data is used. This paper reviews some definitions and backgrounds related to subsequence time series clustering. The categorization of the literature reviews is divided into three groups: preproof, interproof, and postproof period. Moreover, various state-of-the-art approaches in performing subsequence time series clustering are discussed under each of the following categories. The strengths and weaknesses of the employed methods are evaluated as potential issues for future studies.


2011 ◽  
Vol 38 (9) ◽  
pp. 11891-11900 ◽  
Author(s):  
Xiaohang Zhang ◽  
Jiaqi Liu ◽  
Yu Du ◽  
Tingjie Lv

Author(s):  
Pēteris Grabusts ◽  
Arkady Borisov

Clustering Methodology for Time Series MiningA time series is a sequence of real data, representing the measurements of a real variable at time intervals. Time series analysis is a sufficiently well-known task; however, in recent years research has been carried out with the purpose to try to use clustering for the intentions of time series analysis. The main motivation for representing a time series in the form of clusters is to better represent the main characteristics of the data. The central goal of the present research paper was to investigate clustering methodology for time series data mining, to explore the facilities of time series similarity measures and to use them in the analysis of time series clustering results. More complicated similarity measures include Longest Common Subsequence method (LCSS). In this paper, two tasks have been completed. The first task was to define time series similarity measures. It has been established that LCSS method gives better results in the detection of time series similarity than the Euclidean distance. The second task was to explore the facilities of the classical k-means clustering algorithm in time series clustering. As a result of the experiment a conclusion has been drawn that the results of time series clustering with the help of k-means algorithm correspond to the results obtained with LCSS method, thus the clustering results of the specific time series are adequate.


Kybernetes ◽  
2019 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Hossein Abbasimehr ◽  
Mostafa Shabani

Purpose The purpose of this paper is to propose a new methodology that handles the issue of the dynamic behavior of customers over time. Design/methodology/approach A new methodology is presented based on time series clustering to extract dominant behavioral patterns of customers over time. This methodology is implemented using bank customers’ transactions data which are in the form of time series data. The data include the recency (R), frequency (F) and monetary (M) attributes of businesses that are using the point-of-sale (POS) data of a bank. This data were obtained from the data analysis department of the bank. Findings After carrying out an empirical study on the acquired transaction data of 2,531 business customers that are using POS devices of the bank, the dominant trends of behavior are discovered using the proposed methodology. The obtained trends were analyzed from the marketing viewpoint. Based on the analysis of the monetary attribute, customers were divided into four main segments, including high-value growing customers, middle-value growing customers, prone to churn and churners. For each resulted group of customers with a distinctive trend, effective and practical marketing recommendations were devised to improve the bank relationship with that group. The prone-to-churn segment contains most of the customers; therefore, the bank should conduct interesting promotions to retain this segment. Practical implications The discovered trends of customer behavior and proposed marketing recommendations can be helpful for banks in devising segment-specific marketing strategies as they illustrate the dynamic behavior of customers over time. The obtained trends are visualized so that they can be easily interpreted and used by banks. This paper contributes to the literature on customer relationship management (CRM) as the proposed methodology can be effectively applied to different businesses to reveal trends in customer behavior. Originality/value In the current business condition, customer behavior is changing continually over time and customers are churning due to the reduced switching costs. Therefore, choosing an effective customer segmentation methodology which can consider the dynamic behaviors of customers is essential for every business. This paper proposes a new methodology to capture customer dynamic behavior using time series clustering on time-ordered data. This is an improvement over previous studies, in which static segmentation approaches have often been adopted. To the best of the authors’ knowledge, this is the first study that combines the recency, frequency, and monetary model and time series clustering to reveal trends in customer behavior.


Sign in / Sign up

Export Citation Format

Share Document