Effective resolving of false-alarms in subsequence matching of time-series data: a method and its performance results

Author(s):  
Sang-Wook Kim ◽  
Se-Bong Oh
Author(s):  
Noura Alghamdi ◽  
Liang Zhang ◽  
Huayi Zhang ◽  
Elke A. Rundensteiner ◽  
Mohamed Y. Eltabakh

Author(s):  
Huanmei Wu ◽  
Betty Salzberg ◽  
Gregory C Sharp ◽  
Steve B Jiang ◽  
Hiroki Shirato ◽  
...  

1996 ◽  
Vol 24 (3) ◽  
pp. 247-261 ◽  
Author(s):  
Ian A. James ◽  
Paul S. Smith ◽  
Derek Milne

Visual analysis, or “eyeballing”, of single subject (N=l) data is the commonest technique for analysing time series data. The present study examined firstly, psychologists' abilities to determine significant change between baseline (A) and therapeutic (B) phases, and secondly, the decision making process in relation to the visual components of such graphs. Thirdly, it looked at the effect that a training programme had on psychologists' abilities to identify significant A−B change. The results revealed that the participants were poor at identifying significant effects from non-significant changes. In particular, the study found a high rate of false alarms (Type 1 errors), and a low rate of misses (Type 2 errors), i.e. high sensitivity but poor specificity. The only visual components to significantly alter decisions were the degree of serial dependency and the mean shift component. The teaching influenced the participants' judgements. In general, participants became more conservative, but there was limited evidence of a significant improvement in their judgements following the teaching.


Author(s):  
Sura Rodpongpun ◽  
Vit Niennattrakul ◽  
Chotirat Ann Ratanamahatana

Many algorithms have been proposed to deal with subsequence similarity search problem in time series data stream. Dynamic Time Warping (DTW), which has been accepted as the best distance measure in time series similarity search, has been used in many research works. SPRING and its variance were proposed to solve such problem by mitigating the complexity of DTW. Unfortunately, these algorithms produce meaningless result since no normalization is taken into account before the distance calculation. Recently, GPUs and FPGAs were used in similarity search supporting subsequence normalization to reduce the computation complexity, but it is still far from practical use. In this work, we propose a novel Meaningful Subsequence Matching (MSM) algorithm which produces meaningful result in subsequence matching by considering global constraint, uniform scaling, and normalization. Our method significantly outperforms the existing algorithms in terms of both computational cost and accuracy.


2016 ◽  
Vol 16 (12) ◽  
pp. 2603-2622
Author(s):  
Jun-Whan Lee ◽  
Sun-Cheon Park ◽  
Duk Kee Lee ◽  
Jong Ho Lee

Abstract. Timely detection of tsunamis with water level records is a critical but logistically challenging task because of outliers and gaps. Since tsunami detection algorithms require several hours of past data, outliers could cause false alarms, and gaps can stop the tsunami detection algorithm even after the recording is restarted. In order to avoid such false alarms and time delays, we propose the Tsunami Arrival time Detection System (TADS), which can be applied to discontinuous time series data with outliers. TADS consists of three algorithms, outlier removal, gap filling, and tsunami detection, which are designed to update whenever new data are acquired. After calibrating the thresholds and parameters for the Ulleung-do surge gauge located in the East Sea (Sea of Japan), Korea, the performance of TADS was discussed based on a 1-year dataset with historical tsunamis and synthetic tsunamis. The results show that the overall performance of TADS is effective in detecting a tsunami signal superimposed on both outliers and gaps.


2019 ◽  
Author(s):  
David Grethlein ◽  
Flaura Koplin Winston ◽  
Elizabeth Walshe ◽  
Sean Tanner ◽  
Venk Kandadai ◽  
...  

BACKGROUND A large Midwestern state commissioned a virtual driving test (VDT) to assess driving skills preparedness before the on-road examination (ORE). Since July 2017, a pilot deployment of the VDT in state licensing centers (VDT pilot) has collected both VDT and ORE data from new license applicants with the aim of creating a scoring algorithm that could predict those who were underprepared. OBJECTIVE Leveraging data collected from the VDT pilot, this study aimed to develop and conduct an initial evaluation of a novel machine learning (ML)–based classifier using limited domain knowledge and minimal feature engineering to reliably predict applicant pass/fail on the ORE. Such methods, if proven useful, could be applicable to the classification of other time series data collected within medical and other settings. METHODS We analyzed an initial dataset that comprised 4308 drivers who completed both the VDT and the ORE, in which 1096 (25.4%) drivers went on to fail the ORE. We studied 2 different approaches to constructing feature sets to use as input to ML algorithms: the standard method of reducing the time series data to a set of manually defined variables that summarize driving behavior and a novel approach using time series clustering. We then fed these representations into different ML algorithms to compare their ability to predict a driver’s ORE outcome (pass/fail). RESULTS The new method using time series clustering performed similarly compared with the standard method in terms of overall accuracy for predicting pass or fail outcome (76.1% vs 76.2%) and area under the curve (0.656 vs 0.682). However, the time series clustering slightly outperformed the standard method in differentially predicting failure on the ORE. The novel clustering method yielded a risk ratio for failure of 3.07 (95% CI 2.75-3.43), whereas the standard variables method yielded a risk ratio for failure of 2.68 (95% CI 2.41-2.99). In addition, the time series clustering method with logistic regression produced the lowest ratio of false alarms (those who were predicted to fail but went on to pass the ORE; 27.2%). CONCLUSIONS Our results provide initial evidence that the clustering method is useful for feature construction in classification tasks involving time series data when resources are limited to create multiple, domain-relevant variables.


10.2196/13995 ◽  
2020 ◽  
Vol 22 (6) ◽  
pp. e13995
Author(s):  
David Grethlein ◽  
Flaura Koplin Winston ◽  
Elizabeth Walshe ◽  
Sean Tanner ◽  
Venk Kandadai ◽  
...  

Background A large Midwestern state commissioned a virtual driving test (VDT) to assess driving skills preparedness before the on-road examination (ORE). Since July 2017, a pilot deployment of the VDT in state licensing centers (VDT pilot) has collected both VDT and ORE data from new license applicants with the aim of creating a scoring algorithm that could predict those who were underprepared. Objective Leveraging data collected from the VDT pilot, this study aimed to develop and conduct an initial evaluation of a novel machine learning (ML)–based classifier using limited domain knowledge and minimal feature engineering to reliably predict applicant pass/fail on the ORE. Such methods, if proven useful, could be applicable to the classification of other time series data collected within medical and other settings. Methods We analyzed an initial dataset that comprised 4308 drivers who completed both the VDT and the ORE, in which 1096 (25.4%) drivers went on to fail the ORE. We studied 2 different approaches to constructing feature sets to use as input to ML algorithms: the standard method of reducing the time series data to a set of manually defined variables that summarize driving behavior and a novel approach using time series clustering. We then fed these representations into different ML algorithms to compare their ability to predict a driver’s ORE outcome (pass/fail). Results The new method using time series clustering performed similarly compared with the standard method in terms of overall accuracy for predicting pass or fail outcome (76.1% vs 76.2%) and area under the curve (0.656 vs 0.682). However, the time series clustering slightly outperformed the standard method in differentially predicting failure on the ORE. The novel clustering method yielded a risk ratio for failure of 3.07 (95% CI 2.75-3.43), whereas the standard variables method yielded a risk ratio for failure of 2.68 (95% CI 2.41-2.99). In addition, the time series clustering method with logistic regression produced the lowest ratio of false alarms (those who were predicted to fail but went on to pass the ORE; 27.2%). Conclusions Our results provide initial evidence that the clustering method is useful for feature construction in classification tasks involving time series data when resources are limited to create multiple, domain-relevant variables.


2021 ◽  
Author(s):  
Michael Yi ◽  
Pradeepkumar Ashok ◽  
Dawson Ramos ◽  
Taylor Thetford ◽  
Spencer Bohlander ◽  
...  

Abstract Kick and lost circulation events are large contributors to non-productive time. Therefore, early detection of these events is crucial. In the absence of good flow in and flow out sensors, pit volume trends offer the best possibility for influx/loss detection, but errors occur since external mud addition /removal to the pits is not monitored or sensed. The goal is to reduce false alarms caused by such mud additions and removal. Data analyzed from over 100s of wells in North America show that mud addition and removal results in certain unique pit volume gain / loss trends, and these trends are quite different from a kick, a lost circulation or a wellbore breathing event trend. Additionally, driller's input text memos into the data aggregation system (EDR) and these memos often provide information with regards to pit operations. In this paper, we introduce a method that utilizes a Bayesian network to aggregate trends detected in time-series data with events identified by natural language processing (NLP) of driller memos critical to greatly improve the accuracy and robustness of kick and lost circulation detection. The methodology was implemented in software that is currently running on rigs in North America. During the test phase, we applied it on several historical wells with lost circulation events and several historical wells with kick events. We were able to identify and quantify the losses even during connections and mud additions, where usually pit volume was increasing despite continual losses. Also, the real-time simultaneous analysis of driller memos provides context to pit volume trends and further reduce the false alarms. The algorithm is also able to take account of pit volume that was reduced due to drilling. Quantification of the losses offers more insight into what lost circulation material to use and the changes in the rate of loss while drilling. This approach was very robust in discovering kicks as well and differentiating it from mud removal and wellbore breathing events. These historical case studies will be detailed in this paper. This is the first time that patterns in mud volume addition and removal detected from time-series data have been used along with driller memos using NLP to reduce false alerts in kick and lost circulation detection. This approach is particularly useful in identifying kick and lost circulation events from pit volume data, especially when good flow in and flow out sensors are not available. The paper provides guidance on how real-time sensor data can be combined with textual data to improve the outputs from an advisory system.


2013 ◽  
Author(s):  
Stephen J. Tueller ◽  
Richard A. Van Dorn ◽  
Georgiy Bobashev ◽  
Barry Eggleston

Author(s):  
Rizki Rahma Kusumadewi ◽  
Wahyu Widayat

Exchange rate is one tool to measure a country’s economic conditions. The growth of a stable currency value indicates that the country has a relatively good economic conditions or stable. This study has the purpose to analyze the factors that affect the exchange rate of the Indonesian Rupiah against the United States Dollar in the period of 2000-2013. The data used in this study is a secondary data which are time series data, made up of exports, imports, inflation, the BI rate, Gross Domestic Product (GDP), and the money supply (M1) in the quarter base, from first quarter on 2000 to fourth quarter on 2013. Regression model time series data used the ARCH-GARCH with ARCH model selection indicates that the variables that significantly influence the exchange rate are exports, inflation, the central bank rate and the money supply (M1). Whereas import and GDP did not give any influence.


Sign in / Sign up

Export Citation Format

Share Document