Effective resolving of false-alarms in subsequence matching of time-series data: a method and its performance results

ChainLink: Indexing Big Time Series Data For Long Subsequence Matching

2020 IEEE 36th International Conference on Data Engineering (ICDE) ◽

10.1109/icde48307.2020.00052 ◽

2020 ◽

Cited By ~ 1

Author(s):

Noura Alghamdi ◽

Liang Zhang ◽

Huayi Zhang ◽

Elke A. Rundensteiner ◽

Mohamed Y. Eltabakh

Keyword(s):

Time Series ◽

Time Series Data ◽

Series Data ◽

Subsequence Matching

Download Full-text

Subsequence matching on structured time series data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data - SIGMOD '05 ◽

10.1145/1066157.1066235 ◽

2005 ◽

Cited By ~ 30

Author(s):

Huanmei Wu ◽

Betty Salzberg ◽

Gregory C Sharp ◽

Steve B Jiang ◽

Hiroki Shirato ◽

...

Keyword(s):

Time Series ◽

Time Series Data ◽

Series Data ◽

Subsequence Matching

Download Full-text

Teaching Visual Analysis of Time Series Data

Behavioural and Cognitive Psychotherapy ◽

10.1017/s1352465800015101 ◽

1996 ◽

Vol 24 (3) ◽

pp. 247-261 ◽

Cited By ~ 3

Author(s):

Ian A. James ◽

Paul S. Smith ◽

Derek Milne

Keyword(s):

Time Series ◽

Visual Analysis ◽

Time Series Data ◽

Mean Shift ◽

High Sensitivity ◽

Training Programme ◽

High Rate ◽

Series Data ◽

False Alarms ◽

Single Subject

Visual analysis, or “eyeballing”, of single subject (N=l) data is the commonest technique for analysing time series data. The present study examined firstly, psychologists' abilities to determine significant change between baseline (A) and therapeutic (B) phases, and secondly, the decision making process in relation to the visual components of such graphs. Thirdly, it looked at the effect that a training programme had on psychologists' abilities to identify significant A−B change. The results revealed that the participants were poor at identifying significant effects from non-significant changes. In particular, the study found a high rate of false alarms (Type 1 errors), and a low rate of misses (Type 2 errors), i.e. high sensitivity but poor specificity. The only visual components to significantly alter decisions were the degree of serial dependency and the mean shift component. The teaching influenced the participants' judgements. In general, participants became more conservative, but there was limited evidence of a significant improvement in their judgements following the teaching.

Download Full-text

Efficient Subsequence Search on Streaming Data Based on Time Warping Distance

ECTI Transactions on Computer and Information Technology (ECTI-CIT) ◽

10.37936/ecti-cit.201151.54225 ◽

1970 ◽

Vol 5 (1) ◽

pp. 2-8 ◽

Cited By ~ 1

Author(s):

Sura Rodpongpun ◽

Vit Niennattrakul ◽

Chotirat Ann Ratanamahatana

Keyword(s):

Time Series ◽

Similarity Search ◽

Time Series Data ◽

Distance Measure ◽

Computational Cost ◽

Streaming Data ◽

Series Data ◽

Search Problem ◽

Time Warping ◽

Subsequence Matching

Many algorithms have been proposed to deal with subsequence similarity search problem in time series data stream. Dynamic Time Warping (DTW), which has been accepted as the best distance measure in time series similarity search, has been used in many research works. SPRING and its variance were proposed to solve such problem by mitigating the complexity of DTW. Unfortunately, these algorithms produce meaningless result since no normalization is taken into account before the distance calculation. Recently, GPUs and FPGAs were used in similarity search supporting subsequence normalization to reduce the computation complexity, but it is still far from practical use. In this work, we propose a novel Meaningful Subsequence Matching (MSM) algorithm which produces meaningful result in subsequence matching by considering global constraint, uniform scaling, and normalization. Our method significantly outperforms the existing algorithms in terms of both computational cost and accuracy.

Download Full-text

Tsunami arrival time detection system applicable to discontinuous time series data with outliers

Natural Hazards and Earth System Science ◽

10.5194/nhess-16-2603-2016 ◽

2016 ◽

Vol 16 (12) ◽

pp. 2603-2622

Author(s):

Jun-Whan Lee ◽

Sun-Cheon Park ◽

Duk Kee Lee ◽

Jong Ho Lee

Keyword(s):

Time Series ◽

Arrival Time ◽

Time Series Data ◽

Detection System ◽

Detection Algorithm ◽

Series Data ◽

False Alarms ◽

Detection Algorithms ◽

Tsunami Detection ◽

Tsunami Signal

Abstract. Timely detection of tsunamis with water level records is a critical but logistically challenging task because of outliers and gaps. Since tsunami detection algorithms require several hours of past data, outliers could cause false alarms, and gaps can stop the tsunami detection algorithm even after the recording is restarted. In order to avoid such false alarms and time delays, we propose the Tsunami Arrival time Detection System (TADS), which can be applied to discontinuous time series data with outliers. TADS consists of three algorithms, outlier removal, gap filling, and tsunami detection, which are designed to update whenever new data are acquired. After calibrating the thresholds and parameters for the Ulleung-do surge gauge located in the East Sea (Sea of Japan), Korea, the performance of TADS was discussed based on a 1-year dataset with historical tsunamis and synthetic tsunamis. The results show that the overall performance of TADS is effective in detecting a tsunami signal superimposed on both outliers and gaps.

Download Full-text

Simulator Pre-Screening of Underprepared Drivers Prior to Licensing On-Road Examination: Clustering of Virtual Driving Test Time Series Data (Preprint)

10.2196/preprints.13995 ◽

2019 ◽

Author(s):

David Grethlein ◽

Flaura Koplin Winston ◽

Elizabeth Walshe ◽

Sean Tanner ◽

Venk Kandadai ◽

...

Keyword(s):

Time Series ◽

Risk Ratio ◽

Standard Method ◽

Domain Knowledge ◽

Time Series Data ◽

Series Data ◽

Test Time ◽

False Alarms ◽

Clustering Method ◽

Time Series Clustering

BACKGROUND A large Midwestern state commissioned a virtual driving test (VDT) to assess driving skills preparedness before the on-road examination (ORE). Since July 2017, a pilot deployment of the VDT in state licensing centers (VDT pilot) has collected both VDT and ORE data from new license applicants with the aim of creating a scoring algorithm that could predict those who were underprepared. OBJECTIVE Leveraging data collected from the VDT pilot, this study aimed to develop and conduct an initial evaluation of a novel machine learning (ML)–based classifier using limited domain knowledge and minimal feature engineering to reliably predict applicant pass/fail on the ORE. Such methods, if proven useful, could be applicable to the classification of other time series data collected within medical and other settings. METHODS We analyzed an initial dataset that comprised 4308 drivers who completed both the VDT and the ORE, in which 1096 (25.4%) drivers went on to fail the ORE. We studied 2 different approaches to constructing feature sets to use as input to ML algorithms: the standard method of reducing the time series data to a set of manually defined variables that summarize driving behavior and a novel approach using time series clustering. We then fed these representations into different ML algorithms to compare their ability to predict a driver’s ORE outcome (pass/fail). RESULTS The new method using time series clustering performed similarly compared with the standard method in terms of overall accuracy for predicting pass or fail outcome (76.1% vs 76.2%) and area under the curve (0.656 vs 0.682). However, the time series clustering slightly outperformed the standard method in differentially predicting failure on the ORE. The novel clustering method yielded a risk ratio for failure of 3.07 (95% CI 2.75-3.43), whereas the standard variables method yielded a risk ratio for failure of 2.68 (95% CI 2.41-2.99). In addition, the time series clustering method with logistic regression produced the lowest ratio of false alarms (those who were predicted to fail but went on to pass the ORE; 27.2%). CONCLUSIONS Our results provide initial evidence that the clustering method is useful for feature construction in classification tasks involving time series data when resources are limited to create multiple, domain-relevant variables.

Download Full-text

Simulator Pre-Screening of Underprepared Drivers Prior to Licensing On-Road Examination: Clustering of Virtual Driving Test Time Series Data

Journal of Medical Internet Research ◽

10.2196/13995 ◽

2020 ◽

Vol 22 (6) ◽

pp. e13995

Author(s):

David Grethlein ◽

Flaura Koplin Winston ◽

Elizabeth Walshe ◽

Sean Tanner ◽

Venk Kandadai ◽

...

Keyword(s):

Time Series ◽

Risk Ratio ◽

Standard Method ◽

Domain Knowledge ◽

Time Series Data ◽

Series Data ◽

Test Time ◽

False Alarms ◽

Clustering Method ◽

Time Series Clustering

Background A large Midwestern state commissioned a virtual driving test (VDT) to assess driving skills preparedness before the on-road examination (ORE). Since July 2017, a pilot deployment of the VDT in state licensing centers (VDT pilot) has collected both VDT and ORE data from new license applicants with the aim of creating a scoring algorithm that could predict those who were underprepared. Objective Leveraging data collected from the VDT pilot, this study aimed to develop and conduct an initial evaluation of a novel machine learning (ML)–based classifier using limited domain knowledge and minimal feature engineering to reliably predict applicant pass/fail on the ORE. Such methods, if proven useful, could be applicable to the classification of other time series data collected within medical and other settings. Methods We analyzed an initial dataset that comprised 4308 drivers who completed both the VDT and the ORE, in which 1096 (25.4%) drivers went on to fail the ORE. We studied 2 different approaches to constructing feature sets to use as input to ML algorithms: the standard method of reducing the time series data to a set of manually defined variables that summarize driving behavior and a novel approach using time series clustering. We then fed these representations into different ML algorithms to compare their ability to predict a driver’s ORE outcome (pass/fail). Results The new method using time series clustering performed similarly compared with the standard method in terms of overall accuracy for predicting pass or fail outcome (76.1% vs 76.2%) and area under the curve (0.656 vs 0.682). However, the time series clustering slightly outperformed the standard method in differentially predicting failure on the ORE. The novel clustering method yielded a risk ratio for failure of 3.07 (95% CI 2.75-3.43), whereas the standard variables method yielded a risk ratio for failure of 2.68 (95% CI 2.41-2.99). In addition, the time series clustering method with logistic regression produced the lowest ratio of false alarms (those who were predicted to fail but went on to pass the ORE; 27.2%). Conclusions Our results provide initial evidence that the clustering method is useful for feature construction in classification tasks involving time series data when resources are limited to create multiple, domain-relevant variables.

Download Full-text

Natural Language Processing Applied to Reduction of False and Missed Alarms in Kick and Lost Circulation Detection

10.2118/206340-ms ◽

2021 ◽

Author(s):

Michael Yi ◽

Pradeepkumar Ashok ◽

Dawson Ramos ◽

Taylor Thetford ◽

Spencer Bohlander ◽

...

Keyword(s):

Time Series ◽

Natural Language Processing ◽

North America ◽

Natural Language ◽

Language Processing ◽

Time Series Data ◽

Series Data ◽

False Alarms ◽

Lost Circulation ◽

Good Flow

Abstract Kick and lost circulation events are large contributors to non-productive time. Therefore, early detection of these events is crucial. In the absence of good flow in and flow out sensors, pit volume trends offer the best possibility for influx/loss detection, but errors occur since external mud addition /removal to the pits is not monitored or sensed. The goal is to reduce false alarms caused by such mud additions and removal. Data analyzed from over 100s of wells in North America show that mud addition and removal results in certain unique pit volume gain / loss trends, and these trends are quite different from a kick, a lost circulation or a wellbore breathing event trend. Additionally, driller's input text memos into the data aggregation system (EDR) and these memos often provide information with regards to pit operations. In this paper, we introduce a method that utilizes a Bayesian network to aggregate trends detected in time-series data with events identified by natural language processing (NLP) of driller memos critical to greatly improve the accuracy and robustness of kick and lost circulation detection. The methodology was implemented in software that is currently running on rigs in North America. During the test phase, we applied it on several historical wells with lost circulation events and several historical wells with kick events. We were able to identify and quantify the losses even during connections and mud additions, where usually pit volume was increasing despite continual losses. Also, the real-time simultaneous analysis of driller memos provides context to pit volume trends and further reduce the false alarms. The algorithm is also able to take account of pit volume that was reduced due to drilling. Quantification of the losses offers more insight into what lost circulation material to use and the changes in the rate of loss while drilling. This approach was very robust in discovering kicks as well and differentiating it from mud removal and wellbore breathing events. These historical case studies will be detailed in this paper. This is the first time that patterns in mud volume addition and removal detected from time-series data have been used along with driller memos using NLP to reduce false alerts in kick and lost circulation detection. This approach is particularly useful in identifying kick and lost circulation events from pit volume data, especially when good flow in and flow out sensors are not available. The paper provides guidance on how real-time sensor data can be combined with textual data to improve the outputs from an advisory system.

Download Full-text

Graphical Exploratory Data Analysis for Categorical Longitudinal and Time Series Data

PsycEXTRA Dataset ◽

10.1037/e634372013-001 ◽

2013 ◽

Author(s):

Stephen J. Tueller ◽

Richard A. Van Dorn ◽

Georgiy Bobashev ◽

Barry Eggleston

Keyword(s):

Time Series ◽

Data Analysis ◽

Exploratory Data Analysis ◽

Time Series Data ◽

Series Data ◽

Exploratory Data

Download Full-text

Faktor-Faktor Yang Mempengaruhi Nilai Tukar Dollar Amerika Serikat Terhadap Rupiah Tahun 2000–2013

Jurnal Riset Manajemen Sekolah Tinggi Ilmu Ekonomi Widya Wiwaha Program Magister Manajemen ◽

10.32477/jrm.v1i2.72 ◽

2017 ◽

Vol 1 (2) ◽

pp. 177-191

Author(s):

Rizki Rahma Kusumadewi ◽

Wahyu Widayat

Keyword(s):

Time Series ◽

Exchange Rate ◽

Money Supply ◽

Time Series Data ◽

The United States ◽

Economic Conditions ◽

Series Data ◽

Arch Model ◽

United States Dollar ◽

The Exchange Rate

Exchange rate is one tool to measure a country’s economic conditions. The growth of a stable currency value indicates that the country has a relatively good economic conditions or stable. This study has the purpose to analyze the factors that affect the exchange rate of the Indonesian Rupiah against the United States Dollar in the period of 2000-2013. The data used in this study is a secondary data which are time series data, made up of exports, imports, inflation, the BI rate, Gross Domestic Product (GDP), and the money supply (M1) in the quarter base, from first quarter on 2000 to fourth quarter on 2013. Regression model time series data used the ARCH-GARCH with ARCH model selection indicates that the variables that significantly influence the exchange rate are exports, inflation, the central bank rate and the money supply (M1). Whereas import and GDP did not give any influence.

Download Full-text