Cloudet

George Baciu; Chenhui Li; Yunzhe Wang; Xiujun Zhang

doi:10.4018/ijcini.2016010102

Cloudet

International Journal of Cognitive Informatics and Natural Intelligence ◽

10.4018/ijcini.2016010102 ◽

2016 ◽

Vol 10 (1) ◽

pp. 12-31 ◽

Cited By ~ 8

Author(s):

George Baciu ◽

Chenhui Li ◽

Yunzhe Wang ◽

Xiujun Zhang

Keyword(s):

Data Streams ◽

Event Detection ◽

Visual Analytics ◽

Data Stream ◽

Streaming Data ◽

Dynamic Visualization ◽

Cloud Environment ◽

Density Maps ◽

Computing Platforms ◽

Potential Interactions

Streaming data cognition has become a dominant problem in interactive visual analytics for event detection, meteorology, cosmology, security, and smart city applications. In order to interact with streaming data patterns in an elastic cloud environment, we present a new elastic framework for big data visual analytics in the cloud, the Cloudet. The Cloudet is a self-adaptive cloud-based platform that treats both data and compute nodes as elastic objects. The main objective is to readily achieve the scalability and elasticity of cloud computing platforms in order to process large streaming data and adapt to potential interactions between data stream features. Our main contributions include a robust cloud-based framework called the Cloudet. This is a cloud profile manager that attempts to optimize resource parameters in order to achieve expressivity, scalability, reliability, and the proper aggregation of the compute nodes and data streams into several density maps for the purpose of dynamic visualization.

Download Full-text

A Survey of Challenges Facing Streaming Data

Transactions on Machine Learning and Artificial Intelligence ◽

10.14738/tmlai.84.8579 ◽

2020 ◽

Vol 8 (4) ◽

pp. 63-73

Author(s):

Sikha Bagui ◽

Katie Jin

Keyword(s):

Data Reduction ◽

Data Streams ◽

Data Stream ◽

Stream Processing ◽

Streaming Data ◽

Data Detection ◽

Data Stream Processing ◽

The Face ◽

Concept Drifts

This survey performs a thorough enumeration and analysis of existing methods for data stream processing. It is a survey of the challenges facing streaming data. The challenges addressed are preprocessing of streaming data, detection and dealing with concept drifts in streaming data, data reduction in the face of data streams, approximate queries and blocking operations in streaming data.

Download Full-text

EvolveCluster: an evolutionary clustering algorithm for streaming data

Evolving Systems ◽

10.1007/s12530-021-09408-y ◽

2021 ◽

Author(s):

Christian Nordahl ◽

Veselka Boeva ◽

Håkan Grahn ◽

Marie Persson Netz

Keyword(s):

Data Streams ◽

Data Stream ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Streaming Data ◽

Evolutionary Clustering ◽

Stream Clustering ◽

The Past ◽

Data Stream Clustering ◽

Evolving Data

AbstractData has become an integral part of our society in the past years, arriving faster and in larger quantities than before. Traditional clustering algorithms rely on the availability of entire datasets to model them correctly and efficiently. Such requirements are not possible in the data stream clustering scenario, where data arrives and needs to be analyzed continuously. This paper proposes a novel evolutionary clustering algorithm, entitled EvolveCluster, capable of modeling evolving data streams. We compare EvolveCluster against two other evolutionary clustering algorithms, PivotBiCluster and Split-Merge Evolutionary Clustering, by conducting experiments on three different datasets. Furthermore, we perform additional experiments on EvolveCluster to further evaluate its capabilities on clustering evolving data streams. Our results show that EvolveCluster manages to capture evolving data stream behaviors and adapts accordingly.

Download Full-text

TADILOF: Time Aware Density-Based Incremental Local Outlier Detection in Data Streams

Sensors ◽

10.3390/s20205829 ◽

2020 ◽

Vol 20 (20) ◽

pp. 5829 ◽

Cited By ~ 1

Author(s):

Jen-Wei Huang ◽

Meng-Xun Zhong ◽

Bijay Prasad Jaysawal

Keyword(s):

Outlier Detection ◽

Data Streams ◽

Data Stream ◽

State Of The Art ◽

Streaming Data ◽

Current State ◽

Data Points ◽

Local Outlier ◽

Time Aware ◽

Over Time

Outlier detection in data streams is crucial to successful data mining. However, this task is made increasingly difficult by the enormous growth in the quantity of data generated by the expansion of Internet of Things (IoT). Recent advances in outlier detection based on the density-based local outlier factor (LOF) algorithms do not consider variations in data that change over time. For example, there may appear a new cluster of data points over time in the data stream. Therefore, we present a novel algorithm for streaming data, referred to as time-aware density-based incremental local outlier detection (TADILOF) to overcome this issue. In addition, we have developed a means for estimating the LOF score, termed "approximate LOF," based on historical information following the removal of outdated data. The results of experiments demonstrate that TADILOF outperforms current state-of-the-art methods in terms of AUC while achieving similar performance in terms of execution time. Moreover, we present an application of the proposed scheme to the development of an air-quality monitoring system.

Download Full-text

Anomaly Pattern Detection in Streaming Data Based on the Transformation to Multiple Binary-Valued Data Streams

Journal of Artificial Intelligence and Soft Computing Research ◽

10.2478/jaiscr-2022-0002 ◽

2021 ◽

Vol 12 (1) ◽

pp. 19-27

Author(s):

Taegong Kim ◽

Cheong Hee Park

Keyword(s):

Outlier Detection ◽

Data Streams ◽

Data Stream ◽

Detection Method ◽

Binary Classification ◽

Streaming Data ◽

Pattern Detection ◽

Detection Methods ◽

Anomaly Pattern ◽

Isolation Forest

Abstract Anomaly pattern detection in a data stream aims to detect a time point where outliers begin to occur abnormally. Recently, a method for anomaly pattern detection has been proposed based on binary classification for outliers and statistical tests in the data stream of binary labels of normal or an outlier. It showed that an anomaly pattern can be detected accurately even when outlier detection performance is relatively low. However, since the anomaly pattern detection method is based on the binary classification for outliers, most well-known outlier detection methods, with the output of real-valued outlier scores, can not be used directly. In this paper, we propose an anomaly pattern detection method in a data stream using the transformation to multiple binary-valued data streams from real-valued outlier scores. By using three outlier detection methods, Isolation Forest(IF), Autoencoder-based outlier detection, and Local outlier factor(LOF), the proposed anomaly pattern detection method is tested using artificial and real data sets. The experimental results show that anomaly pattern detection using Isolation Forest gives the best performance.

Download Full-text

TIFIM: Tree based Incremental Frequent Itemset Mining over Streaming Data

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v10i5.4149 ◽

2013 ◽

Vol 10 (5) ◽

pp. 1580-1586

Author(s):

V.sidda Reddy ◽

Dr T.V. Rao ◽

Dr A. Govardhan

Keyword(s):

Data Streams ◽

Data Stream ◽

Streaming Data ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Proposed Model ◽

Mining Model ◽

Mining Algorithms ◽

Memory Efficient

Data Stream Mining algorithms performs under constraints called space used and time taken, which is due to the streaming property. The relaxation in these constraints is inversely proportional to the streaming speed of the data. Since the caching and mining the streaming-data is sensitive, here in this paper a scalable, memory efficient caching and frequent itemset mining model is devised. The proposed model is an incremental approach that builds single level multi node trees called bushes from each window of the streaming data; henceforth we refer this proposed algorithm as a Tree (bush) based Incremental Frequent Itemset Mining (TIFIM) over data streams.

Download Full-text

A Framework for Visual Analytics of Spatio-Temporal Sensor Observations from Data Streams

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi7120475 ◽

2018 ◽

Vol 7 (12) ◽

pp. 475 ◽

Cited By ~ 4

Author(s):

Bolelang H Sibolla ◽

Serena Coetzee ◽

Terence L Van Zyl

Keyword(s):

Data Streams ◽

High Frequency ◽

Visual Analytics ◽

Domain Knowledge ◽

User Model ◽

Streaming Data ◽

Design Model ◽

Observation Data ◽

Spatio Temporal ◽

Frequency Changes

Sensor networks generate substantial amounts of frequently updated, highly dynamic data that are transmitted as packets in a data stream. The high frequency and continuous unbound nature of data streams leads to challenges when deriving knowledge from the underlying observations. This paper presents (1) a state of the art review into visual analytics of geospatial, spatio-temporal streaming data, and (2) proposes a framework based on the identified gaps from the review. The framework consists of (1) the data model that characterizes the sensor observation data, (2) the user model, which addresses the user queries and manages domain knowledge, (3) the design model, which handles the patterns that can be uncovered from the data and corresponding visualizations, and (4) the visualization model, which handles the rendering of the data. The conclusion from the visualization model is that streaming sensor observations require tools that can handle multivariate, multiscale, and time series displays. The design model reveals that the most useful patterns are those that show relationships, anomalies, and aggregations of the data. The user model highlights the need for handling missing data, dealing with high frequency changes, as well as the ability to review retrospective changes.

Download Full-text

DETECTION AND CLASSIFICATION OF CHANGES IN EVOLVING DATA STREAMS

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622006002179 ◽

2006 ◽

Vol 05 (04) ◽

pp. 659-670 ◽

Cited By ~ 26

Author(s):

MOHAMED MEDHAT GABER ◽

PHILIP S. YU

Keyword(s):

Data Streams ◽

Data Stream ◽

Weather Conditions ◽

High Volume ◽

Streaming Data ◽

Wide Range ◽

Change Characteristics ◽

History Of ◽

Scientific Phenomena

Data stream mining has attracted considerable attention over the past few years owing to the significance of its applications. Streaming data is often evolving over time. Capturing changes could be used for detecting an event or a phenomenon in various applications. Weather conditions, economical changes, astronomical, and scientific phenomena are among a wide range of applications. Because of the high volume and speed of data streams, it is computationally hard to capture these changes from raw data in real-time. In this paper, we propose a novel algorithm that we term as STREAM-DETECT to capture these changes in data stream distribution and/or domain using clustering result deviation. STREAM-DETECT is followed by a process of offline classification CHANGE-CLASS. This classification is concerned with the association of the history of change characteristics with the observed event or phenomenon. Experimental results show the efficiency of the proposed framework in both detecting the changes and classification accuracy.

Download Full-text

Simulation Study on the Electricity Data Streams Time Series Clustering

Energies ◽

10.3390/en13040924 ◽

2020 ◽

Vol 13 (4) ◽

pp. 924 ◽

Cited By ~ 1

Author(s):

Krzysztof Gajowniczek ◽

Marcin Bator ◽

Tomasz Ząbkowski ◽

Arkadiusz Orłowski ◽

Chu Kiong Loo

Keyword(s):

Time Series ◽

Data Streams ◽

Data Stream ◽

Rapid Development ◽

Traffic Monitoring ◽

Streaming Data ◽

Energy Regulation ◽

Multiple Sources ◽

Data Set ◽

Clustering Data

Currently, thanks to the rapid development of wireless sensor networks and network traffic monitoring, the data stream is gradually becoming one of the most popular data generating processes. The data stream is different from traditional static data. Cluster analysis is an important technology for data mining, which is why many researchers pay attention to grouping streaming data. In the literature, there are many data stream clustering techniques, unfortunately, very few of them try to solve the problem of clustering data streams coming from multiple sources. In this article, we present an algorithm with a tree structure for grouping data streams (in the form of a time series) that have similar properties and behaviors. We have evaluated our algorithm over real multivariate data streams generated by smart meter sensors—the Irish Commission for Energy Regulation data set. There were several measures used to analyze the various characteristics of a tree-like clustering structure (computer science perspective) and also measures that are important from a business standpoint. The proposed method was able to cluster the flows of data and has identified the customers with similar behavior during the analyzed period.

Download Full-text

A State-of-the-Art Review of Data Stream Anonymization Schemes

Information Security in Diverse Computing Environments - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-4666-6158-5.ch003 ◽

2014 ◽

pp. 24-50 ◽

Cited By ~ 1

Author(s):

Aderonke B. Sakpere ◽

Anne V. D. M. Kayem

Keyword(s):

Real Time ◽

Data Streams ◽

Data Stream ◽

Data Privacy ◽

Streaming Data ◽

Continuous Data ◽

Privacy Concerns ◽

Useful Knowledge ◽

Transient Nature ◽

Address Data

Streaming data emerges from different electronic sources and needs to be processed in real time with minimal delay. Data streams can generate hidden and useful knowledge patterns when mined and analyzed. In spite of these benefits, the issue of privacy needs to be addressed before streaming data is released for mining and analysis purposes. In order to address data privacy concerns, several techniques have emerged. K-anonymity has received considerable attention over other privacy preserving techniques because of its simplicity and efficiency in protecting data. Yet, k-anonymity cannot be directly applied on continuous data (data streams) because of its transient nature. In this chapter, the authors discuss the challenges faced by k-anonymity algorithms in enforcing privacy on data streams and review existing privacy techniques for handling data streams.

Download Full-text

Improving OLAP Analysis of Multidimensional Data Streams via Efficient Compression Techniques

Intelligent Techniques for Warehousing and Mining Sensor Network Data ◽

10.4018/978-1-60566-328-9.ch002 ◽

2010 ◽

pp. 17-49 ◽

Cited By ~ 4

Author(s):

Alfredo Cuzzocrea ◽

Filippo Furfaro ◽

Elio Masciari ◽

Domenico Saccà

Keyword(s):

Data Streams ◽

Data Stream ◽

Real Life ◽

Main Idea ◽

Curse Of Dimensionality ◽

Data Cube ◽

Streaming Data ◽

Multidimensional Data ◽

Multi Level ◽

Olap Analysis

Sensor networks represent a leading case of data stream sources coming from real-life application scenarios. Sensors are non-reactive elements which are used to monitor real-life phenomena, such as live weather conditions, network traffic etc. They are usually organized into networks where their readings are transmitted using low level protocols. A relevant problem in dealing with data streams consists in the fact that they are intrinsically multi-level and multidimensional in nature, so that they require to be analyzed by means of a multi-level and a multi-resolution (analysis) model accordingly, like OLAP, beyond traditional solutions provided by primitive SQL-based DBMS interfaces. Despite this, a significant issue in dealing with OLAP is represented by the so-called curse of dimensionality problem, which consists in the fact that, when the number of dimensions of the target data cube increases, multidimensional data cannot be accessed and queried efficiently, due to their enormous size. Starting from this practical evidence, several data cube compression techniques have been proposed during the last years, with alternate fortune. Briefly, the main idea of these techniques consists in computing compressed representations of input data cubes in order to evaluate time-consuming OLAP queries against them, thus supobtaining approximate answers. Similarly to static data, approximate query answering techniques can be applied to streaming data, in order to improve OLAP analysis of such kind of data. Unfortunately, the data cube compression computational paradigm gets worse when OLAP aggregations are computed on top of a continuously flooding multidimensional data stream. In order to efficiently deal with the curse of dimensionality problem and achieve high efficiency in processing and querying multidimensional data streams, thus efficiently supporting OLAP analysis of such kind of data, in this chapter we propose novel compression techniques over data stream readings that are materialized for OLAP purposes. This allows us to tame the unbounded nature of streaming data, thus dealing with bounded memory issues exposed by conventional DBMS tools. Overall, in this chapter we introduce an innovative, complex technique for efficiently supporting OLAP analysis of multidimensional data streams.

Download Full-text