Cloudet

Author(s):  
George Baciu ◽  
Chenhui Li ◽  
Yunzhe Wang ◽  
Xiujun Zhang

Streaming data cognition has become a dominant problem in interactive visual analytics for event detection, meteorology, cosmology, security, and smart city applications. In order to interact with streaming data patterns in an elastic cloud environment, we present a new elastic framework for big data visual analytics in the cloud, the Cloudet. The Cloudet is a self-adaptive cloud-based platform that treats both data and compute nodes as elastic objects. The main objective is to readily achieve the scalability and elasticity of cloud computing platforms in order to process large streaming data and adapt to potential interactions between data stream features. Our main contributions include a robust cloud-based framework called the Cloudet. This is a cloud profile manager that attempts to optimize resource parameters in order to achieve expressivity, scalability, reliability, and the proper aggregation of the compute nodes and data streams into several density maps for the purpose of dynamic visualization.

2020 ◽  
Vol 8 (4) ◽  
pp. 63-73
Author(s):  
Sikha Bagui ◽  
Katie Jin

This survey performs a thorough enumeration and analysis of existing methods for data stream processing. It is a survey of the challenges facing streaming data. The challenges addressed are preprocessing of streaming data, detection and dealing with concept drifts in streaming data, data reduction in the face of data streams, approximate queries and blocking operations in streaming data.


2021 ◽  
Author(s):  
Christian Nordahl ◽  
Veselka Boeva ◽  
Håkan Grahn ◽  
Marie Persson Netz

AbstractData has become an integral part of our society in the past years, arriving faster and in larger quantities than before. Traditional clustering algorithms rely on the availability of entire datasets to model them correctly and efficiently. Such requirements are not possible in the data stream clustering scenario, where data arrives and needs to be analyzed continuously. This paper proposes a novel evolutionary clustering algorithm, entitled EvolveCluster, capable of modeling evolving data streams. We compare EvolveCluster against two other evolutionary clustering algorithms, PivotBiCluster and Split-Merge Evolutionary Clustering, by conducting experiments on three different datasets. Furthermore, we perform additional experiments on EvolveCluster to further evaluate its capabilities on clustering evolving data streams. Our results show that EvolveCluster manages to capture evolving data stream behaviors and adapts accordingly.


Sensors ◽  
2020 ◽  
Vol 20 (20) ◽  
pp. 5829 ◽  
Author(s):  
Jen-Wei Huang ◽  
Meng-Xun Zhong ◽  
Bijay Prasad Jaysawal

Outlier detection in data streams is crucial to successful data mining. However, this task is made increasingly difficult by the enormous growth in the quantity of data generated by the expansion of Internet of Things (IoT). Recent advances in outlier detection based on the density-based local outlier factor (LOF) algorithms do not consider variations in data that change over time. For example, there may appear a new cluster of data points over time in the data stream. Therefore, we present a novel algorithm for streaming data, referred to as time-aware density-based incremental local outlier detection (TADILOF) to overcome this issue. In addition, we have developed a means for estimating the LOF score, termed "approximate LOF," based on historical information following the removal of outdated data. The results of experiments demonstrate that TADILOF outperforms current state-of-the-art methods in terms of AUC while achieving similar performance in terms of execution time. Moreover, we present an application of the proposed scheme to the development of an air-quality monitoring system.


Author(s):  
Taegong Kim ◽  
Cheong Hee Park

Abstract Anomaly pattern detection in a data stream aims to detect a time point where outliers begin to occur abnormally. Recently, a method for anomaly pattern detection has been proposed based on binary classification for outliers and statistical tests in the data stream of binary labels of normal or an outlier. It showed that an anomaly pattern can be detected accurately even when outlier detection performance is relatively low. However, since the anomaly pattern detection method is based on the binary classification for outliers, most well-known outlier detection methods, with the output of real-valued outlier scores, can not be used directly. In this paper, we propose an anomaly pattern detection method in a data stream using the transformation to multiple binary-valued data streams from real-valued outlier scores. By using three outlier detection methods, Isolation Forest(IF), Autoencoder-based outlier detection, and Local outlier factor(LOF), the proposed anomaly pattern detection method is tested using artificial and real data sets. The experimental results show that anomaly pattern detection using Isolation Forest gives the best performance.


2013 ◽  
Vol 10 (5) ◽  
pp. 1580-1586
Author(s):  
V.sidda Reddy ◽  
Dr T.V. Rao ◽  
Dr A. Govardhan

Data Stream Mining algorithms performs under constraints called space used and time taken, which is due to the streaming property. The relaxation in these constraints is inversely proportional to the streaming speed of the data. Since the caching and mining the streaming-data is sensitive, here in this paper a scalable, memory efficient caching and frequent itemset mining model is devised. The proposed model is an incremental approach that builds single level multi node trees called bushes from each window of the streaming data; henceforth we refer this proposed algorithm as a Tree (bush) based Incremental Frequent Itemset Mining (TIFIM) over data streams.


2018 ◽  
Vol 7 (12) ◽  
pp. 475 ◽  
Author(s):  
Bolelang H Sibolla ◽  
Serena Coetzee ◽  
Terence L Van Zyl

Sensor networks generate substantial amounts of frequently updated, highly dynamic data that are transmitted as packets in a data stream. The high frequency and continuous unbound nature of data streams leads to challenges when deriving knowledge from the underlying observations. This paper presents (1) a state of the art review into visual analytics of geospatial, spatio-temporal streaming data, and (2) proposes a framework based on the identified gaps from the review. The framework consists of (1) the data model that characterizes the sensor observation data, (2) the user model, which addresses the user queries and manages domain knowledge, (3) the design model, which handles the patterns that can be uncovered from the data and corresponding visualizations, and (4) the visualization model, which handles the rendering of the data. The conclusion from the visualization model is that streaming sensor observations require tools that can handle multivariate, multiscale, and time series displays. The design model reveals that the most useful patterns are those that show relationships, anomalies, and aggregations of the data. The user model highlights the need for handling missing data, dealing with high frequency changes, as well as the ability to review retrospective changes.


Author(s):  
MOHAMED MEDHAT GABER ◽  
PHILIP S. YU

Data stream mining has attracted considerable attention over the past few years owing to the significance of its applications. Streaming data is often evolving over time. Capturing changes could be used for detecting an event or a phenomenon in various applications. Weather conditions, economical changes, astronomical, and scientific phenomena are among a wide range of applications. Because of the high volume and speed of data streams, it is computationally hard to capture these changes from raw data in real-time. In this paper, we propose a novel algorithm that we term as STREAM-DETECT to capture these changes in data stream distribution and/or domain using clustering result deviation. STREAM-DETECT is followed by a process of offline classification CHANGE-CLASS. This classification is concerned with the association of the history of change characteristics with the observed event or phenomenon. Experimental results show the efficiency of the proposed framework in both detecting the changes and classification accuracy.


Energies ◽  
2020 ◽  
Vol 13 (4) ◽  
pp. 924 ◽  
Author(s):  
Krzysztof Gajowniczek ◽  
Marcin Bator ◽  
Tomasz Ząbkowski ◽  
Arkadiusz Orłowski ◽  
Chu Kiong Loo

Currently, thanks to the rapid development of wireless sensor networks and network traffic monitoring, the data stream is gradually becoming one of the most popular data generating processes. The data stream is different from traditional static data. Cluster analysis is an important technology for data mining, which is why many researchers pay attention to grouping streaming data. In the literature, there are many data stream clustering techniques, unfortunately, very few of them try to solve the problem of clustering data streams coming from multiple sources. In this article, we present an algorithm with a tree structure for grouping data streams (in the form of a time series) that have similar properties and behaviors. We have evaluated our algorithm over real multivariate data streams generated by smart meter sensors—the Irish Commission for Energy Regulation data set. There were several measures used to analyze the various characteristics of a tree-like clustering structure (computer science perspective) and also measures that are important from a business standpoint. The proposed method was able to cluster the flows of data and has identified the customers with similar behavior during the analyzed period.


Author(s):  
Aderonke B. Sakpere ◽  
Anne V. D. M. Kayem

Streaming data emerges from different electronic sources and needs to be processed in real time with minimal delay. Data streams can generate hidden and useful knowledge patterns when mined and analyzed. In spite of these benefits, the issue of privacy needs to be addressed before streaming data is released for mining and analysis purposes. In order to address data privacy concerns, several techniques have emerged. K-anonymity has received considerable attention over other privacy preserving techniques because of its simplicity and efficiency in protecting data. Yet, k-anonymity cannot be directly applied on continuous data (data streams) because of its transient nature. In this chapter, the authors discuss the challenges faced by k-anonymity algorithms in enforcing privacy on data streams and review existing privacy techniques for handling data streams.


Author(s):  
Alfredo Cuzzocrea ◽  
Filippo Furfaro ◽  
Elio Masciari ◽  
Domenico Saccà

Sensor networks represent a leading case of data stream sources coming from real-life application scenarios. Sensors are non-reactive elements which are used to monitor real-life phenomena, such as live weather conditions, network traffic etc. They are usually organized into networks where their readings are transmitted using low level protocols. A relevant problem in dealing with data streams consists in the fact that they are intrinsically multi-level and multidimensional in nature, so that they require to be analyzed by means of a multi-level and a multi-resolution (analysis) model accordingly, like OLAP, beyond traditional solutions provided by primitive SQL-based DBMS interfaces. Despite this, a significant issue in dealing with OLAP is represented by the so-called curse of dimensionality problem, which consists in the fact that, when the number of dimensions of the target data cube increases, multidimensional data cannot be accessed and queried efficiently, due to their enormous size. Starting from this practical evidence, several data cube compression techniques have been proposed during the last years, with alternate fortune. Briefly, the main idea of these techniques consists in computing compressed representations of input data cubes in order to evaluate time-consuming OLAP queries against them, thus supobtaining approximate answers. Similarly to static data, approximate query answering techniques can be applied to streaming data, in order to improve OLAP analysis of such kind of data. Unfortunately, the data cube compression computational paradigm gets worse when OLAP aggregations are computed on top of a continuously flooding multidimensional data stream. In order to efficiently deal with the curse of dimensionality problem and achieve high efficiency in processing and querying multidimensional data streams, thus efficiently supporting OLAP analysis of such kind of data, in this chapter we propose novel compression techniques over data stream readings that are materialized for OLAP purposes. This allows us to tame the unbounded nature of streaming data, thus dealing with bounded memory issues exposed by conventional DBMS tools. Overall, in this chapter we introduce an innovative, complex technique for efficiently supporting OLAP analysis of multidimensional data streams.


Sign in / Sign up

Export Citation Format

Share Document