Distributed Publish/Subscribe Query Processing on the Spatio-Textual Data Stream

Due to the great progress of computer technology and mature development of network, more and more data are generated and distributed through the network, which is called data streams. During the last couple of years, a number of researchers have paid their attention to data stream management, which is different from the conventional database management. At present, the new type of data management system, called data stream management system (DSMS), has become one of the most popular research areas in data engineering field. Lots of research projects have made great progress in this area. Since the current DSMS does not support queries on sequence data, this project will study the issues related to two types of data. First, we will focus on the content filtering on single-attribute streams, such as sensor data. Second, we will focus on multi-attribute streams, such as video films. We will discuss the related issues such as how to build an efficient index for all queries of different streams and the corresponding query processing mechanisms.

Download Full-text

Correction to: Semantic annotation of summarized sensor data stream for effective query processing

The Journal of Supercomputing ◽

10.1007/s11227-017-2212-6 ◽

2018 ◽

Vol 76 (6) ◽

pp. 4040-4040

Author(s):

Shobharani Pacha ◽

Suresh Ramalingam Murugan ◽

R. Sethukarasi

Keyword(s):

Query Processing ◽

Data Stream ◽

Semantic Annotation ◽

Sensor Data

Download Full-text

The Social Spiders in the Clustering of Texts

International Journal of Artificial Life Research ◽

10.4018/jalr.2012070101 ◽

2012 ◽

Vol 3 (3) ◽

pp. 1-14 ◽

Cited By ~ 7

Author(s):

Reda Mohamed Hamou ◽

Abdelmalek Amine ◽

Ahmed Chaouki Lokbani

Keyword(s):

Data Stream ◽

Combinatorial Problem ◽

Large Data ◽

Social Spiders ◽

Stream Flows ◽

The Social ◽

Textual Data ◽

Biomimetic Approach ◽

N Gram

In this paper the authors experiment and test a new biomimetic approach based on social spiders to solve a combinatorial problem ie the automatic classification of texts because a very large data stream flows and particularly on the web. Representation of textual data was performed by a method independent of the language ie n-gram characters and words because there is currently no method of learning that can directly represent unstructured data (text). To validate the classification, the authors used a measure of evaluation based on recall and precision (F-measure). During the experiment, the authors found a powerful visualization tool in social spiders that they exploit to make visual classification.

Download Full-text

A Method for Processing Top-k Continuous Query on Uncertain Data Stream in Sliding Window Model

WSEAS TRANSACTIONS ON SYSTEMS AND CONTROL ◽

10.37394/23203.2021.16.22 ◽

2021 ◽

Vol 16 ◽

pp. 261-269

Author(s):

Raja Azhan Syah Raja Wahab ◽

Siti Nurulain Mohd Rum ◽

Hamidah Ibrahim ◽

Fatimah Sidi ◽

Iskandar Ishak

Keyword(s):

Query Processing ◽

Data Streams ◽

Data Stream ◽

Uncertain Data ◽

Research Work ◽

Computational Cost ◽

Sliding Window ◽

Possible World ◽

Processing Methods ◽

Uncertain Data Streams

The data stream is a series of data generated at sequential time from different sources. Processing such data is very important in many contemporary applications such as sensor networks, RFID technology, mobile computing and many more. The huge amount data generated and frequent changes in a short time makes the conventional processing methods insufficient. The Sliding Window Model (SWM) was introduced by Datar et. al to handle this problem. Avoiding multiple scans of the whole data sets, optimizing memory usage, and processing only the most recent tuple are the main challenges. The number of possible world instances grows exponentially in uncertain data and it is highly difficult to comprehend what it takes to meet Top-k query processing in the shortest amount of time. Following the generation of rules and the probability theory of this model, a framework was anticipated to sustain top-k processing algorithm over the SWM approach until the candidates expired. Based on the literature review study, none of the existing work have been made to tackle the issue arises from the top-k query processing of the possible world instance of the uncertain data streams within the SWM. The major issue resulted from these scenarios need to be addressed especially in the computation redundancy area that contributed to the increases of computational cost within the SWM. Therefore, the main objective of this research work is to propose the top-k query processing methods over uncertain data streams in SWM utilizing the score and the Possible World (PW) setting. In this study, a novel expiration and object indexing method is introduced to address the computational redundancy issues. We believed the proposed method can reduce computational costs and by managing insertion and exit policy on the right tuple candidates within a specified window frame. This research work will contribute to the area of computational query processing.

Download Full-text

AIS-Clus: A Bio-Inspired Method for Textual Data Stream Clustering

Vietnam Journal of Computer Science ◽

10.1142/s2196888819500143 ◽

2019 ◽

Vol 06 (02) ◽

pp. 223-256

Author(s):

Amal Abid ◽

Salma Jamoussi ◽

Abdelmajid Ben Hamadou

Keyword(s):

Data Mining ◽

Data Stream ◽

Concept Drift ◽

Novelty Detection ◽

Infinite Length ◽

Stream Clustering ◽

Textual Data ◽

Data Stream Clustering ◽

Feature Evolution ◽

Over Time

The spread of real-time applications has led to a huge amount of data shared between users. This vast volume of data rapidly evolving over time is referred to as data stream. Clustering and processing such data poses many challenges to the data mining community. Indeed, traditional data mining techniques become unfeasible to mine such a continuous flow of data where characteristics, features, and concepts are rapidly changing over time. This paper presents a novel method for data stream clustering. In this context, major challenges of data stream processing are addressed, namely, infinite length, concept drift, novelty detection, and feature evolution. To handle these issues, the proposed method uses the Artificial Immune System (AIS) meta-heuristic. The latter has been widely used for data mining tasks and it owns the property of adaptability required by data stream clustering algorithms. Our method, called AIS-Clus, is able to detect novel concepts using the performance of the learning process of the AIS meta-heuristic. Furthermore, AIS-Clus has the ability to adapt its model to handle concept drift and feature evolution for textual data streams. Experimental results have been performed on textual datasets where efficient and promising results are obtained.

Download Full-text

Unbounded Spatial Data Stream Query Processing using Spatial Semijoins

Journal of Ubiquitous Systems and Pervasive Networks ◽

10.5383/juspn.15.02.005 ◽

2021 ◽

Vol 15 (02) ◽

pp. 33-41

Author(s):

Wendy Osborn

Keyword(s):

Query Processing ◽

Data Streams ◽

Spatial Data ◽

Data Stream ◽

Spatial Extent ◽

Common Region ◽

Data Set ◽

Spatial Join ◽

The Common ◽

Spatial Data Stream

In this paper, the problem of query processing in spatial data streams is explored, with a focus on the spatial join operation. Although the spatial join has been utilized in many proposed centralized and distributed query processing strategies, for its application to spatial data streams the spatial join operation has received very little attention. One identified limitation with existing strategies is that a bounded region of space (i.e., spatial extent) from which the spatial objects are generated needs to be known in advance. However, this information may not be available. Therefore, two strategies for spatial data stream join processing are proposed where the spatial extent of the spatial object stream is not required to be known in advance. Both strategies estimate the common region that is shared by two or more spatial data streams in order to process the spatial join. An evaluation of both strategies includes a comparison with a recently proposed approach in which the spatial extent of the data set is known. Experimental results show that one of the strategies performs very well at estimating the common region of space using only incoming objects on the spatial data streams. Other limitations of this work are also identified.

Download Full-text