PHUIMUS: A Potential High Utility Itemsets Mining Algorithm Based on Stream Data with Uncertainty

Mathematical Problems in Engineering ◽

10.1155/2017/8576829 ◽

2017 ◽

Vol 2017 ◽

pp. 1-13 ◽

Cited By ~ 2

Author(s):

Ju Wang ◽

Fuxian Liu ◽

Chunjie Jin

Keyword(s):

Data Stream ◽

Uncertain Data ◽

Real Life ◽

Stream Data ◽

Memory Consumption ◽

Sliding Windows ◽

Uncertain Data Streams ◽

High Utility ◽

High Utility Itemsets ◽

Weighted Probability

High utility itemsets (HUIs) mining has been a hot topic recently, which can be used to mine the profitable itemsets by considering both the quantity and profit factors. Up to now, researches on HUIs mining over uncertain datasets and data stream had been studied respectively. However, to the best of our knowledge, the issue of HUIs mining over uncertain data stream is seldom studied. In this paper, PHUIMUS (potential high utility itemsets mining over uncertain data stream) algorithm is proposed to mine potential high utility itemsets (PHUIs) that represent the itemsets with high utilities and high existential probabilities over uncertain data stream based on sliding windows. To realize the algorithm, potential utility list over uncertain data stream (PUS-list) is designed to mine PHUIs without rescanning the analyzed uncertain data stream. And transaction weighted probability and utility tree (TWPUS-tree) over uncertain data stream is also designed to decrease the number of candidate itemsets generated by the PHUIMUS algorithm. Substantial experiments are conducted in terms of run-time, number of discovered PHUIs, memory consumption, and scalability on real-life and synthetic databases. The results show that our proposed algorithm is reasonable and acceptable for mining meaningful PHUIs from uncertain data streams.

Download Full-text

Mining top-k high-utility itemsets from a data stream under sliding window model

Applied Intelligence ◽

10.1007/s10489-017-0939-7 ◽

2017 ◽

Vol 47 (4) ◽

pp. 1240-1255 ◽

Cited By ~ 12

Author(s):

Siddharth Dawar ◽

Veronica Sharma ◽

Vikram Goyal

Keyword(s):

Data Stream ◽

Sliding Window ◽

High Utility ◽

High Utility Itemsets

Download Full-text

Efficient Mining of Uncertain Data for High-Utility Itemsets

Web-Age Information Management - Lecture Notes in Computer Science ◽

10.1007/978-3-319-39937-9_2 ◽

2016 ◽

pp. 17-30

Author(s):

Jerry Chun-Wei Lin ◽

Wensheng Gan ◽

Philippe Fournier-Viger ◽

Tzung-Pei Hong ◽

Vincent S. Tseng

Keyword(s):

Uncertain Data ◽

High Utility ◽

High Utility Itemsets

Download Full-text

Clustering on Uncertain Data Stream over Sliding Windows

2015 Third International Conference on Advanced Cloud and Big Data ◽

10.1109/cbd.2015.32 ◽

2015 ◽

Author(s):

Li Tu

Keyword(s):

Data Stream ◽

Uncertain Data ◽

Sliding Windows

Download Full-text

A Method for Processing Top-k Continuous Query on Uncertain Data Stream in Sliding Window Model

WSEAS TRANSACTIONS ON SYSTEMS AND CONTROL ◽

10.37394/23203.2021.16.22 ◽

2021 ◽

Vol 16 ◽

pp. 261-269

Author(s):

Raja Azhan Syah Raja Wahab ◽

Siti Nurulain Mohd Rum ◽

Hamidah Ibrahim ◽

Fatimah Sidi ◽

Iskandar Ishak

Keyword(s):

Query Processing ◽

Data Streams ◽

Data Stream ◽

Uncertain Data ◽

Research Work ◽

Computational Cost ◽

Sliding Window ◽

Possible World ◽

Processing Methods ◽

Uncertain Data Streams

The data stream is a series of data generated at sequential time from different sources. Processing such data is very important in many contemporary applications such as sensor networks, RFID technology, mobile computing and many more. The huge amount data generated and frequent changes in a short time makes the conventional processing methods insufficient. The Sliding Window Model (SWM) was introduced by Datar et. al to handle this problem. Avoiding multiple scans of the whole data sets, optimizing memory usage, and processing only the most recent tuple are the main challenges. The number of possible world instances grows exponentially in uncertain data and it is highly difficult to comprehend what it takes to meet Top-k query processing in the shortest amount of time. Following the generation of rules and the probability theory of this model, a framework was anticipated to sustain top-k processing algorithm over the SWM approach until the candidates expired. Based on the literature review study, none of the existing work have been made to tackle the issue arises from the top-k query processing of the possible world instance of the uncertain data streams within the SWM. The major issue resulted from these scenarios need to be addressed especially in the computation redundancy area that contributed to the increases of computational cost within the SWM. Therefore, the main objective of this research work is to propose the top-k query processing methods over uncertain data streams in SWM utilizing the score and the Possible World (PW) setting. In this study, a novel expiration and object indexing method is introduced to address the computational redundancy issues. We believed the proposed method can reduce computational costs and by managing insertion and exit policy on the right tuple candidates within a specified window frame. This research work will contribute to the area of computational query processing.

Download Full-text

Time-Fading Based High Utility Pattern Mining from Uncertain Data Streams

Smart Innovation, Systems and Technologies - Advanced Computing, Networking and Informatics- Volume 1 ◽

10.1007/978-3-319-07353-8_61 ◽

2014 ◽

pp. 529-536 ◽

Cited By ~ 2

Author(s):

Chiranjeevi Manike ◽

Hari Om

Keyword(s):

Data Streams ◽

Pattern Mining ◽

Uncertain Data ◽

Uncertain Data Streams ◽

High Utility

Download Full-text

HUPSMT: AN EFFICIENT ALGORITHM FOR MINING HIGH UTILITY-PROBABILITY SEQUENCES IN UNCERTAIN DATABASES WITH MULTIPLE MINIMUM UTILITY THRESHOLDS

Journal of Computer Science and Cybernetics ◽

10.15625/1813-9663/35/1/13234 ◽

2019 ◽

Vol 35 (1) ◽

pp. 1-20 ◽

Cited By ~ 2

Author(s):

Truong Chi Tin ◽

Tran Ngoc Anh ◽

Duong Van Hai ◽

Le Hoai Bac

Keyword(s):

Experimental Study ◽

Real World ◽

Efficient Algorithm ◽

Real Life ◽

Sequence Mining ◽

Memory Consumption ◽

Uncertain Databases ◽

The Real ◽

Low Probability ◽

High Utility

The problem of high utility sequence mining (HUSM) in quantitative se-quence databases (QSDBs) is more general than that of frequent sequence mining in se-quence databases. An important limitation of HUSM is that a user-predened minimum tility threshold is used commonly to decide if a sequence is high utility. However, this is not convincing in many real-life applications as sequences may have diferent importance. Another limitation of HUSM is that data in QSDBs are assumed to be precise. But in the real world, collected data such as by sensor maybe uncertain. Thus, this paper proposes a framework for mining high utility-probability sequences (HUPSs) in uncertain QSDBs (UQS-DBs) with multiple minimum utility thresholds using a minimum utility. Two new width and depth pruning strategies are also introduced to early eliminate low utility or low probability sequences as well as their extensions, and to reduce sets of candidate items for extensions during the mining process. Based on these strategies, a novel ecient algorithm named HUPSMT is designed for discovering HUPSs. Finally, an experimental study conducted in both real-life and synthetic UQSDBs shows the performance of HUPSMT in terms of time and memory consumption.

Download Full-text

Mining High Utility Itemsets with Hill Climbing and Simulated Annealing

ACM Transactions on Management Information Systems ◽

10.1145/3462636 ◽

2022 ◽

Vol 13 (1) ◽

pp. 1-22

Author(s):

M. Saqib Nawaz ◽

Philippe Fournier-Viger ◽

Unil Yun ◽

Youxi Wu ◽

Wei Song

Keyword(s):

Simulated Annealing ◽

Heuristic Algorithms ◽

Real Life ◽

Search Space ◽

Population Diversity ◽

Hill Climbing ◽

Target Values ◽

High Utility ◽

High Utility Itemsets ◽

Search Space Pruning

High utility itemset mining (HUIM) is the task of finding all items set, purchased together, that generate a high profit in a transaction database. In the past, several algorithms have been developed to mine high utility itemsets (HUIs). However, most of them cannot properly handle the exponential search space while finding HUIs when the size of the database and total number of items increases. Recently, evolutionary and heuristic algorithms were designed to mine HUIs, which provided considerable performance improvement. However, they can still have a long runtime and some may miss many HUIs. To address this problem, this article proposes two algorithms for HUIM based on Hill Climbing (HUIM-HC) and Simulated Annealing (HUIM-SA). Both algorithms transform the input database into a bitmap for efficient utility computation and for search space pruning. To improve population diversity, HUIs discovered by evolution are used as target values for the next population instead of keeping the current optimal values in the next population. Through experiments on real-life datasets, it was found that the proposed algorithms are faster than state-of-the-art heuristic and evolutionary HUIM algorithms, that HUIM-SA discovers similar HUIs, and that HUIM-SA evolves linearly with the number of iterations.

Download Full-text

Mining High Utility Sequential Patterns with Negative Item Values

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001417500355 ◽

2017 ◽

Vol 31 (10) ◽

pp. 1750035 ◽

Cited By ~ 8

Author(s):

Tiantian Xu ◽

Xiangjun Dong ◽

Jianliang Xu ◽

Xue Dong

Keyword(s):

Real Life ◽

Search Space ◽

Sequential Patterns ◽

Negative Item ◽

Novel Method ◽

Complete Set ◽

High Utility ◽

High Utility Itemsets ◽

Positive Return ◽

Pruning Methods

High utility sequential patterns (HUSP) refer to those sequential patterns with high utility (such as profit), which play a crucial role in many real-life applications. Relevant studies of HUSP only consider positive values of sequence utility. In some applications, however, a sequence consists of items with negative values (NIV). For example, a supermarket sells a cartridge with negative profit in a package with a printer at higher positive return. Although a few methods have been proposed to mine high utility itemsets (HUI) with NIV, they are not suitable for mining HUSP with NIV because an item may occur more than once in a sequence and its utility may have multiple values. In this paper, we propose a novel method High Utility Sequential Patterns with Negative Item Values (HUSP-NIV) to efficiently mine HUSP with NIV from sequential utility-based databases. HUSP-NIV works as follows: (1) using the lexicographic quantitative sequence tree (LQS-tree) to extract the complete set of high utility sequences and using I-Concatenation and S-Concatenation mechanisms to generate newly concatenated sequences; (2) using three pruning methods to reduce the search space in the LQS-tree; (3) traversing LQS-tree and outputting all the high utility sequential patterns. To the best of our knowledge, HUSP-NIV is the first method to mine HUSP with NIV, which is shown efficient on both synthetic and real datasets.

Download Full-text

Queueing Analysis of Continuous Queries for Uncertain Data Streams Over Sliding Windows

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001416600016 ◽

2016 ◽

Vol 30 (09) ◽

pp. 1660001 ◽

Cited By ~ 6

Author(s):

Guoqing Xiao ◽

Kenli Li ◽

Xu Zhou ◽

Keqin Li

Keyword(s):

Data Streams ◽

Uncertain Data ◽

Continuous Queries ◽

System Capacity ◽

Queueing Model ◽

Loss Ratio ◽

Data Loss ◽

Data Stream Management ◽

Sliding Windows ◽

Uncertain Data Streams

With the rapid development of data collection methods and their practical applications, the management of uncertain data streams has drawn wide attention in both academia and industry. System capacity planning and Quality of service (QoS) metrics are two very important problems for data stream management systems (DSMSs) to process streams efficiently due to unpredictable input characteristics and limited memory resource in the system. Motivated by this, in this paper, we explore an effective approach to estimate the memory requirement, data loss ratio, and tuple latency of continuous queries for uncertain data streams over sliding windows in a DSMS. More specifically, we propose a queueing model to address these problems in this paper. We study the average number of tuples, average tuple latency in the queue, and the distribution of the number of tuples and tuple latency in the queue under the Poisson arrival of input data streams in our queueing model. Furthermore, we also determine the maximum capacity of the queueing system based on the data loss ratio. The solutions for the above problems are very important to help researchers design, manage, and optimize a DSMS, including allocating buffer needed for a queue and admitting a continuous uncertain query to the system without violation of the pre-specified QoS requirements.

Download Full-text

An Algorithm of Top-k High Utility Itemsets Mining over Data Stream

Journal of Software ◽

10.4304/jsw.9.9.2342-2347 ◽

2014 ◽

Vol 9 (9) ◽

Cited By ~ 4

Author(s):

Tianjun Lu ◽

Yang Liu ◽

Le Wang

Keyword(s):

Data Stream ◽

High Utility ◽

High Utility Itemsets

Download Full-text