scholarly journals PHUIMUS: A Potential High Utility Itemsets Mining Algorithm Based on Stream Data with Uncertainty

2017 ◽  
Vol 2017 ◽  
pp. 1-13 ◽  
Author(s):  
Ju Wang ◽  
Fuxian Liu ◽  
Chunjie Jin

High utility itemsets (HUIs) mining has been a hot topic recently, which can be used to mine the profitable itemsets by considering both the quantity and profit factors. Up to now, researches on HUIs mining over uncertain datasets and data stream had been studied respectively. However, to the best of our knowledge, the issue of HUIs mining over uncertain data stream is seldom studied. In this paper, PHUIMUS (potential high utility itemsets mining over uncertain data stream) algorithm is proposed to mine potential high utility itemsets (PHUIs) that represent the itemsets with high utilities and high existential probabilities over uncertain data stream based on sliding windows. To realize the algorithm, potential utility list over uncertain data stream (PUS-list) is designed to mine PHUIs without rescanning the analyzed uncertain data stream. And transaction weighted probability and utility tree (TWPUS-tree) over uncertain data stream is also designed to decrease the number of candidate itemsets generated by the PHUIMUS algorithm. Substantial experiments are conducted in terms of run-time, number of discovered PHUIs, memory consumption, and scalability on real-life and synthetic databases. The results show that our proposed algorithm is reasonable and acceptable for mining meaningful PHUIs from uncertain data streams.

2017 ◽  
Vol 47 (4) ◽  
pp. 1240-1255 ◽  
Author(s):  
Siddharth Dawar ◽  
Veronica Sharma ◽  
Vikram Goyal

Author(s):  
Jerry Chun-Wei Lin ◽  
Wensheng Gan ◽  
Philippe Fournier-Viger ◽  
Tzung-Pei Hong ◽  
Vincent S. Tseng

2021 ◽  
Vol 16 ◽  
pp. 261-269
Author(s):  
Raja Azhan Syah Raja Wahab ◽  
Siti Nurulain Mohd Rum ◽  
Hamidah Ibrahim ◽  
Fatimah Sidi ◽  
Iskandar Ishak

The data stream is a series of data generated at sequential time from different sources. Processing such data is very important in many contemporary applications such as sensor networks, RFID technology, mobile computing and many more. The huge amount data generated and frequent changes in a short time makes the conventional processing methods insufficient. The Sliding Window Model (SWM) was introduced by Datar et. al to handle this problem. Avoiding multiple scans of the whole data sets, optimizing memory usage, and processing only the most recent tuple are the main challenges. The number of possible world instances grows exponentially in uncertain data and it is highly difficult to comprehend what it takes to meet Top-k query processing in the shortest amount of time. Following the generation of rules and the probability theory of this model, a framework was anticipated to sustain top-k processing algorithm over the SWM approach until the candidates expired. Based on the literature review study, none of the existing work have been made to tackle the issue arises from the top-k query processing of the possible world instance of the uncertain data streams within the SWM. The major issue resulted from these scenarios need to be addressed especially in the computation redundancy area that contributed to the increases of computational cost within the SWM. Therefore, the main objective of this research work is to propose the top-k query processing methods over uncertain data streams in SWM utilizing the score and the Possible World (PW) setting. In this study, a novel expiration and object indexing method is introduced to address the computational redundancy issues. We believed the proposed method can reduce computational costs and by managing insertion and exit policy on the right tuple candidates within a specified window frame. This research work will contribute to the area of computational query processing.


2019 ◽  
Vol 35 (1) ◽  
pp. 1-20 ◽  
Author(s):  
Truong Chi Tin ◽  
Tran Ngoc Anh ◽  
Duong Van Hai ◽  
Le Hoai Bac

The problem of high utility sequence mining (HUSM) in quantitative se-quence databases (QSDBs) is more general than that of frequent sequence mining in se-quence databases. An important limitation of HUSM is that a user-predened minimum tility threshold is used commonly to decide if a sequence is high utility. However, this is not convincing in many real-life applications as sequences may have diferent importance. Another limitation of HUSM is that data in QSDBs are assumed to be precise. But in the real world, collected data such as by sensor maybe uncertain. Thus, this paper proposes a framework for mining high utility-probability sequences (HUPSs) in uncertain QSDBs (UQS-DBs) with multiple minimum utility thresholds using a minimum utility. Two new width and depth pruning strategies are also introduced to early eliminate low utility or low probability sequences as well as their extensions, and to reduce sets of candidate items for extensions during the mining process. Based on these strategies, a novel ecient algorithm named HUPSMT is designed for discovering HUPSs. Finally, an experimental study conducted in both real-life and synthetic UQSDBs shows the performance of HUPSMT in terms of time and memory consumption.


2022 ◽  
Vol 13 (1) ◽  
pp. 1-22
Author(s):  
M. Saqib Nawaz ◽  
Philippe Fournier-Viger ◽  
Unil Yun ◽  
Youxi Wu ◽  
Wei Song

High utility itemset mining (HUIM) is the task of finding all items set, purchased together, that generate a high profit in a transaction database. In the past, several algorithms have been developed to mine high utility itemsets (HUIs). However, most of them cannot properly handle the exponential search space while finding HUIs when the size of the database and total number of items increases. Recently, evolutionary and heuristic algorithms were designed to mine HUIs, which provided considerable performance improvement. However, they can still have a long runtime and some may miss many HUIs. To address this problem, this article proposes two algorithms for HUIM based on Hill Climbing (HUIM-HC) and Simulated Annealing (HUIM-SA). Both algorithms transform the input database into a bitmap for efficient utility computation and for search space pruning. To improve population diversity, HUIs discovered by evolution are used as target values for the next population instead of keeping the current optimal values in the next population. Through experiments on real-life datasets, it was found that the proposed algorithms are faster than state-of-the-art heuristic and evolutionary HUIM algorithms, that HUIM-SA discovers similar HUIs, and that HUIM-SA evolves linearly with the number of iterations.


Author(s):  
Tiantian Xu ◽  
Xiangjun Dong ◽  
Jianliang Xu ◽  
Xue Dong

High utility sequential patterns (HUSP) refer to those sequential patterns with high utility (such as profit), which play a crucial role in many real-life applications. Relevant studies of HUSP only consider positive values of sequence utility. In some applications, however, a sequence consists of items with negative values (NIV). For example, a supermarket sells a cartridge with negative profit in a package with a printer at higher positive return. Although a few methods have been proposed to mine high utility itemsets (HUI) with NIV, they are not suitable for mining HUSP with NIV because an item may occur more than once in a sequence and its utility may have multiple values. In this paper, we propose a novel method High Utility Sequential Patterns with Negative Item Values (HUSP-NIV) to efficiently mine HUSP with NIV from sequential utility-based databases. HUSP-NIV works as follows: (1) using the lexicographic quantitative sequence tree (LQS-tree) to extract the complete set of high utility sequences and using I-Concatenation and S-Concatenation mechanisms to generate newly concatenated sequences; (2) using three pruning methods to reduce the search space in the LQS-tree; (3) traversing LQS-tree and outputting all the high utility sequential patterns. To the best of our knowledge, HUSP-NIV is the first method to mine HUSP with NIV, which is shown efficient on both synthetic and real datasets.


Author(s):  
Guoqing Xiao ◽  
Kenli Li ◽  
Xu Zhou ◽  
Keqin Li

With the rapid development of data collection methods and their practical applications, the management of uncertain data streams has drawn wide attention in both academia and industry. System capacity planning and Quality of service (QoS) metrics are two very important problems for data stream management systems (DSMSs) to process streams efficiently due to unpredictable input characteristics and limited memory resource in the system. Motivated by this, in this paper, we explore an effective approach to estimate the memory requirement, data loss ratio, and tuple latency of continuous queries for uncertain data streams over sliding windows in a DSMS. More specifically, we propose a queueing model to address these problems in this paper. We study the average number of tuples, average tuple latency in the queue, and the distribution of the number of tuples and tuple latency in the queue under the Poisson arrival of input data streams in our queueing model. Furthermore, we also determine the maximum capacity of the queueing system based on the data loss ratio. The solutions for the above problems are very important to help researchers design, manage, and optimize a DSMS, including allocating buffer needed for a queue and admitting a continuous uncertain query to the system without violation of the pre-specified QoS requirements.


Sign in / Sign up

Export Citation Format

Share Document