scholarly journals An Efficient Subset-Lattice Algorithm for Mining Closed Frequent Itemsets in Data Streams

Author(s):  
Ye-In Chang ◽  
Chia-En Li ◽  
Wei-Hau Peng
2010 ◽  
Vol 44-47 ◽  
pp. 3159-3163
Author(s):  
Ke Ming Tang ◽  
Cai Yan Dai ◽  
Ling Chen

Mining closed frequent itemsets in data streams is an important task in stream data mining. Most of the traditional algorithms for mining closed frequent itemsets are Apriori-based which find the frequent itemsets from large amount of candidates, and needs a great deal of time and space. In this paper, an algorithm ItemListFCI for mining closed frequent itemsets in data stream is proposed. The algorithm is based on the sliding window model, and uses a ItemList where the transactions and itemsets are recorded by the column and row vectors respectively. The algorithm first builds the ItemList for the first sliding window. Frequent closed itemsets can be detected by pair-test operations on the binary numbers in the Table. After building the first ItemList, the algorithm updates the ItemList for each sliding window. The frequent closed itemsets in the sliding window can be identified from the ItemList. Algorithms are also proposed to modify ItemList when adding and deleting a transaction. The experimental results on synthetic and real data sets indicate that the proposed algorithm needs less CPU time and memory than other similar methods.


2012 ◽  
Vol 263-266 ◽  
pp. 231-240
Author(s):  
Yi Min Mao ◽  
Zhi Gang Chen ◽  
Li Xin Liu

With the emergence of large-volume and high-speed streaming data, traditional techniques for mining closed frequent itemsets has become inefficient. Online mining of closed frequent itemsets over streaming data is one of the most important issues in mining data streams. In this paper, a combinative data structure is designed by using an effective bit-victor to represent items and an extended dictionary frequent item list to record the current closed frequent information in streams. For tremendous reduction of search space, some new search strategies are proposed to avoid a large number of intermediate itemsets generated. Meanwhile, some new pruning strategies are also proposed for the purpose of efficiently and dynamically maintaining of all the closure check operations. Experimental results show that the method proposed is efficient in time, with sound scalability as the number of transactions processed increases and adapts rapidly to the changes in data streams.


Sign in / Sign up

Export Citation Format

Share Document