Apriori-based High Efficiency Load Balancing Parallel Data Mining Algorithms on Multi-core Architectures

Kun-Ming Yu; Sheng-Hui Liu; Li-Wei Zhou; Shu-Hao Wu

doi:10.4018/ijghpc.2015040106

Apriori-based High Efficiency Load Balancing Parallel Data Mining Algorithms on Multi-core Architectures

International Journal of Grid and High Performance Computing ◽

10.4018/ijghpc.2015040106 ◽

2015 ◽

Vol 7 (2) ◽

pp. 77-99 ◽

Cited By ~ 4

Author(s):

Kun-Ming Yu ◽

Sheng-Hui Liu ◽

Li-Wei Zhou ◽

Shu-Hao Wu

Keyword(s):

Data Mining ◽

Load Balancing ◽

Pattern Mining ◽

High Efficiency ◽

Computation Time ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Parallel Data ◽

Parallel Data Mining ◽

Mining Methods

Frequent pattern mining has been playing an essential role in knowledge discovery and data mining tasks that try to find usable patterns from databases. Efficiency is especially crucial for an algorithm in order to find frequent itemsets from a large database. Numerous methods have been proposed to solve this problem, such as Apriori and FP-growth. These are regarded as fundamental frequent pattern mining methods. In addition, parallel computing architectures, such as an on-cloud platform, a grid system, multi-core and GPU platform, have been popular in data mining. However, most of the algorithms have been proposed without considering the prevalent multi-core architectures. In this study, multi-core architectures were used as well as two high efficiency load balancing parallel data mining methods based on the Apriori algorithm. The main goal of the proposed algorithms was to reduce the massive number of duplicate candidates generated using previous methods. This goal was achieved for, in this detailed experimental study the algorithms performed better than the previous methods. The experimental results demonstrated that the proposed algorithms had dramatically reduced computation time when using more threads. Moreover, the observations showed that the workload was equally balanced among the computing units.

Download Full-text

Clustering of Time Series Data

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch042 ◽

2011 ◽

pp. 258-263

Author(s):

Anne Denton

Keyword(s):

Data Mining ◽

Time Series ◽

Pattern Mining ◽

Time Series Data ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Series Data ◽

Science And Engineering ◽

Data Mining Algorithms ◽

Mining Algorithms

Time series data is of interest to most science and engineering disciplines and analysis techniques have been developed for hundreds of years. There have, however, in recent years been new developments in data mining techniques, such as frequent pattern mining, that take a different perspective of data. Traditional techniques were not meant for such pattern-oriented approaches. There is, as a result, a significant need for research that extends traditional time-series analysis, in particular clustering, to the requirements of the new data mining algorithms.

Download Full-text

Research of Data Graph Mining Based on Telecommunication Customers

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.443.402 ◽

2013 ◽

Vol 443 ◽

pp. 402-406 ◽

Cited By ~ 1

Author(s):

Shang Gao ◽

Mei Mei Li

Keyword(s):

Data Mining ◽

Graph Mining ◽

Pattern Mining ◽

Rapid Development ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Practical Significance ◽

Research Progress ◽

Graph Data ◽

Data Graph

With the rapid development of the number of mobile phone users has accumulated a large number of graph data, graph data mining has gradually become a hot area of research. Traditional data such as clustering, classification, frequent pattern mining gradually extended to the field of graph data mining research. Introduced at this stage graph data mining technology research progress, summarizes the characteristics of the graphical data mining, practical significance, the main problem, and scenarios to discuss and forecast chart data, especially research on uncertain graph data become trends and hot spots.

Download Full-text

BIG DATA MINING FOR INTERESTING PATTERNS WITH MAP REDUCE TECHNIQUE

Asian Journal of Pharmaceutical and Clinical Research ◽

10.22159/ajpcr.2017.v10s1.19634 ◽

2017 ◽

Vol 10 (13) ◽

pp. 191

Author(s):

Nikhil Jamdar ◽

A Vijayalakshmi

Keyword(s):

Data Mining ◽

Pattern Mining ◽

Uncertain Data ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Map Reduce ◽

Frequent Patterns ◽

Precise Data ◽

Big Data Mining ◽

Transactional Databases

There are many algorithms available in data mining to search interesting patterns from transactional databases of precise data. Frequent pattern mining is a technique to find the frequently occurred items in data mining. Most of the techniques used to find all the interesting patterns from a collection of precise data, where items occurred in each transaction are certainly known to the system. As well as in many real-time applications, users are interested in a tiny portion of large frequent patterns. So the proposed user constrained mining approach, will help to find frequent patterns in which user is interested. This approach will efficiently find user interested frequent patterns by applying user constraints on the collections of uncertain data. The user can specify their own interest in the form of constraints and uses the Map Reduce model to find uncertain frequent pattern that satisfy the user-specified constraints

Download Full-text

Research into the Algorithm of Frequent Pattern Mining Based on across Linker

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.195-196.984 ◽

2012 ◽

Vol 195-196 ◽

pp. 984-986

Author(s):

Ming Ru Zhao ◽

Yuan Sun ◽

Jian Guo ◽

Ping Ping Dong

Keyword(s):

Data Mining ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Itemsets ◽

Frequent Pattern ◽

Apriori Algorithm ◽

Important Data ◽

Classical Algorithm ◽

Frequent Itemsets Mining ◽

Mining Frequent Itemsets

Frequent itemsets mining is an important data mining task and a focused theme in data mining research. Apriori algorithm is one of the most important algorithm of mining frequent itemsets. However, the Apriori algorithm scans the database too many times, so its efficiency is relatively low. The paper has therefore conducted a research on the mining frequent itemsets algorithm based on a across linker. Through comparing with the classical algorithm, the improved algorithm has obvious advantages.

Download Full-text

Distributed frequent hierarchical pattern mining for robust and efficient large-scale association discovery

10.32469/10355/63867 ◽

2017 ◽

Author(s):

◽

Michael Phinney

Keyword(s):

Data Mining ◽

Distributed Computing ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Generation Process ◽

Computing Environment ◽

Wide Range ◽

Mining Algorithms ◽

Hierarchical Pattern

Frequent pattern mining is a classic data mining technique, generally applicable to a wide range of application domains, and a mature area of research. The fundamental challenge arises from the combinatorial nature of frequent itemsets, scaling exponentially with respect to the number of unique items. Apriori-based and FPTree-based algorithms have dominated the space thus far. Initial phases of this research relied on the Apriori algorithm and utilized a distributed computing environment; we proposed the Cartesian Scheduler to manage Apriori's candidate generation process. To address the limitation of bottom-up frequent pattern mining algorithms such as Apriori and FPGrowth, we propose the Frequent Hierarchical Pattern Tree (FHPTree): a tree structure and new frequent pattern mining paradigm. The classic problem is redefined as frequent hierarchical pattern mining where the goal is to detect frequent maximal pattern covers. Under the proposed paradigm, compressed representations of maximal patterns are mined using a top-down FHPTree traversal, FHPGrowth, which detects large patterns before their subsets, thus yielding significant reductions in computation time. The FHPTree memory footprint is small; the number of nodes in the structure scales linearly with respect to the number of unique items. Additionally, the FHPTree serves as a persistent, dynamic data structure to index frequent patterns and enable efficient searches. When the search space is exponential, efficient targeted mining capabilities are paramount; this is one of the key contributions of the FHPTree. This dissertation will demonstrate the performance of FHPGrowth, achieving a 300x speed up over state-of-the-art maximal pattern mining algorithms and approximately a 2400x speedup when utilizing FHPGrowth in a distributed computing environment. In addition, we allude to future research opportunities, and suggest various modifications to further optimize the FHPTree and FHPGrowth. Moreover, the methods we offer will have an impact on other data mining research areas including contrast set mining as well as spatial and temporal mining.

Download Full-text

Scalable frequent-pattern mining methods

10.1145/502786.502792 ◽

2001 ◽

Cited By ~ 7

Author(s):

Jiawei Han ◽

Laks V. S. Lakshmanan ◽

Jian Pei

Keyword(s):

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Mining Methods

Download Full-text

Comparative Study of Frequent Pattern Mining Techniques

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.403-408.1022 ◽

2011 ◽

Vol 403-408 ◽

pp. 1022-1027 ◽

Cited By ~ 1

Author(s):

Gauravjeet Singh ◽

Sandeep Bal ◽

Poonamjeet Kaur ◽

Kanwaljit Kaur

Keyword(s):

Data Mining ◽

Comparative Study ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Mining Algorithms

Frequent pattern mining has been a focused theme in data mining research. Lots of techniques have been proposed to improve the performance of frequent pattern mining algorithms. This paper presents review of different frequent mining techniques. With each technique, we have provided brief description of the technique. At the end, we compared different frequent pattern mining techniques.

Download Full-text

Apriori Algorithm through RapidMiner for Age Patterns of Homeless and Beggars

Indonesian Journal of Artificial Intelligence and Data Mining ◽

10.24014/ijaidm.v1i2.5670 ◽

2018 ◽

Vol 1 (2) ◽

pp. 86

Author(s):

Wirta Agustin ◽

Yulya Muharmi

Keyword(s):

Data Mining ◽

Urban Areas ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Data Sets ◽

Apriori Algorithm ◽

Age Patterns ◽

Rule Method ◽

Algorithm Implementation

Homeless and beggars are one of the problems in urban areas because they can interfere public order, security, stability and urban development. The efforts conducted are still focused on how to manage homeless and beggars, but not for the prevention. One method that can be done to solve this problem is by determining the age pattern of homeless and beggars by implementing Algoritma Apriori. Apriori Algorithm is an Association Rule method in data mining to determine frequent item set that serves to help in finding patterns in a data (frequent pattern mining). The manual calculation through Apriori Algorithm obtaines combination pattern of 11 rules with a minimum support value of 25% and the highest confidence value of 100%. The evaluation of the Apriori Algorithm implementation is using the RapidMiner. RapidMiner application is one of the data mining processing software, including text analysis, extracting patterns from data sets and combining them with statistical methods, artificial intelligence, and databases to obtain high quality information from processed data. The test results showed a comparison of the age patterns of homeless and beggars who had the potential to become homeless and beggars from of testing with the RapidMiner application and manual calculations using the Apriori Algorithm.

Download Full-text

Preference-Based Frequent Pattern Mining

Data Warehousing and Mining ◽

10.4018/978-1-59904-951-9.ch073 ◽

2008 ◽

pp. 1280-1299

Author(s):

Moonjung Cho ◽

Jian Pei ◽

Haixun Wang ◽

Wei Wang

Keyword(s):

Data Mining ◽

General Framework ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Frequent Patterns ◽

Performance Study ◽

Important Data ◽

Mining Algorithms ◽

Extensive Performance

Frequent pattern mining is an important data-mining problem with broad applications. Although there are many in-depth studies on efficient frequent pattern mining algorithms and constraint pushing techniques, the effectiveness of frequent pattern mining remains a serious concern: It is non-trivial and often tricky to specify appropriate support thresholds and proper constraints. In this paper, we propose a novel theme of preference-based frequent pattern mining. A user simply can specify a preference instead of setting detailed parameters in constraints. We identify the problem of preference-based frequent pattern mining and formulate the preferences for mining. We develop an efficient framework to mine frequent patterns with preferences. Interestingly, many preferences can be pushed deep into the mining by properly employing the existing efficient frequent pattern mining techniques. We conduct an extensive performance study to examine our method. The results indicate that preference-based frequent pattern mining is effective and efficient. Furthermore, we extend our discussion from pattern-based frequent pattern mining to preference-based data mining in principle and draw a general framework.

Download Full-text

Frequent Pattern Mining over Unstructured Data using Semi-Structured Doc-Model and Pattern Ranking

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit206216 ◽

2020 ◽

pp. 36-42

Author(s):

Sudhir Tirumalasetty ◽

A. Divya ◽

D. Rahitya Lakshmi ◽

Ch. Durga Bhavani ◽

D. Anusha

Keyword(s):

Data Mining ◽

Big Data ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Unstructured Data ◽

Frequent Pattern ◽

Frequent Patterns ◽

Innovative Methods ◽

Mining Algorithms ◽

Doc Model

Frequent pattern mining is an essential data-mining task, with a goal of discovering knowledge in the form of repeated patterns. Many efficient pattern-mining algorithms have been discovered in the last two decades, yet most do not scale to the type of data we are presented with today, the so-called “Big Data”. Scalable parallel algorithms hold the key to solving the problem in this context. This paper reviews recent advances in parallel frequent pattern mining, analysing them through the Big Data lens. Load balancing and work partitioning are the major challenges to be conquered. These challenges always invoke innovative methods to do, as Big Data evolves with no limits. The biggest challenge than before is conquering unstructured data for finding frequent patterns. To accomplish this Semi Structured Doc-Model and ranking of patterns are used.

Download Full-text