MISFP-Growth: Hadoop-Based Frequent Pattern Mining with Multiple Item Support

Chen-Shu Wang; Jui-Yen Chang

doi:10.3390/app9102075

MISFP-Growth: Hadoop-Based Frequent Pattern Mining with Multiple Item Support

Applied Sciences ◽

10.3390/app9102075 ◽

2019 ◽

Vol 9 (10) ◽

pp. 2075 ◽

Cited By ~ 4

Author(s):

Chen-Shu Wang ◽

Jui-Yen Chang

Keyword(s):

Big Data ◽

Data Analytics ◽

Pattern Mining ◽

High Efficiency ◽

Big Data Analytics ◽

Frequent Pattern Mining ◽

Experimental Results ◽

Frequent Pattern ◽

Multiple Item ◽

Two Phases

In practice, single item support cannot comprehensively address the complexity of items in large datasets. In this study, we propose a big data analytics framework (named Multiple Item Support Frequent Patterns, MISFP-growth algorithm) that uses Hadoop-based parallel computing to achieve high-efficiency mining of itemsets with multiple item supports (MIS). The proposed architecture consists of two phases. First, in the counting support phase, a Hadoop MapReduce architecture is employed to determine the support for each item. Next, in the analytics phase, sub-transaction blocks are generated according to MIS and the MISFP-growth algorithm identifies the frequency of patterns. To facilitate decision makers in setting MIS, we also propose the concept of classification of item (COI), which classifies items of higher homogeneity into the same class, by which the items inherit class support as their item support. Three experiments were implemented to validate the proposed Hadoop-based MISFP-growth algorithm. The experimental results show approximately 38% reduction in the execution time on parallel architectures. The proposed MISFP-growth algorithm can be implemented on the distributed computing framework. Furthermore, according to the experimental results, the enhanced performance of the proposed algorithm indicates that it could have big data analytics applications.

Download Full-text

An Innovative Framework for Supporting Cognitive-Based Big Data Analytics for Frequent Pattern Mining

2018 IEEE International Conference on Cognitive Computing (ICCC) ◽

10.1109/iccc.2018.00014 ◽

2018 ◽

Cited By ~ 3

Author(s):

Deyu Deng ◽

Carson K. Leung ◽

Bryan H. Wodi ◽

Jialiang Yu ◽

Hao Zhang ◽

...

Keyword(s):

Big Data ◽

Data Analytics ◽

Pattern Mining ◽

Big Data Analytics ◽

Frequent Pattern Mining ◽

Frequent Pattern

Download Full-text

Big Data Analytics and Mining for Knowledge Discovery

Encyclopedia of Organizational Knowledge, Administration, and Technology - Advances in Logistics, Operations, and Management Science ◽

10.4018/978-1-7998-3473-1.ch125 ◽

2021 ◽

pp. 1817-1830

Author(s):

Carson K. Leung

Keyword(s):

Big Data ◽

Data Analytics ◽

Pattern Mining ◽

Real Life ◽

Big Data Analytics ◽

Organizational Knowledge ◽

Frequent Pattern ◽

Data Sets ◽

Technology Applications ◽

Rich Data

Big data analytics and mining aims to discover implicit, previously unknown, and potentially useful information and knowledge from big data sets that contain huge volumes of valuable veracious data collected or generated at a high velocity from a wide variety of rich data sources. Among different big data analytic and mining tasks, this chapter focuses on frequent pattern mining. By relying on the MapReduce programming model, researchers only need to specify the “map” and “reduce” functions to discover (organizational) knowledge from (i) big data sets of precise data in a breadth-first manner or depth-first manner and/or from (ii) big data sets of uncertain data. Such a big data analytics process can be sped up by focusing the mining according to the user-specified constraints that express the user interests. The resulting (constrained or unconstrained) frequent patterns mined from big data sets provide users with new insights and a sound understanding of users' patterns. Such (organizational) knowledge is useful is many real-life information science and technology applications.

Download Full-text

A parallel approach for high utility-based frequent pattern mining in a big data environment

Iran Journal of Computer Science ◽

10.1007/s42044-021-00083-5 ◽

2021 ◽

Author(s):

Krishna Kumar Mohbey ◽

Sunil Kumar

Keyword(s):

Big Data ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Data Environment ◽

High Utility

Download Full-text

Swarm intelligent based online feature selection (OFS) and weighted entropy frequent pattern mining (WEFPM) algorithm for big data analysis

Cluster Computing ◽

10.1007/s10586-017-1489-9 ◽

2017 ◽

Vol 22 (S5) ◽

pp. 11791-11803

Author(s):

S. Gayathri Devi ◽

M. Sabrigiriraj

Keyword(s):

Feature Selection ◽

Big Data ◽

Data Analysis ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Big Data Analysis ◽

Frequent Pattern ◽

Online Feature Selection ◽

Weighted Entropy

Download Full-text

Big Data Frequent Pattern Mining

Frequent Pattern Mining ◽

10.1007/978-3-319-07821-2_10 ◽

2014 ◽

pp. 225-259 ◽

Cited By ~ 8

Author(s):

David C. Anastasiu ◽

Jeremy Iverson ◽

Shaden Smith ◽

George Karypis

Keyword(s):

Big Data ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern

Download Full-text

Apriori-based High Efficiency Load Balancing Parallel Data Mining Algorithms on Multi-core Architectures

International Journal of Grid and High Performance Computing ◽

10.4018/ijghpc.2015040106 ◽

2015 ◽

Vol 7 (2) ◽

pp. 77-99 ◽

Cited By ~ 4

Author(s):

Kun-Ming Yu ◽

Sheng-Hui Liu ◽

Li-Wei Zhou ◽

Shu-Hao Wu

Keyword(s):

Data Mining ◽

Load Balancing ◽

Pattern Mining ◽

High Efficiency ◽

Computation Time ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Parallel Data ◽

Parallel Data Mining ◽

Mining Methods

Frequent pattern mining has been playing an essential role in knowledge discovery and data mining tasks that try to find usable patterns from databases. Efficiency is especially crucial for an algorithm in order to find frequent itemsets from a large database. Numerous methods have been proposed to solve this problem, such as Apriori and FP-growth. These are regarded as fundamental frequent pattern mining methods. In addition, parallel computing architectures, such as an on-cloud platform, a grid system, multi-core and GPU platform, have been popular in data mining. However, most of the algorithms have been proposed without considering the prevalent multi-core architectures. In this study, multi-core architectures were used as well as two high efficiency load balancing parallel data mining methods based on the Apriori algorithm. The main goal of the proposed algorithms was to reduce the massive number of duplicate candidates generated using previous methods. This goal was achieved for, in this detailed experimental study the algorithms performed better than the previous methods. The experimental results demonstrated that the proposed algorithms had dramatically reduced computation time when using more threads. Moreover, the observations showed that the workload was equally balanced among the computing units.

Download Full-text

Constrained Frequent Pattern Mining from Big Data Via Crowdsourcing

Big Data Applications and Services 2017 - Advances in Intelligent Systems and Computing ◽

10.1007/978-981-13-0695-2_9 ◽

2018 ◽

pp. 69-79 ◽

Cited By ~ 2

Author(s):

Calvin S. H. Hoi ◽

Daniyal Khowaja ◽

Carson K. Leung

Keyword(s):

Big Data ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern

Download Full-text

An Improved Eclat Algorithm Based on Tissue-Like P System with Active Membranes

Processes ◽

10.3390/pr7090555 ◽

2019 ◽

Vol 7 (9) ◽

pp. 555

Author(s):

Linlin Jia ◽

Laisheng Xiang ◽

Xiyu Liu

Keyword(s):

Pattern Mining ◽

Frequent Pattern Mining ◽

Experimental Results ◽

Frequent Pattern ◽

P System ◽

Vertical Data ◽

Active Membranes ◽

Pruning Strategy ◽

Mining Algorithms ◽

Rewriting Rules

The Eclat algorithm is a typical frequent pattern mining algorithm using vertical data. This study proposes an improved Eclat algorithm called ETPAM, based on the tissue-like P system with active membranes. The active membranes are used to run evolution rules, i.e., object rewriting rules, in parallel. Moreover, ETPAM utilizes subsume indices and an early pruning strategy to reduce the number of frequent pattern candidates and subsumes. The time complexity of ETPAM is decreased from O(t2) to O(t) as compared with the original Eclat algorithm through the parallelism of the P system. The experimental results using two databases indicate that ETPAM performs very well in mining frequent patterns, and the experimental results using four databases prove that ETPAM is computationally very efficient as compared with three other existing frequent pattern mining algorithms.

Download Full-text

A fast and parallel algorithm for frequent pattern mining from big data in many-task environments

International Journal of High Performance Computing and Networking ◽

10.1504/ijhpcn.2017.084244 ◽

2017 ◽

Vol 10 (3) ◽

pp. 157 ◽

Cited By ~ 1

Author(s):

Wei Tee Lin ◽

Chih Ping Chu

Keyword(s):

Big Data ◽

Parallel Algorithm ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Task Environments

Download Full-text

Frequent Pattern Mining over Unstructured Data using Semi-Structured Doc-Model and Pattern Ranking

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit206216 ◽

2020 ◽

pp. 36-42

Author(s):

Sudhir Tirumalasetty ◽

A. Divya ◽

D. Rahitya Lakshmi ◽

Ch. Durga Bhavani ◽

D. Anusha

Keyword(s):

Data Mining ◽

Big Data ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Unstructured Data ◽

Frequent Pattern ◽

Frequent Patterns ◽

Innovative Methods ◽

Mining Algorithms ◽

Doc Model

Frequent pattern mining is an essential data-mining task, with a goal of discovering knowledge in the form of repeated patterns. Many efficient pattern-mining algorithms have been discovered in the last two decades, yet most do not scale to the type of data we are presented with today, the so-called “Big Data”. Scalable parallel algorithms hold the key to solving the problem in this context. This paper reviews recent advances in parallel frequent pattern mining, analysing them through the Big Data lens. Load balancing and work partitioning are the major challenges to be conquered. These challenges always invoke innovative methods to do, as Big Data evolves with no limits. The biggest challenge than before is conquering unstructured data for finding frequent patterns. To accomplish this Semi Structured Doc-Model and ranking of patterns are used.

Download Full-text