New and Efficient Algorithms for Producing Frequent Itemsets with the Map-Reduce Framework

As Big Data is group of structured, unstructured and semi-structure data collected from various sources, it is important to mine and provide privacy to individual data. Differential Privacy is one the best measure which provides strong privacy guarantee. The chapter proposed differentially private frequent item set mining using map reduce requires less time for privately mining large dataset. The chapter discussed problem of preserving data privacy, different challenges to preserving data privacy in big data environment, Data privacy techniques and their applications to unstructured data. The analyses of experimental results on structured and unstructured data set are also presented.

Download Full-text

Issues of K Means Clustering While Migrating to Map Reduce Paradigm with Big Data: A Survey

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v6i6.11207 ◽

2016 ◽

Vol 6 (6) ◽

pp. 3047 ◽

Cited By ~ 1

Author(s):

Khyati R Nirmal ◽

K.V.V. Satyanarayana

Keyword(s):

Data Mining ◽

Big Data ◽

Distinct Group ◽

Map Reduce ◽

Data Mining Algorithm ◽

Distributed Environment ◽

Significant Information ◽

User Influence ◽

Initial Cluster ◽

Machine Learning Approach

In recent times Big Data Analysis are imminent as essential area in the field of Computer Science. Taking out of significant information from Big Data by separating the data in to distinct group is crucial task and it is beyond the scope of commonly used personal machine. It is necessary to adopt the distributed environment similar to map reduce paradigm and migrate the data mining algorithm using it. In Data Mining the partition based K Means Clustering is one of the broadly used algorithms for grouping data according to the degree of similarities between data. It requires the number of K and initial centroid of cluster as input. By surveying the parameters preferred by algorithm or opted by user influence the functionality of Algorithm. It is the necessity to migrate the K means Clustering on MapReduce and predicts the value of k using machine learning approach. For selecting the initial cluster the efficient method is to be devised and united with it. This paper is comprised the survey of several methods for predicting the value of K in K means Clustering and also contains the survey of different methodologies to find out initial center of the cluster. Along with initial value of k and initial centroid selection the objective of proposed work is to compact with analysis of categorical data.

Download Full-text

Issues of K Means Clustering While Migrating to Map Reduce Paradigm with Big Data: A Survey

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v6i6.pp3047-3051 ◽

2016 ◽

Vol 6 (6) ◽

pp. 3047

Author(s):

Khyati R Nirmal ◽

K.V.V. Satyanarayana

Keyword(s):

Data Mining ◽

Big Data ◽

Distinct Group ◽

Map Reduce ◽

Data Mining Algorithm ◽

Distributed Environment ◽

Significant Information ◽

User Influence ◽

Initial Cluster ◽

Machine Learning Approach

In recent times Big Data Analysis are imminent as essential area in the field of Computer Science. Taking out of significant information from Big Data by separating the data in to distinct group is crucial task and it is beyond the scope of commonly used personal machine. It is necessary to adopt the distributed environment similar to map reduce paradigm and migrate the data mining algorithm using it. In Data Mining the partition based K Means Clustering is one of the broadly used algorithms for grouping data according to the degree of similarities between data. It requires the number of K and initial centroid of cluster as input. By surveying the parameters preferred by algorithm or opted by user influence the functionality of Algorithm. It is the necessity to migrate the K means Clustering on MapReduce and predicts the value of k using machine learning approach. For selecting the initial cluster the efficient method is to be devised and united with it. This paper is comprised the survey of several methods for predicting the value of K in K means Clustering and also contains the survey of different methodologies to find out initial center of the cluster. Along with initial value of k and initial centroid selection the objective of proposed work is to compact with analysis of categorical data.

Download Full-text

Map Reduce clustering in Incremental Big Data processing

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b6606.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 4205-4211

Keyword(s):

Data Mining ◽

Big Data ◽

Social Network ◽

Data Processing ◽

Online Shopping ◽

Processing Technique ◽

Map Reduce ◽

Data Set ◽

Computation Procedure ◽

Incremental Processing

An advanced Incremental processing technique is planned for data examination in knowledge to have the clustering results inform. Data is continuously arriving by different data generating factors like social network, online shopping, sensors, e-commerce etc. [1]. On account of this Big Data the consequences of data mining applications getting stale and neglected after some time. Cloud knowledge applications regularly perform iterative calculations (e.g., PageRank) on continuously converting datasets. Though going before trainings grow Map-Reduce aimed at productive iterative calculations, it's miles also pricey to carry out a whole new big-ruler Map-Reduce iterative task near well-timed quarter new adjustments to fundamental records sets. Our usage of MapReduce keeps running [4] scheduled a big cluster of product technologies and is incredibly walkable: an ordinary Map-Reduce computation procedure several terabytes of records arranged heaps of technologies. Processor operator locates the machine clean to apply: masses of MapReduce applications, we look at that during many instances, The differences result separate a totally little part of the data set, and the recently iteratively merged nation is very near the recently met state. I2MapReduce clustering adventures this commentary to keep re-calculated by way of beginning after the before affected national [2], and by using acting incremental up-dates on the converging information. The approach facilitates in enhancing the process successively period and decreases the jogging period of stimulating the consequences of big data.

Download Full-text

Big Data Mining using Map Reduce: A Survey Paper

IOSR Journal of Computer Engineering ◽

10.9790/0661-16673740 ◽

2014 ◽

Vol 16 (6) ◽

pp. 37-40 ◽

Cited By ~ 2

Author(s):

Shital Suryawanshi ◽

◽

Prof. V.S Wadne

Keyword(s):

Data Mining ◽

Big Data ◽

Map Reduce ◽

Survey Paper ◽

Big Data Mining

Download Full-text

Data Mining with Big Data e-Health Service Using Map Reduce

IJARCCE ◽

10.17148/ijarcce.2015.4227 ◽

2015 ◽

pp. 123-127

Author(s):

Abinaya. K

Keyword(s):

Data Mining ◽

Big Data ◽

Health Service ◽

Map Reduce

Download Full-text

Closed Frequent Itemsets Mining Based on It-Tree

Journal of Medical Informatics and Decision Making ◽

10.14302/issn.2641-5526.jmid-20-3424 ◽

2020 ◽

Vol 1 (2) ◽

pp. 44-52

Author(s):

Youssef Fakir ◽

Chaima Ahle Touate ◽

Rachid Elayachi ◽

Mohamed Fakir

Keyword(s):

Data Mining ◽

Association Rule ◽

Computing Time ◽

Frequent Itemsets ◽

Closed Frequent Itemsets ◽

Hidden Knowledge ◽

Closed Itemsets ◽

Frequent Itemsets Mining ◽

Direct Counting ◽

Very High

In the last decade, the amount of collected data, in various computer science applications, has grown considerably. These large volumes of data need to be analysed in order to extract useful hidden knowledge. This work focuses on association rule extraction. This technique is one of the most popular in data mining. Nevertheless, the number of extracted association rules is often very high, and many of them are redundant. In this paper, we propose an algorithm, for mining closed itemsets, with the construction of an it-tree. This algorithm is compared with the DCI (direct counting & intersect) algorithm based on min support and computing time. CHARM is not memery-efficient. It needs to store all closed itemsets in the memory. The lower min-sup is, the more frequent closed itemsets there are so that the amounts of memory used by CHARM are increasing.

Download Full-text

An Accurate Privacy-Preserving Data Mining Algorithm for Frequent Itemsets in Distributed Databases

Information Computing and Automation ◽

10.1142/9789812799524_0327 ◽

2008 ◽

Author(s):

Xiaodan Hu ◽

Yongchu Wang

Keyword(s):

Data Mining ◽

Distributed Databases ◽

Privacy Preserving ◽

Frequent Itemsets ◽

Data Mining Algorithm ◽

Privacy Preserving Data Mining ◽

Mining Algorithm

Download Full-text

Flexible Mining of Association Rules

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch137 ◽

2011 ◽

pp. 890-894

Author(s):

Hong Shen

Keyword(s):

Data Mining ◽

Association Rules ◽

Association Rule ◽

Association Rule Mining ◽

Strong Association ◽

Frequent Itemsets ◽

Efficient Algorithms ◽

Multiple Level ◽

Rule Mining ◽

Unique Identifier

The discovery of association rules showing conditions of data co-occurrence has attracted the most attention in data mining. An example of an association rule is the rule “the customer who bought bread and butter also bought milk,” expressed by T(bread; butter)? T(milk). Let I ={x1,x2,…,xm} be a set of (data) items, called the domain; let D be a collection of records (transactions), where each record, T, has a unique identifier and contains a subset of items in I. We define itemset to be a set of items drawn from I and denote an itemset containing k items to be k-itemset. The support of itemset X, denoted by Ã(X/D), is the ratio of the number of records (in D) containing X to the total number of records in D. An association rule is an implication rule ?Y, where X; ? I and X ?Y=0. The confidence of ? Y is the ratio of s(?Y/D) to s(X/D), indicating that the percentage of those containing X also contain Y. Based on the user-specified minimum support (minsup) and confidence (minconf), the following statements are true: An itemset X is frequent if s(X/D)> minsup, and an association rule ? XY is strong i ?XY is frequent and ( / ) ( / ) X Y D X Y ? ¸ minconf. The problem of mining association rules is to find all strong association rules, which can be divided into two subproblems: 1. Find all the frequent itemsets. 2. Generate all strong rules from all frequent itemsets. Because the second subproblem is relatively straightforward ? we can solve it by extracting every subset from an itemset and examining the ratio of its support; most of the previous studies (Agrawal, Imielinski, & Swami, 1993; Agrawal, Mannila, Srikant, Toivonen, & Verkamo, 1996; Park, Chen, & Yu, 1995; Savasere, Omiecinski, & Navathe, 1995) emphasized on developing efficient algorithms for the first subproblem. This article introduces two important techniques for association rule mining: (a) finding N most frequent itemsets and (b) mining multiple-level association rules.

Download Full-text

Closed frequent itemsets mining based on It-Tree

Global Journal of Computer Sciences Theory and Research ◽

10.18844/gjcs.v11i1.4912 ◽

2021 ◽

Vol 11 (1) ◽

pp. 01-11

Author(s):

Youssef Fakir ◽

Chaima Ahle Touateb ◽

Rachid Elayachi

Keyword(s):

Data Mining ◽

Association Rule ◽

Computing Time ◽

Frequent Itemsets ◽

Closed Frequent Itemsets ◽

Hidden Knowledge ◽

Closed Itemsets ◽

Frequent Itemsets Mining ◽

Direct Counting ◽

Very High

In the last decade, the amount of collected data, in various computer science applications, has grown considerably. These large volumes of data need to be analysed in order to extract useful hidden knowledge. This work focuses on association rule extraction. This technique is one of the most popular in data mining. Nevertheless, the number of extracted association rules is often very high, and many of them are redundant. In this paper, we propose an algorithm, for mining closed itemsets, with the construction of an it-tree. This algorithm is compared with the DCI (direct counting & intersect) algorithm based on min support and computing time. CHARM is not memery-efficient. It needs to store all closed itemsets in the memory. The lower min-sup is, the more frequent closed itemsets there are so that the amounts of memory used by CHARM are increasing.

Download Full-text