Extracting Functional Dependencies in Large Datasets Using MapReduce Model
Over the last few years, data are generated in large volume at a faster rate and there has been a remarkable growth in the need for large scale data processing systems. As data grows larger in size, data quality is compromised. Functional dependencies representing semantic constraints in data are important for data quality assessment. Executing functional dependency discovery algorithms on a single computer is hard and laborious with large data sets. MapReduce provides an enabling technology for large scale data processing. The open-source Hadoop implementation of MapReduce has provided researchers a powerful tool for tackling large-data problems in a distributed manner. The objective of this study is to extract functional dependencies between attributes from large datasets using MapReduce programming model. Attribute entropy is used to measure the inter attribute correlations, and exploited to discover functional dependencies hidden in the data.