Extracting Functional Dependencies in Large Datasets Using MapReduce Model

K. Amshakala; R. Nedunchezhian; M. Rajalakshmi

doi:10.4018/ijiit.2014070102

Extracting Functional Dependencies in Large Datasets Using MapReduce Model

International Journal of Intelligent Information Technologies ◽

10.4018/ijiit.2014070102 ◽

2014 ◽

Vol 10 (3) ◽

pp. 19-35 ◽

Cited By ~ 8

Author(s):

K. Amshakala ◽

R. Nedunchezhian ◽

M. Rajalakshmi

Keyword(s):

Data Processing ◽

Data Quality ◽

Large Scale ◽

Programming Model ◽

Large Data ◽

Large Datasets ◽

Functional Dependencies ◽

Large Scale Data ◽

Large Scale Data Processing ◽

Scale Data

Over the last few years, data are generated in large volume at a faster rate and there has been a remarkable growth in the need for large scale data processing systems. As data grows larger in size, data quality is compromised. Functional dependencies representing semantic constraints in data are important for data quality assessment. Executing functional dependency discovery algorithms on a single computer is hard and laborious with large data sets. MapReduce provides an enabling technology for large scale data processing. The open-source Hadoop implementation of MapReduce has provided researchers a powerful tool for tackling large-data problems in a distributed manner. The objective of this study is to extract functional dependencies between attributes from large datasets using MapReduce programming model. Attribute entropy is used to measure the inter attribute correlations, and exploited to discover functional dependencies hidden in the data.

Download Full-text

Recent Developments on Security and Reliability in Large-Scale Data Processing with MapReduce

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2016010104 ◽

2016 ◽

Vol 12 (1) ◽

pp. 49-68 ◽

Cited By ~ 7

Author(s):

Christian Esposito ◽

Massimo Ficco

Keyword(s):

Data Processing ◽

Large Scale ◽

Programming Model ◽

Big Data Analytics ◽

Large Scale Data ◽

Recent Developments ◽

Security And Reliability ◽

Large Scale Data Processing ◽

The Right ◽

Scale Data

The demand to access to a large volume of data, distributed across hundreds or thousands of machines, has opened new opportunities in commerce, science, and computing applications. MapReduce is a paradigm that offers a programming model and an associated implementation for processing massive datasets in a parallel fashion, by using non-dedicated distributed computing hardware. It has been successfully adopted in several academic and industrial projects for Big Data Analytics. However, since such analytics is increasingly demanded within the context of mission-critical applications, security and reliability in MapReduce frameworks are strongly required in order to manage sensible information, and to obtain the right answer at the right time. In this paper, the authors present the main implementation of the MapReduce programming paradigm, provided by Apache with the name of Hadoop. They illustrate the security and reliability concerns in the context of a large-scale data processing infrastructure. They review the available solutions, and their limitations to support security and reliability within the context MapReduce frameworks. The authors conclude by describing the undergoing evolution of such solutions, and the possible issues for improvements, which could be challenging research opportunities for academic researchers.

Download Full-text

Simplified Mapreduce Mechanism for Large Scale Data Processing

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.8.15211 ◽

2018 ◽

Vol 7 (3.8) ◽

pp. 16

Author(s):

Md Tahsir Ahmed Munna ◽

Shaikh Muhammad Allayear ◽

Mirza Mohtashim Alam ◽

Sheikh Shah Mohammad Motiur Rahman ◽

Md Samadur Rahman ◽

...

Keyword(s):

Data Processing ◽

Large Scale ◽

Processing Time ◽

Programming Model ◽

Data Sets ◽

Hadoop Mapreduce ◽

Large Scale Data ◽

Large Scale Data Processing ◽

Scale Data ◽

Large Scale Data Sets

MapReduce has become a popular programming model for processing and running large-scale data sets with a parallel, distributed paradigm on a cluster. Hadoop MapReduce is needed especially for large scale data like big data processing. In this paper, we work to modify the Hadoop MapReduce Algorithm and implement it to reduce processing time.

Download Full-text

Study of Map-Reduce over Hadoop Based Cloud Computing Environment

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.509.175 ◽

2014 ◽

Vol 509 ◽

pp. 175-181

Author(s):

Wu Min Pan ◽

Li Bai Ha

Keyword(s):

Cloud Computing ◽

Data Processing ◽

Large Scale ◽

Programming Model ◽

Sql Server ◽

Map Reduce ◽

Data Set ◽

Large Scale Data ◽

Large Scale Data Processing ◽

Scale Data

Popularity for the term Cloud-Computing has been increasing in recent years. In addition to the SQL technique, Map-Reduce, a programming model that realizes implementing large-scale data processing, has been a hot topic that is widely discussed through many studies. Many real-world tasks such as data processing for search engines can be parallel-implemented through a simple interface with two functions called Map and Reduce. We focus on comparing the performance of the Hadoop implementation of Map-Reduce with SQL Server through simulations. Hadoop can complete the same query faster than SQL Server. On the other hand, some concerned factors are also tested to see whether they would affect the performance for Hadoop or not. In fact more machines included for data processing can make Hadoop achieve a better performance, especially for a large-scale data set.

Download Full-text

Teaching large scale data processing

Proceedings of the 1st ACM Summit on Computing Education in China on First ACM Summit on Computing Education in China - SCE '08 ◽

10.1145/1517632.1517635 ◽

2008 ◽

Author(s):

Kang Chen ◽

Yubing Yin ◽

Weimin Zheng

Keyword(s):

Data Processing ◽

Large Scale ◽

Large Scale Data ◽

Large Scale Data Processing ◽

Scale Data

Download Full-text

Advanced monitoring techniques for a large‐scale data‐processing network

Campus-Wide Information Systems ◽

10.1108/10650740810921448 ◽

2008 ◽

Vol 25 (5) ◽

pp. 287-300 ◽

Cited By ~ 1

Author(s):

B. Martin ◽

A. Al‐Shabibi ◽

S.M. Batraneanu ◽

Ciobotaru ◽

G.L. Darlea ◽

...

Keyword(s):

Data Processing ◽

Large Scale ◽

Monitoring Techniques ◽

Large Scale Data ◽

Large Scale Data Processing ◽

Processing Network ◽

Scale Data

Download Full-text

Large scale data processing in real world: From analytics to predictions

2014 14th International Conference on Advances in ICT for Emerging Regions (ICTer) ◽

10.1109/icter.2014.7083870 ◽

2014 ◽

Author(s):

Srinath Perera

Keyword(s):

Data Processing ◽

Real World ◽

Large Scale ◽

Large Scale Data ◽

Large Scale Data Processing ◽

Scale Data

Download Full-text

BestPeer++: A Peer-to-Peer Based Large-Scale Data Processing Platform

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2012.236 ◽

2014 ◽

Vol 26 (6) ◽

pp. 1316-1331 ◽

Cited By ~ 6

Author(s):

Gang Chen ◽

Tianlei Hu ◽

Dawei Jiang ◽

Peng Lu ◽

Kian-Lee Tan ◽

...

Keyword(s):

Data Processing ◽

Large Scale ◽

Peer To Peer ◽

Large Scale Data ◽

Large Scale Data Processing ◽

Processing Platform ◽

Scale Data

Download Full-text

Medimate : Ailment Diffusion Control System with Real Time Large Scale Data Processing

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.31.18233 ◽

2018 ◽

Vol 7 (2.31) ◽

pp. 240

Author(s):

S Sujeetha ◽

Veneesa Ja ◽

K Vinitha ◽

R Suvedha

Keyword(s):

Control System ◽

Data Processing ◽

Real Time ◽

Large Scale ◽

Diffusion Control ◽

Qr Code ◽

Healthcare Applications ◽

Large Scale Data ◽

Large Scale Data Processing ◽

Scale Data

In the existing scenario, a patient has to go to the hospital to take necessary tests, consult a doctor and buy prescribed medicines or use specified healthcare applications. Hence time is wasted at hospitals and in medical shops. In the case of healthcare applications, face to face interaction with the doctor is not available. The downside of the existing scenario can be improved by the Medimate: Ailment diffusion control system with real time large scale data processing. The purpose of medimate is to establish a Tele Conference Medical System that can be used in remote areas. The medimate is configured for better diagnosis and medical treatment for the rural people. The system is installed with Heart Beat Sensor, Temperature Sensor, Ultrasonic Sensor and Load Cell to monitor the patient’s health parameters. The voice instructions are updated for easier access. The application for enabling video and voice communication with the doctor through Camera and Headphone is installed at both the ends. The doctor examines the patient and prescribes themedicines. The medical dispenser delivers medicine to the patient as per the prescription. The QR code will be generated for each prescription by medimate and that QR code can be used forthe repeated medical conditions in the future. Medical details are updated in the server periodically.

Download Full-text

Towards Heterogeneous Network Alignment: Design and Implementation of a Large-Scale Data Processing Framework

Lecture Notes in Computer Science - Euro-Par 2018: Parallel Processing Workshops ◽

10.1007/978-3-030-10549-5_54 ◽

2018 ◽

pp. 692-703 ◽

Cited By ~ 1

Author(s):

Marianna Milano ◽

Pierangelo Veltri ◽

Mario Cannataro ◽

Pietro H. Guzzi

Keyword(s):

Data Processing ◽

Heterogeneous Network ◽

Large Scale ◽

Network Alignment ◽

Design And Implementation ◽

Large Scale Data ◽

Large Scale Data Processing ◽

Scale Data ◽

Processing Framework

Download Full-text

Integration of large-scale data processing systems and traditional parallel database technology

Proceedings of the VLDB Endowment ◽

10.14778/3352063.3352145 ◽

2019 ◽

Vol 12 (12) ◽

pp. 2290-2299

Author(s):

Azza Abouzied ◽

Daniel J. Abadi ◽

Kamil Bajda-Pawlikowski ◽

Avi Silberschatz

Keyword(s):

Data Processing ◽

Large Scale ◽

Parallel Database ◽

Large Scale Data ◽

Database Technology ◽

Large Scale Data Processing ◽

Scale Data

Download Full-text