Idempotent Task Cache System for Handling Intermediate Data Skew in MapReduce on Cloud Computing

Multi-area and multi-faceted remote sensing (SAR) datasets are widely used due to the increasing demand for accurate and up-to-date information on resources and the environment for regional and global monitoring. In general, the processing of RS data involves a complex multi-step processing sequence that includes several independent processing steps depending on the type of RS application. The processing of RS data for regional disaster and environmental monitoring is recognized as computationally and data demanding.Recently, by combining cloud computing and HPC technology, we propose a method to efficiently solve these problems by searching for a large-scale RS data processing system suitable for various applications. Real-time on-demand service. The ubiquitous, elastic, and high-level transparency of the cloud computing model makes it possible to run massive RS data management and data processing monitoring dynamic environments in any cloud. via the web interface. Hilbert-based data indexing methods are used to optimally query and access RS images, RS data products, and intermediate data. The core of the cloud service provides a parallel file system of large RS data and an interface for accessing RS data from time to time to improve localization of the data. It collects data and optimizes I/O performance. Our experimental analysis demonstrated the effectiveness of our method platform.

Download Full-text

SP-Partitioner: A novel partition method to handle intermediate data skew in spark streaming

Future Generation Computer Systems ◽

10.1016/j.future.2017.07.014 ◽

2018 ◽

Vol 86 ◽

pp. 1054-1063 ◽

Cited By ~ 11

Author(s):

Guipeng Liu ◽

Xiaomin Zhu ◽

Ji Wang ◽

Deke Guo ◽

Weidong Bao ◽

...

Keyword(s):

Data Skew ◽

Intermediate Data ◽

Partition Method

Download Full-text

Smart Intermediate Data Transfer for MapReduce on Cloud Computing

2013 International Conference on Cloud Computing and Big Data ◽

10.1109/cloudcom-asia.2013.97 ◽

2013 ◽

Author(s):

Tzu-Chi Huang ◽

Kuo-Chih Chu ◽

Yu-Ruei Rao

Keyword(s):

Cloud Computing ◽

Data Transfer ◽

Intermediate Data

Download Full-text

Workload Alleviation Scheduling Framework to Alleviate Negative Performance Impact of Intermediate Data Skew in Small-Scale MapReduce Cloud

2018 International Conference on System Science and Engineering (ICSSE) ◽

10.1109/icsse.2018.8520003 ◽

2018 ◽

Author(s):

Tzu-Chi Huang ◽

Kuo-Chih Chu ◽

Jia-Huei Lin ◽

Guo-Hao Huang ◽

Ce-Kuen Shieh

Keyword(s):

Small Scale ◽

Performance Impact ◽

Data Skew ◽

Intermediate Data

Download Full-text

Handling Imbalance Data in Reduce task of MapReduce in Cloud Environment

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i11.498 ◽

2017 ◽

Vol 7 (11) ◽

pp. 168

Author(s):

Chetana Tukkoji ◽

Seetharam K

Keyword(s):

Ad Hoc ◽

Programming Model ◽

Low Cost ◽

Large Data ◽

Map Reduce ◽

Data Sets ◽

Parallel Database ◽

Data Skew ◽

Intermediate Data ◽

The People

There is a growing need for an ad-hoc analysis of extremely large data sets, especially at web based companies where innovation critically depends on being able to analyze terabytes of data collected every day. Parallel database products, over a solution, but are usually prohibitively ex-pensive at this scale. But, most of the people who analyze data are called procedural programmers. The success of the more procedural map-reduce programming model and its associated scalable implementations on commodity hardware (low cost), is evidence of the above. However, the map-reduce paradigm is too low-level and rigid, and leads to a great deal of custom user code that is hard to maintain, and reuse. The map reduce is an effective tool for parallel data processing. One significant issue in practical map reduce application is the data skew. The imbalance of the amount of the data assigned to each tasks to take much longer to finish than the others. Now we need to propose a framework, to solve the data skew problem to reduce side application in the map reduce. It usage a innovative sampling of the data input accurate approximation to the distribution of the intermediate data by sampling only small fraction of the intermediate data. It does not contain the any type of the data to prevent the overlap between the maps and reduce stages.

Download Full-text

Applying Data Mining Techniques to Improve Information Security in the Cloud: A Single Cache System Approach

Scientific Programming ◽

10.1155/2016/2385654 ◽

2016 ◽

Vol 2016 ◽

pp. 1-5 ◽

Cited By ~ 2

Author(s):

Amany AlShawi

Keyword(s):

Data Mining ◽

Cloud Computing ◽

Information Security ◽

System Approach ◽

Specific Reference ◽

Apriori Algorithm ◽

Data Mining Techniques ◽

Data Objects ◽

Cache System ◽

Day By Day

Presently, the popularity of cloud computing is gradually increasing day by day. The purpose of this research was to enhance the security of the cloud using techniques such as data mining with specific reference to the single cache system. From the findings of the research, it was observed that the security in the cloud could be enhanced with the single cache system. For future purposes, an Apriori algorithm can be applied to the single cache system. This can be applied by all cloud providers, vendors, data distributors, and others. Further, data objects entered into the single cache system can be extended into 12 components. Database and SPSS modelers can be used to implement the same.

Download Full-text

Distributed Caching Technology Research Based on Cloud Computing Model

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.926-930.2208 ◽

2014 ◽

Vol 926-930 ◽

pp. 2208-2212

Author(s):

Yao Qin Liu

Keyword(s):

Cloud Computing ◽

High Performance ◽

Rapid Development ◽

Data Access ◽

Internet Technology ◽

Application Performance ◽

Distributed Caching ◽

Important Means ◽

And Storage ◽

Cache System

With the rapid development of Internet technology, a variety of Web data also grows at an alarming rate, which has brought great challenges for the traditional data access and storage . Distributed caching technology based on cloud computing nodes provides a storage service with high performance through a large cloud caching service.Distributed cache system in each cache server coordinates and works together effectively to realize the sharing of resources, which is an important means of cloud-based storage platform to improve application performance of the cloud.

Download Full-text

Privacy and Memory Concerned Intermediate Data Handling in Cloud Computing Environment

Journal of Advanced Research in Dynamical and Control Systems ◽

10.5373/jardcs/v12sp1/20201080 ◽

2020 ◽

Vol 12 (01-Special Issue) ◽

pp. 337-347

Author(s):

Sini S Nair

Keyword(s):

Cloud Computing ◽

Data Handling ◽

Computing Environment ◽

Cloud Computing Environment ◽

Intermediate Data

Download Full-text