Idempotent Task Cache System for Handling Intermediate Data Skew in MapReduce on Cloud Computing

Author(s):  
Tzu-Chi Huang ◽  
Kuo-Chih Chu ◽  
Jia-Hui Lin ◽  
Ce-Kuen Shieh
Author(s):  
Yassine Sabri ◽  
Aouad Siham

Multi-area and multi-faceted remote sensing (SAR) datasets are widely used due to the increasing demand for accurate and up-to-date information on resources and the environment for regional and global monitoring. In general, the processing of RS data involves a complex multi-step processing sequence that includes several independent processing steps depending on the type of RS application. The processing of RS data for regional disaster and environmental monitoring is recognized as computationally and data demanding.Recently, by combining cloud computing and HPC technology, we propose a method to efficiently solve these problems by searching for a large-scale RS data processing system suitable for various applications. Real-time on-demand service. The ubiquitous, elastic, and high-level transparency of the cloud computing model makes it possible to run massive RS data management and data processing monitoring dynamic environments in any cloud. via the web interface. Hilbert-based data indexing methods are used to optimally query and access RS images, RS data products, and intermediate data. The core of the cloud service provides a parallel file system of large RS data and an interface for accessing RS data from time to time to improve localization of the data. It collects data and optimizes I/O performance. Our experimental analysis demonstrated the effectiveness of our method platform.


2018 ◽  
Vol 86 ◽  
pp. 1054-1063 ◽  
Author(s):  
Guipeng Liu ◽  
Xiaomin Zhu ◽  
Ji Wang ◽  
Deke Guo ◽  
Weidong Bao ◽  
...  

Author(s):  
Chetana Tukkoji ◽  
Seetharam K

There is a growing need for an ad-hoc analysis of extremely large data sets, especially at web based companies where innovation critically depends on being able to analyze terabytes of data collected every day. Parallel database products, over a solution, but are usually prohibitively ex-pensive at this scale. But, most of the people who analyze data are called procedural programmers. The success of the more procedural map-reduce programming model and its associated scalable implementations on commodity hardware (low cost), is evidence of the above. However, the map-reduce paradigm is too low-level and rigid, and leads to a great deal of custom user code that is hard to maintain, and reuse. The map reduce is an effective tool for parallel data processing. One significant issue in practical map reduce application is the data skew. The imbalance of the amount of the data assigned to each tasks to take much longer to finish than the others. Now we need to propose a framework, to solve the data skew problem to reduce side application in the map reduce. It usage a innovative sampling of the data input accurate approximation to the distribution of the intermediate data by sampling only small fraction of the intermediate data. It does not contain the any type of the data to prevent the overlap between the maps and reduce stages.


2016 ◽  
Vol 2016 ◽  
pp. 1-5 ◽  
Author(s):  
Amany AlShawi

Presently, the popularity of cloud computing is gradually increasing day by day. The purpose of this research was to enhance the security of the cloud using techniques such as data mining with specific reference to the single cache system. From the findings of the research, it was observed that the security in the cloud could be enhanced with the single cache system. For future purposes, an Apriori algorithm can be applied to the single cache system. This can be applied by all cloud providers, vendors, data distributors, and others. Further, data objects entered into the single cache system can be extended into 12 components. Database and SPSS modelers can be used to implement the same.


2014 ◽  
Vol 926-930 ◽  
pp. 2208-2212
Author(s):  
Yao Qin Liu

With the rapid development of Internet technology, a variety of Web data also grows at an alarming rate, which has brought great challenges for the traditional data access and storage . Distributed caching technology based on cloud computing nodes provides a storage service with high performance through a large cloud caching service.Distributed cache system in each cache server coordinates and works together effectively to realize the sharing of resources, which is an important means of cloud-based storage platform to improve application performance of the cloud.


Sign in / Sign up

Export Citation Format

Share Document