A survey of dynamic replication strategies for improving data availability in data grids

Tehmina Amjad; Muhammad Sher; Ali Daud

doi:10.1016/j.future.2011.06.009

Efficient Dynamic Replication Algorithm Using Agent for Data Grid

The Scientific World JOURNAL ◽

10.1155/2014/767016 ◽

2014 ◽

Vol 2014 ◽

pp. 1-10 ◽

Cited By ~ 7

Author(s):

Priyanka Vashisht ◽

Rajesh Kumar ◽

Anju Sharma

Keyword(s):

Data Replication ◽

Data Access ◽

Data Grid ◽

Data Availability ◽

Access Time ◽

Test Bed ◽

Data Grids ◽

Dynamic Replication ◽

Data Files ◽

Using Data

In data grids scientific and business applications produce huge volume of data which needs to be transferred among the distributed and heterogeneous nodes of data grids. Data replication provides a solution for managing data files efficiently in large grids. The data replication helps in enhancing the data availability which reduces the overall access time of the file. In this paper an algorithm, namely, EDRA using agents for data grid, has been proposed and implemented. EDRA consists of dynamic replication of hierarchical structure taken into account for the selection of best replica. Decision for selecting the best replica is based on scheduling parameters. The scheduling parameters are bandwidth, load gauge, and computing capacity of the node. The scheduling in data grid helps in reducing the data access time. The distribution of the load on the nodes of data grid is done evenly by considering scheduling parameters. EDRA is implemented using data grid simulator, namely, OptorSim. European Data Grid CMS test bed topology is used in this experiment. The simulation results are obtained by comparing BHR, LRU, No Replication, and EDRA. The result shows the efficiency of EDRA algorithm in terms of mean job execution time, network usage, and storage usage of node.

Download Full-text

A Collaborative Replication Approach for Mobile-P2P Networks

Emergent Trends in Personal, Mobile, and Handheld Computing Technologies ◽

10.4018/978-1-4666-0921-1.ch010 ◽

2012 ◽

pp. 160-182

Author(s):

Anirban Mondal ◽

Sanjay Kumar Madria ◽

Masaru Kitsuregawa

Keyword(s):

Ad Hoc ◽

Response Times ◽

Data Availability ◽

P2p Networks ◽

Allocation Mechanism ◽

Memory Space ◽

Mobile Hosts ◽

Dynamic Replication ◽

Replica Allocation ◽

Communication Traffic

This paper proposes CADRE (Collaborative Allocation and De-allocation of Replicas with Efficiency), which is a dynamic replication scheme for improving the typically low data availability in dedicated and cooperative mobile ad-hoc peer-to-peer (M-P2P) networks. In particular, replica allocation and de-allocation are collaboratively performed in tandem to facilitate effective replication. Such collaboration is facilitated by a hybrid super-peer architecture in which some of the mobile hosts act as the ‘gateway nodes’ (GNs) in a given region. GNs facilitate both search and replication. The main contributions of CADRE are as follows. First, it facilitates the prevention of ‘thrashing’ conditions due to its collaborative replica allocation and de-allocation mechanism. Second, it considers the replication of images at different resolutions to optimize the usage of the generally limited memory space of the mobile hosts (MHs). Third, it addresses fair replica allocation across the MHs. Fourth, it facilitates the optimization of the limited energy resources of MHs during replication. The authors’ performance evaluation demonstrates that CADRE is indeed effective in improving data availability in M-P2P networks with significant reduction in query response times and low communication traffic during replication as compared to a recent existing scheme as well as a baseline approach, which does not consider any replication.

Download Full-text

Collaborative Services for Fault Tolerance in Hierarchical Data Grid

International Journal of Distributed Systems and Technologies ◽

10.4018/ijdst.2014010101 ◽

2014 ◽

Vol 5 (1) ◽

pp. 1-21 ◽

Cited By ~ 3

Author(s):

B. Meroufel ◽

G. Belalem

Keyword(s):

Fault Tolerance ◽

System Reliability ◽

High Performance ◽

Data Grid ◽

Data Availability ◽

Fault Prediction ◽

Grid Systems ◽

Adaptive Dynamic ◽

Dynamic Replication ◽

Multiple Copies

As fault tolerance is the ability of a system to perform its function correctly even in the presence of faults. Therefore, different fault tolerance techniques are critical for improving the efficient utilization of expensive resources in high performance data grid systems. One of the most popular strategies of fault tolerance is the replication, it creates multiple copies of resources in the system and it has been proved to be an effective way to achieve data availability and system reliability. In this paper the authors propose a new adaptive dynamic replication that combines between a replication based on availability and replication based on popularity. The authors' adaptive dynamic replication uses two types of replicas (primary and ordinary) and two types of placement nodes (best client and best responsible nodes) for the new replicas. In addition to the replication, we used other strategies such as fault detection, fault prediction, dynamicity management, self-stabilization. All these services are grouped in one fault tolerance box named Collaborative Services for Fault Tolerance (CSFT) that structure them in hierarchical services and organize the relationships between them.

Download Full-text

A Two-Level Fuzzy Value-Based Replica Replacement Algorithm in Data Grids

International Journal of Grid and High Performance Computing ◽

10.4018/ijghpc.2016100105 ◽

2016 ◽

Vol 8 (4) ◽

pp. 78-99 ◽

Cited By ~ 3

Author(s):

Nazanin Saadat ◽

Amir Masoud Rahmani

Keyword(s):

Data Grid ◽

Data Availability ◽

Distributed Data ◽

Similar Data ◽

Data Grids ◽

Replacement Algorithm ◽

Minimum Latency ◽

Network Usage ◽

Effective Network ◽

And Storage

One of the challenges of data grid is to access widely distributed data fast and efficiently and providing maximum data availability with minimum latency. Data replication is an efficient way used to address this challenge by replicating and storing replicas, making it possible to access similar data in different locations of the data grid and can shorten the time of getting the files. However, as the number and storage size of grid sites is limited and restricted, an optimized and effective replacement algorithm is needed to improve the efficiency of replication. In this paper, the authors propose a novel two-level replacement algorithm which uses Fuzzy Replica Preserving Value Evaluator System (FRPVES) for evaluating the value of each replica. The algorithm was tested using a grid simulator, OptorSim developed by European Data Grid projects. Results from simulation procedure show that the authors' proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, total number of replications and effective network usage.

Download Full-text

Combination of data replication and scheduling algorithm for improving data availability in Data Grids

Journal of Network and Computer Applications ◽

10.1016/j.jnca.2012.12.021 ◽

2013 ◽

Vol 36 (2) ◽

pp. 711-722 ◽

Cited By ~ 38

Author(s):

Najme Mansouri ◽

Gholam Hosein Dastghaibyfard ◽

Ehsan Mansouri

Keyword(s):

Scheduling Algorithm ◽

Data Replication ◽

Data Availability ◽

Data Grids

Download Full-text

Improving Job Scheduling Performance with Dynamic Replication Strategy in Data Grids

Lecture Notes in Computer Science - Parallel Computing Technologies ◽

10.1007/978-3-540-73940-1_20 ◽

2007 ◽

pp. 194-199 ◽

Cited By ~ 2

Author(s):

Nguyen Dang Nhan ◽

Soon Wook Hwang ◽

Sang Boem Lim

Keyword(s):

Job Scheduling ◽

Data Grids ◽

Dynamic Replication ◽

Replication Strategy

Download Full-text

The State of the Art and Open Problems in Data Replication in Grid Environments

Handbook of Research on Scalable Computing Technologies ◽

10.4018/978-1-60566-661-7.ch022 ◽

2010 ◽

pp. 486-516 ◽

Cited By ~ 2

Author(s):

Mohammad Shorfuzzaman ◽

Rasit Eskicioglu ◽

Peter Graham

Keyword(s):

Data Replication ◽

Data Access ◽

The State ◽

Data Availability ◽

Data Grids ◽

Open Problems ◽

Data Intensive ◽

Grid Environments ◽

Bandwidth Savings ◽

And Storage

Data Grids provide services and infrastructure for distributed data-intensive applications that need to access, transfer and modify massive datasets stored at distributed locations around the world. For example, the next-generation of scientific applications such as many in high-energy physics, molecular modeling, and earth sciences will involve large collections of data created from simulations or experiments. The size of these data collections is expected to be of multi-terabyte or even petabyte scale in many applications. Ensuring efficient, reliable, secure and fast access to such large data is hindered by the high latencies of the Internet. The need to manage and access multiple petabytes of data in Grid environments, as well as to ensure data availability and access optimization are challenges that must be addressed. To improve data access efficiency, data can be replicated at multiple locations so that a user can access the data from a site near where it will be processed. In addition to the reduction of data access time, replication in Data Grids also uses network and storage resources more efficiently. In this chapter, the state of current research on data replication and arising challenges for the new generation of data-intensive grid environments are reviewed and open problems are identified. First, fundamental data replication strategies are reviewed which offer high data availability, low bandwidth consumption, increased fault tolerance, and improved scalability of the overall system. Then, specific algorithms for selecting appropriate replicas and maintaining replica consistency are discussed. The impact of data replication on job scheduling performance in Data Grids is also analyzed. A set of appropriate metrics including access latency, bandwidth savings, server load, and storage overhead for use in making critical comparisons of various data replication techniques is also discussed. Overall, this chapter provides a comprehensive study of replication techniques in Data Grids that not only serves as a tool to understanding this evolving research area but also provides a reference to which future e orts may be mapped.

Download Full-text

A survey of dynamic replication and replica selection strategies based on data mining techniques in data grids

Engineering Applications of Artificial Intelligence ◽

10.1016/j.engappai.2015.11.002 ◽

2016 ◽

Vol 48 ◽

pp. 140-158 ◽

Cited By ~ 15

Author(s):

T. Hamrouni ◽

S. Slimani ◽

F. Ben Charrada

Keyword(s):

Data Mining ◽

Data Grids ◽

Data Mining Techniques ◽

Selection Strategies ◽

Dynamic Replication ◽

Replica Selection

Download Full-text

INTEGRATED METHOD FOR DYNAMIC REPLICATION OF SERVICES IN SOFTWARE-DEFINED NETWORKS

Telecommunications and Radio Engineering ◽

10.1615/telecomradeng.v76.i5.30 ◽

2017 ◽

Vol 76 (5) ◽

pp. 417-432

Author(s):

E. Tkachova ◽

A. T. Abu Jassar

Keyword(s):

Software Defined Networks ◽

Integrated Method ◽

Dynamic Replication

Download Full-text

A DYNAMIC REPLICATION MECHANISM IN DATA GRID BASED ON A WEIGHTED PRIORITY - BASED SCHEME

i-manager’s Journal on Cloud Computing ◽

10.26634/jcc.6.1.15897 ◽

2019 ◽

Vol 6 (1) ◽

pp. 9

Author(s):

SAMADI GHARAJEH MOHAMMAD ◽

Keyword(s):

Data Grid ◽

Replication Mechanism ◽

Dynamic Replication ◽

Grid Based

Download Full-text