Managing the Execution of Large Scale MPI Applications on Computational Grids

Author(s):  
A. de P. Nascimento ◽  
A. da C. Sena ◽  
J.A. Da Silva ◽  
D.Q.C. Vianna ◽  
C. Boeres ◽  
...  
Author(s):  
Zahid Raza ◽  
Deo P. Vidyarthi

Computational Grid attributed with distributed load sharing has evolved as a platform to large scale problem solving. Grid is a collection of heterogeneous resources, offering services of varying natures, in which jobs are submitted to any of the participating nodes. Scheduling these jobs in such a complex and dynamic environment has many challenges. Reliability analysis of the grid gains paramount importance because grid involves a large number of resources which may fail anytime, making it unreliable. These failures result in wastage of both computational power and money on the scarce grid resources. It is normally desired that the job should be scheduled in an environment that ensures maximum reliability to the job execution. This work presents a reliability based scheduling model for the jobs on the computational grid. The model considers the failure rate of both the software and hardware grid constituents like application demanding execution, nodes executing the job, and the network links supporting data exchange between the nodes. Job allocation using the proposed scheme becomes trusted as it schedules the job based on a priori reliability computation.


2007 ◽  
Vol 18 (01) ◽  
pp. 45-61 ◽  
Author(s):  
LIMOR FIX ◽  
ORNA GRUMBERG ◽  
AMNON HEYMAN ◽  
TAMIR HEYMAN ◽  
ASSAF SCHUSTER

Recent advances in scheduling and networking have paved the way for efficient exploitation of large-scale distributed computing platforms such as computational grids and huge clusters. Such infrastructures hold great promise for the highly resource-demanding task of verifying and checking large models, given that model checkers would be designed with a high degree of scalability and flexibility in mind. In this paper we focus on the mechanisms required to execute a high-performance, distributed, symbolic model checker on top of a large-scale distributed environment. We develop a hybrid algorithm for slicing the state space and dynamically distribute the work among the worker processes. We show that the new approach is faster, more effective, and thus much more scalable than previous slicing algorithms. We then present a checkpoint-restart module that has very low overhead. This module can be used to combat failures, the likelihood of which increases with the size of the computing plat-form. However, checkpoint-restart is even more handy for the scheduling system: it can be used to avoid reserving large numbers of workers, thus making the distributed computation work-efficient. Finally, we discuss for the first time the effect of reorder on the distributed model checker and show how the distributed system performs more efficient reordering than the sequential one. We implemented our contributions on a network of 200 processors, using a distributed scalable scheme that employs a high-performance industrial model checker from Intel. Our results show that the system was able to verify real-life models much larger than was previously possible.


Author(s):  
MALARVIZHI NANDAGOPAL ◽  
S. GAJALAKSHMI ◽  
V. RHYMEND UTHARIARAJ

Computational grids have the potential for solving large-scale scientific applications using heterogeneous and geographically distributed resources. In addition to the challenges of managing and scheduling these applications, reliability challenges arise because of the unreliable nature of grid infrastructure. Two major problems that are critical to the effective utilization of computational resources are efficient scheduling of jobs and providing fault tolerance in a reliable manner. This paper addresses these problems by combining the checkpoint replication based fault tolerance mechanism with minimum total time to release (MTTR) job scheduling algorithm. TTR includes the service time of the job, waiting time in the queue, transfer of input and output data to and from the resource. The MTTR algorithm minimizes the response time by selecting a computational resource based on job requirements, job characteristics, and hardware features of the resources. The fault tolerance mechanism used here sets the job checkpoints based on the resource failure rate. If resource failure occurs, the job is restarted from its last successful state using a checkpoint file from another grid resource. Globus ToolKit is used as the grid middleware to set up a grid environment and evaluate the performance of the proposed approach. The monitoring tools Ganglia and Network Weather Service are used to gather hardware and network details, respectively. The experimental results demonstrate that, the proposed approach effectively schedule the grid jobs with fault-tolerant way thereby reduces TTR of the jobs submitted in the grid. Also, it increases the percentage of jobs completed within specified deadline and making the grid trustworthy.


2013 ◽  
Vol 28 (2) ◽  
pp. 211-231 ◽  
Author(s):  
Joshua Peraza ◽  
Ananta Tiwari ◽  
Michael Laurenzano ◽  
Laura Carrington ◽  
Allan Snavely
Keyword(s):  

2013 ◽  
Vol 5 (2) ◽  
pp. 72-91 ◽  
Author(s):  
Ashiqur Md. Rahman ◽  
Rashedur M Rahman

Computational Grids are a promising platform for executing large-scale resource intensive applications. This paper identifies challenges in managing resources in a Grid computing environment and proposes computational economy as a metaphor for effective management of resources and application scheduling. It identifies distributed resource management challenges and requirements of economy-based Grid systems, and proposes an economy based negotiation system protocol for cooperative and competitive trading of resources. Dynamic pricing for services and good level of Pareto optimality make auctions more attractive for resource allocation over other economic models. In a complex Grid environment, the communication demand can become a bottleneck; that is, a number of messages need to be exchanged for matching suitable service providers and consumers. The Fuzzy Trust integrated hybrid Capital Asset Pricing Model (CAPM) shows the higher user centric satisfaction and provides the equilibrium relationship between the expected return and risk on investments. This paper also presents an analysis on the communication requirements and the necessity of the CAPMAuction in Grid environment.


2007 ◽  
Vol 591 ◽  
pp. 183-213 ◽  
Author(s):  
M. LANDRINI ◽  
A. COLAGROSSI ◽  
M. GRECO ◽  
M. P. TULIN

The generation and evolution of two-dimensional bores in water of uniform depth and on sloping beaches are simulated through numerical solution of the Euler equations using the smoothed particle hydrodynamics (SPH) method, wherein particles are followed in Lagrangian fashion, avoiding the need for computational grids. In water of uniform depth, a piston wavemaker produces cyclically breaking bores in the Froude number range 1.37–1.82, which were shown to move at time-averaged speeds in very good agreement with the requirements of global mass and momentum conservation. A single Strouhal number for the breaking period was discovered. Complex repetitive splashing patterns are observed and described, involving forward jet formation growth, impact and ricochet, and similarly, backward jet formation and impact. Observed consequences were the creation of vortical regions of both signs, dipole creation through pairing, large-scale transport of surface water downward and high tangential scouring velocities on the bed, which are quantified. These bores are further allowed to rise on linear slopes to the shoreline, where they are seen to collapse into a tongue-like flow resembling dam-break evolution.This essentially inviscid calculation is able to reproduce the development of a highly vortical flow in excellent agreement with experimental observations and theoretical concepts. The turbulent flow behaviour is partially described by the numerical solution.


Sign in / Sign up

Export Citation Format

Share Document