Grid Computing 4513

18
Computer Science Department 1 Load Balancing and Grid Computing David Finkel Computer Science Department Worcester Polytechnic Institute

description

Grid Computing

Transcript of Grid Computing 4513

Load Balancing, Beowulf, and Grid ComputingDavid Finkel
References
“The Anatomy of the Grid”, Ian Foster, Carl Kesselman, Steven Tuccke, International Journal of Supercomputer Applications, 2001
“A Performance Oriented Migration Framework for the Grid”, Satish S. Vadhiyar and Jack J. Dongarra, Proceedings of CCGrid 2003, Third IEEE/ACM International Symposium on Cluster Computing and the Grid
Innumerable papers by PEDS members Finkel, Wills and Finkel, and Claypool and Finkel, with additional co-authors.
Computer Science Department
Runs over the Internet, potentially world-wide
Several approaches have emerged: Paper discusses Globus Toolkit
Computer Science Department
Coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations.
Highly controlled, with resource providers and consumers defining what is shared and the conditions of sharing.
Issues to address: Protocols, privacy, security, costs, …
Computer Science Department
Computer Science Department
Resources: Computational, storage, network
Enquiry functions: to determine characteristics and state of a resource
Management functions: Start, stop computations, reserve bandwidth
Computer Science Department
Directory services for discovery of resources
Co-allocation, scheduling, brokering
Computer Science Department
Load Sharing - Overview
Transferring work from a heavily loaded node to a lightly loaded node
Purpose: To improve application performance
Transferring processes not suitable for fine-grain parallelism
Also known as: Load Balancing, Process Migration.
Computer Science Department
Measuring load (policy, implementation)
Which jobs to transfer
Computer Science Department
Load Sharing in the Grid
“A Performance Oriented Migration Framework for the Grid”, Vadhiyar and Donngarra
Part of the GrADS project – Grid Application Development System – based at Univ. of Tennessee and other institutions
Designed for long-running computations
Load Sharing in the Grid - 2
Basic idea – the load sharing system can run a performance model of a computation to estimate running time and resource requirements.
Application programmer is responsible for providing performance model for the application, and hooks to stop application, checkpoint state, and re-start application.
Based on MPI Programming Library, Globus Toolkit
Computer Science Department
Before application begins, Application Manager runs performance model to predict execution times, number of processors.
Determines whether an appropriate set of processors is available, schedules jobs
Monitors process of application as it runs
Computer Science Department
Load sharing can occur if
Application progress is delayed
Additional resources become available
Checkpoint
Computer Science Department
Load sharing on the Grid:
There’s a large body of pre-Grid research of load balancing in distributed systems
Can the results of this research be used to design load balancing systems for the Grid
Computer Science Department
David Finkel