GridSolve: A Network Enabled Solver
description
Transcript of GridSolve: A Network Enabled Solver
Presented by
GridSolve: A Network Enabled Solver
Asim YarKhan and Jack Dongarra
University of Tennessee
2 2 YarKhan_GridSolve_0611
GridSolve
Grid based software-hardware-data server
Based on a Remote ProcedureCall model but with …
resource discovery, dynamicproblem solving capabilities,load balancing, fault tolerance,asynchronous calls, security, …
Easy-of-use paramount
It’s about providing transparent access to resources.
Make it easy to wrap legacy codes into services
Evolution of successful NetSolve project
3 3 YarKhan_GridSolve_0611
GridSolve Architecture
Agent
server list
Cluster
data
Cluster
request Single
processor
Batch queueresul
tClient
[x,y,z,info] = gridsolve(‘dgesv’, A, B)
GridSolve clients: Matlab, C, Fortran[NetSolve clients: Java, Mathematica, Excel, IDL, Octave]
`
Resource discoverySchedulingLoad balancingFault tolerance
4 4 YarKhan_GridSolve_0611
GridSolve Client Dynamic service bindings
Client does not need to have stubsfor the services that it wishes to use
Opaque networking interactions.
API provides a variety of methods Blocking, non-blocking, task farms, …
Intuitive and easy to use. Matlab: Solve using dgesv
[x,y,z,info]=gs_call('dgesv',m,1,a,m,b,m) C: Call dgesv using GridRPC
grpc_initialize() grpc_function_handle_default(&handle, "dgesv") status = grpc_call(&handle, n, nrhs, a, lda, ipiv, b, ldb, &info);
Client
5 5 YarKhan_GridSolve_0611
GridRPC – Grid Remote Procedure Call
GGF proposed standard Global Grid Forum Research Group on Programming Models Implementations: Ninf-G (AIST), GridSolve/NetSolve (UTK),
DIET (INRIA, ENS)
GridRPC API grpc_initialize, grpc_finalize Function handle create, initialize, destroy, get Grpc_call blocking, grpc_call_async non-blocking Grpc_probe, cancel, wait, wait_and/or/any
GridSolve uses GridRPC as primary API Older NetSolve API available as wrapper Added calls based on GridRPC API to support fault tolerance,
dynamic scheduling, …
6 6 YarKhan_GridSolve_0611
GridSolve Agent
Agent acts as name serverand information service Client users and administrators
can query the hardware and software services available.
Interactions mediated by agent Scheduling, tracking, server fault tolerance, etc
Resource scheduler Maintains both static and dynamic information regarding
server components Can use execution history to build performance models
for services Can simulate multi-service executions to predict best server
7 7 YarKhan_GridSolve_0611
Adding Services to GridSolve Server
GSIDLParser/
Compiler
ServerService
Service
Service
Service
NewService
New Service Added!
Fortran ROUTINE dgesv(IN int N, IN int NRHS, INOUT double A[LDA][N], IN int LDA, OUT int IPIV[N], INOUT double B[LDB][NRHS], IN int LDB, OUT int INFO)"Solves a general system of linear equations AX = B"LIBS = "/usr/local/lib/liblapack.a /usr/local/lib/libf77blas.a /usr/local/lib/libatlas.a"LANGUAGE = "FORTRAN"LIBS = "$(LAPACK_LIBS) $(BLAS_LIBS)"COMPLEXITY = "2.0*pow(N,3.0)*(double)NRHS"MAJOR="COLUMN"
8 8 YarKhan_GridSolve_0611
GridSolve Backends
Scripts encapsulate service management for PBS, MS Compute Cluster (job submit, probe, cancel)
Agent
Server Server
Server Server
GridSolveClient
GridSolveClient
Server Server
MS Compute Cluster, PBS[Condor, ScaLAPACK, LFC, etc.]GridSolve System
User maybe unaware of parallel processing
9 9 YarKhan_GridSolve_0611
Distributed Storage Infrastructure
Client optionally pushes argument data to DSI
DSI data caches
Server
Server cluster
Server
client
DSI API currently instantiatedusing IBP (Internet BackplaneProtocol)
10 10 YarKhan_GridSolve_0611
GridSolve: Benefits
Domain Scientists use SCEs GridSolve provides the ability for SCE environments
to easily access and use grid resources [Ease of use!]
Libraries GridSolve can provide easy access to high performance
libraries, so that end users do not have to install them
Scheduling GridSolve can choose the software/hardware resource
appropriate for the problem
Resource Aggregation GridSolve agent provides a single access point for multiple
resources/clusters
11 11 YarKhan_GridSolve_0611
GridSolve In-Progress
Scheduling work Work with Emmanuel Jeannot, INRIA History based performance estimation
Adds more accurate server/service performance model based on prior history
Communication cost estimates Client estimates communication costs for a subset of servers
via a simple probe Perturbation model for scheduling
The agent uses a model of the currently executing jobs on the servers to schedule jobs (includes estimated completion times)
Client interfaces IDL – Interactive Data Language
12 12 YarKhan_GridSolve_0611
GridSolve Status
Version 0.15 (Sept 2006)
http://icl.cs.utk.edu/netsolve
Supported Platform
Linux, Solaris, BSDs, MacOS X,
Should work in most POSIX environments
Windows native client (MSVC)
Windows Compute Cluster backend
PBS backend
13 13 YarKhan_GridSolve_0611
Contacts
Asim YarKhan and Jack Dongarra
University of Tennessee