ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.
-
Upload
nathan-webb -
Category
Documents
-
view
219 -
download
1
Transcript of ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.
![Page 1: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/1.jpg)
ECECS Lecture 18
Grid Computing
Citation: B.Ramamurthy/Suny-Buffalo
![Page 2: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/2.jpg)
Globus Material
The presentation is based on the two main publications on grid computing given below:
1. The Physiology of the Grid, An Open Services Architecture for Distributed Systems Integration, by Ian Foster, Carl Kesselman, Jeffrey Nick, and Steven Tuecke, 2002.
2. The Anatomy of the grid, Enabling Scalable Virtual Organization, Ian Foster, Carl Kesselman, Steven Tuecke, 2001.
3. URL:http://www.globus.org/research/papers.html
![Page 3: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/3.jpg)
Grid Technology
• Grid technologies and infrastructures support the sharing and coordinated use of diverse resources in dynamic, distributed “virtual organizations”.
• Grid technologies are distinct from technology trends such as Internet, enterprise, distributed and peer-to-peer computing. But these technologies can benefit from growing into the “problem space” addressed by grid technologies.
![Page 4: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/4.jpg)
Virtual Organization: Problem Space
• An industrial consortium formed to develop a feasibility study for a next generation supersonic aircraft undertakes a highly accurate multidisciplinary simulation of the entire aircraft.
• A crisis management teams responds to a chemical spill by using local weather and soil models to estimate the spread of the spill, planning and coordinating evacuation, notifying hospitals and so forth.
• Thousands of physicists come together to design, create, operate and analyze products by pooling together computing, storage, networking resources to create a Data Grid.
![Page 5: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/5.jpg)
Resource Sharing Requirements
• Members should be trustful and trustworthy.• Sharing is conditional.• Should be secure.• Sharing should be able to change dynamically over
time.• Need for discovery and registering of resources.• Can be peer to peer or client/server.• Same resource may be used in different ways.• All these point to well defined architecture and
protocols.
![Page 6: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/6.jpg)
Grid Definition• Architecture identifies the fundamental system
components, specifies purpose and function of these components, and indicates how these components interact with each other.
• Grid architecture is a protocol architecture, with protocols defining the basic mechanisms by which VO users and resources negotiate , establish, manage and exploit sharing relationships.
• Grid architecture is also a services standards-based open architecture that facilitates extensibility, interoperability, portability and code sharing.
• API and Toolkits are also being developed.
![Page 7: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/7.jpg)
Grid Services Architecture
Grid FabricLayer
Applications
Transport Multicast
Instrumentation Control interfaces QoS mechanisms
Grid ServicesLayer
Information Resource mgmt
Security Data access Fault detection
. . .
. . .
High-energyphysics data
analysis Regionalclimate studies
Collaborativeengineering
Parameterstudies
On-lineinstrumentation
ApplicationToolkit Layer
Highthroughput
Data-intensive
Collab.design
Remoteviz
Remote control
![Page 8: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/8.jpg)
Architecture
Application
Collective
Resource
Connectivity
Fabric
Application
Transport
Internet
Link
GRIDInternet
![Page 9: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/9.jpg)
Fabric Layer• Fabric layer: Provides the resources to which shared
access is mediated by Grid protocols.• Example: computational resources, storage
systems, catalogs, network resources, and sensors.• Fabric components implement local, resource
specific operations.• Richer fabric functionality enables more
sophisticated sharing operations.• Sample resources: computational resources,
storage resources, network resources, code repositories, catalogs.
![Page 10: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/10.jpg)
Connectivity Layer
• Communicating easily and securely.• Connectivity layer defines the core
communication and authentication protocols required for grid-specific network functions.
• This enables the exchange of data between fabric layer resources.
• Support for this layer is drawn from TCP/IP’s IP, TCL and DNS layers.
• Authentication solutions: single sign on, etc.
![Page 11: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/11.jpg)
Resources Layer• Resource layer defines protocols, APIs, and SDKs for
secure negotiations, initiation, monitoring control, accounting and payment of sharing operations on individual resources.
• Two protocols information protocol and management protocol define this layer.
• Information protocols are used to obtain the information about the structure and state of the resource, ex: configuration, current load and usage policy.
• Management protocols are used to negotiate access to the shared resource, specifying for example qos, advanced reservation, etc.
![Page 12: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/12.jpg)
Collective Layer
• Coordinating multiple resources.• Contains protocols and services that capture
interactions among a collection of resources.• It supports a variety of sharing behaviors without
placing new requirements on the resources being shared.
• Sample services: directory services, coallocation, brokering and scheduling services, data replication service, workload management services, collaboratory services.
![Page 13: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/13.jpg)
Applications Layer
• These are user applications that operate within VO environment.
• Applications are constructed by calling upon services defined at any layer.
• Each of the layers are well defined using protocols, provide access to useful services.
• Well defined APIs also exist to work with these services.
• A toolkit Globus implements all these layers and supports grid application development.
![Page 14: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/14.jpg)
Globus Toolkit Services
• Security (GSI)– PKI-based Security (Authentication) Service
• Job submission and management (GRAM)– Uniform Job Submission
• Information services (MDS)– LDAP-based Information Service
• Remote file management (GASS)– Remote Storage Access Service
• Remote Data Catalogue and Management Tools– Support by Globus 2.0 released in 2002
![Page 15: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/15.jpg)
High-level services
Part II
![Page 16: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/16.jpg)
Sample of High-Level Services
• Resource brokers and co-allocators– DUROC, Nimrod/G, Condor-G, GridbusBroker
Communication & I/O libraries– MPICH-G, PAWS, RIO (MPI-IO), PPFS, MOL
• Parallel languages– HPC++, CC++, Nimrod Parameter Specification
• Collaborative environments– CAVERNsoft, ManyWorlds
• Others– MetaNEOS, NetSolve, LSA, AutoPilot, WebFlow
![Page 17: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/17.jpg)
• A resource broker for managing, steering, and executing task farming (parameter sweep/SPMD model) applications on the Grid based on deadline and computational economy.
• Based on users’ QoS requirements, our Broker dynamically leases services at runtime depending on their quality, cost, and availability.
• Key Features– A single window to manage & control experiment– Persistent and Programmable Task Farming Engine– Resource Discovery– Resource Trading – Scheduling & Predications– Generic Dispatcher & Grid Agents– Transportation of data & results– Steering & data management– Accounting
• Uses Globus – MDS, GRAM, GSI, GASS
The Nimrod-G Grid Resource Broker
![Page 18: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/18.jpg)
Condor-G: Condor for the Grid• Condor is a high-throughput scheduler• Condor-G uses Globus Toolkit libraries for:
– Security (GSI)– Managing remote jobs on Grid (GRAM)– File staging & remote I/O (GSI-FTP)
• Grid job management interface & scheduling– Robust replacement for Globus Toolkit programs
• Globus Toolkit focus is on libraries and services, not end user vertical solutions
– Supports single or high-throughput apps on Grid• Personal job manager which can exploit Grid resources
![Page 19: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/19.jpg)
Production Grids & Testbeds
• Production deployments underway at:– NSF PACIs National Technology Grid– NASA Information Power Grid– DOE ASCI– European Grid
• Research testbeds– EMERGE: Advance reservation & QoS– GUSTO: Globus Ubiquitous Supercomputing Testbed
Organization– Particle Physics Data Grid– World-Wide Grid (WWG)
![Page 20: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/20.jpg)
Production Grids & Testbeds
NASA’s Information Power Grid The Alliance National Technology Grid
GUSTO Testbed
![Page 21: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/21.jpg)
World Wide Grid (WWG)
GMonitor
@ SC 2002/Baltimore
Grid MarketDirectory
Australia
Melbourne+Monash U:
VPAC, Physics
Solaris WS
Gridbus+Nimrod-G
Europe
ZIB: T3E/OnyxAEI: Onyx CNR: ClusterCUNI/CZ: OnyxPozman: SGI/SP2Vrije U: ClusterCardiff: Sun E6500Portsmouth: Linux PCManchester: O3KCambridge: SGIMany others
Asia
AIST, Japan: Solaris ClusterOsaka University: ClusterDoshia: Linux clusterKorea: Linux cluster
North America
ANL: SGI/Sun/SP2NCSA: ClusterWisc: PC/clusterNRC, CanadaMany others
InternetWW Grid
MEG Visualisation
![Page 22: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/22.jpg)
Example Applications Projects (via Nimrod-G or Gridbus)
• Molecular Docking for Drug Discovery– Docking molecules from chemical databases with
target protein
• Neuro Science– Brain Activity Analysis
• High Energy Physics– Belle Detector Data Analysis
• Natural Language Engineering– Analyzing audio data (e.g., to identify emotional state
of a person!)
![Page 23: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/23.jpg)
Example Application Projects
• Computed microtomography (ANL, ISI)– Real-time, collaborative analysis of data from X-Ray
source (and electron microscope)
• Hydrology (ISI, UMD, UT; also NCSA, Wisc.)– Interactive modeling and data analysis
• Collaborative engineering (“tele-immersion”)– CAVERNsoft @ EVL
• OVERFLOW (NASA)– Large CFD simulations for aerospace vehicles
![Page 24: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/24.jpg)
Example Application Experiments
• Distributed interactive simulation (CIT, ISI)– Record-setting SF-Express simulation
• Cactus– Astrophysics simulation, viz, and steering– Including trans-Atlantic experiments
• Particle Physics Data Grid– High Energy Physics distributed data analysis
• Earth Systems Grid– Climate modeling data management
![Page 25: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/25.jpg)
The Globus Advantage
• Flexible Resource Specification Language which provides the necessary power to express the required constraints
• Services for resource co-allocation, executable staging, remote data access and I/O streaming
• Integration of these services into high-level tools– MPICH-G: grid-enabled MPI– globus-job-*: flexible remote execution commands– Nimrod-G Grid Resource broker
– Gridbus: Grid Business Infrastructure– Condor-G: high-throughput broker– PBS, GRD: meta-schedulers
![Page 26: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/26.jpg)
Resource Management
• Resource Specification Language (RSL) is used to communicate requirements
• The Globus Resource Allocation Manager (GRAM) API allows programs to be started on remote resources, despite local heterogeneity
• A layered architecture allows application-specific resource brokers and co-allocators to be defined in terms of GRAM services
![Page 27: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/27.jpg)
GRAM GRAM GRAM
LSF EASY-LL NQE
Application
RSL
Simple ground RSL
Information Service
Localresourcemanagers
RSLspecialization
Broker
Ground RSL
Co-allocator
Queries& Info
Resource Management Architecture
![Page 28: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/28.jpg)
GRAM Components
Globus SecurityInfrastructure
Job Manager
GRAM client API calls to request resource allocation
and process creation.
MDS client API callsto locate resources
Query current statusof resource
Create
RSL Library
Parse
RequestAllocate &
create processes
Process
Process
Process
Monitor &control
Site boundary
Client MDS: Grid Index Info Server
Gatekeeper
MDS: Grid Resource Info Server
Local Resource Manager
MDS client API callsto get resource info
GRAM client API statechange callbacks
![Page 29: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/29.jpg)
A simple run
• [raj@belle raj]$ globus-job-run belle.anu.edu.au /bin/date
• Mon May 3 15:05:42 EST 2004
![Page 30: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/30.jpg)
Resource Specification Language (RSL)
• Common notation for exchange of information between components– Syntax similar to MDS/LDAP filters
• RSL provides two types of information:– Resource requirements: Machine type,
number of nodes, memory, etc.– Job configuration: Directory, executable, args,
environment
• API provided for manipulating RSL
![Page 31: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/31.jpg)
RSL Syntax
• Elementary form: parenthesis clauses– (attribute op value [ value … ] )
• Operators Supported:– <, <=, =, >=, > , !=
• Some supported attributes:– executable, arguments, environment, stdin, stdout,
stderr, resourceManagerContact,resourceManagerName
• Unknown attributes are passed through – May be handled by subsequent tools
![Page 32: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/32.jpg)
Constraints: “&”
• globusrun -o -r belle.anu.edu.au "&(executable=/bin/date)"
• For example:
& (count>=5) (count<=10)
(max_time=240) (memory>=64)
(executable=myprog)
“Create 5-10 instances of myprog, each on a machine with at least 64 MB memory that is available to me for 4 hours”
![Page 33: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/33.jpg)
Disjunction: “|”
• For example:
• & (executable=myprog)
• ( | (&(count=5)(memory>=64))
• (&(count=10)(memory>=32)))
• Create 5 instances of myprog on a machine that has at least 64MB of memory, or 10 instances on a machine with at least 32MB of memory
![Page 34: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/34.jpg)
Multirequest: “+”
• A multi-request allows us to specify multiple resource needs, for example
+ (& (count=5)(memory>=64)
(executable=p1))
(&(network=atm) (executable=p2))– Execute 5 instances of p1 on a machine with at least
64M of memory– Execute p2 on a machine with an ATM connection
• Multirequests are central to co-allocation
![Page 35: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/35.jpg)
Co-allocation
• Simultaneous allocation of a resource set– Handled via optimistic co-allocation based on free
nodes or queue prediction
– In the future, advance reservations will also be supported
• globusrun and globus-job-* will co-allocate specific multi-requests– Uses a Globus component called the Dynamically
Updated Request Online Co-allocator (DUROC)
![Page 36: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/36.jpg)
DUROC Functions
• Submit a multi-request• Edit a pending request
– Add new nodes, edit out failed nodes
• Commit to configuration– Delay to last possible minute– Barrier synchronization
• Initialize computation– Bootstrap library
• Monitor and control collection
![Page 37: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/37.jpg)
DUROC Architecture
ControllingApplication
ControlledJobs
RSL multi-request
Job 1
RM1
Job 4
Job 5
RM4
Job 2
RM2
Job 3
RM3
Edit request
Subjob status
Barrier
![Page 38: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/38.jpg)
RSL Creation Using globus-job-run
• globus-job-run can be used to generate RSL from command-line args:globus-job-run –dumprsl \
-: host1 -np N1 [-s] executable1 args1 \ -: host2 -np N2 [-s] executable2 args2 \ ... > rslfile
– -np: number of processors– -s: stage file– argument options for all RSL keywords– -help: description of all options
![Page 39: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/39.jpg)
Job Submission Interfaces
• Globus Toolkit includes several command line programs for job submission – globus-job-run: Interactive jobs– globus-job-submit: Batch/offline jobs– globusrun: Flexible scripting infrastructure
• Other High Level Interfaces– General purpose
• Nimrod-G, Condor-G, PBS, GRD, etc
– Application specific• ECCE’, Cactus, Web portals
![Page 40: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/40.jpg)
globus-job-run
• For running of interactive jobs• Additional functionality beyond rsh
– Ex: Run 2 process job w/ executable stagingglobus-job-run -: host –np 2 –s myprog arg1 arg2
– Ex: Run 5 processes across 2 hostsglobus-job-run \
-: host1 –np 2 –s myprog.linux arg1 \
-: host2 –np 3 –s myprog.aix arg2
– For list of arguments run:
globus-job-run -help
![Page 41: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/41.jpg)
globus-job-submit
• For running of batch/offline jobs– globus-job-submit Submit job
• Same interface as globus-job-run• Returns immediately
– globus-job-status Check job status– globus-job-cancel Cancel job– globus-job-get-output Get job stdout/err– globus-job-clean Cleanup after job
![Page 42: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/42.jpg)
globusrun
• Flexible job submission for scripting– Uses an RSL string to specify job request – Contains an embedded globus-gass-server
• Defines GASS URL prefix in RSL substitution variable:
(stdout=$(GLOBUSRUN_GASS_URL)/stdout)
– Supports both interactive and offline jobs
• Complex to use– Must write RSL by hand– Must understand its esoteric features– Generally you should use globus-job-* commands
instead
![Page 43: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/43.jpg)
Simultaneous start
co-allocator
InformationService
“Run SF-Expresson 300 nodes”
"Run SF-Expresson 256 nodes”
“Run adistributed interactive
simulation involving100,000 entities”
“80 nodes on Argonne SP,256 nodes on CIT Exemplar300 nodes on NCSA O2000”
“Supercomputers providing 100 GFLOPS, 100 GB, < 100 msec latency”DIS-Specific
Broker
" . . ."
“Performa parameter studyinvolving 10,000separate trials”
Parameter studyspecific broker
Supercomputerresource broker
NCSAResource Manager
ArgonneResource Manager
CITResource Manager
Resource Brokers
" . . ."
“Create ashared virtual space
with participantsX, Y, and Z”
Collaborativeenvironment-specific
resource broker
"Run SF-Expresson 80 nodes”
![Page 44: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/44.jpg)
Brokering via Lowering
• Resource location by refining a RSL expression (RSL lowering):
(MFLOPS=1000) (& (arch=sp2)(count=200)) (+ (& (arch=sp2) (count=120)
(resourceManagerContact=anlsp2))
(& (arch=sp2) (count=80)
(resourceManagerContact=uhsp2)))
![Page 45: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/45.jpg)
Remote I/O and Staging
• Tell GRAM to pull executable from remote location
• Access files from a remote location
• stdin/stdout/stderr from a remote location
![Page 46: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/46.jpg)
What is GASS?
(a) GASS file access API– Replace open/close with
globus_gass_open/close; read/write calls can then proceed directly
(b) RSL extensions – URLs used to name executables, stdout, stderr
(c) Remote cache management utility
(d) Low-level APIs for specialized behaviors
![Page 47: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/47.jpg)
GASS Architecture
CacheCache
GASS Server
HTTP Server
FTP Server
% globus-gass-cache
(c) Remote cache management
GRAM
(a) GASS file access API
&(executable=https://…)
(b) RSL extensions
(d) Low-level APIs for customizing cache & GASS server
main( ) { fd = globus_gass_open(…) … read(fd,…) … globus_gass_close(fd)}
![Page 48: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/48.jpg)
GASS File Naming
• URL encoding of resource nameshttps://quad.mcs.anl.gov:9991/~bester/myjob
protocol server address file name
• Other exampleshttps://pitcairn.mcs.anl.gov/tmp/input_dataset.1
https://pitcairn.mcs.anl.gov:2222/./output_data
http://www.globus.org/~bester/input_dataset.2
• Supports http & https• Support ftp & gsiftp.
![Page 49: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/49.jpg)
GASS RSL Extensions
• executable, stdin, stdout, stderr can be local files or URLs
• executable and stdin loaded into local cache before job begins (on front-end node)
• stdout, stderr handled via GASS append mode
• Cache cleaned after job completes
![Page 50: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/50.jpg)
GASS/RSL Example
&(executable=https://quad:1234/~/myexe) (stdin=https://quad:1234/~/myin) (stdout=/home/bester/output) (stderr=https://quad:1234/dev/stdout)
![Page 51: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/51.jpg)
Example GASS Applications
• On-demand, transparent loading of data sets
• Caching of data sets
• Automatic staging of code and data to remote supercomputers
• (Near) real-time logging of application output to remote server
![Page 52: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/52.jpg)
GASS File Access API
• Minimum changes to application• globus_gass_open(), globus_gass_close()
– Same as open(), close() but use URLs instead of filenames
– Caches URL in case of multiple opens– Return descriptors to files in local cache or
sockets to remote server
• globus_gass_fopen(), globus_gass_fclose()
![Page 53: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/53.jpg)
GASS File Access API (cont)
• Support for different access patterns– Read-only (from local cache)– Write-only (to local cache)– Read-write (to/from local cache)– Write-only, append (to remote server)
![Page 54: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/54.jpg)
Remove cachereference
Upload changes
Modified no
yes
globus_gass_open()/close()
Download Fileinto cache
open cached file,add cachereference
URL in cache? no
yes
globus_gass_open()
globus_gass_close()
![Page 55: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/55.jpg)
GASS File API Semantics
• Copy-on-open to cache if not truncate or write-only append and not already in cache
• Copy on close from cache if not read only and not other copies open
• Multiple globus_gass_open() calls share local copy of file
• Append to remote file if write only append: e.g., for stdout and stderr
• Reference counting keeps track of open files
![Page 56: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/56.jpg)
globus-gass-server
• Simple file server– Run by user wherever necessary– Secure https protocol, using GSI– APIs for embedding server into other programs
• Exampleglobus-gass-server –r –w -t
– -r: Allow files to be read from this server– -w: Allow files to be written to this server– -t: Tilde expand (~/… $(HOME)/…)– -help: For list of all options
![Page 57: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/57.jpg)
1. Derive Contact String2. Build RSL string3. Startup GASS server4. Submit to request5. Return output
jobmanager
gatekeeper
program
GRAM & GASS: Putting It Together
stdout
GASS server
3
4
globus-job-run
Host name
Contactstring
1
RSLstring
2CommandLine Args
4
4
55
55
![Page 58: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/58.jpg)
Example: A Simple Broker
• Select machines based on availability– Use MDS queries to get current host loads– Look at output and figure out what machines
to use
• Generate RSL based on selection– globus-job-run -dumprsl can assist
• Execute globusrun, feeding it the RSL generated in previous step
![Page 59: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/59.jpg)
GRAM & GASS
• Using RSL with globusrun
• Running globus-gass-server
• Modifying a program to use globus_gass_open() to read files remotely from a GASS server
![Page 60: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/60.jpg)
Globus Components In ActionLocal Machine
mpirun
globusrun
GRAM
ClientGSI
GRAM
ClientGSI
Remote Machine
AppNexus
AIX
PBS
MPI
grid-proxy-initX509UserCert
UserProxyCert
Machines
GRAM Gatekeeper
GSI
GRAM Job Manager
GASS Client
Remote Machine
AppNexus
Solaris
Unix Fork
MPI
GRAM Gatekeeper
GSI
GRAM Job Manager
GASS Client
RSL string
RSL multi-request
RSL single requestDUROC
GASS Server
RSL parser
![Page 61: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/61.jpg)
GRAM Components
Globus SecurityInfrastructure
Job Manager
GRAM client API calls to request resource allocation
and process creation.
MDS client API callsto locate resources
Query current statusof resource
Create
RSL Library
Parse
RequestAllocate &
create processes
Process
Process
Process
Monitor &control
Site boundary
Client MDS: Grid Index Info Server
Gatekeeper
MDS: Grid Resource Info Server
Local Resource Manager
MDS client API callsto get resource info
GRAM client API statechange callbacks
![Page 62: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/62.jpg)
MDS: Monitoring and Discovery Service
• Learn how to use the MDS to locate and determine characteristics of resources
• Locate resources– Where are resources with required
architecture, installed software, available capacity, network bandwidth, etc.?
• Determine resource characteristics– What are the physical characteristics,
connectivity, capabilities of a resource?
![Page 63: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/63.jpg)
The Need for Information
• System information is critical to operation of the grid and construction of applications– How does an application determine what resources
are available?– What is the “state” of the computational grid?– How can we optimize an application based on
configuration of the underlying system?
• We need a general information infrastructure to answer these questions
![Page 64: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/64.jpg)
Using Information forResource Brokering
“10 GFlops, EOS data,20 Mb/sec -- for 20 mins”
MetacomputingDirectoryService
GRAMGRAMGRAM
ResourceBroker
Info service:location + selection
Globus ResourceAllocation Managers
GRAM
ForkLSFEASYLLCondoretc.
“What computers?”“What speed?”“When available?”
“50 processors + storage from 10:20 to 10:40 pm”
“20 Mb/sec”
![Page 65: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/65.jpg)
Examples of Useful Information
• Characteristics of a compute resource– IP address, software available, system
administrator, networks connected to, OS version, load
• Characteristics of a network– Bandwidth and latency, protocols, logical
topology
• Characteristics of the Globus infrastructure– Hosts, resource managers
![Page 66: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/66.jpg)
Grid Information Service
• Provide access to static and dynamic information regarding system components
• A basis for configuration and adaptation in heterogeneous, dynamic environments
• Requirements and characteristics– Uniform, flexible access to information– Scalable, efficient access to dynamic data– Access to multiple information sources– Decentralized maintenance
![Page 67: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/67.jpg)
MDS
• Store information in a distributed directories– Directory stored in collection of LDAP servers– Each server optimized for particular function
• Directory can be updated by – Information providers and tools– Applications (i.e., users)– Backend tools which generate info on demand
• Information dynamically available to – Tools– Applications
![Page 68: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/68.jpg)
Directory Service Functions
• White Pages– Look up the IP number, amount of memory, etc., associated
with a particular machine
• Yellow Pages– Find all the computers of a particular class or with a
particular property
• Temporary inconsistencies are often considered okay– In a distributed system, you often do not know the state of a
resource until you actually use it– Information is often used as “hints”– Information itself can contain ttl, etc.
![Page 69: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/69.jpg)
MDS Approach
• Based on LDAP– Lightweight Directory Access Protocol v3 (LDAPv3)– Standard data model– Standard query protocol
• Globus specific schema– Host-centric representation
• Globus specific tools– GRIS, GIIS– Data discovery, publication,…
SNMP
GRIS
NIS
NWS
LDAP
LDAP API
Middleware
…
Application
GIIS…
![Page 70: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/70.jpg)
MDS Components
• Uses standard LDAP servers– OpenLDAP, Netscape, Oracle, etc
• Tools for populating & maintaining MDS– Integrated with Globus Toolkit server release, not of
concern to most Globus users– Discover/update static and dynamic info
• APIs for accessing & updating MDS contents– C, Java, PERL (LDAP API, JNDI)
• Various tools for manipulating MDS contents– Command line tools, Shell scripts & GUIs
![Page 71: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/71.jpg)
Anonymous Grid info search
• grid-info-search -x -h belle.anu.edu.au….Mds-Computer-isa: i686Mds-Computer-platform: i686Mds-Computer-Total-nodeCount: 1Mds-Cpu-Cache-l2kB: 512Mds-Cpu-features: fpu vme de pse tsc msr pae mce cx8 apic sep
mtrr pge mca cmo v pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tmMds-Cpu-Free-15minX100: 400Mds-Cpu-Free-1minX100: 400Mds-Cpu-Free-5minX100: 400Mds-Cpu-model: Intel(R) Xeon(TM) CPU 2…
![Page 72: ECECS Lecture 18 Grid Computing Citation: B.Ramamurthy/Suny-Buffalo.](https://reader030.fdocuments.net/reader030/viewer/2022032708/56649e6c5503460f94b6be66/html5/thumbnails/72.jpg)
Summary
• MDS provides the information needed to perform dynamic resource discovery and configuration– Critical component of resource brokers
• MDS is base on existing directory service standards (LDAPv3)