EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE International Summer School on Grid Computing...
-
date post
19-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE International Summer School on Grid Computing...
EGEE-II INFSO-RI-031688
Enabling Grids for E-sciencE
www.eu-egee.org
International Summer School on Grid Computing 2006
gLite Information System and Workload Management System
Diego ScardaciINFN CataniaInternational Summer School on Grid ComputingIschia, 9-21 July, 2006
International Summer School on Grid Computing 2006 2
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Outline
• Information System Architecture− Berkeley DB Information Index (BDII)
− The Relational Grid Monitoring Architecture (RGMA)
• Workload Management System− WMS Architecture
− Job Description Language Overview
− WMProxy Overview
− Special Jobs: DAG, Collections, Parametric and MPI
International Summer School on Grid Computing 2006 3
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Information System
International Summer School on Grid Computing 2006 4
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Information System
• What is?– System to collect information on the state of resources
• Why?– To discover resources of the grid and their nature– To have useful data in order to who is in charge of managing
the workload to do it more efficiently.– To check for health status of resources.
• How?– Monitoring state of resources locally and publishing fresh data
on the information system.– Adopting a data model that MUST be well known to all
components that want to access monitored information– Using different approaches that we are going to investigate in
the next slides
International Summer School on Grid Computing 2006 5
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Adopted Information Systems
• The BDII (Berkeley DB Information Index)– has been adopted in LCG middleware as the Information System
provider. – It is an evolution of the Globus Meta Directory System (MDS)– It is based on Lightweight Directory Access Protocol (LDAP)
servers.
• The Relational Grid Monitoring Architecture (R-GMA)– It is an implementation of the Grid Monitoring Architecture (GMA)
standardized by the Global Grid Forum (GGF, now OGF)– It is a relational implementation of the GMA– It is strongly Web Services Oriented– It uses standard SQL query syntax
International Summer School on Grid Computing 2006 6
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
GRISs, local BDII and BDII
Each site can run
a BDII. It collects the information
given by the local BDIIs
At each site, a *local* BDII collects the information
given by the GRISs
Local GRISes run on CEs and SEs at each site and report dynamic and static information
Abbreviations:
BDII: Berkeley DataBase Information Index
GIIS: Grid Index Information
Server
GRIS: Grid Resource
Information Server
International Summer School on Grid Computing 2006 7
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
The IS in gLite
RB Local GRIS
SELocal GRIS
CE Local GRIS
BDII-A BDII-B
SELocal GRIS
SELocal GRIS
CE Local GRIS
SELocal GRIS
BDII-C
CELocal GRIS
CE Site BDII
CELocal GRIS
CE Site BDIICE
Local GRIS
CE Site BDII
Site 1 Site 2 Site 3
International Summer School on Grid Computing 2006 9
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
R-GMA
• The Relational Grid Monitoring Architecture (R-GMA)– It is the relational implementation of
GMA defined by the GGF– Adopts a database model with
tables and relations between tables– Implements a virtual database– The user queries the R-GMA as
he/she was querying to a classical database (SQL string)
– Implements different type of queries
• The information– The Producer stores its location
(URL) in the Registry.– The Consumer looks up producer
URLs in the Registry.– The Consumer contacts the
Producer to get all the data or to listen for new data.
PRODUCER
CONSUMER
REGISTRY
Store location
Lookup location
Transfer Data
International Summer School on Grid Computing 2006 10
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Workload Management System
International Summer School on Grid Computing 2006 11
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
• Overview of WMS Architecture
• Job Description Language Overview
• WMProxy Overview
• Special Jobs– DAG jobs
– Job collections
– Parametric jobs
– MPI jobs
Outline
International Summer School on Grid Computing 2006 12
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
• The The Workload Management SystemWorkload Management System (WMS) comprises a set of Grid middleware components responsible for distribution and management of tasks across Grid resources.
• The purpose of the Workload Manager (WM) is accept and satisfy requests for job management coming from its clients– meaning of the submission request is to pass the responsibility
of the job to the WM. WM will pass the job to an appropriate CE for execution
• taking into account requirements and the preferences expressed in the job description file
• The decision of which resource should be used is the outcome of a matchmakingmatchmaking process.
WMS Objectives
International Summer School on Grid Computing 2006 13
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
WMS Architecture
Job managementJob managementrequests (submission, requests (submission, cancellation) expressedcancellation) expressed
via a Job Descriptionvia a Job DescriptionLanguage (JDL)Language (JDL)
Finds an appropriateFinds an appropriateCE for each submission CE for each submission
request, taking into account request, taking into account job requests and preferences, job requests and preferences, Grid status, utilization policies Grid status, utilization policies
on resources on resources
Keeps submission Keeps submission requestsrequests
Requests are keptRequests are kept for a whilefor a while
if no resources are if no resources are immediately availableimmediately available
Repository of resourceRepository of resource informationinformation
available to matchmakeravailable to matchmaker
Updated via notifications Updated via notifications and/or active and/or active
polling on resourcespolling on resources
Performs the actual Performs the actual job submission job submission and monitoring and monitoring
International Summer School on Grid Computing 2006 14
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Job Description Language
International Summer School on Grid Computing 2006 15
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Job Description Language
• In gLite Job Description Language (JDL)Job Description Language (JDL) is used to describe jobs for execution on Grid.
• The JDL adopted within the gLite middleware is based upon Condor’s CLASSified Advertisement language CLASSified Advertisement language (ClassAd)(ClassAd).
• A ClassAd is a record-like structure composed of a finite number of attributes separated by semi-colon (;)
• A ClassAd is highly flexible and can be used to represent arbitrary services
The JDL is used in gLite to specify the job’s characteristics and constrains, which are used during the match-making match-making
processprocess to select the best resources that satisfy job’s requirements.
International Summer School on Grid Computing 2006 16
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
• The JDL syntax JDL syntax consists on statements like:
Attribute = value;Attribute = value;
• Comments must be preceded by a sharp character
( ## ) or have to follow the C++ syntax
WARNING: The JDL is sensitive to blank
characters and tabs. No blank characters
or tabs should follow the
semicolon at the end of a line.
Job Description Language (cont.)
International Summer School on Grid Computing 2006 17
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
JDL: an example
Type = "Job";
JobType = "Normal";
Executable = "startGen4.sh";
Environment = {"CLASSPATH=./gfal.jar:./gint.jar","LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH","LCG_GFAL_VO=gilda","LCG_RFIO_TYPE=dpm"};
Arguments = " 0 0 10 4 10000 aliserv6.ct.infn.it lfn:/grid/gilda/valeria/2000pillar.dat /gilda/ischia06/vardizzo";
StdOutput = "sample.out";
StdError = "sample.err";
InputSandbox = {"startGen4.sh","gint.jar","gfal.jar","libGFalFile.so"};
OutputSandbox = {"sample.err","sample.out"};
Requirements = Member("GLITE-3_0_0",other.GlueHostApplicationSoftwareRunTimeEnvironment);
International Summer School on Grid Computing 2006 18
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Workload Manager Proxy
International Summer School on Grid Computing 2006 19
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
WMProxy
• WMProxy (Workload Manager Proxy) – is a new service providing access to the gLite Workload
Management System (WMS) functionality through a simple Web Services based interface.
– has been designed to handle a large number of requests for job submission gLite 1.5 => ~180 secs for 500 jobs
goal is to get in the short term to ~60 secs for 1000 jobs
– it provides additional features such as bulk submission and the support for shared and compressed sandboxes for compound jobs.
– It’s the natural replacement of the NS in the passage to the SOA approach.
International Summer School on Grid Computing 2006 20
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
New request types
• Support for new types strongly relies on newly developed JDL converters and on the DAG submission support– all JDL conversions are performed on the server– a single submission for several jobs
• All new request types can be monitored and controlled through a single handle (the request id)– each sub-jobs can be however followed-up and controlled
independently through its own id
• “Smarter” WMS client commands/API – allow submission of DAGs, collections and parametric jobs
exploiting the concept of “shared sandbox”– allow automatic generation and submission of collections and
DAGs from sets of JDL files located in user specified directories on the UI
International Summer School on Grid Computing 2006 21
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Special Jobs
International Summer School on Grid Computing 2006 22
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Outline
• DAG
• Job Collection
• Parametric jobs
• MPI jobs on gLite
International Summer School on Grid Computing 2006 23
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
DAG job
• A DAG job is a set of jobs where input, output, or execution of one or more jobs can depend on other jobs
• Dependencies are represented through Directed Acyclic Graphs, where the nodes are jobs, and the edges identify the dependencies
nodeA
nodeB nodeC NodeF
nodeD
International Summer School on Grid Computing 2006 24
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
JDL structure
International Summer School on Grid Computing 2006 25
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Attribute: Nodes
International Summer School on Grid Computing 2006 26
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Attribute: Dependencies
International Summer School on Grid Computing 2006 27
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
DAG jdl
[ type = "dag"; max_nodes_running = 4; nodes = [ nodeA = [ file ="nodes/nodeA.jdl" ; ]; nodeB = [ file ="nodes/nodeB.jdl" ; ]; nodeC = [ file ="nodes/nodeC.jdl" ; ]; nodeD = [ file ="nodes/nodeD.jdl"; ]; dependencies = { {nodeA, nodeB}, {nodeA, nodeC}, { {nodeB,nodeC}, nodeD } } ];]
Node description could also be done here,
instead of using separate files
International Summer School on Grid Computing 2006 28
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Job Collection
• A job collection is a set of independent jobs that user wants to submit and monitor via a single request
• Jobs of a collection are submitted as DAG nodes without dependencies
• JDL is a list of classad, which describes the subjobs
[
Type = "collection";
VirtualOrganisation = “gilda";
nodes = {
[ <job descr 1 >],
[ <job descr 2 >],
…
};
]
International Summer School on Grid Computing 2006 29
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
‘Scattered’ Input Sandboxes
• Input Sandbox can contain – file paths on the UI machine (i.e. the usual way)– URI pointing to files on a remote gridFTP/HTTPS server
• A base URI to be applied to all sandbox files can also be specified
• Only local files (file://) are uploaded to the WMS node• File pointed by URIs are directly downloaded on the WN by the
JobWrapper just before the job is started
InputSandbox = {
"gsiftp://neo.datamat.it:2811/var/prg/sim.exe",
"https://ghemon.cnaf.infn.it:8443/data/idat_1",
"file:///home/pacio/myconf“ };
InputSandboxBaseURI = "gsiftp://matrix.datamat.it:2811/var";
International Summer School on Grid Computing 2006 30
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
‘Scattered’ Output Sandboxes
• JDL has been enriched with new attributes for specifying the destinations for the files listed in the OutputSandbox attribute list
• A base URI to be applied to all sandbox files can also be specified
• Files are copied when the job has completed execution by the JobWrapper to the specified destination without transiting on the WMS node
OutputSandbox = { "jobOutput",
"run1/event1",
"jobError" };
OutputSandboxDestURI = {
"gsiftp://matrix.datamat.it/var/jobOutput",
"https://grid003.ct.infn.it:8443/home/cms/event1",
"gsiftp://matrix.datamat.it/var/jobError" };
OutputSandboxBaseDestURI = "gsiftp://neo.datamat.it/home/run1/";
International Summer School on Grid Computing 2006 31
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Job collection example
[ type = "collection"; InputSandbox = {"date.sh"}; RetryCount = 0; nodes = { [ file ="jobs/job1.jdl" ; ], [ [
Executable = "/bin/sh"; Arguments = "date.sh"; Stdoutput = "date.out"; StdError = "date.err"; OutputSandbox ={"date.out", "date.err"};]
], [ file ="jobs/job3.jdl" ; ] };]
All nodes will share this Input Sandbox
International Summer School on Grid Computing 2006 32
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Parametric Job
• A parametric job is a job where one or more of its attributes are parameterized
• Values of attributes vary according to a parameter
• Job monitoring / managing is always done through an unique jobID, as if the job was single (see submission of collection
[ JobType = "Parametric"; Executable = "/bin/sh"; Arguments = "md5.sh input_PARAM_.txt"; InputSandbox = {"md5.sh", "input_PARAM_.txt"}; StdOutput = "out_PARAM_.txt"; StdError = "err_PARAM_.txt"; Parameters = 4; ParameterStart = 1; ParameterStep = 1; OutputSandbox = {"out_PARAM_.txt", "err_PARAM_.txt"};]
International Summer School on Grid Computing 2006 33
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Parametric job / 2
• Parameter can be also a list of string• InputSandbox (if present) has to be coherent with
parameters[ui-test] /home/giorgio/param > cat param2.jdl[ JobType = "Parametric";
Executable = “/bin/cat"; Arguments = “input_PARAM_.txt”;
InputSandbox = "input_PARAM_.txt"; StdOutput = "myoutput_PARAM_.txt"; StdError = "myerror_PARAM_.txt"; Parameters = {earth,moon,mars}; OutputSandbox = {“myoutput_PARAM_.txt”};
]
[ui-test] /home/giorgio/param > ls
inputEARTH.txt inputMARS.txt inputMOON.txt param2.jdl
International Summer School on Grid Computing 2006 34
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
MPI Overview
• Execution of parallel jobs is an essential issue for modern informatics and applications.
• Most used library for parallel jobs support is MPI (Message Passing Interface)
• At the state of the art, parallel jobs can run inside single Computing Elements (CE) only; – several projects are involved into studies concerning the
possibility of executing parallel jobs on Worker Nodes (WNs) belonging to different CEs.
International Summer School on Grid Computing 2006 35
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
References
• gLite 3.0 User Guide– https://edms.cern.ch/file/722398/1.1/gLite-3-UserGuide.pdf
• R-GMA overview page– http://www.r-gma.org/
• GLUE Schema– http://infnforge.cnaf.infn.it/glueinfomodel/
• JDL attributes specification for WM proxy– https://edms.cern.ch/document/590869/1
• WMProxy quickstart– http://egee-jra1-wm.mi.infn.it/egee-jra1-wm/wmproxy_client_quickst
art.shtml
• WMS user guides– https://edms.cern.ch/document/572489/1