EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE International Summer School on Grid Computing...

34
EGEE-II INFSO-RI- 031688 Enabling Grids for E-sciencE www.eu-egee.org International Summer School on Grid Computing 2006 gLite Information System and Workload Management System Diego Scardaci INFN Catania International Summer School on Grid Computing Ischia, 9-21 July, 2006
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    0

Transcript of EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE International Summer School on Grid Computing...

Page 1: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

EGEE-II INFSO-RI-031688

Enabling Grids for E-sciencE

www.eu-egee.org

International Summer School on Grid Computing 2006

gLite Information System and Workload Management System

Diego ScardaciINFN CataniaInternational Summer School on Grid ComputingIschia, 9-21 July, 2006

Page 2: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 2

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Outline

• Information System Architecture− Berkeley DB Information Index (BDII)

− The Relational Grid Monitoring Architecture (RGMA)

• Workload Management System− WMS Architecture

− Job Description Language Overview

− WMProxy Overview

− Special Jobs: DAG, Collections, Parametric and MPI

Page 3: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 3

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Information System

Page 4: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 4

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Information System

• What is?– System to collect information on the state of resources

• Why?– To discover resources of the grid and their nature– To have useful data in order to who is in charge of managing

the workload to do it more efficiently.– To check for health status of resources.

• How?– Monitoring state of resources locally and publishing fresh data

on the information system.– Adopting a data model that MUST be well known to all

components that want to access monitored information– Using different approaches that we are going to investigate in

the next slides

Page 5: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 5

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Adopted Information Systems

• The BDII (Berkeley DB Information Index)– has been adopted in LCG middleware as the Information System

provider. – It is an evolution of the Globus Meta Directory System (MDS)– It is based on Lightweight Directory Access Protocol (LDAP)

servers.

• The Relational Grid Monitoring Architecture (R-GMA)– It is an implementation of the Grid Monitoring Architecture (GMA)

standardized by the Global Grid Forum (GGF, now OGF)– It is a relational implementation of the GMA– It is strongly Web Services Oriented– It uses standard SQL query syntax

Page 6: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 6

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

GRISs, local BDII and BDII

Each site can run

a BDII. It collects the information

given by the local BDIIs

At each site, a *local* BDII collects the information

given by the GRISs

Local GRISes run on CEs and SEs at each site and report dynamic and static information

Abbreviations:

BDII: Berkeley DataBase Information Index

GIIS: Grid Index Information

Server

GRIS: Grid Resource

Information Server

Page 7: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 7

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

The IS in gLite

RB Local GRIS

SELocal GRIS

CE Local GRIS

BDII-A BDII-B

SELocal GRIS

SELocal GRIS

CE Local GRIS

SELocal GRIS

BDII-C

CELocal GRIS

CE Site BDII

CELocal GRIS

CE Site BDIICE

Local GRIS

CE Site BDII

Site 1 Site 2 Site 3

Page 8: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 9

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

R-GMA

• The Relational Grid Monitoring Architecture (R-GMA)– It is the relational implementation of

GMA defined by the GGF– Adopts a database model with

tables and relations between tables– Implements a virtual database– The user queries the R-GMA as

he/she was querying to a classical database (SQL string)

– Implements different type of queries

• The information– The Producer stores its location

(URL) in the Registry.– The Consumer looks up producer

URLs in the Registry.– The Consumer contacts the

Producer to get all the data or to listen for new data.

PRODUCER

CONSUMER

REGISTRY

Store location

Lookup location

Transfer Data

Page 9: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 10

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Workload Management System

Page 10: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 11

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

• Overview of WMS Architecture

• Job Description Language Overview

• WMProxy Overview

• Special Jobs– DAG jobs

– Job collections

– Parametric jobs

– MPI jobs

Outline

Page 11: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 12

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

• The The Workload Management SystemWorkload Management System (WMS) comprises a set of Grid middleware components responsible for distribution and management of tasks across Grid resources.

• The purpose of the Workload Manager (WM) is accept and satisfy requests for job management coming from its clients– meaning of the submission request is to pass the responsibility

of the job to the WM. WM will pass the job to an appropriate CE for execution

• taking into account requirements and the preferences expressed in the job description file

• The decision of which resource should be used is the outcome of a matchmakingmatchmaking process.

WMS Objectives

Page 12: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 13

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

WMS Architecture

Job managementJob managementrequests (submission, requests (submission, cancellation) expressedcancellation) expressed

via a Job Descriptionvia a Job DescriptionLanguage (JDL)Language (JDL)

Finds an appropriateFinds an appropriateCE for each submission CE for each submission

request, taking into account request, taking into account job requests and preferences, job requests and preferences, Grid status, utilization policies Grid status, utilization policies

on resources on resources

Keeps submission Keeps submission requestsrequests

Requests are keptRequests are kept for a whilefor a while

if no resources are if no resources are immediately availableimmediately available

Repository of resourceRepository of resource informationinformation

available to matchmakeravailable to matchmaker

Updated via notifications Updated via notifications and/or active and/or active

polling on resourcespolling on resources

Performs the actual Performs the actual job submission job submission and monitoring and monitoring

Page 13: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 14

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Job Description Language

Page 14: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 15

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Job Description Language

• In gLite Job Description Language (JDL)Job Description Language (JDL) is used to describe jobs for execution on Grid.

• The JDL adopted within the gLite middleware is based upon Condor’s CLASSified Advertisement language CLASSified Advertisement language (ClassAd)(ClassAd).

• A ClassAd is a record-like structure composed of a finite number of attributes separated by semi-colon (;)

• A ClassAd is highly flexible and can be used to represent arbitrary services

The JDL is used in gLite to specify the job’s characteristics and constrains, which are used during the match-making match-making

processprocess to select the best resources that satisfy job’s requirements.

Page 15: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 16

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

• The JDL syntax JDL syntax consists on statements like:

Attribute = value;Attribute = value;

• Comments must be preceded by a sharp character

( ## ) or have to follow the C++ syntax

WARNING: The JDL is sensitive to blank

characters and tabs. No blank characters

or tabs should follow the

semicolon at the end of a line.

Job Description Language (cont.)

Page 16: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 17

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

JDL: an example

Type = "Job";

JobType = "Normal";

Executable = "startGen4.sh";

Environment = {"CLASSPATH=./gfal.jar:./gint.jar","LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH","LCG_GFAL_VO=gilda","LCG_RFIO_TYPE=dpm"};

Arguments = " 0 0 10 4 10000 aliserv6.ct.infn.it lfn:/grid/gilda/valeria/2000pillar.dat /gilda/ischia06/vardizzo";

StdOutput = "sample.out";

StdError = "sample.err";

InputSandbox = {"startGen4.sh","gint.jar","gfal.jar","libGFalFile.so"};

OutputSandbox = {"sample.err","sample.out"};

Requirements = Member("GLITE-3_0_0",other.GlueHostApplicationSoftwareRunTimeEnvironment);

Page 17: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 18

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Workload Manager Proxy

Page 18: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 19

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

WMProxy

• WMProxy (Workload Manager Proxy) – is a new service providing access to the gLite Workload

Management System (WMS) functionality through a simple Web Services based interface.

– has been designed to handle a large number of requests for job submission gLite 1.5 => ~180 secs for 500 jobs

goal is to get in the short term to ~60 secs for 1000 jobs

– it provides additional features such as bulk submission and the support for shared and compressed sandboxes for compound jobs.

– It’s the natural replacement of the NS in the passage to the SOA approach.

Page 19: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 20

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

New request types

• Support for new types strongly relies on newly developed JDL converters and on the DAG submission support– all JDL conversions are performed on the server– a single submission for several jobs

• All new request types can be monitored and controlled through a single handle (the request id)– each sub-jobs can be however followed-up and controlled

independently through its own id

• “Smarter” WMS client commands/API – allow submission of DAGs, collections and parametric jobs

exploiting the concept of “shared sandbox”– allow automatic generation and submission of collections and

DAGs from sets of JDL files located in user specified directories on the UI

Page 20: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 21

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Special Jobs

Page 21: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 22

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Outline

• DAG

• Job Collection

• Parametric jobs

• MPI jobs on gLite

Page 22: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 23

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

DAG job

• A DAG job is a set of jobs where input, output, or execution of one or more jobs can depend on other jobs

• Dependencies are represented through Directed Acyclic Graphs, where the nodes are jobs, and the edges identify the dependencies

nodeA

nodeB nodeC NodeF

nodeD

Page 23: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 24

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

JDL structure

Page 24: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 25

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Attribute: Nodes

Page 25: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 26

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Attribute: Dependencies

Page 26: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 27

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

DAG jdl

[ type = "dag"; max_nodes_running = 4; nodes = [ nodeA = [ file ="nodes/nodeA.jdl" ; ]; nodeB = [ file ="nodes/nodeB.jdl" ; ]; nodeC = [ file ="nodes/nodeC.jdl" ; ]; nodeD = [ file ="nodes/nodeD.jdl"; ]; dependencies = { {nodeA, nodeB}, {nodeA, nodeC}, { {nodeB,nodeC}, nodeD } } ];]

Node description could also be done here,

instead of using separate files

Page 27: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 28

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Job Collection

• A job collection is a set of independent jobs that user wants to submit and monitor via a single request

• Jobs of a collection are submitted as DAG nodes without dependencies

• JDL is a list of classad, which describes the subjobs

[

Type = "collection";

VirtualOrganisation = “gilda";

nodes = {

[ <job descr 1 >],

[ <job descr 2 >],

};

]

Page 28: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 29

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

‘Scattered’ Input Sandboxes

• Input Sandbox can contain – file paths on the UI machine (i.e. the usual way)– URI pointing to files on a remote gridFTP/HTTPS server

• A base URI to be applied to all sandbox files can also be specified

• Only local files (file://) are uploaded to the WMS node• File pointed by URIs are directly downloaded on the WN by the

JobWrapper just before the job is started

InputSandbox = {

"gsiftp://neo.datamat.it:2811/var/prg/sim.exe",

"https://ghemon.cnaf.infn.it:8443/data/idat_1",

"file:///home/pacio/myconf“ };

InputSandboxBaseURI = "gsiftp://matrix.datamat.it:2811/var";

Page 29: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 30

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

‘Scattered’ Output Sandboxes

• JDL has been enriched with new attributes for specifying the destinations for the files listed in the OutputSandbox attribute list

• A base URI to be applied to all sandbox files can also be specified

• Files are copied when the job has completed execution by the JobWrapper to the specified destination without transiting on the WMS node

OutputSandbox = { "jobOutput",

"run1/event1",

"jobError" };

OutputSandboxDestURI = {

"gsiftp://matrix.datamat.it/var/jobOutput",

"https://grid003.ct.infn.it:8443/home/cms/event1",

"gsiftp://matrix.datamat.it/var/jobError" };

OutputSandboxBaseDestURI = "gsiftp://neo.datamat.it/home/run1/";

Page 30: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 31

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Job collection example

[ type = "collection"; InputSandbox = {"date.sh"}; RetryCount = 0; nodes = { [ file ="jobs/job1.jdl" ; ], [ [

Executable = "/bin/sh"; Arguments = "date.sh"; Stdoutput = "date.out"; StdError = "date.err"; OutputSandbox ={"date.out", "date.err"};]

], [ file ="jobs/job3.jdl" ; ] };]

All nodes will share this Input Sandbox

Page 31: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 32

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Parametric Job

• A parametric job is a job where one or more of its attributes are parameterized

• Values of attributes vary according to a parameter

• Job monitoring / managing is always done through an unique jobID, as if the job was single (see submission of collection

[ JobType = "Parametric"; Executable = "/bin/sh"; Arguments = "md5.sh input_PARAM_.txt"; InputSandbox = {"md5.sh", "input_PARAM_.txt"}; StdOutput = "out_PARAM_.txt"; StdError = "err_PARAM_.txt"; Parameters = 4; ParameterStart = 1; ParameterStep = 1; OutputSandbox = {"out_PARAM_.txt", "err_PARAM_.txt"};]

Page 32: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 33

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Parametric job / 2

• Parameter can be also a list of string• InputSandbox (if present) has to be coherent with

parameters[ui-test] /home/giorgio/param > cat param2.jdl[ JobType = "Parametric";

Executable = “/bin/cat"; Arguments = “input_PARAM_.txt”;

InputSandbox = "input_PARAM_.txt"; StdOutput = "myoutput_PARAM_.txt"; StdError = "myerror_PARAM_.txt"; Parameters = {earth,moon,mars}; OutputSandbox = {“myoutput_PARAM_.txt”};

]

[ui-test] /home/giorgio/param > ls

inputEARTH.txt inputMARS.txt inputMOON.txt param2.jdl

Page 33: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 34

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

MPI Overview

• Execution of parallel jobs is an essential issue for modern informatics and applications.

• Most used library for parallel jobs support is MPI (Message Passing Interface)

• At the state of the art, parallel jobs can run inside single Computing Elements (CE) only; – several projects are involved into studies concerning the

possibility of executing parallel jobs on Worker Nodes (WNs) belonging to different CEs.

Page 34: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  International Summer School on Grid Computing 2006 gLite Information System and Workload.

International Summer School on Grid Computing 2006 35

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

References

• gLite 3.0 User Guide– https://edms.cern.ch/file/722398/1.1/gLite-3-UserGuide.pdf

• R-GMA overview page– http://www.r-gma.org/

• GLUE Schema– http://infnforge.cnaf.infn.it/glueinfomodel/

• JDL attributes specification for WM proxy– https://edms.cern.ch/document/590869/1

• WMProxy quickstart– http://egee-jra1-wm.mi.infn.it/egee-jra1-wm/wmproxy_client_quickst

art.shtml

• WMS user guides– https://edms.cern.ch/document/572489/1