EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE Grid application development with gLite and...

59
EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org Grid application development with gLite and P-GRADE Portal Miklos Kozlovszky MTA SZTAKI [email protected]

Transcript of EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE Grid application development with gLite and...

EGEE-II INFSO-RI-031688

Enabling Grids for E-sciencE

www.eu-egee.org

Grid application development with gLite and P-GRADE Portal

Miklos KozlovszkyMTA [email protected]

2

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Contents

• P-GRADE Portal in a nutshell

• Workflow development with the Portal

• Workflow execution with the Portal

• Scaling up to a parametric workflow

3

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Short History of P-GRADE portal

• Parallel Grid Application and Development Environment• Initial development started in the Hungarian

SuperComputing Grid project in 2003• It has been continuously developed since 2003• Detailed information:

http://www.portal.p-grade.hu/

• Open Source community development since January 2008:

https://sourceforge.net/projects/pgportal/

4

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Download of OSS P-GRADE portal

110 downloads within the first month

~697 total downloads until now

5

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Main P-GRADE related projects

• EU SEE-GRID-1 (2004-2006)– Integration with LCG-2 and gLite

• EU SEE-GRID-2,SEE-GRID-SCI (2006-2008 / 2008-2010)– Parameter sweep extension

• EU CoreGrid (2005-2008)– To solve grid interoperation for job submission– To solve grid interoperation for data handling: SRB, OGSA-DAI

• GGF GIN (2006)– Providing the GIN Resource Testing portal

• EGEE 2,3 (2006-2010)– Respect program tool used for training and application development

• ICEAGE (2006-2008)– P-GRADE portal is used for training as official portal of the GILDA training

infrastructure• EU EDGeS (2008-2009)

– Transparent access to any EGEE and Desktop Grid systems

7

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Portal installations

8

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Multi-Grid service portalTo be used today!

11

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Motivations for developing P-GRADE portal

• P-GRADE portal should – Hide the complexity of the underlying grid middlewares– Provide a high-level graphical user interface that is easy-to-use for

e-scientists– Support many different grid programming approaches:

Simple Scripts & Control (sequential and MPI job execution) Scientific Application Plug-ins Complex Workflows Parameter sweep applications: both on job and workflow level Interoperability: transparent access to grids based on different

middleware technology (both computing and data resources)– Support several levels of parallelism

12

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Layers in a Grid system

Basic Grid services:AA, job submission, info, …

Higher-level grid services (brokering,…)

Application toolkits, standards

Application

Grid middlewareCommand line tools

P-GRADE Portal servicesGraphical interface

14

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

What is a P-GRADE Portal workflow?

• a directed acyclic graph where– Nodes represent jobs (batch

programs to be executed on a computing element)

– Ports represent input/output files the jobs expect/produce

– Arcs represent file transfer operations

• semantics of the workflow:– A job can be executed if all of

its input files are available

15

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Three Levels of parallelism

– PS workflow level: Parameter study execution of the workflow

– Workflow level: Parallel execution among workflow nodes (WF branch parallelism)

Multiple jobs can run parallel

Each job can be a parallel program

– Job level: Parallel execution inside a workflow node (MPI job as workflow component)

Multiple instances of the same workflow can

process different data files

16

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

25 times

ExampleExample 1. 1.: Computational Chemistry: Computational Chemistry

Department of Chemistry, University of Perugia

SOLUTION OF SCHRODINGER EQUATION FOR TRIATOMIC SYSTEMS USING TIME-DEPENDENT (RWAVEPR) OR TIME INDEPENDENT (ABC) METHOD

A single execution can be between 5 hours and 10 hours

SEQUENTIAL FORTRAN 90

Many simulations at the same time

17

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

25 x

10 x25 x 5 x

Hungarian Meteorology ServiceHungarian Meteorology ServiceForecasting dangerous weather situations (storms, fog, etc.), crucial task in the protection of life and property

Processed information:surface level measurements, high-altitude measurements, radar, satellite, lightning, results of previous computed models

Requirements:•Execution time < 10 min•High resolution (1km)

Example 2.:Ultra-short range weather forecastExample 2.:Ultra-short range weather forecast

18

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Grid interoperation by P-GRADEAcccessing Globus, gLite and ARC based grids simultaneously

P-GRADE

GEMLCA

Portal

GEMLCA GEMLCA RepositoryRepository

P-GRADEportal

19

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Typical user scenarioCompilation phase

Portalserver

Gridservices

Certificate servers

DOWNLOAD BINARI(ES)

UPLOAD SOURCE(S)

COMPILE – EDIT

20

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Typical user scenarioApplication development phase

Portalserver

Gridservices

START EDITOR

OPEN & EDIT or DEVELOP WORKFLOW

SAVE WORKFLOW

Certificate servers

21

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Certificate servers

Portalserver

Gridservices

TRANSFER FILES, SUBMIT JOBS

DOWNLOAD (SMALL)

RESULTS

DOWNLOAD (SMALL)

RESULTS

Typical user scenarioWorkflow Execution phase

VISUALIZE JOBS and

WORKFLOW PROGRESS

MONITOR JOBS

DOWNLOAD PROXY CERTIFICATES

22

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

P-GRADE Portal structural overview

User interface layer Presents the user interface

Internal layer – Java classesRepresents the internal concepts

Java Webstartworkflow editor

Web browser

EGEE and Globus Grid services (gLite WMS, LFC,…; Globus GRAM, GridFTP, …)

Client

P-GRADEPortalserver

Grid

Grid layer – gLite and Globus command line toolsInterfacing with grid services

23

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Interface layer

User interface layer

Java Webstartworkflow editor

Web browserClient

Web server

P-GRADEPortalserver

Gridpshere Web portal framework

Gridsphere portlets

P-GRADE portlets

Workflow monitor:

Java applet generator

Workflow editor:

Java webstart application

24

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Interface layer functionalities

User interface layer

Java Webstartworkflow editor

Web browserClient

Web server

P-GRADEPortalserver

Gridpshere Web portal framework

Gridsphere portlets

P-GRADE portlets

Workflow monitor:

Java applet generator

Workflow editor:

Java webstart application

•Workflow portlet• Workflow manager, Storage, Upload

•Certificate portlet• Upload, download and other operations

•Settings portlet• Grid settings, Quota settings

• File management• Manage files in the grid

• Compiler portlet• Compile jobs on portal server

• Login• Welcome• ...

25

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

P-GRADE vs. Non-P-GRADE portlets

P-GRADE Portal portletsGridSphere

2.xGrid Portal framework

26

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Interface layer

User interface layer

Java Webstartworkflow editor

Web browserClient

Web server

P-GRADEPortalserver

Gridpshere Web portal framework

Gridsphere portlets

P-GRADE portlets

Workflow monitor:

Java applet generator

Workflow editor:

Java webstart application

27

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Interface layer

User interface layer

Java Webstartworkflow editor

Web browserClient

Web server

P-GRADEPortalserver

Gridpshere Web portal framework

Gridsphere portlets

P-GRADE portlets

Workflow monitor:

Java applet generator

Workflow editor:

Java webstart application

31

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Portlets/functionalities of P-GRADE portal

• Settings (portlet)• Certificate and proxy management (portlet)• Information system visualization (portlet)• Graphical workflow editing• Workflow manager (portlet)• LFC (EGEE) file management (portlet)• Compilation support (portlet)• Fault-tolerance support

32

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Settings Portlet

• Portal administrator can – connect the portal

to several grids– register the basic

resources of the connected grids

33

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Settings Portlet

User can customize the connected grids by adding and removing resources

34

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Certificate and proxy management Portlet

• User can upload his certificates of various grids to the MyProxy server

• User can download proxys and allocate to grids• User can use simultaneously as many proxys as many

grids are connected to the portal• As a result parallel branches of a workflow can be executed

simultaneously in several grids

SEE-GRID accessHUNGRID access

35

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

MyProxy interaction in P-GRADE: Certificate Manager

• To start your session on the Grid you must create a proxy certificate on the portal server

• “Certificates” portlet:

• to upload a proxy into MyProxy servers

• to download a proxy from MyProxy into the portal server

Certificates portlet

36

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Certificate ManagerDownloading a proxy

1. MyProxy server access details:• Hostname• Port number• User name (from upload)• Password (from upload)

2. Proxy parameters:• Lifetime• Comment

3. Grid association

37

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Certificate ManagerAssociating the proxy with a grid

This operation displays the details of the certificate and the list of available Grids (defined by portal administrator)

38

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

EGEE Grid

UK NGS

P-GRADE-Portal

London Paris

Athens

Solving Grid interoperation by P-GRADE Portal

Different jobs can be parallel executed in different grids

39

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Interoperation vs. Interoperability

Interoperation: – short term solution that defines what needs to be done to achieve

interoperation between current production grids using existing technologies

Interoperability:– native ability of Grids and Grid middleware to interact directly via common

open standards

As defined by the GIN (Grid Interoperation Now)CG (Community Group) of the OGF (Open Grid Forum)

Grid 1 Grid 2 Grid 3

P-GRADE Portal Grid 1

Grid 2 Grid 3

Interoperation Interoperability

40

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Information system Portlet

41

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Graphical workflow editing

• The aim is to define a DAG of batch jobs:

1. Drag & drop components:jobs and ports

2. Define their properties3. Connect ports by

channels (no cycles, no loops, no conditions)

4. Automatically generates JDL file

42

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Workflow EditorProperties of a job

Properties of a job:• Binary executable• Type of executable• Number of required

processors• Command line parameters• The resource to be used

for the execution:• Grid/VO• (Computing element)

43

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Direct resource selection:Which computing element to use?

The information system portlet

queries BDII and GIIS servers

I still don’t know which resource to

use!

44

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Automatic resource selection

1. Select a broker Grid/VO for the job (e.g. GILDA_LCG2_broker/GILDA_gLite_broker)

2. (Describe the ranks & requirements of the job in JDL)

3. The portal will use the broker to find the best resource for the job!

45

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Workflow EditorDefining broker jobs

Select a Grid with broker!(*_BROKER)

Ignore the resource field!

If default JDL is not sufficient use the built-in JDL editor!

46

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Workflow EditorBuilt-in JDL editor

JDL look at the gLite Users’ manual!

47

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Workflow EditorDefining input-output files

File propertiesType: input: the job reads output: the job generatesFile type: local: comes from my desktop remote: comes from an SEFile: location of the fileInternal file name: Executable reads the file in this name – fopen(“file.in”, …)File storage type (output files only): Permanent: final result Volatile: only data channel

48

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

• Client side location:result.dat

• LFC logical file name(LFC file catalog is required – EGEE VOs) lfn:/grid/gilda/kozlovszky/11-04_-_result.dat

• GridFTP address (in Globus Grids):gsiftp://somengshost.ac.uk/mydir/result.dat

Local fileLocal file

Remote fileRemote file

How to refer to an I/O file?

• Client side location:c:\experiments\11-04.dat

• LFC logical file name(LFC file catalog is required – EGEE VOs) lfn:/grid/gilda/kozlovszky/11-04.dat

• GridFTP address (in Globus Grids):gsiftp://somengshost.ac.uk/mydir/11-04.dat

Input file Output file

49

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Gridservices

Local vs. remote files

Computing elements

Storage elements

REMOTE INPUTFILES

REMOTE OUTPUT

FILES

LOCAL INPUT FILES

& EXECUTABLES

LOCAL OUTPUT

FILES

LOCAL INPUT FILES

& EXECUTABLES

LOCAL OUTPUT

FILES

Only the permanent

files!

Your binary can access data services directly too• GridFTP API• GFAL API• lfc-*, lcg-* commands

Portalserver

51

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Workflow manager

• Lists available workflows• Enables

– Submitting– Aborting– Deleting

existing workflows

• Shows status, logs and results of workflow executions• Orchestrates job executions inside a workflow

52

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Workflow Management(workflow portlet)

• The portlet presents the status, size and output of the available workflow in the “Workflow” list

• It has a Quota manager to control the users’ storage space on the server• The portlet also contains the “Abort”, “Attach”, “Details”, “Delete” and

“Delete all” buttons to handle execution of workflows• The “Attach” button opens the workflow in the Workflow Editor• The “Details” button gives an overview about the jobs of the workflow

53

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 White/Red/Green color means the job is initial/running/finished state

Workflow Execution(observation by the workflow portlet)

54

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 White/Red/Green color means the job is initial/running/finished state

Workflow Execution(observation by the workflow portlet)

55

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 White/Red/Green color means the job is initial/running/finished state

Workflow Execution(observation by the workflow portlet)

56

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 White/Red/Green color means the job is initial/running/finished state

Workflow Execution(observation by the workflow portlet)

57

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Workflow Execution(observation by the workflow portlet)

White/Red/Green color means the job is initialised/running/finished

58

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

LFC (EGEE) file management

59

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Compilation support

60

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Logs provided for each job

61

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Analysis of the log

• 2008.01.09 09:32:19 - Proxy with VOMS extensions created for VO "voce" with accounting group "".

• 2008.01.09 09:32:19 - Job submission in progress...• 2008.01.09 09:32:23 - Job has been submitted successfully!• 2008.01.09 09:32:23 - Job identifier is:• "https://skurut1.cesnet.cz:9000/mD_8VzPhm8AmIToTJKtigg"• 2008.01.09 09:32:26 - EGEE job's status has changed to "Waiting" (host is ).• 2008.01.09 09:33:00 - EGEE job's status has changed to "Ready" (host is ce1-

egee.srce.hr).• 2008.01.09 09:35:46 - EGEE job's status has changed to "Waiting" (host is egee-

ce.grid.niif.hu).• 2008.01.09 09:36:19 - EGEE job's status has changed to "Ready" (host is ce.cyf-

kr.edu.pl).• 2008.01.09 09:36:53 - EGEE job's status has changed to "Waiting" (host is• ce.cyf-kr.edu.pl).• 2008.01.09 09:37:26 - EGEE job's status has changed to "Done" (host is• egee-ce.grid.niif.hu).• 2008.01.09 09:37:26 - Job found to be finished. Checking again if this is• really the case.• 2008.01.09 09:38:03 - EGEE job's status has changed to "Ready" (host is• egee-ce1.gup.uni-linz.ac.at).

62

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Fault-tolerant Grid applications

• Utilizing– Condor DAGMan’s rescue mechanism– EGEE job resubmission mechanism of WMS

• If the EGEE broker leaves a job stuck in a CEs’ queue, the portal automatically – kills the job on this site and – resubmits the job to the broker by prohibiting this site.

• As a result – the portal guarantees the correct submission of a job as long as

there exists at least one matching resource – job submission is reliable even in an unreliable grid

63

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Fault-tolerance by P-GRADE portal

• 09:33: the broker assigned the job to a site: ce1-egee.srce.hr• 09:35: The broker moved the job to another site: egee-ce.grid.niif.hu• 09:36: Again the broker moved the job to another site: ce.cyf-kr.edu.pl• 09:37: The broker indicated that the job is Done, but .• 09:38: ... It turned out that the job was not finished (Done - Failed

status), only it was moved to another site: egee-ce1.gup.uni-linz.ac.at• 09:39: Again the broker moved the job to another site: ares02.cyf-

kr.edu.pl• 09:39: Again the broker moved the job to another site: ce.cyf-kr.edu.pl• 09:40: After trying 10 different sites the VOCE broker gave it up and

aborted the job (the Shallow RetryCount was set for 10):

• 2008.01.09 09:40:16 - The job has been aborted!

64

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Fault-tolerance by P-GRADE portal

• Our fault-tolerant portal did not give it up:• 2008.01.09 09:40:16 - The job can be submitted again (try 1 out of 3,• excluding host(s): ce.cyf-kr.edu.pl)• 2008.01.09 09:40:17 - Proxy with VOMS extensions created for VO "voce"

with• accounting group "".• 2008.01.09 09:40:17 - Job submission in progress...• 2008.01.09 09:40:27 - Job has been submitted successfully!• 2008.01.09 09:40:27 - Job identifier is:• "https://skurut1.cesnet.cz:9000/o22BTVqQsvwzj2wn5KP8_A"• 2008.01.09 09:40:30 - EGEE job's status has changed to "Waiting" (host is ).• 2008.01.09 09:41:04 - EGEE job's status has changed to "Ready" (host is• eszakigrid66.inf.elte.hu).

65

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

• 2008.01.09 09:41:37 - EGEE job's status has changed to "Scheduled" (host is• eszakigrid66.inf.elte.hu).• 2008.01.09 09:44:57 - EGEE job's status has changed to "Done" (host is• eszakigrid66.inf.elte.hu).• 2008.01.09 09:44:57 - Job found to be finished. Checking again if this is• really the case.• 2008.01.09 09:45:34 - EGEE job's status has changed to "Waiting" (host is• eszakigrid66.inf.elte.hu).• 2008.01.09 10:06:06 - The job's status hasn't changed for 20 minutes,• resubmitting...

It is a quite frequently occurring problem in EGEE-like grids that the broker leaves jobs stuck in CEs. queues.) In such case the portal automatically kills the job on this site and resubmits it to the broker.

• 2008.01.09 10:06:06 - Proxy with VOMS extensions created for VO "voce" with accounting group "".

• 2008.01.09 10:06:06 - Job submission in progress...• 2008.01.09 10:06:12 - Job has been submitted successfully!

• 10:10: The job successfully finished with exit code 0 on site: ce.ui.savba.sk

Fault-tolerance by P-GRADE portal

66

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Lessons learnt

• P-GRADE portal provides– Easy-to-use but powerful workflow system (graphical editor, wf

manager, etc.)

– Three levels of parallelism MPI job level Workflow branch level Parameter sweep at workflow level

– Multi-grid/multi-VO access mechanism for various grids (LCG-2, gLite and GT2)

Simultaneous access Transparent access Migrating a workflow from one grid to another requires no modification in

the workflow

EGEE-II INFSO-RI-031688

Enabling Grids for E-sciencE

www.eu-egee.org

Thank you!

[email protected]

Learn once, use everywhereDevelop once, execute anywhere