P-GRADE Portal and GEMLCA: A workflow-oriented portal and application hosting environment

53
1 www.portal.p-grade.hu www.cpc.wmin.ac.uk/gemlca P-GRADE Portal and GEMLCA: P-GRADE Portal and GEMLCA: A workflow-oriented portal and A workflow-oriented portal and application hosting environment application hosting environment Gergely Sipos Gergely Sipos [email protected] [email protected] MTA SZTAKI (Hungarian Academy of Sciences) MTA SZTAKI (Hungarian Academy of Sciences)

description

P-GRADE Portal and GEMLCA: A workflow-oriented portal and application hosting environment. Gergely Sipos [email protected] MTA SZTAKI (Hungarian Academy of Sciences). www.portal.p-grade.hu www.cpc.wmin.ac.uk/gemlca. Contents. Motivation of creating the tools - PowerPoint PPT Presentation

Transcript of P-GRADE Portal and GEMLCA: A workflow-oriented portal and application hosting environment

Page 1: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

1

www.portal.p-grade.huwww.cpc.wmin.ac.uk/gemlca

P-GRADE Portal and GEMLCA: P-GRADE Portal and GEMLCA: A workflow-oriented portal and application A workflow-oriented portal and application

hosting environmenthosting environmentGergely Sipos Gergely Sipos

[email protected]@sztaki.huMTA SZTAKI (Hungarian Academy of Sciences)MTA SZTAKI (Hungarian Academy of Sciences)

Page 2: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

2

ContentsContents

• Motivation of creating the tools• P-GRADE Portal and GEMLCA in a nutshell• Lifecycle of GEMLCA / P-GRADE applications• Services provided for application developers

• Introduction of the hands-on• Hands-on

• How to use P-GRADE / GEMLCA Portal for training and dissemination

Page 3: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

3

ContextContext

Basic Grid services:AA, job submission, info, …

Higher-level grid services (brokering,…)

Application toolkits, standards

Application

Grid middleware servicesMiddleware specific clients

Middleware independent services and interfaces of P-GRADE/GEMLCA

Graphical interface

Page 4: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

4

Current situation Current situation and trends in Grid computingand trends in Grid computing

• Fast evolution of Grid systems and middleware:– GT2, OGSA, GT3 (OGSI), GT4 (WSRF), LCG-2, gLite, …

• Many production Grid systems are built with them– EGEE (LCG-2 gLite), UK NGS (GT2), Open Science Grid

(GT2 GT4), NorduGrid (~GT2)• Although the same set of core services are available

everywhere, they are implemented in different ways– Data services (file management)– Computation services (job submission)– Security services (proxy based single sign-on)– Brokers (not in every middleware)

Page 5: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

6

P-GRADE Portal in a nutshellP-GRADE Portal in a nutshell• General purpose, workflow-oriented computational Grid portal.

Supports the development and execution of workflow based Grid applications – a Grid orchestration environment

• Based on GridSphere web portal framework– Functionalities are accessed through portlets– Easy to expand with new portlets (e.g. application-specific portlets)– Easy to tailor to end-user or community needs

• Developed by SZTAKI (1.0 in 2003, now 2.5)• Grid services supported by P-GRADE Portal 2.5:

Service EGEE grids (LCG/gLite) Globus 2 gridsJob execution Computing Element GRAM

File storage Storage Element, File catalog GridFTP server

Certificate management MyProxy server, VOMS server

Information system BDII MDS-2, MDS-4

Brokering Workload Management System

Job monitoring Mercury

Workflow & job visualization

PROVE

Solves Grid interoperability problem at the workflow level

TODAY’S FOCUS

Page 6: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

7

GEMLCA extension of theGEMLCA extension of theP-GRADE Portal P-GRADE Portal

• P-GRADE Portal extended with GEMLCA Grid service back-end– To share jobs and legacy codes as application components with others– A step towards collaborative e-Science

• Developed by the University of Westminster (London)• Support for Globus 4 grids (besides GT2 and EGEE)• Available on the NGS and OGF GIN

P-GRADE Portal

GEMLCAGlobus 4 VOs

Globus 2 VOs

LCG / gLite VOs

jobjobjobjob

Page 7: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

8

Related projectsRelated projectsThe development, operation and training of P-GRADE Portal and GEMLCA is supported by the following projects:– SEE-GRID www.see-grid.eu

Development, application support

– Coregrid www.coregrid.netResearch, development

– EGEE www.eu-egee.orggLite training, application development

– ICEAGE www.iceage-eu.orgGrid training and education

Page 8: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

9

A Grid application in the GEMLCA / P-GRADE Portal

• A directed acyclic graph where– Nodes represent jobs or

services (a batch program executed on a computing resource)

– Ports represent input/output files the components expect/produce

– Arcs represent file transfer operations

• Semantics of the workflow:– A job can be executed if all of

its input files are available – Responsibility of the built-in

workflow manager

Page 9: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

10

Three levels of parallelism within a P-GRADE Portal application

• The workflow concept of the GEMLCA/ P-GRADE Portal enables the efficient parallelization of complex problems

• Semantics of the workflow enables two levels of parallelism:

The job/service can be a parallel

code

– Parallel execution inside a workflow node– Parallel execution among workflow nodes

Multiple nodes can run parallel

Multiple instances of the same workflow process

different data files

– Parametric sweep execution of the workflow (SIMD)

Page 10: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

12

Workflow-level Grid interoperability:The GIN Resource Testing portal

OGF effort to demonstrate workflow level grid interoperability between major production Grids and to monitor OGF GIN VO resources

P-GRADEGEMLCA

Portal

GEMLCA GEMLCA RepositoryRepository

Page 11: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

13

The typical user scenarioThe typical user scenarioPart 1 - development phasePart 1 - development phase

MyProxy servers

Portalserver

Gridservices

START EDITOR

OPEN & EDIT or DEVELOP WORKFLOW

or PS WF

SAVE WF / PS

REUSE WORKFLOW

COMPONENTS

Page 12: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

14

MyProxy servers

Portalserver

Gridservices

TRANSFER FILES, SUBMIT JOBS

DOWNLOAD (SMALL)

RESULTS

DOWNLOAD (SMALL) RESULTS

The typical user scenarioThe typical user scenarioPart 2 - execution phasePart 2 - execution phase

VISUALIZE JOBS and

WORKFLOW PROGRESS

MONITOR JOBS

DOWNLOAD PROXY CERTIFICATES

Keep large files on Grid storage

resources

Page 13: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

15

Portalserver

Gridservices

The typical user scenarioThe typical user scenarioPart 3 - collaborative phasePart 3 - collaborative phase

Share workflow components with other users of the

same portal

Export and share workflows with users

of the same, or another portal

MyProxy servers

Page 14: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

16

Inside the portal serverInside the portal server

Tomcat

DAGMan workflow manager

Informationsystems

MyProxy server

& VOMS

P-GRADE Portal portlets (JSR-168, Gridsphere 2):Workflow, Certificates, Information System, Settings, GEMLCA

Informationsystemclients

CoG API&

scripts

Java Webstartworkflow editorWeb browser

shell scripts

Grid middleware services (WMS, LFC, gridFTP, GRAM, …)

Client

P-GRADEPortalserver

Grid

Grid middleware clients

Mercurymonitorservice

Mercury API

GEMLCA service(WSRF)

Optional plug-in:

•Technology specific gateways•File transfer•Proxy management•Load monitoring

Page 15: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

17

Workflow Workflow EditorEditorDefining the graphDefining the graph

Define a Directed Acyclic Graph (DAG) of jobs and services (GEMLCA jobs):

1. Drag & drop components:nodes and ports

2. Define their properties3. Connect ports by

channels (no cycles, no loops, no conditions…)

Page 16: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

18

Workflow Workflow EditorEditorProperties of a job componentProperties of a job component

Properties of a job:• Type of executable• Client side location of the binary• Number of required processors• Command line parameters• The resource to be used for the

execution:• Grid (VO)• Resource / broker

Page 17: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

19

Workflow Workflow EditorEditorProperties of a service component (GEMLCA job)Properties of a service component (GEMLCA job)

Properties of a service:• The location of the service:

• Grid (VO)• Resource / broker

• An application (binary) associated with that resource

• Input parameter values for the service

Page 18: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

20

Workflow Workflow EditorEditorDefining job / service input-output dataDefining job / service input-output data

File propertiesType: input: the component reads output: the component writes

File type: local: originates from my desktop remote: originates from a grid storage element

File: location of the file

File storage type (for outputs only): Permanent: final result Volatile: used only for inter-component data transfer

Page 19: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

21

How to refer to an I/O file?How to refer to an I/O file?

• Client side location:c:\experiments\11-04.dat

• LFC logical file name(LFC file catalog is required – EGEE VOs) lfn:/grid/gilda/sipos/11-04.dat

• GridFTP address (in Globus Grids):gsiftp://somengshost.ac.uk/mydir/11-04.dat

Input file Output fileLocal fileLocal file

Remote fileRemote file

• Client side location:result.dat

• LFC logical file name(LFC file catalog is required – EGEE VOs) lfn:/grid/gilda/sipos/11-04_-_result.dat

• GridFTP address (in Globus Grids):gsiftp://somengshost.ac.uk/mydir/result.dat

Page 20: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

22

Workflow level file transferWorkflow level file transferby the workflow managerby the workflow manager

Portalserver

Gridinfrastructure

Computing elements

Storage elements

REMOTE INPUTFILES

REMOTE OUTPUT

FILES

LOCAL INPUT FILES

& BINARIES

LOCAL OUTPUT

FILES

LOCAL INPUT FILES

& BINARIES

LOCAL OUTPUT FILESGEMLCA

repository

User levelstorage

Binaries of GEMLCA

jobs

Page 21: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

23

Job / service level file transferJob / service level file transferby the workflow managerby the workflow manager

Portalserver

Gridinfrastructure

Computing Element

Storage Elements

0 1

2

binary

Post script

Custom file transfer

Pre script

3

REMOTE INPUTFILE

LOCALINPUTFILE

0

1

REMOTE OUTPUT

FILE

LOCAL OUTPUT

FILE 2 3

Generated by the portal

Generated by the portal

Page 22: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

24

Information system portlet toInformation system portlet tobrowse computing elementsbrowse computing elements

Graphical interface for BDII servers

Page 23: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

25

1. Download proxies2. Submit workflow3. Observe workflow progress4. If some error occurs correct the graph5. Download result

Main steps

Workflow executionWorkflow execution

Page 24: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

26

Certificate ManagerCertificate ManagerCertificates portletCertificates portlet

• To start your session on the Grid you must create a proxy certificate on the portal server

• “Certificates” portlet:

• to upload a proxy into MyProxy servers

• to download a proxy from MyProxy into the portal server

Page 25: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

28

Certificate ManagerCertificate ManagerMulti-grid portalMulti-grid portal Multi-proxy environment Multi-proxy environment

Multiple proxies can be available on the portal server at the same time!

Certificate from EGEE CA:SEE-GRID CEs and SEs

Certificate from Hungarian CA:HUNGRID CEs and SEs

Page 26: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

30

MyProxyserver

Portalserver

Gridservices

I have to do this every time when I want to execute

workflows

Proxy1

Certificates, proxies with gLite VOs:Certificates, proxies with gLite VOs:DownloadDownload

VOMSserver

Proxy2

Proxy2

VOMS ext.

Proxy2

VOMS ext.

Page 27: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

31

Workflow ManagementWorkflow Management(workflow portlet)(workflow portlet)

• The portlet presents the status, size and output of the available workflow in the “Workflow” list

• It has a Quota manager to control the users’ storage space on the server• The portlet also contains the “Abort”, “Attach”, “Details”, “Delete” and

“Delete all” buttons to handle execution of workflows• The “Attach” button opens the workflow in the Workflow Editor• The “Details” button gives an overview about the jobs of the workflow

Page 28: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

32White/Red/Green color means the job is initial/running/finished state

Workflow ExecutionWorkflow Execution(observation by the workflow portlet)(observation by the workflow portlet)

Page 29: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

33White/Red/Green color means the job is initial/running/finished state

Workflow ExecutionWorkflow Execution(observation by the workflow portlet)(observation by the workflow portlet)

Page 30: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

34White/Red/Green color means the job is initial/running/finished state

Workflow ExecutionWorkflow Execution(observation by the workflow portlet)(observation by the workflow portlet)

Page 31: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

35White/Red/Green color means the job is initial/running/finished state

Workflow ExecutionWorkflow Execution(observation by the workflow portlet)(observation by the workflow portlet)

Page 32: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

36

Workflow ExecutionWorkflow Execution(observation by the workflow portlet)(observation by the workflow portlet)

White/Red/Green color means the job is initialised/running/finished

Page 33: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

37

On-Line Monitoring both at theOn-Line Monitoring both at the workflow and job levels workflow and job levels (workflow portlet)(workflow portlet)

- The portal monitors and visualizes workflow progress

- The portal monitors and visualizes parallel jobs(if they are prepared for Mercury monitor)

Page 34: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

38

Downloading the results…Downloading the results…

Page 35: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

39

Sharing a successfully finished job with other users: GEMLCA repository

Mkdir Legacy Code exposed as a Grid Service Folder : /../.gemlca/legacycodes/mkdir Content : i) mkdir binary or link ii) config.xml

<?xml version="1.0"?><!DOCTYPE GLCEnvironment "gemlcaconfig.dtd"><GLCEnvironment id="mkdir" executable="LINUX/mkdir" jobManager="Fork" maximumJob="11" minimumProcessors="1" maximumProcessors="1" universe="PVM"><Description>Unix mkdir program</Description> <GLCParameters> <Parameter name="-p" friendlyName="Folder to be created" fixed="No" inputOutput="Input" order="0" mandatory="No" fileCommandline="Commandline"> <initialValue> </initialValue> </Parameter> </GLCParameters></GLCEnvironment>

Legacy Code Interface Description File: config.xml

GEMLCArepository

Page 36: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

40

Collaborative grid applicationsCollaborative grid applications

Combine services and your codein the same workflow!

ServiceServiceinvocationinvocation

Service Service invocationinvocation

Service Service invocationinvocation

Job Job submissionsubmission

Job Job submissionsubmission

Page 37: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

41

Support for parametric study Support for parametric study workflowsworkflows

Since P-GRADE Portal v2.5

Page 38: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

42

General structure of aGeneral structure of a PS applications PS applications

Algorithm 2

Algorithm 2Algorithm 2

Algorithm 2

Algorithm 2

Algorithm 2Algorithm 2

Algorithm 2

Page 39: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

43

AdvancedAdvanced PS applications PS applications

Algorithm 2

Algorithm 2Algorithm 2

Algorithm 2

Algorithm 2

Algorithm 2Algorithm 2

Algorithm 2

Algorithm 1

Algorithm 3

Cut input into smaller units

Aggregate result; perform “global

operation”

Page 40: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

44

PS applications inPS applications inP-GRADE Portal 2.5P-GRADE Portal 2.5

Complete workflow

Files stored in an LFC catalog (e.g. /grid/gilda/sipos/myinputs)

Results will be generated in the same catalog

Page 41: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

45

Advanced PS applications inAdvanced PS applications inP-GRADE Portal 2.5P-GRADE Portal 2.5

Complete workflow

Generator component(s)

Initial input data

Generate orcut input into smaller units

Files stored in an LFC catalog (e.g. /grid/gilda/sipos/myinputs)

Results will be generated in the same catalog

Collector component(s)

Aggregate result

Page 42: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

46

Each job can be a parallel program

– Parallel execution inside a workflow node (SIMD/MIMD/MISD)– Parallel execution among workflow nodes(SIMD/MIMD/MISD)

Multiple jobs run parallel

– Parameter study execution of the workflow(SIMD)

Multiple instances of the same workflow process

different data files

Third level of parallelismThird level of parallelism

Page 43: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

47

Turning a WF into a parameter studyTurning a WF into a parameter study

By turning at least of the open input ports into a “PS Input port” the WF is turned into a Parameter Study

Page 44: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

48

Turning a WF into a parameter studyTurning a WF into a parameter study

/grid/gilda/sipos/InputImages Image.0 Image.1

/grid/gilda/sipos/XCoordinates XCoordinate.0 XCoordinate.1

/grid/gilda/sipos/YCoordinates YCoordinate.0 YCoordinate.1

/grid/gilda/sipos/Output ImagePart.0 ImagePart.1 . . .

Page 45: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

49

Generating multi-grid eWFsGenerating multi-grid eWFs

PS port: 4 instances of the

input file

PS port: 3 instances of the input file =

1 PS workflow execution

Assign to broker of Grid X

Assign to brokerof Grid Y

XA XB XC XD

XI XJ XK XL

XR XS XT XV

YE YF YG YH

YM YN YO YP

YU YX YY YZ

Assigns the 24 jobs to 24 Grid Resources within 2 Grids

Page 46: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

50

GeneratorsGenerators

Auto generator• Pre defined program

logic• To generate text files• User controls file

content by templates and parameters

Custom generator• User provides program

logic• To generate binary file

content (e.g. image, audio, …)

• Generate input files for parameter study workflows

• Saves these files into LFC catalogs• Two types:

Page 47: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

51

CollectorsCollectors

• Collect output units and perform a collective operation on them. E.g.– Standard deviation– Average– Statistics– Evaluation– Selecting the “best” point of the parameter space– …

• User provides the program logic• Portal provides data transfer

– No need to use any Grid API in your code– Open and write I/O files as local files

Page 48: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

52

RReferenceseferences

• P-GRADE Portal service for:– SEE-GRID infrastructure– Central European VO of EGEE– GILDA: Training VO of EGEE– Many national Grids (UK National Grid

Service, HunGrid, Turkish Grid, etc.)– US Open Science Grid, TeraGrid– Economy-Grid, Swiss BioGrid, Bio and

Biomed EGEE VOs, BalticGrid – OGF Grid Interoperability Now (GIN) VO

portal.p-grade.hu/index.php?m=5&s=0

Page 49: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

53

Summary and conclusionSummary and conclusion

• P-GRADE Portal hides the complexity of Grid systems– Globus 2, Globus 4, LCG, gLite

• Various components can be integrated into workflows• Sequential codes• MPI codes • Legacy code services (with the GEMLCA-specific version)

• Workflows can be executed as parameter studies– Storage management– Generators– Collectors

• Your code does not have to contain grid specific calls• Graphical interfaces for

– grid application development– certificate management– application execution and monitoring

• Support for collaborative work– Share workflow components– Share workflows

• Built by standard portlet API customizable to specific needs

Page 50: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

54

Questions, hands-onQuestions, hands-onwww.portal.p-grade.hu

[email protected]

Learn once, use everywhereDevelop once, execute anywhere

Page 51: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

55

Additional slidesAdditional slides

Page 52: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

56

Rescuing a failed workflow 1.Rescuing a failed workflow 1.

A job failed during workflow execution

Read the error log to know why

Page 53: P-GRADE Portal and GEMLCA:  A workflow-oriented portal and application hosting environment

57

Rescuing a failed workflow 2.Rescuing a failed workflow 2.

Map the failed job onto a different

resource or download a new

proxy for it

Don’t touch the finished jobs!

The execution can continue

from the point of failure