Supporting MPI Applications on EGEE Grids Zolt án Farkas MTA SZTAKI

25
EGEE-II INFSO-RI- 031688 Enabling Grids for E-sciencE www.eu-egee.org Supporting MPI Applications on EGEE Grids Zoltán Farkas MTA SZTAKI

description

Supporting MPI Applications on EGEE Grids Zolt án Farkas MTA SZTAKI. Contents. MPI Standards Implementations EGEE and MPI History Current status Working/research groups in EGEE Future Works P-GRADE Grid Portal Workflow execution, file handling Direct job submission - PowerPoint PPT Presentation

Transcript of Supporting MPI Applications on EGEE Grids Zolt án Farkas MTA SZTAKI

Page 1: Supporting MPI Applications on EGEE Grids Zolt án  Farkas MTA SZTAKI

EGEE-II INFSO-RI-031688

Enabling Grids for E-sciencE

www.eu-egee.org

Supporting MPI Applications onEGEE Grids

Zoltán FarkasMTA SZTAKI

Page 2: Supporting MPI Applications on EGEE Grids Zolt án  Farkas MTA SZTAKI

2

Enabling Grids for E-sciencE

Contents

• MPI– Standards– Implementations

• EGEE and MPI– History– Current status– Working/research groups in EGEE– Future Works

• P-GRADE Grid Portal– Workflow execution, file handling– Direct job submission– Brokered job submission

Page 3: Supporting MPI Applications on EGEE Grids Zolt án  Farkas MTA SZTAKI

Budapest, 5 July 2006

Enabling Grids for E-sciencE

MPI

• MPI stands for Message Passing Interface– Standards 1.1 and 2.0

• MPI Standard features:– Collective communication (1.1+)– Point-to-Point communication (1.1+)– Group management (1.1+)– Dynamic Processes (2.0)– Programming Language APIs– …

Page 4: Supporting MPI Applications on EGEE Grids Zolt án  Farkas MTA SZTAKI

Budapest, 5 July 2006

Enabling Grids for E-sciencE

MPI Implementations

• MPICH– Freely available implementation of MPI– Runs on many architectures (even on Windows)– Implements Standards 1.1 (MPICH) and 2.0 (MPICH2)– Supports Globus (MPICH-G2)– Nodes are allocated upon application execution

• LAM/MPI– Open-source implementation of MPI– Implements Standards 1.1 and parts of 2.0– Many interesting features (checkpoint)– Nodes are allocated before application execution

• Open MPI– Implements Standard 2.0– Uses technologies of other projects

Page 5: Supporting MPI Applications on EGEE Grids Zolt án  Farkas MTA SZTAKI

Budapest, 5 July 2006

Enabling Grids for E-sciencE

MPICH execution on x86 clusters

• Application can be started …– … using ‘mpirun’– … specifying:

number of requested nodes (-np <nodenumber>), a file containing the nodes to be allocated (-machinefile

<arg>) [OPTIONAL], the executable, executable arguments.

– $ mpirun –np 7 ./cummu –N –M –p 32

• Processes are spawned using ‘rsh’ or ‘ssh’, depending on the configuration

Page 6: Supporting MPI Applications on EGEE Grids Zolt án  Farkas MTA SZTAKI

Budapest, 5 July 2006

Enabling Grids for E-sciencE

MPICH x86 execution – requirements

• Executable (and input files) must be present on worker nodes:– Using Shared Filesystem, or– User distributes the files before invoking ‘mpirun’.

• Accessing worker nodes from the host running ‘mpirun’:– Using ‘rsh’ or ‘ssh’– Without user interaction (host-based authentication)

Page 7: Supporting MPI Applications on EGEE Grids Zolt án  Farkas MTA SZTAKI

Budapest, 5 July 2006

Enabling Grids for E-sciencE

EGEE and MPI

• MPI became important at the end of 2005/beginning of 2006:– Intructions about CE/jobmanager/WN configuration– The user has to start a wrapper script

the input sandbox isn’t distributed to worker nodes sample wrapper script, which works for PBS, LFS and

assumes ssh

• Current status (according to experiments):– No need to use wrapper scripts– MPI jobs fail in case on no shared filesystems– Remote file handling not supported, so user has to take

care

Page 8: Supporting MPI Applications on EGEE Grids Zolt án  Farkas MTA SZTAKI

Budapest, 5 July 2006

Enabling Grids for E-sciencE

EGEE and MPI - II

• Research/Working groups formed:– MPI TCG WG:

User requirements:• „Shared” filesystem: distribute executable and input files• Storage Element handling

Site requirements:• Solution must be compatible with a big number of

jobmanagers• Infosystem extensions (max. number of concurent CPUs

used by a job, …)

– MSc research group (1-month project, 2 students): Created wrapper scripts for MPICH, LAM/MPI, OpenMPI Application source is compiled before execution Executable and input files are distributed to allocated worker

nodes, ‘ssh’ is assumed No remote file support

Page 9: Supporting MPI Applications on EGEE Grids Zolt án  Farkas MTA SZTAKI

Budapest, 5 July 2006

Enabling Grids for E-sciencE

EGEE and MPI – Future work

• Add support for:

– all possible jobmanagers

– all possible MPI implementations

– Storage Element handling in case of legacy applications

– input sandbox distribution in case of no shared filesystems before application execution

– output file collection in case of no shared filesystems after the application has been executed

Page 10: Supporting MPI Applications on EGEE Grids Zolt án  Farkas MTA SZTAKI

Budapest, 5 July 2006

Enabling Grids for E-sciencE

P-GRADE Grid Portal

• Workflow execution:– DAGMan as workflow scheduler– pre and post script to perform tasks around job exeution– Direct job execution using GT-2:

GridFTP, GRAM pre: create temporary storage directory, copy input files job: Condor-G is executing a wrapper script post: download results

– Job execution using EGEE broker (both LCG/gLite): pre: create application context as input sandbox job: Scheduler universe Condor job executing a script, which

does job submission, status polling, output downloading. A wrapper script is submitted to the broker

post: error checking

Page 11: Supporting MPI Applications on EGEE Grids Zolt án  Farkas MTA SZTAKI

Budapest, 5 July 2006

Enabling Grids for E-sciencE

Portal: File handling

• „Local” files:– User has access to these files through the Portal– Local input files are uploaded from the user machine– Local output files are downloaded to the user machine

• „Remote” files:– Files reside on EGEE Storage Elements or are accessible

using GridFTP– EGEE SE files:

lfn:/… guid:…

– GridFTP files: gsiftp://…

Page 12: Supporting MPI Applications on EGEE Grids Zolt án  Farkas MTA SZTAKI

Budapest, 5 July 2006

Enabling Grids for E-sciencE

Portal: Direct job execution

• The resource to be used is known before job execution

• The user must have a valid, accepted certificate

• Local files are supported

• Remote GridFTP files are supported, even in case of grid-unaware applications

• Jobs may be sequential or MPI applications

Page 13: Supporting MPI Applications on EGEE Grids Zolt án  Farkas MTA SZTAKI

Budapest, 5 July 2006

Enabling Grids for E-sciencE

Direct exec: step-by-step I.

1. Pre script:• creates a storage directory on the selected site’s front-

end node, using the ‘fork’ jobmanager• local input files are copied to this directory from the

Portal machine using GridFTP• remote input files are copied using GridFTP (in case of

errors, a two-phase copy is tried using Portal machine)

2. Condor-G job:• a wrapper script (wrapperp) is specified as the real

executable• a single job is submitted to the requested jobmanager,

for MPI jobs the ‘hostcount’ RSL attribute is used to specify the number of requested nodes

Page 14: Supporting MPI Applications on EGEE Grids Zolt án  Farkas MTA SZTAKI

Budapest, 5 July 2006

Enabling Grids for E-sciencE

Direct exec: step-by-step II.

3. LRSM:• allocate the number of requested nodes (if needed)• start wrapperp on one of the allocated nodes (master

worker node)

4. Wrapperp (running on master worker node):• copies the executable and input files from the front-end

node (‘scp’ or ‘rcp’)• in case of PBS jobmanagers, executable and input files

are copied to the allocated nodes (PBS_NODEFILE). In case of non-PBS jobmanagers, shared filesystem is required, as the host names of the allocated nodes cannot be determined

• wrapperp searches for ‘mpirun’• the real executable is started using the found ‘mpirun’• in case of PBS jobmanagers, output files are copied from

the allocated worker nodes to the master worker node)• output files are copied to the front-end node

Page 15: Supporting MPI Applications on EGEE Grids Zolt án  Farkas MTA SZTAKI

Budapest, 5 July 2006

Enabling Grids for E-sciencE

Direct exec: step-by-step III.

5. Post script:• local output files are copied from the temporary working

directory created by the pre script to the Portal machine using GridFTP

• remote output files are copied using GridFTP (in case of errors, a two-phase copy is tried using Portal machine)

6. DAGMan: schedule next jobs…

Page 16: Supporting MPI Applications on EGEE Grids Zolt án  Farkas MTA SZTAKI

Budapest, 5 July 2006

Enabling Grids for E-sciencE

Direct execution: animated

Remote filestoragePortal machine

Temp.Storage

1

1

Fork GridFTP

PBS

2

Master WN

WrapperpSlave WN1 Slave WNn-1

3

In/exempirun

OutputExecutable

In/exe

OutputExecutable

In/exe

OutputExecutable

4

4

5

5

Page 17: Supporting MPI Applications on EGEE Grids Zolt án  Farkas MTA SZTAKI

Budapest, 5 July 2006

Enabling Grids for E-sciencE

Direct Submission Summary

• Pros:– Users can add remote file support to legacy applications– Works for both sequential and MPI(CH) applications– For PBS jobmanagers, there is no need to have a shared

filesystem (support for other jobmanagers can be added, depends on informations provided by jobmanagers)

– Works in case of jobmanagers, which do not support MPI– Faster, than submitting with the broker

• Cons:– doesn’t integrate into the EGEE middleware– user needs to specify the execution resource– currently doesn’t work on non-PBS jobmanagers without

shared filesystems

Page 18: Supporting MPI Applications on EGEE Grids Zolt án  Farkas MTA SZTAKI

Budapest, 5 July 2006

Enabling Grids for E-sciencE

Portal: Brokered job submission

• The resource to be used is unknown before job execution

• The user must have a valid, accepted certificate

• Local files are supported

• Remote files residing on Storage Elements are supported, even in case of grid-unaware applications

• Jobs may be sequential or MPI applications

Page 19: Supporting MPI Applications on EGEE Grids Zolt án  Farkas MTA SZTAKI

Budapest, 5 July 2006

Enabling Grids for E-sciencE

Broker exec: step-by-step I.

1. Pre script:• creates the Scheduler universe Condor submit file

2. Scheduler Universe Condor job:• the job is a shell script• the script is responsible for:

• job submission: a wrapper script (wrapperrb) is specified as the real executable in the JDL file

• job status polling• job output downloading

Page 20: Supporting MPI Applications on EGEE Grids Zolt án  Farkas MTA SZTAKI

Budapest, 5 July 2006

Enabling Grids for E-sciencE

Broker exec: step-by-step II.

3. Resource Broker:• handles requests of the Scheduler universe Condor job• sends the job to a CE• watches its exeution• reports errors• …

4. LRMS on CE:• allocates the requested number of nodes• starts wrapperrb on the master worker node using

‘mpirun’

Page 21: Supporting MPI Applications on EGEE Grids Zolt án  Farkas MTA SZTAKI

Budapest, 5 July 2006

Enabling Grids for E-sciencE

Broker exec: step-by-step III.

5. Wrapperrb:• the script is started by ‘mpirun’, so this script starts on

every allocated worker node like an MPICH process• checks if remote input files are already present. If not,

they are downloaded from the storage element• if the user specified any remote output files, they are

removed from the storage• the real executable is started with the arguments passed

to the script. These arguments already contain MPICH-specific ones

• after the executable has been finished, remote output files are uploaded to the storage element (only in case of gLite)

6. Post script:• nothing special…

Page 22: Supporting MPI Applications on EGEE Grids Zolt án  Farkas MTA SZTAKI

Budapest, 5 July 2006

Enabling Grids for E-sciencE

Broker execution: animated

Portal Machine

Resource Broker

Front-end node

Globus

PBS

Master WN

…mpirun

wrapperrb

Real exe

Slave WN1

wrapperrb

Real exe

Slave WNn-1

wrapperrb

Real exe

StorageElement

2

3

4

5

5

5

Page 23: Supporting MPI Applications on EGEE Grids Zolt án  Farkas MTA SZTAKI

Budapest, 5 July 2006

Enabling Grids for E-sciencE

Broker Submission Summary

• Pros:– adds support for remote file handling in case of legacy

applications– extends the functionality of the EGEE broker– one solution supports both sequential and MPI

applications

• Cons:– slow application execution– status polling generates high load with 500+ jobs

Page 24: Supporting MPI Applications on EGEE Grids Zolt án  Farkas MTA SZTAKI

Budapest, 5 July 2006

Enabling Grids for E-sciencE

Experimental results

• Tested some selected SEEGRID CEs using the broker from command line and the direct job submission from P-GRADE Portal with a job requesting 3 nodes

CE Name Broker Result Portal Direct Result

ce.phy.bg.ac.yu Failed (exe not found) OK

ce.ulakbim.gov.tr Scheduled OK

ce01.isabella.grnet.gr OK Failed (job held)

ce02.grid.acad.bg OK OK

cluster1.csk.kg.ac.yu Failed OK

grid01.rcub.bg.ac.yu Failed OK

grid2.cs.bilkent.edu.tr Failed (exe not found) OK

Page 25: Supporting MPI Applications on EGEE Grids Zolt án  Farkas MTA SZTAKI

Budapest, 5 July 2006

Enabling Grids for E-sciencE

Thank you for your attention

?