Middleware components in EGEE Mike Mineter NeSC Training team [email protected]

50
EGEE is a project funded by the European Union under contract IST-2003-508833 Middleware components in EGEE Mike Mineter NeSC Training team [email protected] http://egee- intranet.web.cern.ch

description

Middleware components in EGEE Mike Mineter NeSC Training team [email protected]. http://egee-intranet.web.cern.ch. Acknowledgements. This presentation includes slides and information from many sources: - PowerPoint PPT Presentation

Transcript of Middleware components in EGEE Mike Mineter NeSC Training team [email protected]

Page 1: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

EGEE is a project funded by the European Union under contract IST-2003-508833

Middleware components in EGEEMike Mineter

NeSC Training [email protected]

http://egee-intranet.web.cern.ch

Page 2: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Middleware components in EGEE - 2

Acknowledgements

This presentation includes slides and information from many sources: Roberto Barbera (Slides on middleware are based on presentations

given in Edinburgh, April 2004) Other colleagues in EGEE The European DataGrid training team Authors of the LCG-2 User Guide v. 2.0 : Antonio Delgado Peris,

Patricia Méndez Lorenzo, Flavia Donno, Andrea Sciabà, Simone Campana, Roberto Santinelli https://edms.cern.ch/file/454439//LCG-2-UserGuide.html

Additional slides and preparation by Mike Mineter

Page 3: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Middleware components in EGEE - 3

Outline

• Overview• Major components• Data management• Lifecycle of a job • Summary

Page 4: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Middleware components in EGEE - 4

Towards a European e-Infrastructure

• To underpin European science and technology in the service of society

• To link with and build on National, regional and

international initiatives Emerging technologies (e.g.

fibre optic networks)

• To foster international cooperation both in the creation and the use

of the e-infrastructure Network infrastructure

(GÉANT )

Ope

ratio

ns,

Supp

ort a

nd

trai

ning

Collaboration

Pan-European Grid

Page 5: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Middleware components in EGEE - 5

User-view of EGEE: a multi-VO Grid

User Interface

Grid services

User Interface

Page 6: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Middleware components in EGEE - 6

1997- Present: Globus

• A software toolkit addressing certain technical problems in the development of Grid enabled tools, services, and applications Offers a modular “bag of technologies” Made available under liberal open source license

• Not turnkey solutions, but building blocks and tools for application developers and system integrators

Page 7: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Middleware components in EGEE - 7

Globus: Key components

• Grid Security Infrastructure (GSI) X.509 authentication with delegates and single sign-on

• Grid Resource Allocation Mgmt (GRAM) Remote allocation, monitoring of job, control of compute resources

• GridFTP protocol (FTP extensions) High-performance data access & transport

• Grid Resource Information Service (GRIS) +Monitoring and Discovery Service (MDS) Access to structure & state information

• Others…

Page 8: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Middleware components in EGEE - 8

VDT

• “The Virtual Data Toolkit (VDT) is an ensemble of grid middleware that can be easily installed and configured. In our experience, installing grid software is challenging and time consuming. The goal of the VDT is to make it as easy as possible for users to deploy, maintain and use grid middleware.” http://www.cs.wisc.edu/vdt/

Page 9: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Middleware components in EGEE - 10

Part of the Grid “ecosystem”

. . .

LCG

2004

2001

EGEE

Used in

USA EU

NextGrid …GridCC

Future e-Infrastructure

EDG

Globus MyProxyCondor ...

VDTDataTAG

AliEnCrossGrid ...

SRM

Page 10: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Middleware components in EGEE - 11

Part of the Grid “ecosystem”

. . .

LCG

2004

2001

EGEE

Used in

USA EU

NextGrid DEISA, …GridCC

Future e-Infrastructure

EDG

Globus MyProxyCondor ...

VDTDataTAG

AliEnCrossGrid ...

SRM

Large Hadron Collider Compute Grid

“hardened” EDG – with strong focus on LCG challenges

Page 11: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Middleware components in EGEE - 12

Current production mware: LCG-2

Computing cluster Network resources Data storage

Operating system Local schedulerFile system

User access SecurityData transferInformation schema

Workload management Data managementApp monitoring system

User interfaces Applications

Hardware

System software

“Basic” services

“Collective” services

Application level services

HPSS, CASTOR…

RedHat Linux NFS, … PBS, Condor, LSF,…

VDT (Condor, Globus, GLUE)

EU DataGrid

Information system

Page 12: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Middleware components in EGEE - 13

Outline

• Overview• Major components• Data management• Lifecycle of a job• Summary

Page 13: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Middleware components in EGEE - 14

Major componentsReplicaReplicaCatalogueCatalogue

Logging &Logging &Book-keepingBook-keeping

ResourceResourceBrokerBroker

StorageStorageElementElement

ComputingComputingElementElement

Information Information ServiceService

Job Status

DataSets info

Author.&Authen.

Job Submit

Event

Job Q

uery Job S

tatu

s

Input “sandbox”

Input “sandbox” + Broker Info

Output “sandbox”

Output “sandbox”

Publish

SE & CE info

““User User interface”interface”

Page 14: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Middleware components in EGEE - 15

User Interface node

• The user’s interface to the Grid• Command-line interface to

Proxy server Job operations

• To submit a job• Monitor its status• Retrieve output

Data operations• Upload file to SE• Access file• …

Other grid services

• Also C++ and Java APIs

• To run a job user creates a JDL (Job Description Language) file

UIJDL

Page 15: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Middleware components in EGEE - 16

Authentication, Authorisation

• Authentication User obtains certificate from CA Connects to UI by ssh Downloads certificate Invokes Proxy server Single logon – to UI - then

Secure Socket Layer with proxy identifies user to other nodes

• Authorisation - currently User joins Virtual Organisation VO negotiates access to Grid nodes

and resources (CE, SE) Authorisation tested by CE, SE:gridmapfile maps user to local account

UI

CA

VO mgr

Personal

VO database

Gridmapfiles

On CE, SE nodes

SSL (proxy)

VO service

Daily update

Page 16: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Middleware components in EGEE - 17

“Compute element”

A CE is a grid batch queuewith a “grid gate” front-end:

Homogeneous set of worker nodes

Grid gate node

Local resource management system:Condor / PBS / LSF master

Globus gatekeeper

Job request

Info system

Logging

gridmapfile

I.S.

Logging

Page 17: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Middleware components in EGEE - 18

Storage elements and files

• Storage elements hold files: write once, read many

Local Info

EventLogging

gridmapfile

GridFTP

Disk arrays or Disk arrays or tapestapes

Info system

LoggingGlobus gatekeeper

File transfer Requests

Page 18: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Middleware components in EGEE - 19

Workload Management System (WMS)

• Distributed scheduling multiple UI’s where you submit your job multiple RB’s from where the job is sent to a CE multiple CE’s where the job can be put in a queuing system

• Distributed resource management multiple information systems that monitor the state of the grid Information from SE, CE, sites

Page 19: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Middleware components in EGEE - 20

Resource Broker nodes

• Run the Workload Management System To accept job submissions Dispatch jobs to appropriate Compute Element (CE) Allow users

• To get information about their status• To retrieve their output

• A configuration file on each UI node determines which RB node(s) will be used

• When a user submits a job, JDL options are to: Specify CE Allow RB to choose CE (using optional tags to define

requirements) Specify SE (then RB finds “nearest” appropriate CE, after

interrogating Replica Location Service)

Page 20: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Middleware components in EGEE - 23

Information System

• Based on the Globus “Monitoring and Discovery Service”

• Receives periodic (~5 minutes) updates from CE, SE

• Used by RB node to determine resources to be used by a job

• Uses “GLUE schema”

Page 21: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Middleware components in EGEE - 24

Information System

Page 22: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Middleware components in EGEE - 25

Page 23: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Middleware components in EGEE - 26

Outline

• Overview• Major components• Data management• Lifecycle of a job• Summary

Page 24: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Middleware components in EGEE - 27

Data management

• User data: generally file-oriented (some RDBMS exceptions exist)

• Small files: On UI; passed to/from CE via sandbox

• Large files: require SE Replica files on different SEs

• Fault tolerance• Performance:

– run job on CE “close” to data– share load on SE

Replica Catalogue - what replicas exist for a file? Replica Location Service - where are they?

Page 25: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Middleware components in EGEE - 29

Naming Conventions

Logical File Name (LFN) • An alias created by a user to refer to some item of data e.g.

“lfn:cms/20030203/run2/track1”

Site URL (SURL) (or Physical File Name (PFN))

• The location of an actual piece of data on a storage system e.g. “srm://pcrd24.cern.ch/flatfiles/cms/output10_1”

Globally Unique Identifier (GUID)

• A non-human readable unique identifier for an item of data e.g. “guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6”

Logical File Name 1

Logical File Name 2

Logical File Name n

GUIDPhysical File SURL n

Physical File SURL 1

Page 26: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Middleware components in EGEE - 30

Replica Metadata Catalog (RMC) Replica Location Service (RLS)

• RMC: Stores LFN-GUID mappings

• RLS: Stores GUID-SURL mappings

Logical File Name 1

Logical File Name 2

Logical File Name n

GUIDPhysical File SURL n

Physical File SURL 1

RMC RLS

RMRLS

RMC

Page 27: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Middleware components in EGEE - 31

StorageElement

Data Replication Services: Basic Functionality

Replica ManagerReplica Location

Service

Replica Metadata Catalog

StorageElement

Files have replicas stored at many Grid sites on Storage Elements.

Each file has a unique GUID.Locations corresponding to the GUID are kept in the Replica Location Service.

Users may assign aliases to the GUIDs. These are kept in the Replica Metadata Catalog.

The Replica Manager provides atomicity for file operations, assuring consistency of SE and catalog contents.

Page 28: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Middleware components in EGEE - 33

Outline

• Overview• Major components• Data management • Lifecycle of a job• Summary

Page 29: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

UI

NetworkServer

Job Contr.

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

Characts.& status

Page 30: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

submitted

Job Status

UI: allows users to access the functionalitiesof the WMS(via command line, GUI, C++ and Java APIs)

Page 31: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

edg-job-submit myjob.jdlMyjob.jdl

JobType = “Normal”;Executable = "$(CMS)/exe/sum.exe";InputSandbox = {"/home/user/WP1testC","/home/file*”, "/home/user/DATA/*"};OutputSandbox = {“sim.err”, “test.out”, “sim.log"};Requirements = other. GlueHostOperatingSystemName == “linux" && other. GlueHostOperatingSystemRelease == "Red Hat 7.3“ && other.GlueCEPolicyMaxCPUTime > 10000;Rank = other.GlueCEStateFreeCPUs;

submitted

Job Statu

s

Job Description Language(JDL) to specify job characteristics and requirements

Page 32: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

Input Sandboxfiles

Jobwaiting

submitted

Job StatusNS: network daemon

responsible for acceptingincoming requests

Page 33: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Job submission

UI

NetworkServer

Job Contr.-

CondorG

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

waiting

submitted

Job Status

WM: acts to satisfy the request

Job

Workload manager

Page 34: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

waiting

submitted

Job Status

Match-Maker/Broker

Where must thisjob be executed ?

Page 35: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

waiting

submitted

Job Status

Match-Maker/ Broker

Matchmaker: responsible to find the “best” CE for a job

Page 36: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

waiting

submitted

Job Status

Match-Maker/ Broker

Where are (which SEs) the needed data ?

What is thestatus of the

Grid ?

Page 37: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

waiting

submitted

Job Status

Match-Maker/Broker

CE choice

Page 38: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

waiting

submitted

Job Status

JobAdapter

Job Adapter: responsible for the final “touches” to the job before performing submission(e.g. creation of wrapper script, PFN, etc.)

Page 39: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Job submission

UI

NetworkServer

Job Contr.

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

Job Status

Job Controller: responsible for theactual job managementoperations (done via CondorG)

Job

submitted

waiting

ready

Page 40: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

Job Status

Job

submitted

waiting

ready

scheduled

Page 41: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

“Compute element” – reminder!

Homogeneous set of worker nodes

Grid gate node

Local resource management system:Condor / PBS / LSF master

Globus gatekeeper

Job request

Info system

Logging

gridmapfile

I.S.

Logging

Page 42: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

RBstorage

Job Status

submitted

waiting

ready

scheduled

running

“Grid enabled”data transfers/

accesses

Job

InputSandboxfiles

Page 43: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

RBstorage

Job Status

OutputSandboxfiles

submitted

waiting

ready

scheduled

running

done

Page 44: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

RBstorage

Job Statussubmitted

waiting

ready

scheduled

running

done

edg-job-get-output <dg-job-id>

Page 45: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

RBstorage

Job Status

OutputSandboxfiles

submitted

waiting

ready

scheduled

running

done

cleared

Page 46: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Job monitoring

UI

Log Monitor

Logging &Bookkeeping

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ComputingElement

RB node

LM: parses CondorG logfile (where CondorG logsinfo about jobs) and notifies LB

LB: receives and stores job events; processes corresponding job status

Log ofjob events

edg-job-status <dg-job-id>edg-job-get-logging-info <dg-job-id>

Job status

Page 47: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Middleware components in EGEE - 52

Interfaces

• Command-line: ssh onto user interface machine

• Portals – e.g. GENIUS – access from browser

• API’s: functions invoked from programs Job submission Details: see talks in Madrid via EGEE training website!

Page 48: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Middleware components in EGEE - 53

About jobs…

• Where is the exe? On the UI, downloaded in the sandbox OR: On the Worker Nodes, downloaded for a VO

• Can MPI be run? On some compute elements NOT across compute elements EGEE – DEISA links for HPC are intended !

• Can they be interactive? Its been seen…BUT it is not supported…

Page 49: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Middleware components in EGEE - 59

Current production mware: LCG-2

Computing cluster Network resources Data storage

Operating system Local schedulerFile system

User access SecurityData transferInformation schema

Workload management Data managementApp monitoring system

User interfaces Applications

Hardware

System software

“Basic” services

“Collective” services

Application level services

HPSS, CASTOR…

RedHat Linux NFS, … PBS, Condor, LSF,…

VDT (Condor, Globus, GLUE)

EU DataGrid

Information system

Page 50: Middleware components in EGEE Mike Mineter NeSC Training team mjm@nesc.ac.uk

Middleware components in EGEE - 60

Summary: EGEE componentsReplicaReplicaCatalogueCatalogue

Logging &Logging &Book-keepingBook-keeping

ResourceResourceBrokerBroker

StorageStorageElementElement

ComputingComputingElementElement

Information Information ServiceService

Job Status

DataSets info

Author.&Authen.

Job Submit

Event

Job Q

uery Job S

tatu

s

Input “sandbox”

Input “sandbox” + Broker Info

Output “sandbox”

Output “sandbox”

Publish

SE & CE info

““User User interface”interface”