Faculty of Sciences, University of Novi Sad Architecture...

63
EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE Feb. 06, 2009 www.eu-egee.org Introduction to High Performance and Grid Computing Faculty of Sciences, University of Novi Sad Dusan Vudragovic <[email protected]> Scientific Computing Laboratory Institute of Physics Belgrade Serbia Architecture and Services of gLite Middleware Introduction to High Performance and Grid Computing

Transcript of Faculty of Sciences, University of Novi Sad Architecture...

Page 1: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Enabling Grids for E-sciencE

Feb. 06, 2009 www.eu-egee.org

Introduction to High Performance and Grid Computing Faculty of Sciences, University of Novi Sad

Dusan Vudragovic <[email protected]> Scientific Computing Laboratory Institute of Physics Belgrade Serbia

Architecture and Services of gLite Middleware

Introduction to High Performance and Grid Computing

Page 2: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Set of basic Grid services

•  Job submission/management •  File transfer (individual, queued) •  Database access •  Data management (replication, metadata) •  Monitoring/Indexing system information

Introduction to High Performance and Grid Computing 2

Page 3: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Grid services

•  Authentication (CA) •  Authorization (VOMS) •  Information System •  User Interface (UI) •  Computing Element (CE) •  Storage Element (SE) •  Workload Management System (WMS)

Introduction to High Performance and Grid Computing 3

Page 4: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Authentication (1/10)

•  Cryptography –  To implement the security infrastructure, cryptography

uses mathematical algorithms that provide important building blocks

–  Corresponding definitions for the above symbols:   Plaintext: M   Cyphertext: C   Encryption with key K1 : EK1(M) = C   Decryption with key K2 : DK2(C) = M

–  Algorithms   Symmetric: K1 = K2   Asymmetric: K1 ≠ K2

Introduction to High Performance and Grid Computing 4

K2 K1

Encryption Decryption M C M

Page 5: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Authentication (2/10)

•  Cryptography :: Symmetric Algorithms –  The same key is used for encryption and decryption (no

public key, only secret keys available.) –  Advantages

  Fast

–  Disadvantages   Exchange of secret keys needed:

–  how to distribute the keys?

  the number of keys is O(n2) –  Examples:

  DES   3DES   AES

Introduction to High Performance and Grid Computing 5

A B

Hi! 3$r Hi!

A B

Hi! 3$r Hi! 3$r

3$r

Page 6: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Authentication (3/10)

•  Cryptography :: Public Key Algorithms (Asymmetric) –  Every user has two keys: one private (secret) and one public:

  it is impossible to derive the private key from the public one

  a message encrypted by one key can be decrypted only by the other one.

–  No exchange of private key is possible.   the sender cyphers using

the public key of the receiver   the receiver decrypts using

his own private key;   the number of keys is O(n).

–  Examples: RSA (1978)

Introduction to High Performance and Grid Computing 6

B keys

public private

A keys

public private

A B Hi! 3$r Hi!

A B Hi! cy7 Hi!

3$r

cy7

Page 7: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Authentication (4/10)

•  Cryptography :: Digital Signature –  A calculates the hash of

the message (with a one-way hash function)

–  A encrypts the hash using his private key: the encrypted hash is the digital signature

–  A sends the signed message to B –  B calculates the hash of the

message and verifies it with A, decyphered with A’s public key

–  If two hashes equal: message wasn’t modified; A cannot repudiate it.

Introduction to High Performance and Grid Computing 7

B

This is some

message

Digital Signature

A

This is some

message

Digital Signature

This is some

message

Digital Signature

Hash(A)

A keys

public private

Hash(B)

Hash(A)

= ?

Page 8: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Authentication (5/10)

•  Digital Certificates –  A’s digital signature is safe if:

  A’s private key is not compromised   B knows A’s public key

–  How can B be sure that A’s public key is really A’s public key and not someone else’s?   A third party guarantees the correspondence between public key

and owner’s identity.   Both A and B must trust this third party

–  Two models proposed to build trust:   X.509: hierarchical organization (used in Grid)   PGP: “web of trust” (person to person)

Introduction to High Performance and Grid Computing 8

Page 9: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Authentication (6/10)

•  Certification Authorities –  The “third party” is called Certification Authority (CA). –  Responsibilities of CA:

  Issue Digital Certificates (containing public key and owner’s identity) for users, programs and machines

  Check identity and the personal data of the requestor   Registration Authorities (RAs) do the actual validation   Revoke certificates in case of a compromise   Renew certificates in case of expiration   Periodically publish a list of revoked certificates through web

repository   Certificate Revocation Lists (CRL): contain all the revoked

certificates –  CA certificates are self-signed

Introduction to High Performance and Grid Computing 9

Page 10: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Authentication (7/10)

•  X.509 Certificates –  An X.509 Certificate contains:

  owner’s public key;   identity of the owner (DN);   info on the CA;   time of validity;   Serial number;   digital signature of the CA

Introduction to High Performance and Grid Computing 10

Structure of a X.509 certificate

Page 11: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Authentication (8/10) •  The Grid Security Infrastructure (GSI)

–  Based on X.509 PKI:   every user/host/service has

an X.509 certificate;   certificates are signed

by trusted (by the local sites) CA’s;   every Grid transaction is mutually

authenticated:   Ali sends his certificate;

•  B verifies signature in A’s certificate;

•  B sends A a challenge string; •  A encrypts the challenge string

with his private key; •  A sends encrypted challenge to B •  B uses A’s public key to decrypt

the challenge. •  B compares the decrypted string

with the original challenge •  If they match, B verifies A’s identity

and A can not repudiate it.

Introduction to High Performance and Grid Computing 11

Page 12: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Authentication (9/10)

•  X.509 Proxy Certificate –  Proxy: GSI extension to X.509 Identity Certificates

  signed by the normal end entity cert (or by another proxy). –  It enables single sign-on. –  It supports some important features:

  Delegation   Mutual authentication

–  It has a limited lifetime (minimized risk of “compromised credentials”)

–  User enters pass phrase, which is used to decrypt private key –  Private key is used to sign a proxy certificate with its own, new

public/private key pair.

Introduction to High Performance and Grid Computing 12

User certificate file

Private Key (Encrypted)

Pass Phrase

User Proxy certificate file

Page 13: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Authentication (10/10)

•  Delegation –  Delegation = remote creation of a (second level) proxy credential

  New key pair generated remotely on server   Client signs proxy cert and returns it

–  Allows remote process to authenticate on behalf of the user

Introduction to High Performance and Grid Computing 13

Page 14: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Authorization (1/7)

•  Multi-institution issues

Introduction to High Performance and Grid Computing 14

Trust Mismatch

Mechanism Mismatch

Certification Authority

Certification Authority

Domain A

Server X Server Y

Policy Authority

Policy Authority

Task

Domain B

Sub-Domain A1 Sub-Domain B1

No Cross-

Domain Trust

Page 15: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Authorization (2/7)

•  Grid solution: use of VOs

Introduction to High Performance and Grid Computing 15

Certification

Domain A

Server X Server Y

Policy Authority

Policy Authority

Task Domain B

Sub-Domain A1

GSI

Certification Authority

Sub-Domain B1

Authority

Federation Service

Virtual Organization

Domain

No Cross-

Domain Trust

Page 16: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Authorization (3/7)

•  Use delegation to establish dynamic distributed system

Introduction to High Performance and Grid Computing 16

Computing Center

VO

Rights

Computing Center

Service

Page 17: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Authorization (4/7)

•  VOMS server –  Virtual organizations (VOs) are groups of Grid users

(authenticated through digital certificates) –  VO Management Service (VOMS) serves as a central repository

for user authorization information, providing support for sorting users into a general group hierarchy, keeping track of their roles,etc.

–  VO Manager, according to VO policies and rules, authorizes authenticated users to become VO members

–  Resource centers (RCs) may support one or more VOs, and this is how users are authorized to use computing, storage and other Grid resources

–  VOMS allows flexible approach to A&A on the Grid

Introduction to High Performance and Grid Computing 17

Page 18: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Authorization (5/7)

•  VOMS Ingredients –  Attribute Certificates: AC is a PKI container, defined in RFC

3281, capable of containing a set of attributes tied to a specific identity. It is the system used by VOMS to issue its attributes.

–  VOMS groups: /aegis/scl –  VOMS roles: /Role=VO-Admin

  Roles can be defined for groups as well –  FQAN (Fully Qualified Attribute Name) is a compact way to

represent user’s membership in a group, along with its role holdership, if any   Syntax: <groupname>/Role=<rolename>/Capability=NULL where

the /Capability=NULL may be omitted, since it refers to a deprecated feature of VOMS

  /aegis/scl/Role=NULL/Capability=NULL

Introduction to High Performance and Grid Computing 18

Page 19: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Authorization (6/7)

•  Attribute Certificate –  FQAN are included in an Attribute Certificate –  Attribute Certificates are used to bind a set of attributes (like

membership, roles, authorization info etc) with an identity –  ACs are digitally signed –  VOMS uses AC to include the attributes of a user in a proxy

certificat

Introduction to High Performance and Grid Computing 19

Page 20: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Authorization (7/7)

•  VOMS Architecture

Introduction to High Performance and Grid Computing 20

VOMS Core Service (vomsd)

VOMS Admin Service

Admin Service SOAP

Web User Interface

Authorization Database

voms-proxy-init

voms-admin CLI

Web browser

GSI

SOAP + SSL

HTTPS

VOMS Server

Page 21: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Information System (1/10) •  Collect information of grid resources

–  Discovering new added resources –  Monitoring load and health status

•  Publish these information –  Periodically updated –  Well know data model

•  Used by –  Users searching a concrete resource –  WMS allocating and managing jobs –  Other monitoring services

•  Basic data model –  Grid Laboratory Uniform Environment (GLUE) Schema.

•  Two architectures in glite3 –  gLite Information System (BDII)

  BDII over Globus MDS (Monitoring and Discovery System).   OpenLDAP interface.

–  Relational Grid Monitoring Architecture (R-GMA)   Based on the GMA (Grid Monitoring Architecture) standard from the Grid Global

Forum   Information in SQL relational databases   Web Services.

Introduction to High Performance and Grid Computing 21

Page 22: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Information System (2/10)

•  GLUE Schema :: Overview –  A schema of objects and attributes describing Grid resources

and its relationships.   Originally a EU-DataTAG and US-iVDGL coordinated effort.   Current participants: EGEE, OSG, Globus and NorduGrid.   A way to describe Grid info

•  Statically and dynamically supplied •  Hierarchically represented •  Independently of the framework (LDAP, XML, SQL…)

Introduction to High Performance and Grid Computing 22

Page 23: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Information System (3/10)

•  GLUE Schema :: Site Element

Introduction to High Performance and Grid Computing 23

Page 24: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Information System (4/10)

•  GLUE Schema :: Cluster Element

Introduction to High Performance and Grid Computing 24

Page 25: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Information System (5/10)

•  GLUE Schema :: Computing Element

Introduction to High Performance and Grid Computing 25

Page 26: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Information System (6/10)

•  gLite Information System Levels –  Resource level: Grid Resource Information Server (GRIS)

  One GRIS on top of each CE, SE, WMS, MyProxy (no WNs).   Sensors and scripts get status of concrete resources statically (e.g.

GlueCEUniqueID) or dynamically (e.g. GlueCEStateWaitingJobs) –  Site level: Grid Index Information Server (GIIS)

  Compiles all the information of the different GRISes in a site.   gLite recommends using a BDII instead of a GIIS

•  Improves robustness and stability. •  Called the site BDII.

–  Top level: Berkeley DB Information Index (BDII)   Keeps all Grid information about the VOs (generally only one).   Stores information from local BDIIs or GIISes in its database.   Only queries sites that are included in a configuration file.

Introduction to High Performance and Grid Computing 26

Page 27: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Information System (7/10)

Introduction to High Performance and Grid Computing 27

Page 28: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Information System (8/10)

•  LDAP Model –  Way of collecting info

  Pull model (higher level servers periodically query lower level servers)   All servers are based on LDAP

•  Inherit hierarchical structure (tree-like) •  LDAP Data Information Format (LDIF)

–  Users get info with   Generic applications

•  ldapsearch (BDII:2170 ports) •  Graphical UIs (BDII web; LDAP GUIs) •  Always can get information about specific resources (maybe more up-to-

date) by querying directly the site BDIIs, GIISes or GRISes.   Querying VO info with lcg-infosites or lcg-info tools

Introduction to High Performance and Grid Computing 28

Page 29: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Information System (9/10)

•  R-GMA Overview –  Added from EDG Project –  Based on the GMA standard from the GGF –  Information in SQL relational databases (a DB per VO) –  Query syntax is a SQL subset –  Simple consumer-producer model –  Web Services oriented –  CLI and Web user interface –  Allows self-logging applications –  R-GMA offers a global view of the VO information

  In one large relational DB: virtual database.   Registry stores localization tuples (database rows) published by

producers: •  Standard Tables: CE state in GLUE Schema (by R-GMA-GIN) •  Applications specific tables (e.g. self-logging with Log4j) •  Access by SQL queries through a WS interface.

Introduction to High Performance and Grid Computing 29

Page 30: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Information System (10/10)

Introduction to High Performance and Grid Computing 30

Page 31: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

User Interface (UI)

•  UI is the user’s interface to the Grid - Command-line interface to –  Attribute/Proxy certificate –  Job operations

  To submit a job   Monitor its status   Retrieve output

–  Data operations   Upload file to SE   Create replica   Discover replicas

–  Other grid services •  To run a job user creates a JDL (Job Description

Language) file

Introduction to High Performance and Grid Computing 31

Page 32: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Computing Element (CE)

Introduction to High Performance and Grid Computing 32

Homogeneous set of worker nodes (WNs)

Grid gate node

Local resource management system: Condor / PBS / LSF master

Gatekeeper

Job request

Loc. Info system

Logging

A&A

Information system

L&B

A CE is a grid batch queue with a “grid gate” front-end:

Page 33: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Storage Element (SE) •  Storage elements hold files: write once, read many •  Replica files can be held on different SE:

–  “close” to CE; share load on SE •  File Catalogue - what replicas exist for a file and where are they?

Introduction to High Performance and Grid Computing 33

Loc. Info System

Event Logging

A&A

GridFTP

Info system

L&B

Gatekeeper

File transfer Requests

Page 34: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Workload Management System

Introduction to High Performance and Grid Computing 34

Page 35: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing 35

Page 36: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing 36

Page 37: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing 37

Page 38: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing 38

Page 39: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing 39

Page 40: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing 40

Computing Element is the place where you jobs run

Page 41: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

•  Workload Manager Proxy WMProxy –  Provides access to WMS functionality through a Web Services based

interface –  Each job submitted to a WMProxy Service is given the delegated

credentials of the user who submitted it. –  These credentials can then be used to perform operations requiring

interactions with other services –  WMProxy advantages:

  web service, SOAP   job collections, DAG jobs, shared and compressed   sandboxes

–  WMProxy caveats:   needs delegated credentials   Delegate once,submit many

Introduction to High Performance and Grid Computing 41

Page 42: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

•  Workload Manager (WM) –  Is responsible for

  Calls Matchmaker to find the resource which best matches the job requirements.

  Interacting with Information System and File catalog.   Calculates the ranking of all the matchmaked resource

•  Information Supermarket (ISM) –  is responsible for

  basically consists of a repository of resource information that is available in read only mode to the matchmaking engine

•  Job Adapter –  is responsible for

  making the final touches to the JDL expression for a job, before it is passed to CondorC for the actual submission

  creating the job wrapper script that creates the appropriate execution environment in the CE worker node

•  transfer of the input and of the output sandboxes

Introduction to High Performance and Grid Computing 42

Page 43: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

•  Job Controller (JC) –  Is responsible for

  Converts the condor submit file into ClassAd   hands over the job to CondorC

•  Condor –  responsible for

  performing the actual job management operations: job submission, removal

•  Log Monitor –  is responsible for

  watching the Condor log file   intercepting interesting events concerning active jobs

•  events affecting the job state machine   triggering appropriate actions.

Introduction to High Performance and Grid Computing 43

Page 44: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

•  Task Queue –  Gives the possibility to keep track of the requests if no

resources are immediatelly avalaible –  Non-matching requests will be retried periodically (eager

scheduling) –  Or wait for notification of avalaible resources (lazy

scheduling)

Introduction to High Performance and Grid Computing 44

Page 45: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing 45

Computing Element is built on a homogeneous farm of computing nodes (called Worker Nodes)

Also there are many components inside CE such as gatekeeper, globus-jobmanager, ..

Page 46: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing 46

Gatekeeper Grants access to the CE and map grid user to a local user id.

Page 47: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing 47

Batch System A cluster of compute nodes controlled by a head node. handles the job execution Example: Torque (Open PBS), PBS

Page 48: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing 48

UI

Network Daemon

Job Contr. -

CondorG

Workload Manager

LFC

Inform. Service

Computing Element

Storage Element

CE characts & status

SE characts & status

WMS

Location of files

Characteristics of resources

Page 49: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing 49

UI

Network Daemon

Job Contr. -

CondorG

Workload Manager

LFC

Inform. Service

Computing Element

Storage Element

CE characts & status

SE characts & status

RB storage

Input Sandbox files

JDL waiting

submitted

Daemon responsible for accepting incoming requests

WMS

glite-wms-job-submit myjob.jdl

Page 50: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing 50

UI

Network Daemon

Job Contr. -

CondorG

Workload Manager

LFC

Inform. Service

Computing Element

Storage Element

CE characts & status

SE characts & status

RB storage

waiting

submitted

Job Status

WM: responsible to take the appropriate actions to satisfy the request

Job

WMS

Page 51: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing 51

UI

Network Daemon

Job Contr. -

CondorG

Workload Manager

LFC

Inform. Service

Computing Element

Storage Element

CE characts & status

SE characts & status

RB storage

waiting

submitted

Match- Maker/ Broker

Where this job can be executed ?

WMS

Page 52: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing 52

UI

Network Daemon

Job Contr. -

CondorG

Workload Manager

LFC

Inform. Service

Computing Element

Storage Element

CE characts & status

SE characts & status

RB storage

waiting

submitted

Match- Maker/ Broker

Matchmaker: responsible to find the “best” CE where to submit a job

WMS

Page 53: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing 53

UI

Network Daemon

Job Contr. -

CondorG

Workload Manager

LFC

Inform. Service

Computing Element

Storage Element

CE characts & status

SE characts & status

RB storage

waiting

submitted

Match- Maker/ Broker

Where is the needed InputData ?

What is the status of the

Grid ?

WMS

Page 54: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing 54

UI

Network Daemon

Job Contr. -

CondorG

Workload Manager

LFC

Inform. Service

Computing Element

Storage Element

CE characts & status

SE characts & status

RB storage

waiting

submitted

Match- Maker/ Broker

CE choice

WMS

Page 55: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing 55

WMS

UI

Network Daemon

Job Contr. -

CondorG

Workload Manager

LFC

Inform. Service

Computing Element

Storage Element

CE characts & status

SE characts & status

RB storage

waiting

submitted

Job Adapter

JA: responsible for the final “touches” to the job before performing submission (e.g. creation of wrapper script, etc.)

Page 56: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing 56

UI

Network Daemon

Job Contr. -

CondorG

Workload Manager

LFC

Inform. Service

Computing Element

Storage Element

CE characts & status

SE characts & status

RB storage

JC: responsible for the actual job management operations (done via CondorG)

Job

submitted

waiting

ready

WMS

Page 57: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing 57

WMS

UI

Network Daemon

Job Contr. -

CondorG

Workload Manager

LFC

Inform. Service

Computing Element

Storage Element

CE characts & status

SE characts & status

RB storage

Job

Input Sandbox files

submitted

waiting

ready

scheduled

Page 58: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing 58

UI

Network Daemon

Job Contr. -

CondorG

Workload Manager

LFC

Inform. Service

Computing Element

Storage Element

RB storage

Input Sandbox

submitted

waiting

ready

scheduled

running

“Grid enabled” data transfers/

accesses

Job

WMS

Page 59: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing 59

UI

Network Daemon

Job Contr. -

CondorG

Workload Manager

LFC

Inform. Service

Computing Element

Storage Element

RB storage

Output Sandbox files

submitted

waiting

ready

scheduled

running

done WMS

Page 60: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing 60

UI

Network Daemon

Job Contr. -

CondorG

Workload Manager

LFC

Inform. Service

Computing Element

Storage Element

RB storage

Output Sandbox

submitted

waiting

ready

scheduled

running

done

glite-wms-get-output <jobID>

WMS

Page 61: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing 61

UI

Network Daemon

Job Contr. -

CondorG

Workload Manager

LFC

Inform. Service

Computing Element

Storage Element

RB storage

Output Sandbox files

submitted

waiting

ready

scheduled

running

done

cleared

WMS

Page 62: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing 62

UI

Logging & Bookkeeping

Network Daemon

Job Contr. -

CondorG

Workload Manager

Computing Element

LB: receives and stores job events; processes corresponding job status

Log of job events

Job status

glite-wms-job-status <jobID> glite-wms-job-logging-info <jobID>

WMS

LB proxy

Page 63: Faculty of Sciences, University of Novi Sad Architecture ...wiki.ipb.ac.rs/images/2/26/Introduction_to_High_Performance_and_G… · EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Other Grid services

•  PX (MyProxy) •  FTS (File Transfer Service) •  LFC (Logical File Catalog) •  AMGA (ARDA Metadata Grid Application)

Introduction to High Performance and Grid Computing 63