EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008...

38
EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece
  • date post

    18-Dec-2015
  • Category

    Documents

  • view

    220
  • download

    0

Transcript of EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008...

Page 1: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

EGEE Project and Middleware Overview

Marco Verlato

CYCLOPS Second Training Workshop

5-7 May 2008

Chania, Greece

Page 2: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

Outline

Introduction The EGEE project

– Infrastructure– Applications– Operations and Support

The EGEE Middleware: gLite– Grid access services– Security services– Information & Monitoring services– Data Management services– Job Management services

Further information

Page 3: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

What is a Grid?

“A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive and inexpensive access to high-end computational capabilities”

Ian Foster -- Carl Kesselman, 1998

“A grid is a combination of networked resources and the corresponding middleware, which provides services for the user”

Erwin Laure, EGEE T.D., ISSGC2007

The users of a Grid are divided into Virtual Organisations (VOs), abstract entities grouping users, institutions and resources, e.g.: the 4 LHC experiments, the community of biomedical researchers, etc

Page 4: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

What is a Grid?

It relies on advanced software, called middleware

Middleware automatically finds the data the scientist needs, and the computing power to analyse it

Middleware balances the load on different resources. It also handles security, accounting, monitoring and much more

Page 5: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

Enabling Grid for E-sciencE project

ArcheologyAstronomyAstrophysicsCivil ProtectionComp. ChemistryEarth SciencesFinanceFusionGeophysicsHigh Energy PhysicsLife SciencesMultimediaMaterial Sciences…

>250 sites48 countries>50,000 CPUs>20 PetaBytes>10,000 users>150 VOs>150,000 jobs/day

Flagship Grid infrastructure project co-funded by the European Commission starting from April 2004Entering now in the 3° phase

Page 6: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

Disciplines and users

Astrophysics and astroparticle physics Biomedical and bioinformatics information Computational chemistry Othersargo libi enmr.eu aegis inaf bio trgrida apesci pamela biomed compchem astron astro.vo.eu-egee.org embrace gaussian cesga planck enea virgo High Energy Physics Infrastructure grid-it magic calice edteam gridmosi.ici.ro auger hone euindia lights.infn.it

ific ops ncf Earth sciences ildg pvier vo.agata.org trgridc minos.vo.gridpp.ac.uk rdteam vo.ipno.in2p3.fr esr pheno rgstest vo.northgrid.ac.uk

supernemo.vo.eu-egee.org swetest webcom Geophysics vo.lal.in2p3.fr vo.deploymenttest.cea.fr geant4 egeode vo.llr.in2p3.fr vo.e-ca.es imath.cesga.es

vo.lpnhe.in2p3.fr vo.grif.fr proactive Finance vo.sbg.in2p3.fr infngrid cosmo egrid hermes eela crypto.swing-grid.ch

vo.dapnia.cea.fr eumed diligent Fusion alice dteam cyclops fusion atlas vo.plgrid.pl geclipse

babar balticgrid gridcc belle dech cdf see cms seegrid dzero twgrid gridpp trgrida/b/c/d/eilc voce lhcb na48 zeus ghep desy

http://cic.gridops.org/index.php?section=home&page=volist

~8000 users listed in

registered VOs Digital libraries, disaster

recovery, computational sciences, etc.

Page 7: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

Types of applications

Simulation– LHC Monte Carlo simulations; Fusion; WISDOM – Jobs needing significant processing power; Large number of

independent jobs; limited input data; significant output data Bulk Processing

– HEP ; Processing of satellite data– Distributed input data; Large amount of input and output data;

Job management (WMS); Metadata services; complex data structures

Parallel Jobs– Climate models, computational chemistry– Large number of independent but communicating jobs; Need for

simultaneous access to large number of CPUs; MPI libraries Short-response delays

– Prototyping new applications; grid Monitoring grid; Interactivity – Limited input & output data; processing needs but fast

response and quality of service Workflow

– Medical imaging; flood analysis– Complex analysis algorithms; complex dependencies between

jobs Commercial Applications

– Non-open source software; Geocluster (seismic platform); FlexX (molecular docking); Matlab, Mathematics; Idl, …

– License server associated to an application deployment model

Page 8: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

Chambres à muons

Calorimètre

Trajectographe

-

High Energy Physics Applications

pp @ √s=14 TeVL : 1034/cm2/s

L: 2.1032 /cm2/s

2,5 million collisions per secondLVL1: 10 KHz, LVL3: 50-100 Hz25 MB/sec digitized recording

40 million collisions per secondLVL1: 1 kHz, LVL3: 100 Hz0.1 to 1 GB/sec digitized recording

Page 9: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

In silico drug discovery

Diseases such as HIV/AIDS, SRAS, Bird Flu etc. are a threat to public health due to world wide exchanges and circulation of persons

Grids open new perspectives to in silico drug discovery– Reduced cost and adding an accelerating factor in the search for new drugs

•Avian influenza:

•bird casualties

International collaboration is required for: • Early detection

• Epidemiological watch

• Prevention

• Search for new drugs

• Search for vaccines

Page 10: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

Wide In Silico Docking On Malaria

http://wisdom.healthgrid.org/

Page 11: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

Earth Sciences Applications

ESA, UTV(IT), ESA, UTV(IT), KNMI(NL), IPSL(FR)- KNMI(NL), IPSL(FR)- Production and Production and validation of 7 years of validation of 7 years of Ozone profiles from Ozone profiles from GOMEGOME

Rapid Earthquake Rapid Earthquake analysis analysis (mechanism and (mechanism and epicenter) epicenter) 50- 100CPUs 50- 100CPUs IPGP(FR)IPGP(FR)

Modelling seawater Modelling seawater intrusion in costal intrusion in costal aquifer (SWIMED) aquifer (SWIMED) CRS4(IT),INAT(TU),CRS4(IT),INAT(TU),Univ.Neuchâtel(CH)-Univ.Neuchâtel(CH)-

Geocluster for Geocluster for Academy and Academy and industry CGG(FR)-industry CGG(FR)-

Flood of a Danube river-Flood of a Danube river-Cascade of models Cascade of models (meteorology,hydraulic ,(meteorology,hydraulic ,hydrodynamic….) hydrodynamic….) UISAV(SK)-UISAV(SK)-

Specfem3D: Specfem3D: Seismic Seismic application. application. Benchmark for Benchmark for MPI (2 to 2000 MPI (2 to 2000 CPUs) (IPGP,FR)CPUs) (IPGP,FR)

DKRZ(DE)- Data access DKRZ(DE)- Data access studies, climate impacts on studies, climate impacts on agricultureagriculture

Data mining Data mining Meteorology & Meteorology & Space Weather Space Weather (GCRAS, RU)(GCRAS, RU)

Air Pollution Air Pollution model- BAS(BG)model- BAS(BG)Mars atmosphere CETP(

FR):

Page 12: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

EGEE workload in 2007

CPU: 114 Million hours

Data:

25Pb stored

11Pb transferred

Estimated cost if performed with Amazon’s EC2 and S3: € 47,486,548http://gridview.cern.ch/GRIDVIEW/same_index.php http://calculator.s3.amazonaws.com/calc5.html?

16%

82%

2%

storage

CPU

Xfer

Page 13: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

EGEE-II to EGEE-III

EGEE-III– To be co-funded under European Commission call INFRA-2007-1.2.3– 32M€ EC funds compared to 36M€ for EGEE-II

Key objectives– Expand/optimise existing EGEE infrastructure, include more resources and user

communities– Prepare migration from a project-based model to a sustainable federated

infrastructure based on National Grid Initiatives 2 year period – May 2008 to April 2010

– No gap between EGEE-II and EGEE-III (1 month extension to EGEE-II) Similar consortium

– Now structured on a national basis (National Grid Initiatives/Joint Research Units)

Networking activities Specific Service Activities

NA1: Management SA1: Operations

NA2: Dissemination SA2: Networking Support

NA3: Training SA3: Integ., testing & Cert.

NA4: Applications Joint Research Activities

NA5: Inter. Coop. & Policy JRA1: Middleware engineering

Page 14: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

European Grid Initiative (EGI)

Need to prepare permanent, common Grid infrastructure Ensure the long-term sustainability of the European e-Infrastructure

independent of short project funding cycles Coordinate the integration and interaction between National Grid

Infrastructures (NGIs) Operate the production Grid infrastructure on a European level for a wide

range of scientific disciplines

Must be no gap in the support of the

production grid

Page 15: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

EGEE operations

Operations Coord. Centre (OCC)

- management, oversight of all operational and support activities

Regional OperationsCentres (ROC)

- providing the core of the support infrastructure, each supporting a number of resource centres within its region

Resource Centres (RC)

- providing resources

(computing, storage, network…)

- At FZK, coordination and management of user support, single point of contact for users

Page 16: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

Monitoring Visualization

16

Page 17: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

The EGEE support infrastructure

•RC A

•RC B

•RC C

•RC A

•RC B

•RC C•ROC C•ROC BROC N

RC A

RC B

RC C

TPM

VO TPM CVO TPM B

VO TPM A

GGUS

Central

System

Middleware

supportMiddleware

supportMiddleware

support

Deployment

supportMiddleware

supportDeployment

support

VO Support

CVO Support

BVO Support

A

Middleware

supportMiddleware

supportMiddleware

support

•ROC C•ROC BROC N

Network Support

Network Support Other GridsOther GridsOther Grids

Other GridsOther GridsOther Grids

CODCIC

Portal

Page 18: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

•20 sites in 3 continents•> 11000 certificates issued, >20% renewed at least once•> 250 courses, training events, official university curricula•> 2,000,000 hits on the web site from >100 different countries •> 4.5 TB of training material downloaded from the web site

The GILDA t-Infrastructure (https://gilda.ct.infn.it)

Page 19: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

e-Infrastructure projects & others Grids

e-Infrastructures adopting gLite

~80 countries “linked” together !

e-Infrastructures interoperable or in pro-gress to be made interoperable with gLite

Page 20: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

EGEE Middleware Distribution

Combines components from different providers– Condor and Globus (via VDT)– LCG (LHC Computing Grid)– EDG (European Data Grid) – Others

After prototyping phases in 2004 and 2005 convergence with LCG-2 distribution reached in May 2006

– gLite 3.0 released in May 2006, current release is 3.1

Develop a lightweight stack of generic middleware useful to EGEE applications

– Pluggable components – cater for different implementations

– Follow SOA approach, WS-I compliant where possible

Focus now is on re-engineering and hardening Business friendly open source license: Apache

2.0

LCG-2

prototyping

prototyping

product

20042004

20052005 product

gLite

20062006 gLite 3.0

Page 21: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

The middleware structure

Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware

Higher-Level Grid Services are supposed to help the users building their computing infrastructure but should not be mandatory

Foundation Grid Middleware will be deployed on the EGEE infrastructure

– Must be complete and robust– Should allow interoperation with

other major grid infrastructures– Should not assume the use of

Higher-Level Grid Services

Page 22: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

gLite services orchestration

Computing Element

Storage Element

Site X

Information System

submit

submit

query

retrieve

retrieve

Workload ManagementLogging & Bookkeeping

User Interface

publishstate

File and ReplicaCatalogs

AuthorizationService

query

updatecredential publish

state

discoverservices

Page 23: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

gLite services decomposition

API Access

Job Mgmt. Services

ComputingElement

WorkloadManagement

MetadataCatalog

Data Services

StorageElement

DataMovement

File & ReplicaCatalog

Authorization

Security Services

Authentication

Information &Monitoring

Information & Monitoring Services

Job

Monitoring

Accounting

Auditing

JobProvenance

PackageManager

CLI

Overview paper http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-001.pdf

Page 24: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

Grid Access

The access point to the EGEE Grid is the User Interface (UI) It provides the CLI tools to access the functionalities offered

by the gLite Services They allow to perform some basic Grid operations:

– create the user proxy needed for authentication/authorization– retrieve the status of different resources from the Information

System– copy, replicate and delete files from the Grid– list all the resources suitable to execute a given job– submit jobs for execution– cancel jobs – retrieve the output of finished jobs– show the status of submitted jobs– retrieve the logging and bookkeeping information of jobs

It provides the APIs to allow the development of Grid-enabled applications

Page 25: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

Security Services

GSI Authentication based on PKI X.509 SSL infrastructure • Certificate Authorities (CA) issue (long lived) certificates identifying individuals (much like a passport)• to reduce vulnerability, on the Grid user identification is done by using (short lived) proxies of their certificates (they can be stored on MyProxy servers)• users belong to VO’s, to groups inside a VO and may have special roles

VOMS provides a way to add attributes to a certificate proxy

Page 26: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

BDIItop-level

BDIIsite-level

BDIIresource

MDSGRIS

provider provider

WMS

WN

UI

FTS

Queries

Site

- Based on ldap- Standardized information provider (GIP)- GLUE-1.3 schema- Top level Used with 230+ sites - Roughly 60 instances in EGEE

2 minutes

Berkeley Database Information Index

Information & Monitoring Services / 1

Page 27: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

Information & Monitoring Services / 2

For users R-GMA appears similar to a single relational database Implementation of OGF’s Grid Monitoring Architecture (GMA) Rich set of APIs (WebBrowsers, Java, C/C++, Python) Typical deployment consists of Producer and Consumer Services on a one

per site basis (MON box), and a centralized Registry and Schema

ProducerService

RegistryService

ConsumerService

AP

IA

PI

SchemaService

Consumerapplication

Producerapplication

Publish Tuples

Send Query

Receive Tuples

Register

LocateQ

uery

Tu

ple

sSQL “CREATE TABLE”

SQL “INSERT”

SQL “SELECT”

Page 28: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

GridICE monitoring tool

Page 29: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

Data Services /1

Heterogeneity– Data is stored on different storage

systems using different access technologies

Distribution– Data is stored in different locations –

in most cases there is no shared file system or common namespace

– Data needs to be moved between different locations

Data description– Data are stored as files: need a way

to describe files and locate them according to their contents

– Need common interface to storage resources

Storage Resource Manager (SRM)

– Need to keep track where data is stored

File and Replica Catalogs

– Need scheduled, reliable file transfer

File transfer services

– Need a way to describe files’ content and query them

Metadata catalog

Page 30: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

Data Services /2

The Storage Resource Manager interface is the basis for the gLite Storage Elements (SE)

– hides the storage system implementation

– handles the authorization based on VOMS credentials

– posix-like access to SRM via GFAL (Grid File Access Layer)

The LCG File Catalogue (LFC)keeps track of file replicas on the gridLogical File Name (LFN)

An alias created by a user to refer to some item of data

Global Unique Identifier (GUID)

A non-human-readable unique identifier for an item of data

Site URL (SURL)Gives indication on which place (Storage Element) the file is actually found. Understood by the SRM interface

Transport URL (TURL)Temporary locator of a replica+access protocol, understood by the backend MSS

Page 31: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

Job Management Services /1

the Computing Element (CE) is the front-end to the local farm (cluster, batch system)

– several implementation : Torque/Maui, PBS, LSF, Condor, SGE

– CE is usually installed on the master node of the farm: slave nodes run the Worker Node

– typically CE runs also the site BDII providing information to the top BDII

– software application is installed on CE on a shared area The CE receives users’ job from the WMS

– there are different queues with different priorities– jobs are sent to the batch system which executes them

on WN Output is then copied back to WMS

Page 32: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

Job Management Services /2

CREAM: Web Service Computing Element

– Cream WSDL allows defining custom user interface

– C++ CLI interface allows direct submission

Lightweight Fast notification of job status changes

– via CEMon Improved security

– no “fork-scheduler” Will support for bulk jobs on the CE

– optimization of staging of input sandboxes for jobs with shared files

ICE: Interface to Cream Environment– being integrated in WMS for

submissions to CREAM

Page 33: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

ENEA-Grid approach to provide access to AIX

A solution of current known limitations:

1) gLite must be installed on each WN only Intel/SL machines2) gLite WN must communicate with RB security/firewall

it works also withNFS or GPFS

it works also withrsh or ssh

Invasiveness of the grid middleware and firewall requirements are minimized !

management issues

Page 34: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

Job Management Services /3

WMS: Resource brokering, workflow management, I/O data management Web Service interface: WMProxy– Task Queue: keep non matched jobs– Information SuperMarket: optimized cache of information system– Match Maker: assigns jobs to resources according to user requirements

(possibly including data location)– Job submission & monitoring

Condor-G ICE (to CREAM)

– External interactions: Information System Data Catalogs Logging&Bookkeeping Policy Management

systems

Page 35: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

Advanced scheduling

Direct Acyclic Graph (DAG) is a set of jobs where the input, output, or execution of one or more jobs depends on one or more other jobs

A Collection is a group of jobs with no dependencies

– basically a collection of JDL’s

A Parametric job is a job having one or more attributes in the JDL that vary their values according to parameters

Using compound jobs it is possible to have one shot submission of a (possibly very large, up to thousands) group of jobs

– Submission time reduction Single call to WMProxy server Single Authentication and Authorization process Sharing of files between jobs

– Availability of both a single Job Id to manage the group as a whole and an Id for each single job in the group

nodeEnodeC

nodeA

nodeD

nodeB

Page 36: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

Logging & Bookkeping (LB)

Tracks jobs in terms of events gathered from various gLite components

Process them to give a higher level view on the job states Provide interfaces for quering L&B, register for notifications Often deployed on the same machine of the WMS, but can be remote

Page 37: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

Job submission example

JDL

Logging &Book-keeping

ResourceBroker

Job SubmissionService

StorageElement

ComputingComputingElementElement

Information Service

Job Status

ReplicaCatalog

Job SubmitEvent

Input Sandbox

JDL

Job

Input Sandbox

Output Sandbox

Output Sandbox

User Interface

Author.Service

voms-proxy-init

glite-wms-job-submit myjob.jdlMyjob.jdl

Executable = “gridTest”;StdError = “stderr.log”;StdOutput = “stdout.log”;InputSandbox = {“/home/joda/test/gridTest”};OutputSandbox = {“stderr.log”, “stdout.log”};InputData = “lfn:testbed0-00019”;DataAccessProtocol = “gridftp”;Requirements = other.Architecture==“INTEL” && \

other.OpSys==“LINUX”;Rank = “other.GlueHostBenchmarkSF00”;

GSI data acc/transf

Page 38: EGEE Project and Middleware Overview Marco Verlato CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece.

Further information

2nd Iberian Grid Infrastructure Conference: 12-14 May 2008, Porto (Portugal), joint with CYCLOPS Project Conferencewww.ibergrid.eu/2008

EGEE’08 Conference: 22-26 September 2008, Istanbul (Turkey)www.eu-egee.org/egee08

EGEE digital library: egee.lib.ed.ac.uk– Needs certificate (GILDA or national CA in browser)

EGEE www.eu-egee.org gLite www.glite.org GILDA https://gilda.ct.infn.it/ LCG lcg.web.cern.ch/LCG Open Grid Forum www.gridforum.org Globus Alliance www.globus.org VDT www.cs.wisc.edu/vdt/

NEW!!!