ARDA status and plans

51
INFSO-RI-508833 Enabling Grids for E-sciencE ttp://arda.cern.ch ARDA status and plans Massimo Lamanna / CERN

description

ARDA status and plans. Massimo Lamanna / CERN. Overview. ARDA prototypes 4 experiments ARDA feedback Middleware components on the development test bed ARDA workshops ARDA in Den Haag 2 nd EGEE conference ARDA personnel and milestones Status 2005 (proposal) Conclusions. - PowerPoint PPT Presentation

Transcript of ARDA status and plans

Page 1: ARDA status and plans

INFSO-RI-508833

Enabling Grids for E-sciencE

http://arda.cern.ch

ARDA status and plansMassimo Lamanna / CERN

Page 2: ARDA status and plans

Overview

• ARDA prototypes– 4 experiments

• ARDA feedback– Middleware components on the development test bed

• ARDA workshops• ARDA in Den Haag

– 2nd EGEE conference

• ARDA personnel and milestones – Status– 2005 (proposal)

• Conclusions

Page 3: ARDA status and plans

The ARDA project

• ARDA is an LCG project– main activity is to enable LHC analysis on the grid– ARDA is contributing to EGEE NA4

uses the entire CERN NA4-HEP resource

• Interface with the new EGEE middleware (gLite)– By construction, use the new middleware– Use the grid software as it matures– Verify the components in an analysis environments (users!)– Provide early and continuous feedback

Page 4: ARDA status and plans

ARDA prototypes

Page 5: ARDA status and plans

Prototype overview

LHC

Experiment

Main focus Basic prototype component

Experiment analysis

application framework

Middleware

GUI to Grid GANGA DaVinci

Interactive analysis

PROOF

ROOT

AliROOT

High level services

DIAL Athena

Exploit native gLite

functionalityAligned with

APROM ORCA

Page 6: ARDA status and plans

ARDA contributions

• Integrating with gLite– Enabling job submission through GANGA to gLite– Job splitting and merging– Result retrieval

• Enabling real analysis jobs to run on gLite – Running DaVinci jobs on gLite (custom code: user

algorithms)– Installation of LHCb software using gLite package manager

• Participating in the overall development of Ganga– Software process (initially)

CVS, Savannah, Release Managment

– Mayor contribution in new versions CLI, “Ganga clients”

Page 7: ARDA status and plans

Related activities

• GANGA-DIRAC (LHCb production system) – Convergence with GANGA/components/experience– Submitting jobs to DIRAC using GANGA

• GANGA-Condor – Enabling submission of jobs through GANGA to Condor

• LHCb Metadata catalogue performance tests – In collaboration with

colleagues from Taiwan– New activity started using the

ARDA metadata prototype (newversion, collaboration with gridPP people)

Page 8: ARDA status and plans

Current Status

• GANGA job submission handler for gLite is developed• DaVinci job runs on gLite submitted through GANGA

Demo in Rio and

Den Haag

Presented in the LHCb

software week

Page 9: ARDA status and plans

Ganga clients

Page 10: ARDA status and plans

ALICE prototype

ROOT and PROOF• ALICE provides

– the UI– the analysis application (AliROOT)

• GRID middleware gLite provides all the rest

• ARDA/ALICE is evolving the ALICE analysis system

UI shell ApplicationMiddleware

end to end

Page 11: ARDA status and plans

USER SESSIONUSER SESSION

PROOFPROOF

PROOF SLAVESPROOF SLAVES

PROOF MASTER SERVERPROOF MASTER SERVER

PROOF SLAVESPROOF SLAVES

Site A Site C

Site B

PROOF SLAVESPROOF SLAVES

Page 12: ARDA status and plans

Interactive Session

Demo in the

ALICE sw week

Demo at Supercomputin

g 04 and Den Haag

Page 13: ARDA status and plans

Current Status

• Developed gLite C++ API and API Service – providing generic interface to any GRID service

• C++ API is integrated into ROOT – will be added to the next ROOT release – job submission and job status query for batch analysis can be done from inside ROOT

• Bash interface for gLite commands with catalogue expansion is developed– More powerful than the original shell– Ready for integration– Considered a “generic” mw contribution (essential for ALICE, interesting in general)

• First version of the interactive analysis prototype is ready

• Batch analysis model is improved– submission and status query are integrated into ROOT– job splitting based on XML query files– application (Aliroot) reads file using xrootd without prestaging

Page 14: ARDA status and plans

ATLAS/ARDA

• Main component:– Contribute to the DIAL evolution

gLite analysis server

• “Embedded in the experiment”– AMI tests and interaction– Production and CTB tools– Job submission (ATHENA jobs)– Integration of the gLite Data Management within Don Quijote

• Benefit from the other experiments prototypes– First look on interactivity/resiliency issues

“Agent-based” approach (a` la DIRAC)

– GANGA (Principal component of the LHCb prototype, key component of the overall ATLAS strategy)

Presentations tomorrow

(ADA meeting)

Page 15: ARDA status and plans

SE

Nordugrid

DQ server

RLS

DQ Client

DQ server DQ serverDQ server

RLSRLS SESE

SERLS

gLiteLCGGRID3

Data Management

Don Quijote

Locate and move data over grid boundaries

ARDA has connected gLite

Presentation tomorrow

(ADA meeting)

Page 16: ARDA status and plans

ATCOM @ CTB

• Combined Testbeam– Various extensions

were made to accommodate the new database schema used for CTB data analysis.

– New panes to edit transformations, datasets and partitions were implemented.

• Production System– A first step is to

provide a prototype with limited functionality, but support for the new production system.

Presentation tomorrow

(ADA meeting)

Page 17: ARDA status and plans

Combined Test Beam

Example:

ATLAS TRT data analysis done by PNPI St Petersburg

Number of straw hits per layer

Real data processed at gLiteStandard Athena for testbeam

Data from CASTOR

Processed on gLite worker node

Page 18: ARDA status and plans

ATLAS: first look in interactivity matters

Presentation tomorrow

(ADA meeting)

Page 19: ARDA status and plans

CMS Prototype

• Aims to end-to-end prototype for CMS analysis jobs on gLite

• Native middleware functionality of gLite

• Only for few CMS specific tasks on top of the middleware

RefDB PubDB

Workflow planner with gLite

back-end and command line UI

gLite

Dataset and owner name

defining CMS data collection

Points to the corresponding PubDB where POOL catalog for a given data collection is published

POOL catalog and a set of COBRA META files

Register required info in gLite catalog

Creates and submits jobs to gLite,

Queries their status

Retrieves output

Page 20: ARDA status and plans

CMS - Using MonAlisafor user job monitoring A single job

Is submiited to gLite

JDL contains job-splitting instructionsMaster job is

splitted by gLite into sub-jobs

Dynamicmonitoringof the totalnumber of

the events ofprocessed by

all sub-jobsbelonging to the same Master job

Demo at Supercomputing

04

Page 21: ARDA status and plans

CMS – getting output from gLite

• When the jobs are over the output files created by all sub-jobs belonging to the same master are retrieved by the Workflow Planner to the directory defined by the user.

• On user request output files are merged by the Workflow Planner (currently implemented

for Root trees and histograms). • Root session is started by the Workflow Planner.

Presentation and demo

Friday (APROM meeting)

Page 22: ARDA status and plans

Related Activities

• Job submission to gLite by PhySH– Physicist Shell– Integrates Grid Tools– Collaboration with CLARENS

• ARDA participates also in …

– Evolvement of PubDB Effective access to data

– Redesign of RefDB Metadata catalog

Page 23: ARDA status and plans

Prototype overview

LHC

Experiment

Prototype activity

Main concern Users (not belonging to the developer

circles!)

Middleware

OK

Prototype size

Ready to have users (1 so far)

OK Prototype size Ready to have users

OK Prototype size Ready to have users (2 for the

CTB data)

OK

Prototype size Ready to have users

Page 24: ARDA status and plans

Middleware feedback

Page 25: ARDA status and plans

Prototype Deployment

Currently 34 worker nodes are available at CERN 10 nodes (RH7.3, PBS) 20 nodes (low end, SLC, LSF) 4 nodes (high end, SLC, LSF) 1 node is available in Wisconsin

• Number of CPUs will increase • Number of sites will increase

• FZK Karlsruhe is preparing to connect another site – Basic middleware components already installed– One person hired (6-month contract) up and running– One person to arrive in January

• Further extensions are under discussion right now

Access granted on May 18th !

Page 26: ARDA status and plans

Access &Authorization

• gLite uses Globus Grid-Certificates(X.509) to authenticate + authorize, session not encrypted

• VOMS is used for VO Management

• Getting access to gLite for a new user is often painful due to registration problems

• It takes minimum one day … for some it can take up to two weeks!

Page 27: ARDA status and plans

Accessing gLite

Easy access to gLite considered very important

Three shells available

• Alien shell• ARDA shell• gLiteIO shell

Too many…

Page 28: ARDA status and plans

Alien shell

• Access through gLite-Alien shell- User-friendly Shell implemented in Perl- Shell provides a set of Unix-like commands and a set of

gLite specific commands

• Perl API

- no API to compile against, but Perl-API sufficient for tests,

though it is poorly documented

Perl-Shell

GASGSI

GSI

PE

RL

File-Catalogue

Storage-Manager

Compute-Element

Page 29: ARDA status and plans

ARDA shell + C/C++ API

Server

Client

Server Application

Application

C-API (POSIX)

Security-wrapperGSI

SSL

UUEnc

Security-wrapper

GSI gSOAPSSL

TE

XT

ServerServiceUUEnc

gSOAP

C++ access library for gLite has been developed by ARDA

•High performance

•Protocol quite proprietary...

Essential for the ALICE prototype

Generic enough for general use

Using this API grid commands have been added seamlessly to the standard shell

Page 30: ARDA status and plans

gLiteIO shell

• Integrate gLite IO as virtual file system– Traps POSIX IO function calls and redirects them– No root access necessary– No recompilation of programs

• Not obvious which programs will work– Basic file IO works– Some standard program work – Editors don’t work– Postscript viewers don’t work

• Only data access– No job submission– No data management per se

Page 31: ARDA status and plans

ARDA Feedback

• Lightweight shell is important– Ease of installation– No root access– Behind NAT routers

• Shell goes together with the GAS – Should presents the user a simplified picture of the grid– Strong aspect of the architecture

Not everybody liked it when it was presented But “not everybody” implies that the rest liked the idea

– Role of GAS should be clarified

Page 32: ARDA status and plans

Work Load Management

ARDA has been evaluating two WMSs

• WMS derived from Alien – Task Queue – available since April– pull model– integrated with gLite shell, file catalog and package

manager

• WMS derived from EDG– available since middle of October– currently push model (pull model not yet possible but

foreseen)– not yet integrated with other gLite components (file

catalogue, package manager, gLite shell)

Page 33: ARDA status and plans

Stability

• Job queues monitored at CERN every hour by ARDA– 80% Success rate (Jobs don't do anything real)– Component support should not depend on single key persons

Page 34: ARDA status and plans

Job submission

Submitting of a user job to gLite- Register executable in the user bin directory

- Create JDL file with requirements

- Submit JDL

Straight forward, did not experience any problems

except system stability

Advanced features tested by ARDA• Job splitting based on the gLite file catalogue LFN hierarchy• Collection of outputs of split jobs in a master job directory

This functionality is widely used in the ARDA prototypes

Page 35: ARDA status and plans

ARDA Feedback

• Usage of WMS should to be transparent for the user – same JDL syntax – worker nodes should be accessible through both systems – same functionality– Integration to other gLite services

• JDL should be standardized on the design level – An API with submitJob(string) leaves place for a lot of

interpretation – There is clearly the place for obligatory and optional parameters

• Debugging features are essential for the user– Access to stdout/stderr for running jobs– Access to system logging information

Page 36: ARDA status and plans

Data Management

ARDA has been evaluating two DMSs

• gLite File Catalog (deployed in April)

– Allowed to access experiments data from CERN CASTOR and – with low efficiency– from the Wisconsin installation

– LFN name space is organized as a very intuitive hierarchical structure

– MySQL backend

• Local File Catalogue (Fireman) (deployed in November)

– Just delivered to us– gliteIO– Oracle backend

Page 37: ARDA status and plans

Performance

• gLite File catalog– Good performance due to streaming– 80 concurrent queries, 0.35 s/query, 2.6s startup time

• Fireman catalog– First attempt to use the catalog: quite high entrance fee– Good performance – Not yet stable results due to unexpected crashes

We are interacting with the developers

FC insertion

Attach MD

FilesF

ile

s/s

[1

/s]

0

2

4

6

8

10

1000 10000 100000

Errors

Time to completion

# Clients

Tim

e t

o C

om

ple

tio

n [

s]

Selecting 2.5k files of 10k

0

5

10

15

20

25

30

35

40

0 20 40 60 80 100

Page 38: ARDA status and plans

Fireman tests

• Single entries up to 100000– Successful, but no stable performance numbers yet– Time outs in reading back (ls)– Erratic values for bulk insertion

• Bulk registration– After some crashes, it seems to work more stable– No statistics yet

• Bulk registration as a transaction– In case of error, no file is registered (OK)

• First draft note ready (ARDA site)

Page 39: ARDA status and plans

gliteIO

• Simple test procedure– Create small random file– copy to SE and read it

back– Check if it still ok– Repeat that until one

observes a problem

• A number of crashes observed– From the client side the

problem cannot be understood

– In one case, a data corruption has been observed

– We are interacting with the developers

gLiteIO tests

• Try to copy 1000 files of 0 to 10KB

• On average an error occurred after 64 Files• About 10 different error messages observed ….

0

50

100

150

200

250

300

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Series1

Page 40: ARDA status and plans

ARDA Feedback

• We keep on testing the catalogs– We are in contact with the developers

• Consider a “clean” C++ API for the catalogs– Hide the SOAP toolkit– Probably handcrafted– Or is there a better toolkit ????

• gLiteIO has to be rock stable

Page 41: ARDA status and plans

Package management

• Multiple approaches exist for handling of the experiment software and user private packages on the Grid

– Pre-installation of the experiment software is implemented by a site manager with further publishing of the installed software. Job can run only on a site where required package is preinstalled.

– Installation on demand at the worker node. Installation can be removed as soon as job execution is over.

• Current gLite package management implementation can handle light-weight installations, close to the second approach

• Clearly more work has to be done to satisfy different use cases

Page 42: ARDA status and plans

Metadata

• gLite has provided a prototype interface and implementation mainly for the Biomed community

• The gLite file catalog has some metadata functionality and has been tested by ARDA– Information containing file properties (file metadata

attributes) can be defined in a tag attached to a directory in the file catalog.

– Access to the metadata attributes is via gLite shell– Knowledge of schema is required– No schema evolution

• Can these limitations be overcome?

Page 43: ARDA status and plans

ARDA Metadata

• ARDA preparatory work– Stress testing of the existing experiment metadata

catalogues was performed – Existing implementations showed to share similar problems

• ARDA technology investigation– On the other hand usage of extended file attributes in

modern systems (NTFS, NFS, EXT2/3 SCL3,ReiserFS,JFS,XFS) was analyzed:

a sound POSIX standard exists!

– Presentation in LCG-GAG and discussion with gLite– As a result of metadata studies a prototype for a metadata

catalogue was developed

Page 44: ARDA status and plans

ARDA metadata prototype: performances

• Tested operations: – query catalogue by meta attributes– attaching meta attributes to the files

– LHCb starting to use it

gLite(Alien)

ARDA

Tim

e t

o C

om

ple

tio

n [

s]

Selecting 2.5k files of 10k

0# Clients

20 40 60 80 100

5

25

10

15

20

30

35

40

Crashes

Files per directory

File

s / s

[1

/s]

Attach Metadata to Files

0

5

10

15

20

25

30

1000 10000 100000

ARDAAlien

Page 45: ARDA status and plans

Other activities

Page 46: ARDA status and plans

ARDA workshops and related activities

• ARDA workshop (January 2004 at CERN; open)• ARDA workshop (June 21-23 at CERN; by invitation)

– “The first 30 days of EGEE middleware”

• NA4 meeting (15 July 2004 in Catania; EGEE open event) • ARDA workshop (October 20-22 at CERN; open)

– “LCG ARDA Prototypes”

• NA4 meeting 24 November (EGEE conference in Den Haag)• ARDA workshop (Early 2005; open)

• Sharing of the AA meeting (Wed afternoon) to start soon (recommendation of the ARDA workshop)

• gLite documents discussions fostered by ARDA (review process, workshop, invitation of the experiments to the EGEE PTF…)

• GAG meetings

Page 47: ARDA status and plans

Den Haag

Summary Application Requirements for gLite 14

Enabling Grids for E-sciencE

INFSO-RI -508833

Discussion summary

• Discussion on proposed gLite components and services proposed for Release Candidate 1, from an application point of view (2)

– The only real input at this point of time “from an application point of view” should be extracted from the transparencies of the 3 applications

– There is a general agreement within NA4 that our main interest is to expose the different user communities to the gLite system on its way to maturity

– Within NA4, the enlargement of the gLite prototype up to the scale of the ALICE offer would be interesting provided that:

This process requires precise definition and then be managed by EGEE to avoid divergence and entropy

NA4 user communities should be allowed to access this infrastructure to continue their activity on a larger scale

Summary Application Requirements for gLite 13

Enabling Grids for E-sciencE

INFSO-RI -508833

Discussion summary

• Discussion on proposed gLite components and services proposed for Release Candidate 1, from an application point of view (1)– Definition of what should be in Release Candidate 1.0

– Discussion focusing mostly on the ALICE proposal to extend the prototype Starting point: current snapshot of the gLite mw installed on the

prototype install on ALICE resources

Deployment effort: ALICE resources

– As a matter of fact we discussed almost exclusively this last issue Almost unanimous concern of altering the current schedule and

procedure (delays, lower final quality)

Almost unanimous concern to the possibility of such a deploymentwithout side effects (effort) in the gLite team

ARDA is preparing (after having discussed

with the experiments interface) its wish list

for the RC 1.0

Page 48: ARDA status and plans

People

• Massimo Lamanna• (EGEE NA4 Frank Harris) • Birger Koblitz

• Derek Feichtinger• Andreas Peters

• Dietrich Liko• Frederik Orellana

• Julia Andreeva• Juha Herrala

• Andrew Maier• Kuba Moscicki

– Andrey Demichev– Viktor Pose– Alex Berejnoi (CMS)

– Wei-Long Ueng– Tao-Sheng Chen

• 2 PhD students (just starting)• Many students requests

LHCb

CMS

ATLAS

ALICE

Russia

Taiwan

Experiment interfaces

Piergiorgio Cerello (ALICE)David Adams (ATLAS)Lucia Silvestris (CMS)Ulrik Egede (LHCb)

Visitors

Page 49: ARDA status and plans

Milestone table

WBS Task Date due Date done

1.6.2 E2E ALICE prototype definition agreed with the experiment

31-05-04 31-05-04

1.6.3 E2E ATLAS prototype definition agreed with the experiment

31-05-04 31-05-04

1.6.4 E2E CMS prototype definition agreed with the experiment 31-05-04 06-12-04

1.6.5 E2E LHCb prototype definition agreed with the experiment

31-05-04 31-05-04

1.6.6 E2E ALICE prototype using basic EGEE middleware 30-09-04 30-09-04

1.6.7 E2E ATLAS prototype using basic EGEE middleware 30-09-04 30-09-04

1.6.8 E2E CMS prototype using basic EGEE middleware 30-09-04 30-09-04

1.6.9 E2E LHCb prototype using basic EGEE middleware 30-09-04 30-09-04

1.6.14 E2E prototype for ALICE, capable of analysis 31-12-04

1.6.15 E2E prototype for ATLAS, capable of analysis 31-12-04

1.6.16 E2E prototype for CMS, capable of analysis 31-12-04

1.6.17 E2E prototype for LHCb, capable of analysis 31-12-04

Page 50: ARDA status and plans

• Agreed with F. Hemmer and E. Laure– Fix the “misalignment” problem

For each experiment

• End of March– use the gLite middleware (beta) on the extended prototype (eventually the pre-production service) (beta) and

provide feedback (technical issues and collect high-level comments and experience from the experiments)• End of June

– use the gLite middleware (version 1.0) on the extended prototype (eventually the pre-production service) and provide feedback (technical issues and collect high-level comments and experience from the experiments)

• End of September – use the gLite middleware (version 1.1) on the extended prototype (eventually the pre-production service) and

provide feedback (technical issues and collect high-level comments and experience from the experiments)• End of December

– use the gLite middleware (version 1.2 - release candidate 2) on the extended prototype (eventually the pre-production service) and provide feedback (technical issues and collect high-level comments and experience from the experiments)

• ARDA will continue to organise workshops and facilitate meetings across the different components (gLite middleware, experiments, other software providers). The suggested format is a full workshop every 6 months and a regular (every fortnight) video conference. The workshop will be fixed according to the needs (the 6 month pace is a general guideline). As for 2004, no real milestone is associated to these tasks.

Milestone 2005

Realistically, our horizon is here:

rediscuss at the end of Q1

Page 51: ARDA status and plans

Conclusions• ARDA has been set up to

– enable distributed HEP analysis on gLite– Contact have been established

With the experiments With the middleware

• Experiment activities are progressing rapidly– Prototypes for LHCb, ALICE, ATLAS & CMS are on the way– Complementary aspects are studied– Good interaction with the experiments environment– Desperately seeking for users! (more interested in physics than in mw… we support them!)

• ARDA is providing early feedback to the development team– First use of components– Try to run real life HEP applications– Follow the development on the prototype

• Some of the experiment-related ARDA activities could be of general use– Shell access (originally in ALICE/ARDA)– Metadata catalog (under test in LHCb/ARDA)– (Pseudo)-interactivity interesting issue (something in/from all experiments)