ARDA status and plans
description
Transcript of ARDA status and plans
INFSO-RI-508833
Enabling Grids for E-sciencE
http://arda.cern.ch
ARDA status and plansMassimo Lamanna / CERN
Overview
• ARDA prototypes– 4 experiments
• ARDA feedback– Middleware components on the development test bed
• ARDA workshops• ARDA in Den Haag
– 2nd EGEE conference
• ARDA personnel and milestones – Status– 2005 (proposal)
• Conclusions
The ARDA project
• ARDA is an LCG project– main activity is to enable LHC analysis on the grid– ARDA is contributing to EGEE NA4
uses the entire CERN NA4-HEP resource
• Interface with the new EGEE middleware (gLite)– By construction, use the new middleware– Use the grid software as it matures– Verify the components in an analysis environments (users!)– Provide early and continuous feedback
ARDA prototypes
Prototype overview
LHC
Experiment
Main focus Basic prototype component
Experiment analysis
application framework
Middleware
GUI to Grid GANGA DaVinci
Interactive analysis
PROOF
ROOT
AliROOT
High level services
DIAL Athena
Exploit native gLite
functionalityAligned with
APROM ORCA
ARDA contributions
• Integrating with gLite– Enabling job submission through GANGA to gLite– Job splitting and merging– Result retrieval
• Enabling real analysis jobs to run on gLite – Running DaVinci jobs on gLite (custom code: user
algorithms)– Installation of LHCb software using gLite package manager
• Participating in the overall development of Ganga– Software process (initially)
CVS, Savannah, Release Managment
– Mayor contribution in new versions CLI, “Ganga clients”
Related activities
• GANGA-DIRAC (LHCb production system) – Convergence with GANGA/components/experience– Submitting jobs to DIRAC using GANGA
• GANGA-Condor – Enabling submission of jobs through GANGA to Condor
• LHCb Metadata catalogue performance tests – In collaboration with
colleagues from Taiwan– New activity started using the
ARDA metadata prototype (newversion, collaboration with gridPP people)
Current Status
• GANGA job submission handler for gLite is developed• DaVinci job runs on gLite submitted through GANGA
Demo in Rio and
Den Haag
Presented in the LHCb
software week
Ganga clients
ALICE prototype
ROOT and PROOF• ALICE provides
– the UI– the analysis application (AliROOT)
• GRID middleware gLite provides all the rest
• ARDA/ALICE is evolving the ALICE analysis system
UI shell ApplicationMiddleware
end to end
USER SESSIONUSER SESSION
PROOFPROOF
PROOF SLAVESPROOF SLAVES
PROOF MASTER SERVERPROOF MASTER SERVER
PROOF SLAVESPROOF SLAVES
Site A Site C
Site B
PROOF SLAVESPROOF SLAVES
Interactive Session
Demo in the
ALICE sw week
Demo at Supercomputin
g 04 and Den Haag
Current Status
• Developed gLite C++ API and API Service – providing generic interface to any GRID service
• C++ API is integrated into ROOT – will be added to the next ROOT release – job submission and job status query for batch analysis can be done from inside ROOT
• Bash interface for gLite commands with catalogue expansion is developed– More powerful than the original shell– Ready for integration– Considered a “generic” mw contribution (essential for ALICE, interesting in general)
• First version of the interactive analysis prototype is ready
• Batch analysis model is improved– submission and status query are integrated into ROOT– job splitting based on XML query files– application (Aliroot) reads file using xrootd without prestaging
ATLAS/ARDA
• Main component:– Contribute to the DIAL evolution
gLite analysis server
• “Embedded in the experiment”– AMI tests and interaction– Production and CTB tools– Job submission (ATHENA jobs)– Integration of the gLite Data Management within Don Quijote
• Benefit from the other experiments prototypes– First look on interactivity/resiliency issues
“Agent-based” approach (a` la DIRAC)
– GANGA (Principal component of the LHCb prototype, key component of the overall ATLAS strategy)
Presentations tomorrow
(ADA meeting)
SE
Nordugrid
DQ server
RLS
DQ Client
DQ server DQ serverDQ server
RLSRLS SESE
SERLS
gLiteLCGGRID3
Data Management
Don Quijote
Locate and move data over grid boundaries
ARDA has connected gLite
Presentation tomorrow
(ADA meeting)
ATCOM @ CTB
• Combined Testbeam– Various extensions
were made to accommodate the new database schema used for CTB data analysis.
– New panes to edit transformations, datasets and partitions were implemented.
• Production System– A first step is to
provide a prototype with limited functionality, but support for the new production system.
Presentation tomorrow
(ADA meeting)
Combined Test Beam
Example:
ATLAS TRT data analysis done by PNPI St Petersburg
Number of straw hits per layer
Real data processed at gLiteStandard Athena for testbeam
Data from CASTOR
Processed on gLite worker node
ATLAS: first look in interactivity matters
Presentation tomorrow
(ADA meeting)
CMS Prototype
• Aims to end-to-end prototype for CMS analysis jobs on gLite
• Native middleware functionality of gLite
• Only for few CMS specific tasks on top of the middleware
RefDB PubDB
Workflow planner with gLite
back-end and command line UI
gLite
Dataset and owner name
defining CMS data collection
Points to the corresponding PubDB where POOL catalog for a given data collection is published
POOL catalog and a set of COBRA META files
Register required info in gLite catalog
Creates and submits jobs to gLite,
Queries their status
Retrieves output
CMS - Using MonAlisafor user job monitoring A single job
Is submiited to gLite
JDL contains job-splitting instructionsMaster job is
splitted by gLite into sub-jobs
Dynamicmonitoringof the totalnumber of
the events ofprocessed by
all sub-jobsbelonging to the same Master job
Demo at Supercomputing
04
CMS – getting output from gLite
• When the jobs are over the output files created by all sub-jobs belonging to the same master are retrieved by the Workflow Planner to the directory defined by the user.
• On user request output files are merged by the Workflow Planner (currently implemented
for Root trees and histograms). • Root session is started by the Workflow Planner.
Presentation and demo
Friday (APROM meeting)
Related Activities
• Job submission to gLite by PhySH– Physicist Shell– Integrates Grid Tools– Collaboration with CLARENS
• ARDA participates also in …
– Evolvement of PubDB Effective access to data
– Redesign of RefDB Metadata catalog
Prototype overview
LHC
Experiment
Prototype activity
Main concern Users (not belonging to the developer
circles!)
Middleware
OK
Prototype size
Ready to have users (1 so far)
OK Prototype size Ready to have users
OK Prototype size Ready to have users (2 for the
CTB data)
OK
Prototype size Ready to have users
Middleware feedback
Prototype Deployment
Currently 34 worker nodes are available at CERN 10 nodes (RH7.3, PBS) 20 nodes (low end, SLC, LSF) 4 nodes (high end, SLC, LSF) 1 node is available in Wisconsin
• Number of CPUs will increase • Number of sites will increase
• FZK Karlsruhe is preparing to connect another site – Basic middleware components already installed– One person hired (6-month contract) up and running– One person to arrive in January
• Further extensions are under discussion right now
Access granted on May 18th !
Access &Authorization
• gLite uses Globus Grid-Certificates(X.509) to authenticate + authorize, session not encrypted
• VOMS is used for VO Management
• Getting access to gLite for a new user is often painful due to registration problems
• It takes minimum one day … for some it can take up to two weeks!
Accessing gLite
Easy access to gLite considered very important
Three shells available
• Alien shell• ARDA shell• gLiteIO shell
Too many…
Alien shell
• Access through gLite-Alien shell- User-friendly Shell implemented in Perl- Shell provides a set of Unix-like commands and a set of
gLite specific commands
• Perl API
- no API to compile against, but Perl-API sufficient for tests,
though it is poorly documented
Perl-Shell
GASGSI
GSI
PE
RL
File-Catalogue
Storage-Manager
Compute-Element
ARDA shell + C/C++ API
Server
Client
Server Application
Application
C-API (POSIX)
Security-wrapperGSI
SSL
UUEnc
Security-wrapper
GSI gSOAPSSL
TE
XT
ServerServiceUUEnc
gSOAP
C++ access library for gLite has been developed by ARDA
•High performance
•Protocol quite proprietary...
Essential for the ALICE prototype
Generic enough for general use
Using this API grid commands have been added seamlessly to the standard shell
gLiteIO shell
• Integrate gLite IO as virtual file system– Traps POSIX IO function calls and redirects them– No root access necessary– No recompilation of programs
• Not obvious which programs will work– Basic file IO works– Some standard program work – Editors don’t work– Postscript viewers don’t work
• Only data access– No job submission– No data management per se
ARDA Feedback
• Lightweight shell is important– Ease of installation– No root access– Behind NAT routers
• Shell goes together with the GAS – Should presents the user a simplified picture of the grid– Strong aspect of the architecture
Not everybody liked it when it was presented But “not everybody” implies that the rest liked the idea
– Role of GAS should be clarified
Work Load Management
ARDA has been evaluating two WMSs
• WMS derived from Alien – Task Queue – available since April– pull model– integrated with gLite shell, file catalog and package
manager
• WMS derived from EDG– available since middle of October– currently push model (pull model not yet possible but
foreseen)– not yet integrated with other gLite components (file
catalogue, package manager, gLite shell)
Stability
• Job queues monitored at CERN every hour by ARDA– 80% Success rate (Jobs don't do anything real)– Component support should not depend on single key persons
Job submission
Submitting of a user job to gLite- Register executable in the user bin directory
- Create JDL file with requirements
- Submit JDL
Straight forward, did not experience any problems
except system stability
Advanced features tested by ARDA• Job splitting based on the gLite file catalogue LFN hierarchy• Collection of outputs of split jobs in a master job directory
This functionality is widely used in the ARDA prototypes
ARDA Feedback
• Usage of WMS should to be transparent for the user – same JDL syntax – worker nodes should be accessible through both systems – same functionality– Integration to other gLite services
• JDL should be standardized on the design level – An API with submitJob(string) leaves place for a lot of
interpretation – There is clearly the place for obligatory and optional parameters
• Debugging features are essential for the user– Access to stdout/stderr for running jobs– Access to system logging information
Data Management
ARDA has been evaluating two DMSs
• gLite File Catalog (deployed in April)
– Allowed to access experiments data from CERN CASTOR and – with low efficiency– from the Wisconsin installation
– LFN name space is organized as a very intuitive hierarchical structure
– MySQL backend
• Local File Catalogue (Fireman) (deployed in November)
– Just delivered to us– gliteIO– Oracle backend
Performance
• gLite File catalog– Good performance due to streaming– 80 concurrent queries, 0.35 s/query, 2.6s startup time
• Fireman catalog– First attempt to use the catalog: quite high entrance fee– Good performance – Not yet stable results due to unexpected crashes
We are interacting with the developers
FC insertion
Attach MD
FilesF
ile
s/s
[1
/s]
0
2
4
6
8
10
1000 10000 100000
Errors
Time to completion
# Clients
Tim
e t
o C
om
ple
tio
n [
s]
Selecting 2.5k files of 10k
0
5
10
15
20
25
30
35
40
0 20 40 60 80 100
Fireman tests
• Single entries up to 100000– Successful, but no stable performance numbers yet– Time outs in reading back (ls)– Erratic values for bulk insertion
• Bulk registration– After some crashes, it seems to work more stable– No statistics yet
• Bulk registration as a transaction– In case of error, no file is registered (OK)
• First draft note ready (ARDA site)
gliteIO
• Simple test procedure– Create small random file– copy to SE and read it
back– Check if it still ok– Repeat that until one
observes a problem
• A number of crashes observed– From the client side the
problem cannot be understood
– In one case, a data corruption has been observed
– We are interacting with the developers
gLiteIO tests
• Try to copy 1000 files of 0 to 10KB
• On average an error occurred after 64 Files• About 10 different error messages observed ….
0
50
100
150
200
250
300
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Series1
ARDA Feedback
• We keep on testing the catalogs– We are in contact with the developers
• Consider a “clean” C++ API for the catalogs– Hide the SOAP toolkit– Probably handcrafted– Or is there a better toolkit ????
• gLiteIO has to be rock stable
Package management
• Multiple approaches exist for handling of the experiment software and user private packages on the Grid
– Pre-installation of the experiment software is implemented by a site manager with further publishing of the installed software. Job can run only on a site where required package is preinstalled.
– Installation on demand at the worker node. Installation can be removed as soon as job execution is over.
• Current gLite package management implementation can handle light-weight installations, close to the second approach
• Clearly more work has to be done to satisfy different use cases
Metadata
• gLite has provided a prototype interface and implementation mainly for the Biomed community
• The gLite file catalog has some metadata functionality and has been tested by ARDA– Information containing file properties (file metadata
attributes) can be defined in a tag attached to a directory in the file catalog.
– Access to the metadata attributes is via gLite shell– Knowledge of schema is required– No schema evolution
• Can these limitations be overcome?
ARDA Metadata
• ARDA preparatory work– Stress testing of the existing experiment metadata
catalogues was performed – Existing implementations showed to share similar problems
• ARDA technology investigation– On the other hand usage of extended file attributes in
modern systems (NTFS, NFS, EXT2/3 SCL3,ReiserFS,JFS,XFS) was analyzed:
a sound POSIX standard exists!
– Presentation in LCG-GAG and discussion with gLite– As a result of metadata studies a prototype for a metadata
catalogue was developed
ARDA metadata prototype: performances
• Tested operations: – query catalogue by meta attributes– attaching meta attributes to the files
– LHCb starting to use it
gLite(Alien)
ARDA
Tim
e t
o C
om
ple
tio
n [
s]
Selecting 2.5k files of 10k
0# Clients
20 40 60 80 100
5
25
10
15
20
30
35
40
Crashes
Files per directory
File
s / s
[1
/s]
Attach Metadata to Files
0
5
10
15
20
25
30
1000 10000 100000
ARDAAlien
Other activities
ARDA workshops and related activities
• ARDA workshop (January 2004 at CERN; open)• ARDA workshop (June 21-23 at CERN; by invitation)
– “The first 30 days of EGEE middleware”
• NA4 meeting (15 July 2004 in Catania; EGEE open event) • ARDA workshop (October 20-22 at CERN; open)
– “LCG ARDA Prototypes”
• NA4 meeting 24 November (EGEE conference in Den Haag)• ARDA workshop (Early 2005; open)
• Sharing of the AA meeting (Wed afternoon) to start soon (recommendation of the ARDA workshop)
• gLite documents discussions fostered by ARDA (review process, workshop, invitation of the experiments to the EGEE PTF…)
• GAG meetings
Den Haag
Summary Application Requirements for gLite 14
Enabling Grids for E-sciencE
INFSO-RI -508833
Discussion summary
• Discussion on proposed gLite components and services proposed for Release Candidate 1, from an application point of view (2)
– The only real input at this point of time “from an application point of view” should be extracted from the transparencies of the 3 applications
– There is a general agreement within NA4 that our main interest is to expose the different user communities to the gLite system on its way to maturity
– Within NA4, the enlargement of the gLite prototype up to the scale of the ALICE offer would be interesting provided that:
This process requires precise definition and then be managed by EGEE to avoid divergence and entropy
NA4 user communities should be allowed to access this infrastructure to continue their activity on a larger scale
Summary Application Requirements for gLite 13
Enabling Grids for E-sciencE
INFSO-RI -508833
Discussion summary
• Discussion on proposed gLite components and services proposed for Release Candidate 1, from an application point of view (1)– Definition of what should be in Release Candidate 1.0
– Discussion focusing mostly on the ALICE proposal to extend the prototype Starting point: current snapshot of the gLite mw installed on the
prototype install on ALICE resources
Deployment effort: ALICE resources
– As a matter of fact we discussed almost exclusively this last issue Almost unanimous concern of altering the current schedule and
procedure (delays, lower final quality)
Almost unanimous concern to the possibility of such a deploymentwithout side effects (effort) in the gLite team
ARDA is preparing (after having discussed
with the experiments interface) its wish list
for the RC 1.0
People
• Massimo Lamanna• (EGEE NA4 Frank Harris) • Birger Koblitz
• Derek Feichtinger• Andreas Peters
• Dietrich Liko• Frederik Orellana
• Julia Andreeva• Juha Herrala
• Andrew Maier• Kuba Moscicki
– Andrey Demichev– Viktor Pose– Alex Berejnoi (CMS)
– Wei-Long Ueng– Tao-Sheng Chen
• 2 PhD students (just starting)• Many students requests
LHCb
CMS
ATLAS
ALICE
Russia
Taiwan
Experiment interfaces
Piergiorgio Cerello (ALICE)David Adams (ATLAS)Lucia Silvestris (CMS)Ulrik Egede (LHCb)
Visitors
Milestone table
WBS Task Date due Date done
1.6.2 E2E ALICE prototype definition agreed with the experiment
31-05-04 31-05-04
1.6.3 E2E ATLAS prototype definition agreed with the experiment
31-05-04 31-05-04
1.6.4 E2E CMS prototype definition agreed with the experiment 31-05-04 06-12-04
1.6.5 E2E LHCb prototype definition agreed with the experiment
31-05-04 31-05-04
1.6.6 E2E ALICE prototype using basic EGEE middleware 30-09-04 30-09-04
1.6.7 E2E ATLAS prototype using basic EGEE middleware 30-09-04 30-09-04
1.6.8 E2E CMS prototype using basic EGEE middleware 30-09-04 30-09-04
1.6.9 E2E LHCb prototype using basic EGEE middleware 30-09-04 30-09-04
1.6.14 E2E prototype for ALICE, capable of analysis 31-12-04
1.6.15 E2E prototype for ATLAS, capable of analysis 31-12-04
1.6.16 E2E prototype for CMS, capable of analysis 31-12-04
1.6.17 E2E prototype for LHCb, capable of analysis 31-12-04
• Agreed with F. Hemmer and E. Laure– Fix the “misalignment” problem
For each experiment
• End of March– use the gLite middleware (beta) on the extended prototype (eventually the pre-production service) (beta) and
provide feedback (technical issues and collect high-level comments and experience from the experiments)• End of June
– use the gLite middleware (version 1.0) on the extended prototype (eventually the pre-production service) and provide feedback (technical issues and collect high-level comments and experience from the experiments)
• End of September – use the gLite middleware (version 1.1) on the extended prototype (eventually the pre-production service) and
provide feedback (technical issues and collect high-level comments and experience from the experiments)• End of December
– use the gLite middleware (version 1.2 - release candidate 2) on the extended prototype (eventually the pre-production service) and provide feedback (technical issues and collect high-level comments and experience from the experiments)
• ARDA will continue to organise workshops and facilitate meetings across the different components (gLite middleware, experiments, other software providers). The suggested format is a full workshop every 6 months and a regular (every fortnight) video conference. The workshop will be fixed according to the needs (the 6 month pace is a general guideline). As for 2004, no real milestone is associated to these tasks.
Milestone 2005
Realistically, our horizon is here:
rediscuss at the end of Q1
Conclusions• ARDA has been set up to
– enable distributed HEP analysis on gLite– Contact have been established
With the experiments With the middleware
• Experiment activities are progressing rapidly– Prototypes for LHCb, ALICE, ATLAS & CMS are on the way– Complementary aspects are studied– Good interaction with the experiments environment– Desperately seeking for users! (more interested in physics than in mw… we support them!)
• ARDA is providing early feedback to the development team– First use of components– Try to run real life HEP applications– Follow the development on the prototype
• Some of the experiment-related ARDA activities could be of general use– Shell access (originally in ALICE/ARDA)– Metadata catalog (under test in LHCb/ARDA)– (Pseudo)-interactivity interesting issue (something in/from all experiments)