The ATLAS Tier2 Federation INFN

24
8-Giugno-2006 L.Perini Workshop CCR @ 1 The ATLAS Tier2 Federation INFN Aims, functions. structure Schedule Services and INFN Grid

description

The ATLAS Tier2 Federation INFN. Aims, functions. structure Schedule Services and INFN Grid. Layout. The Tier2s for ATLAS in Italy 3 slides from our dear Referee at CSN1 in April Structure and functions of the Federation The schedule for the near future Mostly SC but not only.. - PowerPoint PPT Presentation

Transcript of The ATLAS Tier2 Federation INFN

Page 1: The ATLAS Tier2 Federation  INFN

8-Giugno-2006 L.Perini Workshop CCR @ Otranto

1

The ATLAS Tier2 Federation INFN

Aims, functions. structure

Schedule

Services and INFN Grid

Page 2: The ATLAS Tier2 Federation  INFN

8-Giugno-2006 L.Perini Workshop CCR @ Otranto

2

Layout

• The Tier2s for ATLAS in Italy– 3 slides from our dear Referee at CSN1 in April

• Structure and functions of the Federation• The schedule for the near future

– Mostly SC but not only..

• The Grid tools and services– Relation with INFN production Grid

– Status of specific tools and services

• No mention of Money whatsoever….

Page 3: The ATLAS Tier2 Federation  INFN

8-Giugno-2006

Tier2 ATLAS (referee Forti @CSN1 Aprile)• Approvazione piena

• Roma1• Napoli, che non ha costi infrastrutturali e progetto solido

• Approvazione SJ• Milano, a cui si richiede

• il miglioramento e chiarimento del progetto infrastrutturale• reassessment della schedule di LHC (prevista per giugno 2006) ed

effettiva partenza della macchina

• Incubatore (Proto-TIER2)• LNF, le cui debolezze sono:

• finanziamento necessario significativo; manpower tecnico e tecnologo un po’ limitato, esperienza in grid da migliorare.

• Sia le sedi approvate che le altre dovranno essere sottoposte a verifiche periodiche• Se non funziona l’etichetta Tier2 viene tolta

Page 4: The ATLAS Tier2 Federation  INFN

8-Giugno-2006

Proposta dei referee @ CSN1 aprile• Il modello di calcolo proposto dagli esperimenti e’ ragionevole

• Il costo totale infrastrutturale e’ inferiore a quello che si poteva temere• La prudenza e le incertezze ci spingono ad approvare non più di 2 Tier2

adesso.• Le risorse dell’INFN sono limitate e sono un elemento ad oggi non ben noto.

• Rappresentano un punto di domanda in tutto quello che segue• Proponiamo tre livelli di approvazione:

• Approvazione piena• Approvazione SJ• Incubatore di Tier2 (Proto-Tier2)

• Le condizioni per la rimozione del SJ sono:• la sede deve risolvere i propri punti di debolezza• reassessment della schedule di LHC (prevista per giugno 2006) ed effettiva

partenza della macchina• tempistica O(6 mesi)

• Le condizioni per la l’uscita dell’incubatore sono:• la sede deve risolvere i propri punti di debolezza• mantenimento della schedule delle necessita’ di calcolo dell’esperimento• validazione del modello di calcolo distribuito dell’esperimento• Tempistica O(12 mesi)

Page 5: The ATLAS Tier2 Federation  INFN

8-Giugno-2006

Proposta dei referee @ CSN1 aprile• Le risorse di computing

• dovranno essere assegnate a tutte le sedi• per rispondere alle esigenze dell’esperimento• per mantenere attiva la comunita’ e partecipare a Grid ed

ai service/data challenge• per essere pronti al momento dell’arrivo dei dati

• dovranno essere pianificate attentamente• per evitare acquisti prematuri• per permettere ai gruppi italiani di prendersi le

responsabilita’ sul sw derivanti dall’impegno sull’hw.

• Entita’ del finanziamento da discutere• gli esperimenti devono a questo punto presentare

un piano aggiornato

Page 6: The ATLAS Tier2 Federation  INFN

8-Giugno-2006 L.Perini Workshop CCR @ Otranto

6

Tier2 Federation Structure

• Given the referee recommendation in the previous slide, the ATLAS federation includes also the Tier2 sj and the Tier2 inc– This choice is needed for organizing the practical work at hand– Organizing italian participation in SC4 (June-November) and the first

ATLAS large test of distributed analysis (October-November) is the nearest major function of the federation (see next slides)

• The analysis phase will require use training and opening of user accounts (also for remote user) with some disk space, for experimenting implementations of the analysis model

• ATLAS Italy expects a decision about sj in September• Thus using the Milan resources (experienced people and hw) for

supporting the about 20 users (>half of them from Genova, Pavia, Pisa, Udine) who will be active in the analysis phase and had proposed to insist on the Milan Tier2, looks to us the only rational way to follow, till the decision about sj is pending

• In case Referees/CSN1 etc. think we should proceed otherwise we expect to be told and to have the opportunity to discuss with them how to proceed

Page 7: The ATLAS Tier2 Federation  INFN

8-Giugno-2006 L.Perini Workshop CCR @ Otranto

7

Structure and setting up• ATLAS-Italy is setting up a Tier2 federation now

– Some aspects already defined some being defined– Some of the materials in these slides are fully agreed some are

proposals by me• A Federation Representative L.Perini (Mi)

– Typically 1 year mandate – rotation on Tier2• A pool of federation referents for specific items:

– Network: G. Lo Re (Na)– Sw distribution and related matters: A. De Salvo (Roma1)– SE and data architecture: still to be found…– Other areas may be identified in the next future– For each area local referents in all candidates Tier2

• Defaulting on the local Tier2 responsible

• Regular (be-weekly) short phone conf. between the Fed. Rep., the local Tier2 responsibles (or deputy) and the fed. experts being considered

Page 8: The ATLAS Tier2 Federation  INFN

8-Giugno-2006 L.Perini Workshop CCR @ Otranto

8

Aims and Functions - 1

• Facilitate interface with LCG, ATLAS - Grid, INFN Grid– Relation with INFN as Funding Agency stays primarily with the

National representative ( and with the computing national rep.)• L.Mandelli and L. Luminari

• Foster common solutions in the areas where choices are still to be made– E.g. choose how to implement the analysis model in Italy, as well as

which storage system and which local monitoring tools• Represent the Federation when a unique voice is required

• The functions on the next slide will be coordinated by the Grid area coordinator (in the ATLAS-Italy Computing structure it is L.Perini) but will require the active support by the Tier2 federation, especially for the initial phase

Page 9: The ATLAS Tier2 Federation  INFN

8-Giugno-2006 L.Perini Workshop CCR @ Otranto

9

Aims and Functions - 2• Organize ATLAS specific “Computing operation” work as far as

Grid/Tier2– E.g. operate efficiently the continuous ATLAS production via ProdSys,

thus freeing some more expert manpower for the needed tasks of new sw-mw testing and development

• Organize the training required for the above step

• Coordination of the ATLAS-Italy contribution to the deployment and development effort in the area of interfacing ATLAS-LCG-EGEE mw to the ATLAS sw– ATLAS use of VOMS, LCG-executor in ProdSys, ATLAS DDM

• On the first 2 items, the INFN effort is already the biggest one in ATLAS, but more is needed

– To be done in close contact with ATLAS global and the ATLAS-Italy Computing representative

– See next slide for the needs

Page 10: The ATLAS Tier2 Federation  INFN

8-Giugno-2006 L.Perini Workshop CCR @ Otranto

10

Status of ATLAS developments in the LCG-EGEE area

• ATLAS is about to start an action via International Computing Board and National Representatives to address a situation felt as increasingly risky– Manpower shortage on the ATLAS collaboration side to

make full use of the LCG-EGEE mw, and to be able to proactively integrate and validate new functionality into the ATLAS applications running on the LCG-EGEE Grid.

– “Hero model”• INFN is today one of the major contributors but we are relaying on too

few overloaded people, part of them shared with the EGEE work (which funds them).

• Enlarging the pool of Grid developers, experts deployers and operators is mandatory also for ATLAS-Italy

Page 11: The ATLAS Tier2 Federation  INFN

8-Giugno-2006 L.Perini Workshop CCR @ Otranto

11

Schedule for next future• Largely determined by the global schedule set up by

ATLAS– SC4 next big engagement (see next slides)

• All the 4 existing sites are willing to participate

• Count on SE certificates for Naples and LNF coming soon

– For Naples came last Monday

– ATLAS continuous production is part of it

– Distributed Analysis first tests scheduled for October-November are of extreme interest for our community

• Some Italy specific work is scheduled too– Most important Calibration, not going here in any details…

Page 12: The ATLAS Tier2 Federation  INFN

HEPiX Rome 05apr06

LCG

[email protected]

SC4 – the Pilot LHC Service from June 2006

A stable service on which experiments can make a full demonstration of experiment offline chain

DAQ Tier-0 Tier-1data recording, calibration, reconstruction

Offline analysis - Tier-1 Tier-2 data exchangesimulation, batch and end-user analysis

And sites can test their operational readiness Service metrics MoU service levels Grid services Mass storage services, including magnetic tape

Extension to most Tier-2 sites

Evolution of SC3 rather than lots of new functionality

In parallel – Development and deployment of distributed database services (3D

project) Testing and deployment of new mass storage services (SRM 2.1)

Page 13: The ATLAS Tier2 Federation  INFN

HEPiX Rome 05apr06

LCG

[email protected]

LCG Service Deadlines

full physicsrun

first physics

cosmics

2007

2008

2006Pilot Services – stable service from 1 June 06

LHC Service in operation – 1 Oct 06 over following six months ramp up to full operational capacity & performance

LHC service commissioned – 1 Apr 07

Page 14: The ATLAS Tier2 Federation  INFN

8-Giugno-2006 L.Perini Workshop CCR @ Otranto

14

ATLAS SC4 Schedule• June :19 June till 7 July send 772 MB/sec "Raw" (at 320 MB/s), ESD

(at 252 MB/s) and AOD (at 200 MB/s) from Tier 0 to Atlas Tier 1 sites, a total of 90K files per day. The "raw" to go to tape. The Tier2 subscribe fake AOD (20MB/sec) CDP=Continuous distributed production of 2M MC events/week requiring 2700 KSi2K.– CDP is being active in the last months (next slide by Ian Bird from SA1

talk in May Final EGEE EU review) All Tier2 INFN involved– Operated for > 50% by INFN people (<3!) on LCG resources

• July: Distributed reconstruction setting up using local stagein from tape (1-2 drives required). CDP

• August:Two 3-day slots of distributed reconstruction using local stagein from tape (1-2 drives required). Distributed analysis tests - 20 MB/sec incoming at each Tier 1. no CDP?

• September: Tier 0 internal tests CDP• October: Distributed reprocessing tests - 20 MB/sec incoming at each

Tier 1. AOD to Tier2s CDP• November: Distributed analysis tests - 20 MB/sec incoming at each Tier

1 at the same time as distributed reprocessing continues. Massive Tier2 involvement. CDP

Page 15: The ATLAS Tier2 Federation  INFN

Ian Bird, SA1, EGEE Final Review 23-24th May 2006 15

Enabling Grids for E-sciencE

INFSO-RI-508833

Use of the infrastructure

Total

non-LCG0

5000

10000

15000

20000

25000

30000

35000

Jan-05 Feb-05 Mar-05 Apr-05 May-05 Jun-05 Jul-05 Aug-05 Sep-05 Oct-05 Nov-05 Dec-05 Jan-06 Feb-06 Mar-06 Apr-06

No

. jo

bs/

day

CPU - cpu-years/month

0

50

100

150

200

250

300

Jun-05 Jul-05 Aug-05 Sep-05 Oct-05 Nov-05 Dec-05 Jan-06 Feb-06 Mar-06 Apr-06

cpu-

year

/ m

onth

CPU time delivered

0

500,000

1,000,000

1,500,000

2,000,000

2,500,000

3,000,000

Jun-05 Jul-05 Aug-05 Sep-05 Oct-05 Nov-05 Dec-05 Jan-06 Feb-06 Mar-06 Apr-06

SI2

K-h

ou

rs/m

on

th

lhcb

geant4

cms

biomed

atlas

alice

Sustained & regular workloads of >30K jobs/day• spread across full infrastructure• doubling/tripling in last 6 months – no effect on operations

Sustained & regular workloads of >30K jobs/day• spread across full infrastructure• doubling/tripling in last 6 months – no effect on operations

Page 16: The ATLAS Tier2 Federation  INFN

8-Giugno-2006 L.Perini Workshop CCR @ Otranto

16

Phase 19-6 to 8-7• Basically the first distributed test of ATLAS DDM (DQ2)• All Tier1’s involved + some Tier2’s (many?)• VOBOX only in Tier1, DQ2 servers

– Data is shipped from Castor @ CERN, using FTS, to a storage area at a site. This is dummy data (no physics value), so sites may scratch it later fake ESD). Sites must report the SRM host/path where this data is to be written. In addition, we will use the LFC catalogs already available per Tier1 to catalog this dummy data - as with the real system.

– DQ2 will be used to submit, manage and monitor - hopefully without significant user intervention - the Tier1 export. DQ2 is based on the concept of dataset subscriptions: a site is subscribed by the Tier0 management system @ CERN to a dataset that has been reprocessed. The DQ2 site service running at the site's VO BOX will then pick up subscriptions, submit and manage the corresponding FTS requests.

– Tier2 will subscribe for fake AOD (20 MB/s target)

Page 17: The ATLAS Tier2 Federation  INFN

8-Giugno-2006 L.Perini Workshop CCR @ Otranto

17

Tier2 INFN in SC4

• I 4 siti sono tutti coinvolti• Importante anche per acquisire esperienza su

ATLAS DDM (nuovo!)• Nella fase fino a 8 luglio i dati sono fake, ma tanti

– 1.6 TB al giorno se si raggiunge il target – Lo spazio disco oggi mediamente libero su disco non ci

basta neppure per 2 giorni …• Sono dati fake, li ripuliremo in continuazione e sopravvivremo

• Da ottobre i dati saranno veri– Disporre di disco aggiuntivo diventerà allora

indispensabile

Page 18: The ATLAS Tier2 Federation  INFN

8-Giugno-2006 L.Perini Workshop CCR @ Otranto

18

ATLAS SC4• ATLAS intende utilizzare SC4

1. Come test di trasferimento dati via rete (prima fase)2. Ma soprattutto come test dei diversi aspetti del suo modello di

calcolo (in particolare per la seconda fase)• Il punto 2 richiede per ATLAS sw e mw che al 1-6 non è

ancora in “produzione”– Mw: RB gLite, nuovo FTS, VOMS enabled fair share…– Sw ATLAS : varie parti di DDM (DQ2), analysis system with

friendly interface (abbiamo invece Production System)• Ritardo (sia gLite 3.0 che DQ2) rispetto a schedula

originale • Si lavora per avere da Ottobre un sistema “production-

like”– Non facile ma possibile, magari con paio di mesi shift?– Poi servirà ancora parecchio sviluppo e sforzo per portare le nuove

features a production level

Page 19: The ATLAS Tier2 Federation  INFN

8-Giugno-2006 L.Perini Workshop CCR @ Otranto

19

Services-tools and INFN Grid• Relay on the tools and services developed by

EGEE-LCG (INFN Grid) as much as possible• Take advantage of all the possible synergies with

INFN Grid Operation structure• Full integration in the ATLAS-LCG-EGEE system

– Relatively easy in ATLAS as some insulation of Eu-Grid from US-Grid and Nord-Grid is built in the ATLAS system

• Specific tools and services dealt with in the next slides

Page 20: The ATLAS Tier2 Federation  INFN

8-Giugno-2006 L.Perini Workshop CCR @ Otranto

20

RB,CE, SE,FTS,LFC,VObox• ATLAS is using RB and Condor-G on the LCG

resources• US and NorduGrid use different submission systems• The 3 interfaces (“executors”) are part of the same ATLAS ProdSys

– With Condor-G friendly competition, INFN people fully engaged in RB use as developers and operators

– The Condor-G workers are even less than our people..it is helping us in winning the competition…not good…

• Our WMS interface (“Lexor executor”) is now adapted to the new gLite RB

• Test RB servers with all last fix at Milan and CNAF seem ok NOW

– Ready to start production with it in the next days!– Thanks also to the work of the ATLAS-LCG-EGEE task

force

Page 21: The ATLAS Tier2 Federation  INFN

8-Giugno-2006 L.Perini Workshop CCR @ Otranto

21

RB,CE, SE,FTS,LFC,VObox• We are using the LCG CE

– gLite and CREAM CE have some interesting features– Plan to test them on Pre-Prod TB in the Task Force

• Different SE are in use – For SC4 in INFN Tier2 will be DPM as SRM is

needed

• ATLAS DDM uses FTS and LFC both as central and distributed catalogue– ATLAS VOboxes are only at the Tier1’s and include

only “less risk category services” (=class 1)– FTS plugins are explored as a possibility for making

VObox “thinner”… still way to go…

Page 22: The ATLAS Tier2 Federation  INFN

8-Giugno-2006 L.Perini Workshop CCR @ Otranto

22

VOMS, Accounting, HLR, Job Priority

• ATLAS needs a system that• acknowledges the existence of VOMS groups and roles as defined by  the VO;• uses the priorities as defined by sites and VO to distribute jobs; • uses the VOMS groups as a basis for data storage.

– The CPU and storage usage has to be accounted at the group and user level

• These functions should not relay on a unique central DB• The accounting tool we plan for is the merged APEL+DGAS

– Site HLR needed– Test in ATLAS TF asap: exploting the setting up already done in INFN

GRID (HLR etc) In production in October?????

• For Job priority and fair share the only promising tool I know is GPbox– Preview TB testing foreseen in the TF, production timing to be understood

Page 23: The ATLAS Tier2 Federation  INFN

8-Giugno-2006 L.Perini Workshop CCR @ Otranto

23

Monitoring and (local) management tools

• The only ATLAS specific monitoring tools are now for jobs monitoring using the ProdSys DB – Understand what need to be developed in addition to

GridIce and DGAS– Favour adopting solutions already is use in INFN and

common development if needed • Storage monitoring looks a general need…

– Participate in DGAS testing….

• In any case it would be difficult to find ATLAS manpower for developing new solutions here…

Page 24: The ATLAS Tier2 Federation  INFN

8-Giugno-2006 L.Perini Workshop CCR @ Otranto

24

Conclusion

• The months from here to the end of 2006 are critical for setting up the ATLAS data and analysis system– And have italian users start exploiting them

• A lot of work to be done

• The federation will have an important role in helping organise the ATLAS-Italy effort in these areas– as well as in setting up the tools, services and structures needed for

managing and running the Tier2 themselves

• Our plan intend to use all our human and hw resources in the most efficient way for ATLAS-Italy as a whole