18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

30
18-sep-02 1 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste

Transcript of 18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

Page 1: 18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

18-sep-02 1

Computing for CDFStatus and Requests for 2003

Stefano BelforteINFN – Trieste

Page 2: 18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

CSN1 18-sep-02Stefano Belforte – INFN Trieste

CDF computing 2

The CDF-Italy Computing Plan

Presented on June 24, 2002 Referees (and CSN1) postponed discussion/approval

until November 2002: decide based on experience Collecting experience now No reason to modify plan so far

Today: Status report on analysis farm at FNAL Update on work toward de-centralization

GRID - CNAF Progress toward MOU/MOF Rational for 2003 requests

Page 3: 18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

CSN1 18-sep-02Stefano Belforte – INFN Trieste

CDF computing 3

Status of CAF

FNAL Central Analysis Farm (CAF): a big success so far Easy to use Effective Convenient

Measure of success 100% used now Upgrade in progress Many institutions spending their $$$ there Cloning started (Korea)

Page 4: 18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

CSN1 18-sep-02Stefano Belforte – INFN Trieste

CDF computing 4

CDF Central Analysis Farm

Compile/link/debug everywhere Submit from everywhere Execute @ FNAL

Submission of N parallel jobs with single command

Access data from CAF disks now

Access tape data via transparent cache soon now

Get job output everywhere Store small output on local

scratch area for later analysis Access to scratch area from

everywhere

IT WORKS NOW

FNAL

Local Data servers A pile of PC’s

My Desktop

My favorite Computer

gateway

ftp

switch

jobLog

out

out

NFS

rootd

N jobs

rootd

scratchserver

Page 5: 18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

CSN1 18-sep-02Stefano Belforte – INFN Trieste

CDF computing 5

Tape to Disk to CPU

2TB/day From disk

From tape

Days in September 2002

“Spec. from 2000 review”:

Disk cache should satisfy 80% of all data requests

Page 6: 18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

CSN1 18-sep-02Stefano Belforte – INFN Trieste

CDF computing 6

CAF promise fulfilled

Giorgio Chiarelli runs 100 section jobs and integrates 120x7x24x3% = 600 CPU hours in a few days using up to more then half the full CAF at the same time

Go through 1TB of data in a few hours

All of this with one single few lines script that automaticallydivides the input among the various job sections

Made in Italy

Page 7: 18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

CSN1 18-sep-02Stefano Belforte – INFN Trieste

CDF computing 7

Monitoring jobs and sections on the Web

Made in Italy

Page 8: 18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

CSN1 18-sep-02Stefano Belforte – INFN Trieste

CDF computing 8

Managing user’s area on CAF O(100GB)

Made in Italy

Page 9: 18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

CSN1 18-sep-02Stefano Belforte – INFN Trieste

CDF computing 9

CAF this summer

CAF stage 1 saved the day for summer conferences 61 duals (10 INFN 16Pitt/CMU) 15 fileservers (4 INFN 1 MIT)

CPU usage ~90% since June

Users happy

Made in Italy

Page 10: 18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

CSN1 18-sep-02Stefano Belforte – INFN Trieste

CDF computing 10

CAF today

Wait times get longer

Users want more

Ready for Stage 2

New hardware ready this fall for ski conferences

Made in Italy

Page 11: 18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

CSN1 18-sep-02Stefano Belforte – INFN Trieste

CDF computing 11

CAF Stage 2 (Stage1 x4)

FNAL/CD centralized bid ~ two times/year CDF procurement for Stage 2 this summer

JustInTime to catch INFN funds released in June (x3) Bids are in

Hope for HW up and running in November CSN1 users = 6 months

Many others will join CAF in Stage2 KEK-Japan: 2 fileservers 38 duals Korea : 0.5 fileserver (+ 2 later) Spain : 1 fileserver Canada : 1 fileserver US (8 universities) : 10 fileservers 4 duals More to come

Page 12: 18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

CSN1 18-sep-02Stefano Belforte – INFN Trieste

CDF computing 12

Why is CAF a success

CAF is more than a pile of PC’s Integrated hw/sw design for farm and access tools Designed for optimized access to data

Lots of disk resident data Large transparent disk cache in front of tape robot Tuning of disk access (data striping, minimal NFS,…)

Designed for users convenience Simple GUI’s, Kerberos based authentication, large local user

areas Professional system management and close loop with

vendors Several hw/firmware/sw problems solved so far

RAID controller, defective RAM, file system or kernel bugs …

Plus the normal failure rate of disks, power supplies etc. 2 FTE on CAF infrastructure

Page 13: 18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

CSN1 18-sep-02Stefano Belforte – INFN Trieste

CDF computing 13

Will CAF success last ?

User community: ramping up in these days: 20 200 From the pioneers to the masses Exposure to all kinds of access patterns

Hardware expansion: up to a factor 10 over the next 2 years

Only experience will tell CAF is build with the cheapest hardware Will have to learn to live with 10~20% of hardware

broken at any given time

Page 14: 18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

CSN1 18-sep-02Stefano Belforte – INFN Trieste

CDF computing 14

Beyond CAF

FERMILAB wants to join the GRID FNAL will be Tier1 for CMS-US

Foreign CDF institutions want to integrate their local farms

Spain, Korea, UK, Germany, Canada, Italy In many case to exploit LHC/GRID hardware

So far no big offer of help for common work, unlikely D0 Exception: Canada: 224 nodes “now” for CDF MC

No software tool to do this integration “transparently” yet

Not clear how much this will help CDF analysis

Page 15: 18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

CSN1 18-sep-02Stefano Belforte – INFN Trieste

CDF computing 15

Decentralizing analysis computing

FNAL-CD working hard to promote SAM for remote work SAM: Metadata catalog + distributed disk caches

Run analysis locally Copy data as needed (only 1st time) Works in Trieste (as other places)

SAM to become “the” CDF data access tool SAM integration with (EuroData)GRID being tried

CDF working on “packaging CAF for export” Decentralized CAFs Each handling data independently Cloning FNAL CAF is the easiest way (Korea choice)

Remote farms = extra costs for FNAL

Page 16: 18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

CSN1 18-sep-02Stefano Belforte – INFN Trieste

CDF computing 16

CDF computing outside US (approx)

2002 2003NotesTB dual

sTB duals

Spain - - 10 50Shared with CMS, plan for EDG toolsNo plan for shared access

Germany 3 +1

20 +10

20 +20

50 +40

Tier1 (shared with LHC) + Tier3 (CDF)No plan for shared accessTesting SAM on Tier3

UK(4 sites)

24 16 80 64Maybe 5x the CPU if 8-way dualsNo EDG, Kerberos for user access, SAM for data. maybe open

Korea 1 20 7 40Want to clone CAF by end of 2002Kerberos for user access, open to allStart w/o SAM

Canada 1 8 28 224No GRID toolsRun official CDF MC and copy to FNAL

Italy 1 5 7 29No plan for shared accessExploring SAM on single node

Page 17: 18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

CSN1 18-sep-02Stefano Belforte – INFN Trieste

CDF computing 17

MOU/MOF

Moving to a way to recognize foreign contribution IFC and Scrutiny Group to work on this

INFN present in both

Issues being talked about: Computing will have to enter MOF somehow Allow and encourage contribution Take into account history and present situation

No indication of a “crisis” that has to be dumped on the collaborators for help

Page 18: 18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

CSN1 18-sep-02Stefano Belforte – INFN Trieste

CDF computing 18

2003 requests detailed: 5 items

Stick to June plan :1) Invest majority of resources on FNAL CAF2) Modest growth in Italy for interactive work

Summer experience: needs do not scale down with luminosity No reason to expect large variation from June

numbers Requested resources well within June forecast Nevertheless, prudent, incremental approach

(referees)

New in 20033) Start MC4) Interactive work at FNAL5) Start transition to CNAF

Page 19: 18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

CSN1 18-sep-02Stefano Belforte – INFN Trieste

CDF computing 19

Tevatron keeps us busy

By next summer tune analysis to same level as Run1: Alignments, precision tracking, secondary vertexes, B-

tag Jet energy corrections, underlying event

Do interesting physics in the meanwhile

Example: All italian Dhh By end of year (100pb-1)

10^6 events in the mass peak, 10^7 in the histogram

4TB of data by spring, 16TB by end 2003 This channel alone saturates disk financed so far

(15TB) Learning field for Bhh

Page 20: 18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

CSN1 18-sep-02Stefano Belforte – INFN Trieste

CDF computing 20

Monte Carlo

CDF has talked about central production But no overall estimate of needs yet

Next year safe bet: everybody on his/her own Just the same as Run 1

Italian groups starting on this now Plan for capacity of 10^7 events/months

Modest hw need: 10 dual cpu nodes Adequate for most analysis (10x a given dataset) Future growth should be small

Further requests only on basis of clear “cases”

Page 21: 18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

CSN1 18-sep-02Stefano Belforte – INFN Trieste

CDF computing 21

Interactive work at FNAL

When at FNAL can not run root on Italy’s machines Need “some” “better then desktop” PC (Cfr. June’s

talk)

Referees asked for central management: Defined total cap at 10 “power PCs” Asked for 5 in 2003

4 full time physicists doing analysis at FNAL P.Azzi, R.Carosi, S.Giagu, M.Rescigno

Explore central alternative in 2003 Interactive login pool in CAF Some ideas so far, will try and see

Page 22: 18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

CSN1 18-sep-02Stefano Belforte – INFN Trieste

CDF computing 22

Moving CAF to CNAF

Spend money in Italy Join INFN effort in building

world class computing center

Easier access to 3rd data and/or interactive resources GARR vs WAN

Tap on GRID/LHC hardware pool for peak needs

Import here tools and experience learnt on CAF

Not an “experiment need” FNAL CAF may be enough

Costs more Poor access to main data

repository (FNAL tapes) Need to replicate easy of use

and operation of FNAL CAF Different hardware = different

problems Have to divert time and effort

from data analysis

PRO’s CON’s

Page 23: 18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

CSN1 18-sep-02Stefano Belforte – INFN Trieste

CDF computing 23

Moving CAF to CNAF: the proposal

Start with limited, but significant hardware 2003 at CNAF ½ of private share of CAF in 2002 7TB of disk and 29 dual processor estimated on the

basis of expected data needs for top6j and Zbbar

Explore effectiveness of work environment Don’t give up on CAF features Look for added value Will need help (manpower)

Will try and see, decision to leave FNAL will have to be based on proof of existence of valid alternative here

Page 24: 18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

CSN1 18-sep-02Stefano Belforte – INFN Trieste

CDF computing 24

Summary of requests

2001 0.6 02002 20 80 3362003 40 140 266

Luminosityyear disk

(TB)CPU

(duals)cost/y

(Keuro)

ANALYSIS FARM

2.01.0

commissioning

Planned (Church)

0.3

Target (adjusted)

1.2

2001 0.6 02002 15 60 2632003 37 130 3052.0

0.8commissioning

0.3

1.2

June 24 “plan”

After CSN1’sJune decision

Analysis at FNAL

FNAL CAF: 22TB disk + 63 dual nodes = 132+173=306KEu

Monte Carlo: 10 dual nodes = 28KEu (FNAL price)

CNAF: 7TB disk + 29 dual nodes = 70+96=166KEu

Interactive FNAL: 5 “power PC” = 22.5KEu

Interactive Italy: disk and cpu Pd/Pi/Rm/Ts/… = 50KEu total

Page 25: 18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

18-sep-02 25

SPARE

Spare slides from here on

Page 26: 18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

CSN1 18-sep-02Stefano Belforte – INFN Trieste

CDF computing 26

Working on CDF CAF is easy

1. Pick a dataset by name2. Decide how many parallel

execution threads (sections)3. Prepare 1 executable, 1 tcl

and 1 script file

Submit from anywhere via simple GUI

Query CAF status at any time via web monitor

Retrieve log/data anywhere via simple GUI

2 step submission of 100 sections

1) In the script:   setenv TOT_SECT 100   @ section = $1 - 1   setenv CAF_SECTION $section

2) In the tcl file (only one tcl file)   module talk DHInput      include dataset bhmu03      setInput cache=DCACHE      splitInput slots=$env(TOT_SECT) this=$env(CAF_SECTION)

Page 27: 18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

CSN1 18-sep-02Stefano Belforte – INFN Trieste

CDF computing 27

Working on CAF is effective

Quickly go through any CDF dataset (disk or tape) Create personalized output and store it locally Run on that output (data file or root ntuple)

Locally on CAF nodes Remotely via rootd (e.g. Root from desktop)

Page 28: 18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

CSN1 18-sep-02Stefano Belforte – INFN Trieste

CDF computing 28

CAF is convenient: can work from anywhere

All needed code and tools for CDF offline via anonymous ftp or simply from /afs/infn.it Everything runs on plain RedHat 6.x, 7.x

even on GRID testbed no need for customized system install

Need Kerberos ticket to talk to FNAL, but.. One click install of kerberos client from the web

No need for system manager Just type “kinit” and your Fermilab password

Many people work from their laptop !

Page 29: 18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

CSN1 18-sep-02Stefano Belforte – INFN Trieste

CDF computing 29

CAF future

Page 30: 18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste.

CSN1 18-sep-02Stefano Belforte – INFN Trieste

CDF computing 30

Little data ? No way !

DAQ runs at full speed Typical Luminosity better then Run1 2 track trigger from SVT is full of charm We are refocusing attention on samples that in the

default scenario would have been limited in statistics Low Pt jets (20GeV) and leptons (8GeV) Charm Interesting for physics

improve on PDG in charm sector Fundamental control samples

Particle ID on Dhh as learning field for Bhh Heavy flavor content in jets B-jet tagging Jets resolution …