1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M....
-
Upload
griffin-arnold -
Category
Documents
-
view
216 -
download
0
Transcript of 1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M....
M. Paganoni, HCP2007 1
Computing tools and analysis architectures:
the CMS computing strategy
M. PaganoniHCP2007
La Biodola, 23/5/2007
M. Paganoni, HCP2007 2
OutlineCMS Computing and Analysis ModelCMS workflow components 25 % capacity test (CSA06 challenge)CMSSW validationLoadTest07, Site Availability Monitor and Grid gLite 3.1
The goals for 2007• Physics validation with high statistics• Full detector readout during commissioning
• 50 % capacity test (CSA07 challenge)
Analysis workflow
M. Paganoni, HCP2007 3
CMS schedule
March
April
May
June
July
Aug.
Sep.
Oct.
Nov.
1) Detector Installation, Commissioning & Operation
First Global Readout Test
Barrel ECAL Inserted
Tracker InsertedTrigger/DAQ Ready for System
Commissioning
CMS Ready to Close
2) Preparation of Software, Computing and Physics Analysis
HLT exercise complete
Pre-CSA07 Computing Software Analysis Challenge
2007 Physics Analyses completed
CSA07
All CMS Systems Ready for Global Data Taking
M. Paganoni, HCP2007 4
The present status of CMS computing
From development• service/data challenges (both WLCG wide and experiment specific) of increasing scale and complexity
to operations • data distribution• MC production• physics analysis
Primary needs:• Smoothly running Tier1’s and Tier2’s, concurrent with other experiments
• Streamlined and automatic operations to ease the operation load
• Full monitoring to have early detection of Grid and site problems and reach stability
• Sustainable operations in terms of DM, WM, user support, site configuration and availability, continouous significant load
M. Paganoni, HCP2007 5
The CMS computing Model
Tier-0: Accepts data from
DAQ Prompt
reconstruction Data archive and
distribution to T1s
Tier-1’s: Data and MC archiving Re-processing Skimming and other
data-intensive analysis tasks
Data serving to T2s
Tier-2’s: User data
Analysis MC production Calibration/
alignment and detector studies
~30
M. Paganoni, HCP2007 6
CMS data formats and data flow
RAW
RECO
AOD
TAG
CMS: ~1.5 MB/ev2 copies: 1 at T0 and 1 over T1s
4.5 PB/yr
CMS: ~250 kB/ev1 copy spread over T1s
2.1 PB/yr
CMS: ~50 kB/ev1 copy to each T1, data serving to T2s
2.6 PB/yr
CMS: ~1-10 kB/ev
MC in 1:1 ratio with data
M. Paganoni, HCP2007 7
The MC productionProduction of 200M events (50M/month), for HLT and
PhysicsNotes, started at T2s with new MC Production System• less man-power consuming, better handling of Grid-sites
unreliability, better use of resources, automatic retrials, better error report/handling
• More flexible and automated architecture• ProdManager (PM) (+ the policy piece)
– manage the assignment of requests to 1+ ProdAgents and tracks the global completion of the task
• ProdAgent (PA)– Job creation, submission and tracking, management of merges, failures, resubmissionsTier-0/1
Policy/schedulingcontroller
PM
Official MC Prod
Develop. MC Prod
PA
Tier-1/2
PA
PA
PA
PAPM
M. Paganoni, HCP2007 8
CMS Remote Analysis Builder CRAB is a user oriented tool for Grid submission and handling of physics analysis jobs• data discovery (DBS/DLS)• interactions with the Grid
(also error handling, resubmission)• output retrieval
routinely used since 2004on both EGEE and OSG(MTCC, PTDR, CSA06, tracker commissioning…)
New client-server architecture• improve scalability• increase automatization
M. Paganoni, HCP2007 9
The data placement (PhEDEx) Data placement system for CMS (in production since
> 3 years)• large scale reliable dataset/fileblock replication• multi-hop routing following a transfer topology (T0
T1’s T2’s),data pre-stage from tape, data archiving to tape, monitoring, bookkeeping, priorities and policy, fail-over tactics
PhEDEx made of a sets of independent agents, integrated with gLite File Transfer Service (FTS)
It works with bothEGEE and OSG
Automatic subscriptionto DBS/DLS
M. Paganoni, HCP2007 10
Data processing workflow
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
M. Paganoni, HCP2007 11
Computing Simulation Analysis challenge 2006
The first test of the complete CMS workflow and dataflow
60M events exercise to ramp up at 25% of the 2008 capacity
T0: prompt reconstruction• 207M events reconstructed (RECO, AOD) applying
alignment/calibration from offline DB• 0.5 PB transferred to 7 T1s
T1s: skimming (to get manageable datasets), re-reconstruction • automatic data serving to T2s via injection to PhEDEx
and registration in DBS/DLS T2s: access to the skimmed data,
alignment/calibration jobs, Physics analysis jobs • submission of analysis jobs to the Grid with CRAB by
single users and groups• insertion of new constants in offline DB
M. Paganoni, HCP2007 12
CSA06: T0 and T0 -> T1
Prompt Reconstruction @ T0
peak rate: >300 Hz for >10 hoursuptime: 100% over 4 weeksbest eff.: 96% (1400 CPUs) for ~12 h
T0 -> T1 transfer
average rate: 250 MB/s peak rate: 650 MB/s
M. Paganoni, HCP2007 13
CSA06: job submission>50K jobs/day in final week
30K/day robot jobs production jobs managed
by Production Agent analysis jobs submitted
via CRAB to the Grid
90% job efficiency
a typical CSA06 day
CRAB submissions
M. Paganoni, HCP2007 14
CSA06: calibration
minimum bias -symmetrysingle electrons W➔eν
Z mass reconstruction
ECAL calibration with -symmetry of energy deposits in minimum bias
few hours of data
calibration workflow
M. Paganoni, HCP2007 15
CSA06: alignment
TIB modules - positions
Closing the loop:
analysis of re-reconstructed Z➔μμ data at T1/T2 site
Determine new alignment:
run HIP algo on multiple CPU’s over dedicated alignement skim from T0
1M events ~ 4 h on 20 CPU
write new alignment into offline DB at T0
distribute offline DB to T1/T2’s for re-reconstruction
Reconstructed Z mass
M. Paganoni, HCP2007 16
CMSSW validation: tracking Reproduce with CMSSW framework (1.2M lines of simulation, reconstruction and analysis software) the detector performance reported in PTDR vol. 1
Muons (CMSSW)
CMSSW Pixel Seeding
M. Paganoni, HCP2007 17
CMSSW validation: electronselectron classification
momentum at vertex
electron/supercluster matching
Already improvingPTDR results in manyareas (forward tracking,electron reconstruction …)
M. Paganoni, HCP2007 18
Site Availability Monitor Measure the site availability by testing:
Analysis submissionProductionDbase cachingData transfer
With Site Availability Monitor (SAM) infrastructure, developed in collaboration with LCG and CERN/IT
The goal is 90 % for T1s and 80 % for T2s Run tests at each EGEE sites every 2 hours now
5 CMS specific tests, more under development Feedback to site administrators and targeting individual components
M. Paganoni, HCP2007 19
WMS acceptance tests on gLite 3.1
115000 jobs submitted in7 days on a single WMS instance• ~16000 jobs/day well
exceeding acceptance criteria
~0.3% of jobs with problems, well below the required threshold• Recoverable using a
proper command by the user
The WMS dispatched jobs to computing elements with no noticeable delay
M. Paganoni, HCP2007 20
An infrastructure by CMS to help Tiers to exercise transfers• Based on a new traffic load generator• Coordination within the CMS Facilities/Infrastructure project
Exercises• T0T1(tape), T1T1, T1T2 (‘regional’), T1T2 (‘non-regional’)
CMS LoadTest 2007
Important achievements• routinely transferring• all Tiers report it’s useful• higher participation of Tiers • less efforts, improved stability• Automatic, streamlined operations
T0-T1 only
CMS LoadTest Cycles~2.5 PB in 1.5 months
CMS CSA06
M. Paganoni, HCP2007 21
Goals of Computing in 2007 Support of global data taking during detector
commissioning• commissioning of the end-to-end chain: P5 --> T0 --> T1s
(tape)• data transfers and access through the complete DM system• 3-4 days every month starting in May
Demonstrate Physics Analysis performance using final software with high statistics.• Major MC production of up to 200M events started in March• Analysis starts in June, finishes by September
Ramping up of the distributed computing at scale (CSA07)• 50 % challenge of 2008 system scale• Adding new functionalities
• HLT farm (DAQ storage manager -> T0)• T1 - T1 and non regional T1 - T2
• Increase the user load for physics analysis
M. Paganoni, HCP2007 22
CSA07 workflow
M. Paganoni, HCP2007 23
CSA07 success metrics
M. Paganoni, HCP2007 24
CSA07 and Physics Analysis We have roughly 10-15 T2s that have sufficient storage and CPU resources to support multiple datasets• Skims in CSA06 were about ~500 GB, the largest of the raw samples was ~8 TB
Improvements in site availability with SAM Improve non-regional Tier-1 - Tier-2 transfers
Publish data hosting proposals for Tier-1 and Tier-2 sites
User Analysis • Distributed analysis through CRAB to Tier-2 centers
• Dynamic use of the Tier-2 storage
Calibration workflow activities
M. Paganoni, HCP2007 25
Ingredients for analysis workflows
Event Filters • Pre-select Analysis output
Event Producer• Can create new contents to be included for Analysis output
EDM output configurability• Can keep or drop any collections
Flexibility in the event contentFlexibility with different steps of data reduction
InputInput OutputOutputAnalysis jobAnalysis job
Can be mixedin any combination
M. Paganoni, HCP2007 26
Analysis workflow at Tier0/CAF
HLT
Out
put
HLT
Out
put
RECORECO AODAODRAWRAW
(optional)
RECORECO AODAODRAWRAW
(optional)
one in-time processed stream, or HLT primary streams
Early discovery express stream
Early discovery express stream
Physics Data Quality Monitoring
Physics Data Quality Monitoring
Standard Model‘Candles’
Standard Model‘Candles’
Object IDefficiency
Object IDefficiency
Calibration withControl samples
Calibration withControl samples
Dedicated stream(s)for fast calibration
Initial fast calibration
Initial fast calibration
Actual output of HLT farm still to be detailed…
M. Paganoni, HCP2007 27
Conclusions Commissioning, integration remain major tasks in 2007• To balance the needs for physics, computing, detector will be a logistics challenge
Transition to Operations has started. Scaling at production level, while keeping high efficiency is the critical point• Continuous effort to be monitored in detail
Keep as flexible as possible in the analysis model
An increasing number of CMS people will be involved in the facilities, commissioning and operations to prepare for CMS physics analysis