Distributed Computing Operations Stefan Roiser LHCb Computing Operations Workshop 27 Jan ‘15.
LHCb Computing - DESY · DESY Computing seminar – December’07 2 The LHCb Experiment • LHCb is...
Transcript of LHCb Computing - DESY · DESY Computing seminar – December’07 2 The LHCb Experiment • LHCb is...
DESY Computing seminar – December’07 1
LHCb LHCb ComputingComputingNick Brook
LHCb detector
Introduction
Computing Model
2008 needs
Physics software
Harnessing the Grid
DIRAC
GANGA
Experience, Future Plans & Readiness
DESY Computing seminar – December’07 2
The LHCb Experiment• LHCb is dedicated to the Search for New Physics in CP
violation and Rare B decays
• LHCb Collaboration: 14 countries, 47 institutions, ~600 physicists
Interaction
point
DESY Computing seminar – December’07 3
LHC
Interaction point 8
DESY Computing seminar – December’07 4
LHCb December 2006
Muon Calorimeters RICH2Trackers
Magnet
RICH1
VELO
DESY Computing seminar – December’07 5
Dataflow
At recons time only enough info is stored to allow physics pre-selection torun at a later stage - reduced DST (rDST) - stored separately from RAW
Reconstruction performed twice a year:
quasi real time
after LHC shutdown
RAW data isreconstructed:
e.g.
Calo. Energy clusters
Particle ID
Tracks …
DESY Computing seminar – December’07 6
Dataflow -Stripping
rDST is analysed in production-mode event streams for furtheranalysis; 20-30 streams
Algorithm developed by physics working groups - use as i/p rDST & RAW
Event to be output will have additional reconstructed info added: (full)DST+ RAW data
Event Tag Collection - created to allow “quick” access to data; contain“metadata”
Stripping performed 4times per year
RAW:35kB/evt
DST:110kB/evt
rDST:20kB/evt
DESY Computing seminar – December’07 7
Dataflow - Analysis
User physics analysis will beprimarily performed on theoutput of the stripping
Output from stripping is self-contained i.e. no need tonavigate between files
Analysis generates quasi-private data e.g. Ntupleand/or personal DSTs
Data publicly accessible -enable remote collaboration
DESY Computing seminar – December’07 8
Use of computing centres Main useranalysis
supported atCERN+6 “Tier-1”
centres
Tier-2 centresessentially
Monte Carloproductionfacilities
Plan to makeuse of LHCb
online farm forre-processing
DESY Computing seminar – December’07 9
Estimated 4 106 secs physics (including machine efficiency)
Assume 8 109 events from event filter farm to CERNcomputer centre
--4550Tier-2’s
86010251770Tier-1’s
631350360CERN
Tape(TB)
Disk(TB)
CPU(2.8GHz P4 years)
2008 Resource Summary
DESY Computing seminar – December’07 10
LHCb software framework
Object diagram of the Gaudi architecture
DESY Computing seminar – December’07 11
LHCb software framework• Gaudi is architecture-centric, customisable framework
• Adopted by ATLAS; used by GLAST & HARP
• Same framework used both online & offline
• Algorithmic part of data processing as a set of OO objects• decoupling between the objects describing the data and the algorithms
allows programmers to concentrate separately on both.
• allows a longer stability for the data objects (the LHCb event model) asalgorithms evolve much more rapidly
• An important design choice has been to distinguish between atransient and a persistent representation of the data objects
• changed from persistency solutions without the algorithms beingaffected.
• Event Model classes only contain enough basic internal functionalityfor giving algorithms access to their content and derived information
• Algorithms and tools perform the actual data transformations
DESY Computing seminar – December’07 12
LHCb software
Simul.
Gauss Recons.
& HLT
Brunel
Analysis
DaVinci
MCHits
MiniDST
DigitsDST
MCParts
GenParts
Event model / Physics event model
AOD
RawDataDetector
Description
Conditions
Database
Gaudi
Digit.
Boole
LHCb data processing applications and data flow
DESY Computing seminar – December’07 13
LHCb software
• Each application is a producer and/or consumer of datafor the other applications
• The applications are all based on the Gaudi framework• communicate via the LHCb Event model and make use of the
LHCb unique Detector Description
• ensures consistency between the applications and allowsalgorithms to migrate from one application to another asnecessary
• subdivision between the different applications has beendriven by their different scopes as well as CPUconsumption and repetitiveness of the tasks performed
DESY Computing seminar – December’07 14
Data sourceData source
VersionVersion
TimeTime
t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t10t10 t11t11
VELO alignmentVELO alignmentHCAL calibration HCAL calibration
RICH pressure RICH pressure ECAL temperature ECAL temperature
Production version: Production version: VELO: v3 for T<t3, v2 for t3<T<t5, v3 for t5<T<t9, v1 for T>t9VELO: v3 for T<t3, v2 for t3<T<t5, v3 for t5<T<t9, v1 for T>t9
HCAL: v1 for T<t2, v2 for t2<T<t8, v1 for T>t8HCAL: v1 for T<t2, v2 for t2<T<t8, v1 for T>t8RICH: v1 everywhereRICH: v1 everywhereECAL: v1 everywhereECAL: v1 everywhere
Time = TTime = T
Conditions DB
Tools and framework to deal with conditions DB and non-perfect detectorgeometry is in place
LCG COOL project is providing the underlying infrastructure for conditionsDB
DESY Computing seminar – December’07 15
DESY Computing seminar – December’07 16
DESY Computing seminar – December’07 17
DESY Computing seminar – December’07 18
DESY Computing seminar – December’07 19
DESY Computing seminar – December’07 20
DESY Computing seminar – December’07 21
DIRAC - A community Grid solution
• The DIRAC Workload & Data Management System (WMS &DMS) is made up of Central Services and Distributed Agents
• The main aims of DIRAC are:– To integrate all of the heterogeneous compute resources
available to LHCb
– Minimize the human intervention at sites
– Use worldwide LHC Computing Grid services wherever possible
• DIRAC realizes these goals via:– Pilot Agent paradigm
– Overlay Network paradigm
DESY Computing seminar – December’07 22
DIRAC Overlay Network Paradigm• DIRAC Agents are deployed close to resources• Forms an overlay network of Agents masking the underlying
diversity of the available compute resource• Services interact with Agents
Computing Resources
Grid
Site Clusters
PCs
DESY Computing seminar – December’07 23
DIRAC Overlay Network Paradigm• DIRAC Agents are deployed close to resources• Forms an overlay network of Agents masking the underlying
diversity of the available compute resource• Services interact with Agents
Computing Resources
Grid
Site Clusters
PCs AA
A
A
A
A A
DESY Computing seminar – December’07 24
DIRAC Overlay Network Paradigm• DIRAC Agents are deployed close to resources• Forms an overlay network of Agents masking the underlying
diversity of the available compute resource• Services interact with Agents
Computing Resources
Grid
Site Clusters
PCs Agents
AA
A
A
A
A A
DESY Computing seminar – December’07 25
DIRAC Overlay Network Paradigm• DIRAC Agents are deployed close to resources• Forms an overlay network of Agents masking the underlying
diversity of the available compute resource• Services interact with Agents
Computing Resources
Grid
Site Clusters
PCs Agents
AA
A
A
A
A A
Service 1
Service 2Service 3
AA
A
A
A
A A
DESY Computing seminar – December’07 26
DIRAC Workload Management System
• Heterogeneous groupings of resources such as clusters /Grids become homogeneous via DIRAC
• DIRAC can therefore be viewed as a (very) large batchsystem
– Accounting
– Priority Mechanism
– Fair share
DESY Computing seminar – December’07 27
DIRAC Architecture
• The DIRAC corecomponents are– Clients
– Services
– Agents
– Resources
• The DIRAC WMS isnot LHCb specific– GSI authentication
– Standard JDL
DESY Computing seminar – December’07 28
ProductionManager
DataManager
Productiondefinitions
Data toprocess
Processing Database
ProductionWorkflow
Editor
DataDistribution
System
DIRACFile Catalog
DIRAC WMSJobsProduction Agent
Running a Production
Define a production workflow
Register data to be processed by workflow
DESY Computing seminar – December’07 29
DIRAC Pilot Agent Paradigm
• DIRAC is a PULL scheduling system– Agents first occupy a resource and then
request jobs from a central task queue– This ‘late binding’ allows execution
environment to be checked in advance
• Pilot Agents are sent to the gLiteResource Broker as normal jobs– Facilitate PULL approach on PUSH system
• LCG jobs are Pilot jobs in thecontext of the DIRAC WMS– Actual workload management performed
by DIRAC
DESY Computing seminar – December’07 30
Job Matching in DIRAC
• DIRAC matches Agentrequirements to Jobrequirements in a ‘double’matching mechanism– Job requirements are
compared to therequirements of the Agent
• ClassAd based mechanism
• This allows Agents to beeither:– Fully generic
• No requirements on jobs
– Specialized• Request particular jobs
• Request jobs from oneuser
DESY Computing seminar – December’07 31
Job Prioritization and Policy
• Job prioritization in DIRAC– Can be applied by WMS
Agents working on the centralTask Queue
• The Matcher serviceassigns jobs to therequirements presented byAgents– Highest priority job
dispatched first
• Standard batch systemcomponents can be plugged-in for this– e.g. Maui scheduler
DESY Computing seminar – December’07 32
Generic Pilot Agents• Optimizing workloads at the level of the user is
effective– Agents can request multiple eligible jobs if CPU is available
• User jobs currently run under the user credentials
• Optimization at the level of the VO offers significantperformance gains– Centrally managed productions for LHCb are run under a single
credential
• Generic Pilot Agents would be submitted under onecredential for the VO– After reserving a resource, the highest priority task for the
community can be delivered– Need to allow a “super” DN in a VO to provide services to other
users in the same VO
DESY Computing seminar – December’07 33
gLexec & Generic Agents• Generic pilot agents provides an elegant
solution to job prioritization– Agents are sent on behalf of the VO
• Eligible to run the tasks of any VO-member
• Job priority applied in the central Task Queue
• Agents can work in an optimized ‘FillingMode’– Multiple jobs can run in the same CPU slot
– Significant performance gains for short, highpriority tasks
• Also reduces load on LCG since fewer pilots aresubmitted
• Simplified Grid site requirements– Can request long (e.g. 24hrs) queues
everywhere
– Masks local batch queue waiting times
DESY Computing seminar – December’07 34
Registering data on the Grid• Use DIRAC on Gateway between online farm & CERN site
– RAW Replicated to Castor pools– Registered to Online Integrity DB
• Files remain online till‘safe’
• Checksum calculated– online at write time
– by CASTOR atmigration
• Online Integrity Agentinterrogates Castor forchecksum
– If ‘safe’: removalrequest placed toDIRAC at Gateway
– Request passed toOnline system
DESY Computing seminar – December’07 35
Bulk Data Replication• Transfer requests centrally managed
– Maintained in TransferDB– Failover at Tier-1 VO boxes
• Bulk transfers createdfrom aggregatedrequests
• Transfer Agent pollsfor requests
• Bulk transferssubmitted to FTS
DESY Computing seminar – December’07 36
Data Driven RAW Replication• Dataflow for RAW files
– Master copy at CERN
– Replicated across Tier1 sites based on resourcepledges• 40MB/s aggregated out of CERN
• Data driven replication using AdtDB (Autodata transfer DB) as hook
– File registered once ‘safely’ migrated
• Replication Agent– splits files according to
need
– Places transfer requestsinto TransferDB
– Physical replicationscheduled
DESY Computing seminar – December’07 37
Data Driven RAW Replication• AutoDataTransferDB (AdtDB) contains pseudo file catalogue
– Based on ‘transformations’ contained in the DB• Transformations defined for each DM operation• Defines source and target SEs• File mask (based on LFN namespace)
– Can select files of given properties and locations
• Replication Agent manipulates AdtDB– Checks active files in AdtDB– Applies mask based on file type– Checks the location of file– Files which pass mask and match SourceSE selected for transformation
DESY Computing seminar – December’07 38
Data Driven Reconstruction/Stripping• Data driven replication performed using AdtDB and
Replication Agent
• Similar components exist for job creation and submission– ProcessingDB
– Transformation Agent
• Files registered in ProcessingDB may be selected forprocessing
– Transformations define specific processing activities• Reconstruction/stripping/re-processing
• Based on file properties
• DMS registers to ProcessingDB to initiate RAWprocessing
DESY Computing seminar – December’07 39
Optimized DST Replication• Processing activities produce user analysis files (DST)
– Must be present on all Tier1 sites– DSTs must be replicated to all Tier1s
• Production of DSTs is weighted by the pledged resources– Network traffic in/out varies
• Average ~11MB/s in and out• Shared 10Gb network provisioned through FTS
• Replication initiated byprocessing job– File uploaded to associated
Tier1 SE– Replication request put to
TransferDB
DESY Computing seminar – December’07 40
DIRAC Data Management - Stager• DIRAC pre-stages files before the submission of
Pilot Agents to the Grid– Avoids wasting resources
• Could be extended to cache management via ‘pinning’ of files
DESY Computing seminar – December’07 41
GANGA - user interface to the Grid
• Goal– Simplify the management of analysis for end-user
physicists by developing a tool for accessing Grid serviceswith built-in knowledge of how Gaudi works
• Required user functionality– Job preparation and configuration– Job submission, monitoring and
control– Bookkeeping browsing, etc.
• Done in collaboration with ATLAS• Use Grid middleware services
– interface to the Grid via Dirac and create synergy between thetwo projects
DESY Computing seminar – December’07 42
Ganga jobs• A job in Ganga is constructed from a set of
building blocks, not all required for every job
Merger
Application
Backend
Input Dataset
Output Dataset
Splitter
Data read by application
Data written by application
Rule for dividing into subjobs
Rule for combining outputs
Where to run
What to run
Job
DESY Computing seminar – December’07 43
• Ganga has built-in support forATLAS and LHCb
• Componentarchitectureallowscustomisation forother usergroups
Ganga: Architecture
DESY Computing seminar – December’07 44
LHCb Simulation Production
• Typical MC Productionjob lasts 24hrs
• Recently achieved 10Kconcurrent productionjobs– Throughput only
limited by availablecapacity of LCG
• ~80 distinct sitesaccessed via Grid ordirectly
• Sustainedresource usageover extendedperiods of time– System is stable
for simulation
DESY Computing seminar – December’07 45
Breakdown of Production
DESY Computing seminar – December’07 46
LHCb Reconstruction Results
• April 2007,reconstruction jobssuccessfully running atall LHCb Tier-1 sites– CERN (Switzerland)– IN2P3 (France)– GridKa (Germany)– CNAF (Italy)– NIKHEF (Netherlands)– PIC (Spain)– RAL (U.K.)
• Reconstruction challenge is ongoing
– Current issues include:site serviceinstability; tape failures…
DESY Computing seminar – December’07 47
LHCb Analysis
596 unique GANGAUsers
• 99 users from LHCb
~41k GANGA sessionssince start of year
• ~10k LHCb sessions
Nos
of
users
Month
All LHCb
Month
Nos
of
sess
ions
DESY Computing seminar – December’07 48
LHCb Analysis
393k jobs passed through DIRAC analysis system sincestart of year
• Users happy with efficiency
• Access to large amount of resources
DESY Computing seminar – December’07 49
LHCb Plans for 2008 - CCRC’08
• Raw data distribution from pit T0 centre• Use of (native) rfcp into CASTOR from pit - Tape1Disk0
• Raw data distribution from T0 T1 centres• Use of FTS - Tape1Disk0
• Recons of raw data at CERN & T1 centres• Production of rDST data - T1D0• Use of SRM 2.2
• Stripping of data at CERN & T1 centres• Input data: RAW & rDST - Tape1Disk0• Output data: DST - Tape1Disk1• Use SRM 2.2
• Distribution of DST data to all other centres• Use of FTS - Tape0Disk1 (except CERN Tape1Disk1)
All tasks envisaged during data taking in 2008
DESY Computing seminar – December’07 50
February’s CCRC08 Activities
• Runs for 2 weeks• 42 TB of data from pit to CERN T0
• Corresponding 23k files
• Same 23k RAW files from CERN to bedistributed over T1 centres
• 14% of rDST production at CERN, remaining86% at T1 centres• LHCb responsibility to ensure unique files are recons
across CERN & T1 centres• Additional 23k (rDST) files produced (integrated
across all sites)• Corresponds to an additional 21 TB of data
DESY Computing seminar – December’07 51
February’s Activities
• Stripping on rDST files• 8k DST files produced during the process
(and stored on T1D1) - corresponds to 8TB ofdata
• All files are distributed to other sites• 7x8k files• 7x8 TB
• Total number of jobs accessing the data• Recons: 23k• Stripping: ~8k
DESY Computing seminar – December’07 52
Nos of jobs/site
Recons Strip Total Recons Strip Total
CERN 3300 1100 4400 236 79 315
FZK 1700 600 2300 122 43 165
IN2P3 2700 900 3600 193 65 258
CNAF 1800 600 2400 129 43 172
NIKHEF 5700 2000 7700 408 143 551
PIC 900 300 1200 65 22 87
RAL 6900 2400 9300 493 172 665
Total 23000 8000 31000 1643 572 2215
Total Jobs Simultaneous jobs
Amount of data/siteT0D1 T1D1 T1D0 Disk Tape
CERN 0 8 45 8.0 53.0
FZK 7.4 0.6 5.2 8.0 5.8
IN2P3 7.1 0.9 8.1 8.0 9.1
CNAF 7.4 0.6 5.6 8.0 6.2
NIKHEF 6.0 2.0 17.2 8.0 19.2
PIC 7.7 0.3 2.8 8.0 3.2
RAL 5.6 2.4 21.0 8.0 23.4
Total 41.1 14.9 105.0 56.0 119.9
DESY Computing seminar – December’07 53
SummaryComputing model being finalised
currently under stress test…
… particularly access to data
Software framework robust & mature
final version of reconstruction s/w available
3 different persistency solutions without major upheaval
LHCb DIRAC system
allows efficient use of Grid resources
Monte Carlo production now routine
Reconstruction under stress test
User analysis on the Grid
seeing increase in use
GANGA interface between s/w framework & DIRAC
DESY Computing seminar – December’07 54
SummaryComputing model being finalised
currently under stress test…
… particularly access to data
Software framework robust & mature
final version of reconstruction s/w available
3 different persistency solutions without major upheaval
LHCb DIRAC system
allows efficient use of Grid resources
Monte Carlo production now routine
Reconstruction under stress test
User analysis on the Grid
seeing increase in use
GANGA interface between s/w framework & DIRAC
Confident LHCbcomputing will be
ready for data taking