Act 00071 le trio gagnang de l'infrastructure géomatique ouverte (igo)
L'infrastructure de calcul pour le LHC Le point de vue d'ATLAS
description
Transcript of L'infrastructure de calcul pour le LHC Le point de vue d'ATLAS
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
L'infrastructure de calcul pour le LHC Le point de vue d'ATLAS
Simone Campana
CERN IT/GS
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
2
RAWRAW(1.6MB/ev)(1.6MB/ev)
ESDESD(1MB/ev)(1MB/ev)
AODAOD(150 KB/ev)(150 KB/ev)
DPDDPD(20KB/ev)(20KB/ev)
Raw DataRaw Data: output of the Event Filter Farm (HLT) in output of the Event Filter Farm (HLT) in byte-stream formatbyte-stream format
Event Summary DataEvent Summary Data: output of the event output of the event reconstruction (tracks, hits, calorimeter cell and reconstruction (tracks, hits, calorimeter cell and clustersclusters, combined reconstruction objects etc...)., combined reconstruction objects etc...).For calibrazion, allineamento, refitting …For calibrazion, allineamento, refitting …
Analysis Object DataAnalysis Object Data: reduced representation of the events, suitable for analysis. Reconstructed “physics objects” (elettrons, muons, jets (elettrons, muons, jets …)…)
Derived Physics DataDerived Physics Data: reduced information for ROOT reduced information for ROOT specific analysis. specific analysis.
ATLAS Event Data Model
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
LYON
BNL
LPCTokyo
NW
GRIF
T3
NET2
All Tier-1s have predefined (software) channel with CERN and with each other Tier-1.Tier-2s are associated with one Tier-1 and form the cloudTier-2s have predefined channel with the parent Tier-1 only.
FR CloudFR Cloud
BNL CloudBNL Cloud
Pékin
“Tier Cloud Model”Unit : 1 T1 + n T2/T3
ATLAS tiers Organization
NG
LYON
BNL
FZKTRIUMF
ASGC
PIC
SARA
RAL
CNAF
CERN
Clermont
LAPP
CCPM
Roumanie
SW
GL
SLAC
TWT2
Melbourne
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
NG
LYON
BNL
FZKTRIUMF
ASGC
PIC
SARA
RAL
CNAF• Raw data Mass Storage at CERN
• Raw data Tier 1 centers
• complete dataset distributed
among T1s
• ESD Tier 1 centers
• 2 copies of ESD distributed
worldwide
• AOD Each Tier 1 center
• 1 full set per T1
Original ProcessingT0 T1
Original ProcessingT0 T1
Detector Data Distribution
CERN
T2: 100 % AOD, small fraction ESD,RAW
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
ReprocessingT1 T1
ReprocessingT1 T1
• Each T1 reconstructs its own RAW
• Produces new ESD, AOD
• Ships :
Reprocessed Data Distribution
• ESD to associated T1 • AOD to all other T1s
NG
LYON
BNL
FZKTRIUMF
ASGC
PIC
SARA
RAL
CNAF
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Tokyo
GRIF
Pékin
Clermont
Roumanie
• Monte Carlo production (ESD,AOD)Monte Carlo production (ESD,AOD)
• Ships RAW,ESD,AOD to
associated T1
• Physics AnalysisPhysics Analysis
• Gets (ESD) AOD from
associated T1
ATLAS Tier-2 activities
NG
LYON
BNL
FZKTRIUMF
ASGC
PIC
SARA
RAL
CNAF
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
ATLAS and Grid Middleware
• ATLAS resources are distributed across different Grid Infrastructures– EGEE, OSG, Nordugrid
• Most of the Grid Services are shared across different Grids– SRM interface for Storage Elements
• With different backend storage implementation– LCG File Catalog
• At all ATLAS T1s, contains infos for file replicas in the cloud– File Transfer Service at every T1
• Baseline transfer service to import data at any site of the cloud.– VOMS
• To administrate VO membership– CondorG
• For job dispatching
• The ATLAS computing framework guarantees Grid interoperability
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
9
The DDM in a nutshell
The Distributed Data Management …
• … enforces the concept of dataset– Logical collection of files– Dataset contents and location stored in central
catalogs – File information stored on local File Catalogs (LFC) at
T1s
• … based on a subscription model– Datasets are subscribed to sites – A series of services enforce the subscription
• Lookup data location in LFC• Trigger data movement via FTS• Validate data transfer
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Testing Data Distribution: CCRC08
• Week 1: Data Distribution Functional Test– to make sure all files get where we want them to go– between Tier-0 and Tier-1’s, for disk and tape
• Week 2: Tier-1 to Tier-1 tests– similar rates as between Tier-0 and Tier-1– more difficult to control and monitor centrally
• Week 3: Throughput test– try to maximize throughput but still following the model– Tier-0 to Tier-1 and Tier-1 to Tier-2
• Week 4: Final, all tests together– also artificial extra load from simulation production
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Week-4: Full Exercise
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Transfer ramp-up
Test of backlog recoveryFirst data generated over 12 hours and subscribed in bulk
12h backlog recovered 12h backlog recovered in 90 minutes! in 90 minutes!
MB
/s T0->T1s throughput
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Week-4: T0->T1s data distribution
Suspect DatasetsSuspect DatasetsDatasets is complete complete
(OK) but double registration
Suspect DatasetsSuspect DatasetsDatasets is complete complete
(OK) but double registration
Incomplete DatasetsIncomplete DatasetsEffect of the power-
cut at CERN on Friday morning
Incomplete DatasetsIncomplete DatasetsEffect of the power-
cut at CERN on Friday morning
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Week-4: T1-T1 transfer matrix
YELLOW boxesEffect of the power-cut
YELLOW boxesEffect of the power-cut
DARK GREEN boxesDouble Registration problem
DARK GREEN boxesDouble Registration problem
Compared with week-2 (3 problematic sites)Very good improvement
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Week-4: T1->T2s transfers
SIGNET: ATLAS DDM configuration issue (LFC vs RLS)SIGNET: ATLAS DDM configuration issue (LFC vs RLS)
CSTCDIE: joined very late. Prototype.CSTCDIE: joined very late. Prototype.
Many T2s oversubscribed(should get 1/3 of AOD)
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Throughputs
T1->T2 transfers T1->T2 transfers
show a time structure
Datasets subscribed:-upon completion at T1 -every 4 hours
T0->T1 transfersT0->T1 transfers
Problem at load generator on 27th
Power-cut on 30th
MB/s
MB/sExpected Rate
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Week-4: Concurrent Production
# running jobs
# jobs/day
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Week-4: metrics
• We said: • T0->T1: sites should demonstrate to be capable to import 90% of
the subscribed datasets (complete datasets) within 6 hours from the end of the exercise
• T1->T2: a complete copy of the AODs at T1 should be replicated at among the T2s, withing 6 hours from the end of the exercise
• T1-T1 functional challenge, sites should demonstrate to be capable to import 90% of the subscribed datasets (complete datasets) for within 6 hours from the end of the exercise
• T1-T1 throughput challenge, sites should demonstrate to be capable to sustain the rate during nominal rate reprocessing i.e. F*200Hz, where F is the MoU share of the T1.
• Every site (cloud) meet the metric– Despite power-cut– Despite “double registration problem”– Despite competition of production activities
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Disk Space (month)
ATLAS “moved” 1.4PB of data in May 2008
1PB deleted in EGEE+NDGF in << 1day1PB deleted in EGEE+NDGF in << 1dayPossibly another 250TB deleted in OSG250TB deleted in OSG
Deletion agent at work. Uses SRM+LFC bulk methods.Deletion rate is more than good (but those were big files)
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Lessons learned from CCRC08
• The Data Distribution framework seems in good shape and ready for data taking
• Few things need attention:– FTS servers at T1s need global tuning of
parameters– Some bugs found in ATLAS DDM services
• Now fixed
– In at least 3 cases, a network problem or inefficiency has been discovered
• Monitoring …
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Few words about the FDR
• FDR = Full Dress Rehearsal– Test the full chain, from the HLT to the analysis at
T2s. – Same set of Monte Carlo data (approx 8TB) in byte-
stream format, injected every day in the T0 machinery
– Data (RAW and reprocessed) distributed and handled as real data
• FDR2 data exports (June 2008)– Much less challenging than CCRC08 in terms of
distributed computing• 6 hours of data per day to be distributed in 24h• Three days of RAW data have been distributed in less than 4
hours • All datasets (RAW and derived) complete at every T1 and T2
(one exception for T2)
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Data Export after CCRC08 and FRD
• Data Distribution functional test:– To test data transfers:
• Tier-0 to all Tier-1’s tape and disk (RAW, ESD, AOD)• all Tier-1’s to all other Tier-1’s (AOD, DPD)• each Tier-1 to all Tier-2’s in the same cloud (AOD,DPD)• muon calibration streams Tier-0 to some special Tier-2’s
– Completely automated:• at 5% of nominal rate, fake generated data from T0• starts every Monday at midday stops next Sunday at midnight• central data deletion of test data everywhere• reports weekly statistics
• Data taking – Mostly Cosmics … – RAW data exported to T1s (for custodial)– ESD exported to 2 T1s following Computing Model– Some data kept permanently on disk at CERN
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Activity after CCRC08
Most inefficiencies due Most inefficiencies due to Scheduled to Scheduled DowntimesDowntimes
Most inefficiencies due Most inefficiencies due to Scheduled to Scheduled DowntimesDowntimes
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Detector Data Replication
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
• Bursty activity, mainly depending on software readiness
• Main samples: fdr2, 10TeV, 900 GeV and validations
• Runs in Tier-2’s but also in Tier-1’s– no competition yet with analysis (T2) and re-processing
(T1)
• Average of 10k simultaneous jobs, peaks of 25k jobs
• All production now submitted through Panda system
Simulation Production
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
26
site A
server
site B
pilotpilot
Worker Nodes
condor-gSchedulerScheduler
glite
https
pull
run
run
job
pilotpilot
ProdDBjob
BambooBamboo
Monte Carlo Production
job
pull
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Panda in a nutshell
• Job definitions are hosted in the Production Database
• The agent “Bamboo” polls jobs from ProdDB and feeds the Panda server
• The Panda Server manages all job information centrally– Priority Control – Resource Allocation– Job Scheduling
• A job scheduler dispatches pilot jobs to sites – Using various mechanisms: local batch system commands,
gLite WMS, CondorG• Pilots jobs are prescheduled to Grid sites
– Pilots pull “real jobs” from Panda server as soon as suitable CPUs become available.
• Output data are aggregated at T1s using DDM
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Simulation Production
Running Jobs: Monthly StatisticsRunning Jobs: Monthly Statistics
Number of jobs per dayNumber of jobs per day
ErrorsErrors
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Simulation Production Functional Test
• submits one real MC task as a test to each cloud every Monday– 5000 events, 25 events/job 200 jobs of ~6 hours each– jobs should run in each of the Tier-2’s (and Tier-1) in the cloud– low priority to not interfere with real production
• task aborted on Thursday– kills remaining jobs and removes all output– statistics generated: efficiency, brokering, problem sites
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Reprocessing
• Reprocessing “just” is a special case of production system job– Handled by Panda– Runs at T1s only (first order approximation)
• However…– Needs to prestage files (RAW data) from tape at T1s– Needs to access the detector condition data on Oracle
racks at T1s
• Current issues:– pre-staging still not quite working yet
• Software exists, being tested• Every T1 has a different storage setup, performances etc …
– conditions database access not quite working yet• each job opens several connections to the database at the
beginning of the job• Too many concurrent and simultaneous jobs overload the
database. Being investigated.
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Analysis
• The ATLAS analysis model is “jobs go to data”– Analysis mostly run on DPD and AODs– Initially, large access ESD and possibly RAW
• Currently, 2 frameworks for analysis: Ganga and pAthena– Both fully integrated with ATLAS DDM for data
co-location– Will possibly be merged in a unique tool– Now a unique support team
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Ganga
• Client based analysis framework– Central Core component– Multiple plug-ins to benefit of various job
submission system• gLite WMS• CondorG• Local Batch System (LFS,PBS)
Multi VO projectMulti VO projectAnalysis Functional testsAnalysis Functional tests
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
pAthena
• Server based analysis framework– Full usage of the Panda infrastructure– Very advanced monitoring– Offers job prioritization and user shares
Monitoring per userMonitoring per user
Worldwide pAthenaWorldwide pAthenaActivityActivity
(last month) (last month)
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
User Storage Space
• ATLAS uses the srmv2 interface everywhere now– Offers the possibility to partition the space (space
tokens) depending on the use case
• For central activities – DATADISK and DATATAPE for real data – MCDISK, MCTAPE and PRODDISK for Simulation
Production
• For Group analysis (GROUPDISK) – Ideally, quota management per group– In reality, only global quota, little possibility to configure
group based ACLs. Need policing.
• User analysis– USERDISK
• scratch space for job output, cannot guarantee lifetime– LOCALGROUPDISK
• not ATLAS pledged resources, “home” space for users• Same limitation as for GROUPDISK
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Experience from one week of beam data
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Day 1: we were ready
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Data arrived …
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
We started exporting … and we saw issues.
Data Exports Throughput in MB/s
Effect of concurrent data access
from centralized transfers and user activity
(overload of disk server)
Number of errors
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Conclusions
• Computing for LHC experiment is extremely challenging– Very demanding use case– The system is is complex, relies on many external components
• Centralized Data Distribution works reliably– Tested in many challenges and in real life
• Monte Carlo Production framework also reliable– But this is not true for the data reprocessing– Database access and data prestaging need attention
• Data Analysis user activities represent the real challenge now– Do not follow a particular pattern (non-organized by definition)– Not always possible to protect production from users or users from
other users– Never “tested” at the real scale
• The EGEE Grid and offers the necessary baseline services and the infrastructure for ATLAS data taking– Improvements in the area of Storage are foreseen in the near future,
based on experiments inputs and lessons.