Project Status Report
description
Transcript of Project Status Report
Project Status Report
Ian Bird
WLCG Overview Board
CERN, 29th October 2013
15 Nov 2013
Outline WLCG Collaboration & MoU update WLCG status report Summary from RRB
15 Nov 2013
WLCG MoU Topics Additional signatures since last RRB
Greece, Kavala Inst. signed in June, o Tier 2 for ATLAS, CMS
Thailand: National e-Science Infrastructure Inst. o Tier 2 for ALICE & CMS
Latin America: Federation (CLAF), o Tier 2 for all 4 experiments – o initially CBPF (Brazil) for LHCb and CMS
New Tier 1s: update at this meeting
15 Nov 2013
Data 2008-2013
15 Nov 2013 [email protected] 5
CERN Tape Archive
CERN Tape Writes
CERN Tape Verification
Data Loss: ~65 GB over 69 tapesDuration: ~2.5 years
Tape Usage Breakdown
15 PB
23 PB
27 PB
Data transfers CERN export rates driven (mostly) by LHC data export Global transfers at high (but somewhat reduced rate)
15 Nov 2013
CERN Tier 1sGlobal transfers
Use of HLT farms during LS1 LHCb, ATLAS, CMS all commissioned HLT
farms for use during LS1 & beyond Simulation, re-processing, analysis
ATLAS and CMS use Openstack (open source cloud software) to manage their farms Allows simplified configuration and allocation of
resources Identical to what is being done in the CERN CC
to provide IaaS LHCb use DIRAC to manage their HLT farm
15 Nov 2013
HLT Farm use
15 Nov 2013
ATLAS: up to 15k jobs (≈a large Tier 2)
CMS: use of 6k coresHLT upgrade will need ~100Gb/s link
LHCb: HLT is largest simulation site
ALICE Processed all of Run 1 data (~7.5 PB) Moved a lot of analysis load into analysis
trains Now see periods where analysis >50% of jobs Making bigger trains improves efficiency
CPU efficiency problems resolved: ~80% even for analysis now
Introducing a data popularity service Helps to optimize no. replicas and allows to
reclaim disk space
15 Nov 2013
ATLAS Good use of all resources made available (>
pledges) Effort on-going in software to optimize resource
utilization for Run 2 Reducing CPU use, event sizes, memory use etc.
More aggressive replication and data placement policies for Run 2 starting now Reduce AOD replicas to 1 at T1 and T2 Popularity-based dynamic data placement and
automated clean up of secondary copies Expect more use of tape retrieval as a consequence
15 Nov 2013
CMS Some reduction in resource usage in recent months
following completion of 2012 re-reconstruction and analysis Starting 2011 reprocessing Using HLT farm
Starting some preparations for 2015; eg. kinematic steps for simulations
Goal to commission new activities during LS1 Disk/tape separation on-going Data federation deployment in hand
All sites use fallback; testing remote access Next production releases of sw will support multi-core Also have dynamic data placement in development
15 Nov 2013
LHCb Completed 1st incremental stripping of 2011+2012
Starting 2nd campaign Will keep full.DST on disk in 2014 to simplify this
2012 simulation campaign ongoing Factor 2 reduction in event size (compression +
reduction of stored information) Disk use increasing – but received more disk than
pledges for 2013 Starting to do analysis at Tier 2s with disk (T2-D)
Goal of 10 such sites; 4 commissioned in the summer Work in hand on use of virtualisation techniques
15 Nov 2013
Assumption for resource needs
15 Nov 2013
LHC Live time
Pile-up for ATLAS and CMS Resource usage efficiency
CPU 20%
Disk 15%
Tape 15%
Growth assumed achievable with flat budgets
Evolution of requirements
15 Nov 2013
Estimated evolution of requirements 2015-2017 (NB. Does not reflect outcome of current RSG scrutiny)
2008-2013: Actual deployed capacity
Line: extrapolation of 2008-2012 actual resources
Curves: expected potential growth of technology with a constant budget (see next) CPU: 20% yearly growth Disk: 15% yearly growth
Higher trigger (data) rates driven by physics needs
Based on understanding of likely LHC parameters;
Foreseen technology evolution (CPU, disk, tape)
Experiments work hard to fit within constant budget scenario
CRSG comments/ recommendations
Run 2 requests have become more definite since Spring – assumed flat budgets;
ALICE and LHCb scrutinised requests have not always been met at T1. RRB requested to help find a way to resolve this;
CRSG strongly supports on going efforts to improve software efficiency, notes that resulting gains are already assumed in the requests for Run 2;
Effectiveness of disk use only partly reflected in occupancy. Welcome efforts (popularity, etc.) but would like a metric to take account of access frequency;
Networks have been exploited to reduce disk use and move processing between tiers. Concern that poorly networked sites will be underused and cost implications of providing network capacity.
15 Nov 2013
Updated scrutiny schedule Spring of year n
Final scrutiny of requests for year n+1 and look beyond Review use of resources in previous calendar year, n-1
Autumn of year n Look forward to requests for year n+2 and beyond If necessary, consider year n+1 requests
o For individual experiments if they want significant changeso Or for all experiments if, say, LHC running parameters
change significantly
CRSG asks experiments to submit documentation on 1 February and 1 August
15 Nov 2013
Summary During LS1 activities have slowed somewhat
after the conference season Evolution of the computing models:
Document close to final – see later talk Implementations under way in preparation for
Run 2 October RRB generally welcomed the
strategy to work hard to fit within flat budgets, although clearly some concerns about this
15 Nov 2013