Project Status Report

[email protected] 1

Project Status Report

Ian Bird

WLCG Overview Board

CERN, 29th October 2013

15 Nov 2013

[email protected] 2

Outline WLCG Collaboration & MoU update WLCG status report Summary from RRB

15 Nov 2013

[email protected] 3

WLCG MoU Topics Additional signatures since last RRB

Greece, Kavala Inst. signed in June, o Tier 2 for ATLAS, CMS

Thailand: National e-Science Infrastructure Inst. o Tier 2 for ALICE & CMS

Latin America: Federation (CLAF), o Tier 2 for all 4 experiments – o initially CBPF (Brazil) for LHCb and CMS

New Tier 1s: update at this meeting

15 Nov 2013

[email protected] 4

Short overview of activity

15 Nov 2013

Data 2008-2013

15 Nov 2013 [email protected] 5

CERN Tape Archive

CERN Tape Writes

CERN Tape Verification

Data Loss: ~65 GB over 69 tapesDuration: ~2.5 years

Tape Usage Breakdown

15 PB

23 PB

27 PB

[email protected] 6

Data transfers CERN export rates driven (mostly) by LHC data export Global transfers at high (but somewhat reduced rate)

15 Nov 2013

CERN Tier 1sGlobal transfers

[email protected] 7

Resource occupancy

15 Nov 2013

Tier 0+1

Tier 2

[email protected] 8

Use of HLT farms during LS1 LHCb, ATLAS, CMS all commissioned HLT

farms for use during LS1 & beyond Simulation, re-processing, analysis

ATLAS and CMS use Openstack (open source cloud software) to manage their farms Allows simplified configuration and allocation of

resources Identical to what is being done in the CERN CC

to provide IaaS LHCb use DIRAC to manage their HLT farm

15 Nov 2013

[email protected] 9

HLT Farm use

15 Nov 2013

ATLAS: up to 15k jobs (≈a large Tier 2)

CMS: use of 6k coresHLT upgrade will need ~100Gb/s link

LHCb: HLT is largest simulation site

[email protected] 10

ALICE Processed all of Run 1 data (~7.5 PB) Moved a lot of analysis load into analysis

trains Now see periods where analysis >50% of jobs Making bigger trains improves efficiency

CPU efficiency problems resolved: ~80% even for analysis now

Introducing a data popularity service Helps to optimize no. replicas and allows to

reclaim disk space

15 Nov 2013


ATLAS Good use of all resources made available (>

pledges) Effort on-going in software to optimize resource

utilization for Run 2 Reducing CPU use, event sizes, memory use etc.

More aggressive replication and data placement policies for Run 2 starting now Reduce AOD replicas to 1 at T1 and T2 Popularity-based dynamic data placement and

automated clean up of secondary copies Expect more use of tape retrieval as a consequence

15 Nov 2013


CMS Some reduction in resource usage in recent months

following completion of 2012 re-reconstruction and analysis Starting 2011 reprocessing Using HLT farm

Starting some preparations for 2015; eg. kinematic steps for simulations

Goal to commission new activities during LS1 Disk/tape separation on-going Data federation deployment in hand

All sites use fallback; testing remote access Next production releases of sw will support multi-core Also have dynamic data placement in development

15 Nov 2013


LHCb Completed 1st incremental stripping of 2011+2012

Starting 2nd campaign Will keep full.DST on disk in 2014 to simplify this

2012 simulation campaign ongoing Factor 2 reduction in event size (compression +

reduction of stored information) Disk use increasing – but received more disk than

pledges for 2013 Starting to do analysis at Tier 2s with disk (T2-D)

Goal of 10 such sites; 4 commissioned in the summer Work in hand on use of virtualisation techniques

15 Nov 2013


Summary of RRB

15 Nov 2013


Assumption for resource needs

15 Nov 2013

LHC Live time

Pile-up for ATLAS and CMS Resource usage efficiency

CPU 20%

Disk 15%

Tape 15%

Growth assumed achievable with flat budgets


Evolution of requirements

15 Nov 2013

Estimated evolution of requirements 2015-2017 (NB. Does not reflect outcome of current RSG scrutiny)

2008-2013: Actual deployed capacity

Line: extrapolation of 2008-2012 actual resources

Curves: expected potential growth of technology with a constant budget (see next) CPU: 20% yearly growth Disk: 15% yearly growth

Higher trigger (data) rates driven by physics needs

Based on understanding of likely LHC parameters;

Foreseen technology evolution (CPU, disk, tape)

Experiments work hard to fit within constant budget scenario


CRSG comments/ recommendations

Run 2 requests have become more definite since Spring – assumed flat budgets;

ALICE and LHCb scrutinised requests have not always been met at T1. RRB requested to help find a way to resolve this;

CRSG strongly supports on going efforts to improve software efficiency, notes that resulting gains are already assumed in the requests for Run 2;

Effectiveness of disk use only partly reflected in occupancy. Welcome efforts (popularity, etc.) but would like a metric to take account of access frequency;

Networks have been exploited to reduce disk use and move processing between tiers. Concern that poorly networked sites will be underused and cost implications of providing network capacity.

15 Nov 2013


Updated scrutiny schedule Spring of year n

Final scrutiny of requests for year n+1 and look beyond Review use of resources in previous calendar year, n-1

Autumn of year n Look forward to requests for year n+2 and beyond If necessary, consider year n+1 requests

o For individual experiments if they want significant changeso Or for all experiments if, say, LHC running parameters

change significantly

CRSG asks experiments to submit documentation on 1 February and 1 August

15 Nov 2013


Summary During LS1 activities have slowed somewhat

after the conference season Evolution of the computing models:

Document close to final – see later talk Implementations under way in preparation for

Run 2 October RRB generally welcomed the

strategy to work hard to fit within flat budgets, although clearly some concerns about this

15 Nov 2013

Project Status Report

Documents

Transcript of Project Status Report