ATLAS Computing Status and Plans

14
LHCC Referees Meeting Richard P Mount March 5, 2014 ATLAS Computing Status and Plans Richard P Mount SLAC National Accelerator Laboratory

description

ATLAS Computing Status and Plans. Richard P Mount SLAC National Accelerator Laboratory. Topics. Organization from March 1 Status – the last 12 months Plans – Preparation for Run 2 Vision of the Future. Software and Computing Organization – From March 1. - PowerPoint PPT Presentation

Transcript of ATLAS Computing Status and Plans

Page 1: ATLAS Computing Status and Plans

LHCC Referees Meeting Richard P Mount March 5, 2014

ATLAS ComputingStatus and Plans

Richard P Mount

SLAC National Accelerator Laboratory

Page 2: ATLAS Computing Status and Plans

LHCC Referees Meeting Richard P Mount March 5, 2014 2

Topics

• Organization from March 1

• Status – the last 12 months

• Plans – Preparation for Run 2

• Vision of the Future

Page 3: ATLAS Computing Status and Plans

LHCC Referees Meeting Richard P Mount March 5, 2014 3

Software and Computing Organization – From March 1

Software and Computing CoordinationRichard Mount

Eric Lancon (deputy)

Distributed Computing (ADC)Simone Campana and Torre Wenaus

SoftwareRolf Seuster and Markus Elsing

Major software effort underway preparing for Run 2:• Implementation Task Forces• Reorganization and

rationalization of existing activities

Page 4: ATLAS Computing Status and Plans

LHCC Referees Meeting Richard P Mount March 5, 2014 4

CPU Usage March 2013 to January 2014

Tier 1s:• Consistent above-pledge performance• Saturation most of the time

Tier 2s:• Consistent delivery of above-pledge and

opportunistic resources• Saturation most of the time

MC SimulationUser AnalysisMC RecoGroup ProdGroup Analy

Page 5: ATLAS Computing Status and Plans

LHCC Referees Meeting Richard P Mount March 5, 2014 5

High Level Trigger Farm Exploitation

CERN T0 and CAF usage for grid jobs

ATLAS HLT usage for grid jobs (bursts of over 15k jobs)

• The HLT has about 10% of the total ATLAS CPU capacity• Its time-averaged availability for simulation is expected to be no more

than 30%

Page 6: ATLAS Computing Status and Plans

LHCC Referees Meeting Richard P Mount March 5, 2014 6

Pending Jobs and Volume of Data Processed

MC SimulationUser AnalysisMC RecoGroup ProdGroup Analy

Total:> 1 Exabyte

Must limit simulation to keep analysis turnround acceptable, always many pending requests, priority via physics coordination

Analysis is the main driver of storage+network I/O capacity

Page 7: ATLAS Computing Status and Plans

LHCC Referees Meeting Richard P Mount March 5, 2014 7

Disk Space

MC Production

Real Data

Group Analysis

User Analysis

MC Production

Group Analysis

Real Data

User Analysis

Primary (pinned)Default (pinned)Secondary (Dynamically Managed)Input

Tier 1

Tier 2

T1 and T2 disks are full, requiring regular deletion of less-recently-accessed data

T1 dynamically managed space is currently too small (need to pin less data)

Page 8: ATLAS Computing Status and Plans

LHCC Referees Meeting Richard P Mount March 5, 2014 8

Tier 1 Tape

On track to saturate the 41 PB pledge

Simulated Hits to be kept for ~1 year in future

ESD no longer written in most cases

Expect major growth of Group Data on tape.

Raw Data

Simulated Hits

AOD

ESDNTUP

Page 9: ATLAS Computing Status and Plans

LHCC Referees Meeting Richard P Mount March 5, 2014 9

Preparation for Run 2

Guiding Principle:

Maximize physics capability while requiring resources that grow

more slowly than LHC luminosity

Key disk and CPU efficiency improvements:

• Improve reconstruction efficiency (target factor 2 to 3 in speed)

• Improve full simulation efficiency

• Implement the Integrated Simulation Framework supporting an

optimal mix of full and fast simulation

• Rationalize analysis workflow (less CPU/luminosity and less

Disk/luminosity1 for the same physics )

1) Smaller data formats, fewer version of the largest datasets

Page 10: ATLAS Computing Status and Plans

LHCC Referees Meeting Richard P Mount March 5, 2014 10

Page 11: ATLAS Computing Status and Plans

LHCC Referees Meeting Richard P Mount March 5, 2014 11

Major improvements to analysis environment

Task forces

• TF1, TF4: Define and implement new Event Data Model (xAOD);

migrate to new vector/matrix library Eigen

• TF2: Define and implement Reduction framework and train model

• TF3: New Analysis framework and tools; generic tools for “Combined

Performance” recommendations.

Page 12: ATLAS Computing Status and Plans

LHCC Referees Meeting Richard P Mount March 5, 2014 12

Page 13: ATLAS Computing Status and Plans

LHCC Referees Meeting Richard P Mount March 5, 2014 13

Preparation for Run 2 (continued)

Other key improvements:

• Rucio – new scalable distributed data management system

• ProdSys-2 (DEFT and JEDI) • More formalized and automated production management

• Jobs automatically defined to meet the needs of computing

resources

Ongoing developments

• Federated Atlas Xrootd (and http) access• Potential optimization of disk use – test in DC14

• Radical reduction of pinned disk data• Test during 2014

• Stay clear of thrashing the tape system

• “Event Server” technology to facilitate exploitation of

opportunistic resources with unpredictable availability.

Page 14: ATLAS Computing Status and Plans

LHCC Referees Meeting Richard P Mount March 5, 2014 14

ATLAS Computing Longer Term Vision

Computing:• Major shifts in relative costs of CPU/Disk/Tape/Networks will

continue

• Need to be flexible in “store versus recompute” and “store locally

versus get quickly or access directly from somewhere else”

Software:• Multi-threading (100s of threads) seems inevitable

• Quality, intelligibility and supportability of software will be vital

• Software for the Upgrade(s)

And Finally:• Beware of optimizing away our ability to discover the unexpected