Cern intro 2010-10-27-snw

50
Grid Computing at the Large Hadron Collider: Massive Computing at the Limit of Scale, Space, Power and Budget Dr Helge Meinhard CERN, IT Department SNW Frankfurt, 27 October 2010

description

 

Transcript of Cern intro 2010-10-27-snw

Page 1: Cern intro 2010-10-27-snw

Grid Computingat the Large Hadron Collider:

Massive Computing at the Limit of Scale, Space, Power

and Budget

Dr Helge MeinhardCERN, IT Department

SNW Frankfurt, 27 October 2010

Page 2: Cern intro 2010-10-27-snw

CERN (1)

§ Conseil européenpour la recherchenucléaire – aka European Laboratory for Particle Physics§ Facilities for

fundamental research§ Between Geneva and

the Jura mountains, straddling the Swiss-French border§ Founded in 1954

Page 3: Cern intro 2010-10-27-snw

CERN (2)

§ 20 member states§ ~3300 staff members,

fellows, students, apprentices§ 10’000 users registered

(~7’000 on site)§ from more than 550

institutes in more than 80 countries

§ 1026 MCHF (~790 MEUR) annual budget§ http://cern.ch/

Page 4: Cern intro 2010-10-27-snw

Physics at the LHC (1)

Matter particles: fundamental building blocks

Force particles:bind matter particles

Page 5: Cern intro 2010-10-27-snw

Physics at the LHC (2)

§ Four known forces: strong force,weak force, electromagnetism, gravitation§ Standard model

unifies three of them§ Verified to

0.1 percent level§ Too many free

parameters§ E.g. particle masses

§ Higgs particle§ Higgs condensate fills

vacuum§ Acts like ‘molasse’,

slows other particles down, gives them mass

Page 6: Cern intro 2010-10-27-snw

Physics at the LHC (3)

§ Open questions in particle physics:§ Why are the parameters of the size as we observe

them?§ What gives the particles their masses?§ How can gravity be integrated into a unified theory?§ Why is there only matter and no anti-matter in the

universe?§ Are there more space-time dimensions than the 4 we

know of?§ What is dark energy and dark matter which makes

up 98% of the universe?§ Finding the Higgs and possible new physics with

LHC will give the answers!

Page 7: Cern intro 2010-10-27-snw

The Large Hadron Collider (1)

§ Accelerator for protons against protons – 14 TeVcollision energy§ By far the world’s

most powerful accelerator

§ Tunnel of 27 km circumference, 4 m diameter, 50…150 m below ground§ Detectors at four

collision points

Page 8: Cern intro 2010-10-27-snw

The Large Hadron Collider (2)

§ Approved 1994, first circulating beams on10 September 2008§ Protons are bent by

superconducting magnets (8 Tesla, operating at 2K = –271°C) all around the tunnel§ Each beam: 3000

bunches of 100 billion protons each§ Up to 40 million bunch

collisions per second at the centre of each of the four detectors

Page 9: Cern intro 2010-10-27-snw

LHC Status and Future Plans

Date Event10-Sep-2008 First beam in LHC19-Sep-2008 Leak when magnets ramped to full field for 7 TeV/beam20-Nov-2009 First circulating beams since Sep-200830-Nov-2009 World record: 2 * 1.18 TeV, collisions soon after19-Mar-2010 Another world record: 2 * 3.5 TeV30-Mar-2010 First collisions at 2 * 3.5 TeV, special day for the press26-Jul-2010 Experiments present first results at ICHEP conference14-Oct-2010 Target luminosity for 2010 reached (10^32)Until end 2011 Run at 2 * 3.5 TeV to collect 1 fb-1

2012 Shutdown to prepare machine for 2 * 7 TeV2013 - …(?) Run at 2 * 7 TeV

Page 10: Cern intro 2010-10-27-snw

LHC Detectors (1)ATLAS

CMS

LHCb

Page 11: Cern intro 2010-10-27-snw

LHC Detectors (2)

3’000 physicists (including 1’000 students) from 173 institutes of 37 countries

Page 12: Cern intro 2010-10-27-snw

LHC Data(1)

The accelerator generates 40 million bunch collisions (“events”) every second at the centre of each of the four experiments’ detectors§ Per bunch collision, typically

~20 proton-proton interactions

§ Particles from previous bunch collision only 7.5 cm away from detector centre

Page 13: Cern intro 2010-10-27-snw

LHC Data (2)

Reduced by online computers that filter out a few hundred “good” events per second …

… which are recorded on disk and magnetic tape at 100…1’000 Megabytes/sec

15 Petabytes per year for four experiments

15’000 Terabytes = 3 million DVDs

1 event = few Megabytes

Page 14: Cern intro 2010-10-27-snw

LHC Data (3)

Page 15: Cern intro 2010-10-27-snw

CERN18%

All Tier-1s39%

All Tier-2s43%

CERN12%

All Tier-1s55%

All Tier-2s33%

CERN34%

All Tier-1s66%

CPU Disk Tape

30’000 CPU servers, 110’000 disks:

Too much for CERN!

Summary of Computing Resource RequirementsAll experiments – 2008From LCG TDR – June 2005

TotalCPU (MSPECint2000s) 142Disk (Petabytes) 57Tape (Petabytes) 53

CERN All Tier-1s All Tier-2s25 56 617 31 19

18 35

Page 16: Cern intro 2010-10-27-snw

Worldwide LHC Computing Grid (1)§ Tier 0: CERN§ Data acquisition and

initial processing§ Data distribution§ Long-term curation

§ Tier 1: 11 major centres§ Managed mass storage§ Data-heavy analysis§ Dedicated 10 Gbps lines

to CERN§ Tier 2: More than 200

centres in more than 30 countries§ Simulation§ End-user analysis

§ Tier 3: from physicists’ desktops to small workgroup cluster§ Not covered by MoU

Tier3physics

department

a

b

g

Desktop

Germany

USAUK

France

Italy

Taiwan

NordicCountries

Nether-lands

CERN Tier 0

Tier2

Lab a

Uni a

Lab c

Uni n

Lab m

Lab b

Uni bUni y

Uni x

grid for a physicsstudy group

SpainTier 1

grid for a regional group

Page 17: Cern intro 2010-10-27-snw

Worldwide LHC Computing Grid (2)

§ Grid middleware for “seemless” integration of services§ Aim: Looks like single huge compute facility§ Projects: EDG/EGEE/EGI, OSG§ Big step from proof of concept to stable,

large-scale production§ Centres are autonomous, but lots of

commonalities§ Commodity hardware (e.g. x86 processors)§ Linux (RedHat Enterprise Linux variant)

Page 18: Cern intro 2010-10-27-snw

CERNComputer CentreFunctions:§ WLCG: Tier 0,

some T1/T2§ Support for smaller

experiments at CERN§ Infrastructure for

the laboratory§ …

Page 19: Cern intro 2010-10-27-snw

Requirements and Boundaries (1)

§ High Energy Physics applications require mostly integer processor performance§ Large amount of processing power and storage

needed for aggregate performance§ No need for parallelism / low-latency high-speed

interconnects§ Can use large numbers of components with

performance below optimum level (“coarse-grain parallelism”)

§ Infrastructure (building, electricity,cooling) is a concern§ Refurbished two machine rooms

(1500 + 1200 m2) for total air cooledpower consumption of 2.5 MWatts§ Will run out of power in about 2014…

Page 20: Cern intro 2010-10-27-snw

Requirements and Boundaries (2)

§ Major boundary condition: cost§ Getting maximum

resources with fixed budget…

§ … then dealing with cuts to “fixed” budget

§ Only choice: commodity equipment as far as possible, minimisingTCO / performance§ This is not always the

solution with the cheapest investment cost!

Purchased in 2004, now retired

Page 21: Cern intro 2010-10-27-snw

The Bulk Resources – Event Data

Tapes/ servers

Disk servers

CPU servers

Router

R

R

R

REthernetbackbone

(multiple 10GigE)

10GigE

Permanent storage on tape

Disk as temporary buffer

Data paths:tape « diskdisk « cpu

(simplified network topology)

Page 22: Cern intro 2010-10-27-snw

CERN CC currently (September 2010)

§ 8’500 systems, 54’000 processing cores§ CPU servers, disk servers, infrastructure

servers§ 49’900 TB raw on 58’500 disk drives§ 25’000 TB used, 50’000 tape cartridges

total (70’000 slots), 160 tape drives§ Tenders in progress or planned

(estimates)§ 800 systems, 11’000 processing cores§ 16’000 TB raw on 8’500 disk drives

Page 23: Cern intro 2010-10-27-snw

Disk Servers for Bulk Storage (1)§ Target: temporary event data storage§ More than 95% of disk storage capacity

§ Best TCO / performance: Integrated PC server§ One or two x86 processors, 8…16 GB, PCI RAID card(s)§ 16…24 hot-swappable 7’200 rpm SATA disks in server chassis§ Gigabit or 10Gig Ethernet§ Linux (of course)

§ Adjudication based on total usable capacity with constraints§ Power consumption taken into account§ Systems procured recently: depending on specs,

5…20 TB usable§ Looking at software RAID, external iSCSI disk enclosures

§ Home-made optimised protocol (rfcp) and HSM software (Castor)

Page 24: Cern intro 2010-10-27-snw

Disk Servers for Bulk Storage (2)

Page 25: Cern intro 2010-10-27-snw

Disk Servers for Bulk Storage (3)

Page 26: Cern intro 2010-10-27-snw

Other Disk-based Storage

§ For dedicated applications (not physics bulk data):§ SAN/FC storage§ NAS storage§ iSCSI storage

§ Total represents well below 5% of disk capacity§ Consolidation project ongoing

Page 27: Cern intro 2010-10-27-snw

Procurement Guidelines

§ Qualify companies to participate in calls for tender§ A-brands and their resellers§ Highly qualified assemblers/integrators

§ Specify performance rather than box counts§ Some constraints on choices for solution§ Leave detailed system design to bidder

§ Decide based on TCO§ Purchase price§ Box count, network connections§ Total power consumption

Page 28: Cern intro 2010-10-27-snw

The Power Challenge – why bother?

§ Infrastructure limitations§ E.g. CERN: 2.5 MW for IT equipment§ Need to fit maximum capacity into given power envelope

§ Electricity costs money§ Costs likely to raise (steeply) over the next few years

§ IT responsible of significant fraction of world energy consumption§ Server farms in 2008: 1…2% of the world’s energy

consumption (annual growth rate: 16…23%)§ CERN’s data centre is 0.1 per mille of this…

§ Responsibility towards mankind demands using the energy as efficiently as possible

§ Saving a few percent of energy consumption makes a big difference

Page 29: Cern intro 2010-10-27-snw

CERN’s Approach

§ Don’t look in detail at PSU, fans, CPUs, chipset, RAM, disk drives, VRMs, RAID controllers, …§ Rather: Measure apparent (VA) power consumption

in primary AC circuit§ CPU servers: 80% full load, 20% idle§ Storage and infrastructure servers: 50% full load, 50% idle

§ Add element reflecting power consumption to purchase price§ Adjudicate on the sum of purchase price and power

adjudication element

Page 30: Cern intro 2010-10-27-snw

Power Efficiency: Lessons Learned

§ CPU servers: power efficiency increased by factor 12 in a little over four years§ Need to benchmark concrete servers§ Generic statements on platform are void

§ Fostering energy-efficient solutions makes a difference§ Power supplies feeding more than one

system usually more power-efficient§ Redundant power supplies are inefficient

Page 31: Cern intro 2010-10-27-snw

Future (1)

§ Is IT growth sustainable?§ Demands continue to rise exponentially§ Even if Moore’s law continues to apply, data

centres will need to grow in number and size§ IT already consuming 2% of world’s energy –

where do we go?§ How to handle growing demands within a

given data centre?§ Demands evolve very rapidly, technologies less

so, infrastructure even at a slower pace – how to best match these three?

Page 32: Cern intro 2010-10-27-snw

Future (2)

§ IT: Ecosystem of§ Hardware§ OS software and tools§ Applications

§ Evolving at different paces: hardware fastest, applications slowest§ How to make sure at any given time that

they match reasonably well?

Page 33: Cern intro 2010-10-27-snw

Future (3)

§ Example: single-core to multi-core to many-core§ Most HEP applications currently single-

threaded§ Consider server with two quad-core CPUs as

eight independent execution units§ Model does not scale much further

§ Need to adapt applications to many-core machines§ Large, long effort

Page 34: Cern intro 2010-10-27-snw

Summary§ The Large Hadron Collider (LHC) and its experiments is a

very data (and compute) intensive project§ LHC has triggered or pushed new technologies§ E.g. Grid middleware, WANs

§ High-end or bleeding edge technology not necessary everywhere§ That’s why we can benefit from the cost advantages of

commodity hardware§ Scaling computing to the requirements of LHC is hard

work§ IT power consumption/efficiency is a primordial concern§ We are steadily taking collision data at 2 * 3.5 TeV, and

have the capacity in place for dealing with this§ We are on track for further ramp-ups of the computing

capacity for future requirements

Page 35: Cern intro 2010-10-27-snw

Summary of Computing Resource RequirementsAll experiments - 2008From LCG TDR - June 2005

CERN All Tier-1s All Tier-2s TotalCPU (MSPECint2000s) 25 56 61 142Disk (PetaBytes) 7 31 19 57Tape (PetaBytes) 18 35 53

Thank you

Page 36: Cern intro 2010-10-27-snw

BACKUP SLIDES

Page 37: Cern intro 2010-10-27-snw

CPU Servers (1)

§ Simple, stripped down, “HPC like” boxes§ No fast low-latency interconnects

§ EM64T or AMD64 processors (usually 2),2 or 3 GB/core, 1 disk/processor§ Open to multiple systems per enclosure§ Adjudication based on total performance

(SPECcpu2006 – all_cpp subset)§ Power consumption taken into account§ Linux (of course)

Page 38: Cern intro 2010-10-27-snw

CPU Servers (2)

Page 39: Cern intro 2010-10-27-snw

Tape Infrastructure (1)

§ 15 Petabytes per year§ … and in 10 or 15 years’ time physicists will

want to go back to 2010 data!

§ Requirements for permanent storage:§ Large capacity§ Sufficient bandwidth§ Proven for long-term data curation§ Cost-effective

§ Solution: High-end tape infrastructure

Page 40: Cern intro 2010-10-27-snw

Tape Infrastructure (2)

Page 41: Cern intro 2010-10-27-snw

Mass Storage System (1)

§ Interoperation challenge locally at CERN§ 100+ tape drives§ 1’000+ RAID volumes on disk servers§ 10’000+ processing slots on worker nodes

§ HSM required§ Commercial options carefully considered

and rejected: OSM, HPSS§ CERN development: CASTOR (CERN

Advanced Storage Manager)http://cern.ch/castor

Page 42: Cern intro 2010-10-27-snw

Mass Storage System (2)

§ Key CASTOR features§ Database-centric layered architecture§ Stateless agents; can restart easily on error§ No direct connection from users to critical services

§ Scheduled access to I/O§ No overloading of disk servers§ Per-server limit set according to type of transfer§ servers can support many random access style

accesses, but only a few sustained data transfers§ I/O requests can be scheduled according to priority§ Fair shares access to I/O just as for CPU§ Prioritise requests from privileged users

§ Performance and stability proven at the level required for Tier 0 operation

Page 43: Cern intro 2010-10-27-snw

Box Management (1)

§ Many thousand boxes§ Hardware management (install, repair, move,

retire)§ Software installation§ Configuration§ Monitoring and exception handling§ State management

§ 2001…2002: Review of available packages§ Commercial: Full Linux support rare, insufficient

reduction on staff to justify licence fees§ Open Source: Lack of features considered

essential, didn’t scale to required level

Page 44: Cern intro 2010-10-27-snw

Box Management (2)

§ ELFms (http://cern.ch/ELFms)§ CERN development in collaboration with

many HEP sites and in the context of the European DataGrid (EDG) project

§ Components:§ Quattor: installation and configuration§ Lemon: monitoring and corrective actions§ Leaf: workflow and state management

Page 45: Cern intro 2010-10-27-snw

Box Management (3): ELFms Overview

NodeConfigurationManagement

NodeManagement

Leaf

LemonPerformance& ExceptionMonitoring

LogisticalManagement

Page 46: Cern intro 2010-10-27-snw

Box Management (4): Quattor

Node Configuration Manager NCM

CompA CompB CompC

ServiceAServiceBServiceC

RPMs / PKGs

SW Package ManagerSPMA

Managed Nodes

SW server(s)

HTTP

SWRepository RPMs

Install server

HTTP / PXE System

installer

Install Manager

base OS

XML configuration profiles

Configuration server

HTTP

CDB

SQL backend

SQL

CLIGUI

scriptsXML backend

SOAP

Used by 18 organisationsbesides CERN; including two distributed implementations

with 5 and 18 sites.

Page 47: Cern intro 2010-10-27-snw

Box Management (5): Lemon

CorrelationEngines

User Workstations

Web browser

Lemon CLI

User

MonitoringRepository

TCP/UDP

SOAP

SOAP

Repositorybackend

SQL

Nodes

Monitoring Agent

Sensor SensorSensor

RRDTool / PHP

apache

HTTP

Page 48: Cern intro 2010-10-27-snw

Box Management (6): Lemon

§ Apart from node parameters, non-node parameters are monitored as well§ Power, temperatures, …§ Higher-level views of Castor, batch queues

on worker nodes etc.

§ Complemented by user view of service availability: Service Level Status

Page 49: Cern intro 2010-10-27-snw

Box Management (7): Leaf

§ HMS (Hardware management system)§ Track systems through lifecycle§ Automatic ticket creation§ GUI to physically find systems by host name

§ SMS (State management system)§ Automatic handling and tracking of high-level

configuration steps§ E.g. reconfigure, drain and reboot all cluster

nodes for a new kernel version

Page 50: Cern intro 2010-10-27-snw

Box Management (8): Status

§ Many thousands of boxes managed successfully by ELFms, both at CERN and elsewhere, despite decreasing staff levels§ No indication of problems scaling up further§ Changes being applied wherever necessary§ E.g. support for virtual machines

§ Large-scale farm operation remains a challenge§ Purchasing, hardware failures, …