LHC@Home: A BOINC-based volunteer computing infrastructure...

26
LHC@HOME: A BOINC-BASED VOLUNTEER COMPUTING INFRASTRUCTURE FOR PHYSICS STUDIES AT CERN BOINC:FAST 2017 Conference Petrozavodsk 28-30/08 Igor Zacharov, EPFL [email protected] 1 J. Barranco, Y. Cai, D. Cameron, M. Crouch, R. De Maria, L. Field, M. Giovannozzi, P. Hermes, N. Høimyr, D. Kaltchev, N. Karastathis, C. Luzzi, E. Maclean, E. McIntosh, A. Mereghetti, J. Molson, Y. Nosochkov, T. Pieloni, I.D. Reid, L. Rivkin, B. Segal, K. Sjobak, P. Skands, C. Tambasco, F. F. Van der Veken Igor Zacharov CERN/EPFL - August 2017

Transcript of LHC@Home: A BOINC-based volunteer computing infrastructure...

Page 1: LHC@Home: A BOINC-based volunteer computing infrastructure ...boincfast.ru/files/5815/0522/1474/PRES_v1.1.pdf · A BOINC-BASED VOLUNTEER COMPUTING INFRASTRUCTURE FOR PHYSICS STUDIES

LHC@HOME: A BOINC-BASED VOLUNTEER COMPUTING

INFRASTRUCTURE FOR PHYSICS STUDIES AT CERN

BOINC:FAST 2017 Conference – Petrozavodsk 28-30/08

Igor Zacharov, EPFL

[email protected]

1

J. Barranco, Y. Cai, D. Cameron, M. Crouch, R. De Maria, L.

Field, M. Giovannozzi, P. Hermes, N. Høimyr, D. Kaltchev, N.

Karastathis, C. Luzzi, E. Maclean, E. McIntosh, A. Mereghetti, J.

Molson, Y. Nosochkov, T. Pieloni, I.D. Reid, L. Rivkin, B. Segal,

K. Sjobak, P. Skands, C. Tambasco, F. F. Van der Veken

Igor Zacharov – CERN/EPFL - August 2017

Page 2: LHC@Home: A BOINC-based volunteer computing infrastructure ...boincfast.ru/files/5815/0522/1474/PRES_v1.1.pdf · A BOINC-BASED VOLUNTEER COMPUTING INFRASTRUCTURE FOR PHYSICS STUDIES

CERN facts

European Physics Laboratory in Switzerland (Geneva) Focused on Particle Physics and Accelerator Engineering

21 Member states, 7 Observer states Austria, Belgium, Bulgaria, Czech Republic, Denmark, Finland, France, Germany, Greece,

Hungary, Israel, Italy, Netherlands, Norway, Poland, Portugal, Slovakia, Spain, Sweden,

Switzerland, United Kingdom

European Union, India, Japan, JINR, Russian Federation, UNESCO and United States of America

10 Departments Beams (BE), Engineering (EN), Experimental Physics (EP), Finance & Admin FAP), HR,

Industry & Procurement (IPT), Information Technology (IT), Site (SMB), Technology (TE),

Theoretical Physics (TH)

Members of the personnel (for January 2016) Staff: 2531

Fellows: 645

Users: 13128

Energy Frontier Large Hadron Collider (LHC)

Detectors:

ALICE, ATLAS, CMS, LHCb

Igor Zacharov – CERN/EPFL - August 2017 2

Page 3: LHC@Home: A BOINC-based volunteer computing infrastructure ...boincfast.ru/files/5815/0522/1474/PRES_v1.1.pdf · A BOINC-BASED VOLUNTEER COMPUTING INFRASTRUCTURE FOR PHYSICS STUDIES

LHC Experiments

Main physics instruments:

Igor Zacharov – CERN/EPFL - August 2017 3

CMS Compact Muon Solenoid

ATLAS A Toroidal LHC ApparatuS

ALICE A Large Ion Collider Experiment

LHCb LHC beauty experiment

Page 4: LHC@Home: A BOINC-based volunteer computing infrastructure ...boincfast.ru/files/5815/0522/1474/PRES_v1.1.pdf · A BOINC-BASED VOLUNTEER COMPUTING INFRASTRUCTURE FOR PHYSICS STUDIES

CERN LHC Experiments: Data Processing

Experimental Data processing

Large volume of data for analysis

Compare measured data with Monte-Carlo modelling of particle

collisions including apparatus response

Monte-Carlo simulation of particles passing through the

detectors suitable for volunteers’ processing:

Low data volume to transfer

Large number of simultaneous jobs

Simulation campaigns running for months

Volunteers processing is used by

ATLAS, CMS, LHCb

ALICE is running a “proof of concept” with CernVM on a desktop grid

Igor Zacharov – CERN/EPFL - August 2017 4

Page 5: LHC@Home: A BOINC-based volunteer computing infrastructure ...boincfast.ru/files/5815/0522/1474/PRES_v1.1.pdf · A BOINC-BASED VOLUNTEER COMPUTING INFRASTRUCTURE FOR PHYSICS STUDIES

Theory Division Event generator: Theoretical models of particle interactions

Low data volume to transfer

Large number of simultaneous jobs with different theoretical models

Reference calculations for experimental measurements

Igor Zacharov – CERN/EPFL - August 2017 5

Example

Comparison of event generators

to the archived measurements

- Colored lines: models for particle collisions

- Black squares: 1996 ALEPH measurement

Probability distribution for observing N particles

In electron-positron collisions at LEP collider

Yellow band is uncertainty of the measurement Ratio of theory divided by data

Number charged particles

Pro

ba

bili

ty

Page 6: LHC@Home: A BOINC-based volunteer computing infrastructure ...boincfast.ru/files/5815/0522/1474/PRES_v1.1.pdf · A BOINC-BASED VOLUNTEER COMPUTING INFRASTRUCTURE FOR PHYSICS STUDIES

Beam Dynamics: Accelerator study

LHC main magnets are superconducting Complicated field structure: not an ideal magnet

Beam dynamics is non-linear: particles can be lost on magnets with

the risk of quenching them

Main goals of the Beam Dynamics calculations:

Study the field quality and Dynamic Aperture (DA)

→ numerical simulations with the SixTrack program

Protect the magnets from quenches

→ design the collimation system

Igor Zacharov – CERN/EPFL - August 2017 6

Primary

(robust)

Secondary

(robust)

Absorber

(W metal)Tertiary

(W metal)

AR

C

AR

C

AR

C

IP &

Tri

ple

ts

Physics absorbers

(Cu metal)

6.0+ s 7.0+ s 10.0+ s 8.5+ s 10.0+ s

Page 7: LHC@Home: A BOINC-based volunteer computing infrastructure ...boincfast.ru/files/5815/0522/1474/PRES_v1.1.pdf · A BOINC-BASED VOLUNTEER COMPUTING INFRASTRUCTURE FOR PHYSICS STUDIES

IT – CERN Comp. Centre & WLCG CERN Servers

2×105 cores; 2-4 GB memory per core

Linux OS

45 PB disk, 200 PB tape

3.5 MW Power

Worldwide LHC Computing Grid 170 computer centers in 41 countries

Collaboration of computing centers organized in Tiers:

Tier 0: CERN Computing Centre

o Store & pre-compute raw data

o 15% computing capacity

Tier 1: 13 Computing Centers

o GE, NL, RF(RRC-KI, JINR), …

o Raw & reconstructed data storage

Tier 2: Universities, Science Institutes

o 155 sites around the world

Tier 3: Individual computing resources o (no contract with WLCG)

7

Page 8: LHC@Home: A BOINC-based volunteer computing infrastructure ...boincfast.ru/files/5815/0522/1474/PRES_v1.1.pdf · A BOINC-BASED VOLUNTEER COMPUTING INFRASTRUCTURE FOR PHYSICS STUDIES

Requirements of Computing

Igor Zacharov – CERN/EPFL - August 2017 8

Beam Dynamics (SixTrack)

~ 105 - 106 jobs to establish parameter scan for accelerator study

Several accelerator studies per year

Accelerator upgrades and improvements

Beam dynamics profit a lot from volunteers’ computing to run most studies

Experiments (ATLAS, CMS, LHCb)

Raw data processing at CERN (Tier-0) and at Tier-1

Monte-Carlo simulation at Tier-2 (suitable for volunteers computing)

Processing requirements:

IEEE 754 Floating Point compliance, double precision

Access to numerical libraries

CRLIBM for Sixtrack

Experiments: ~10 M lines code, CERN libraries, Linux environment

Virtualization to run on non-Linux hosts

Page 9: LHC@Home: A BOINC-based volunteer computing infrastructure ...boincfast.ru/files/5815/0522/1474/PRES_v1.1.pdf · A BOINC-BASED VOLUNTEER COMPUTING INFRASTRUCTURE FOR PHYSICS STUDIES

Volunteers’ computing history @CERN

LHC@Home started 2004 using BOINC SixTrack, Garfield (gas chamber detector simulation)

Test4Theory production since 2011 using VM technology

Oracle Virtual Box hypervisor and CernVM reproduce CERN Linux environment

• Open the VM solution to other CERN experiments to run on all BOINC platforms

ATLAS, CMS, LHCb applications adapted own submission

Each successfully run under BOINC-VM

CernVM and CernVmFS used by all of the experiments

Consolidation project led by CERN IT Department

Bring all projects under single LHC@Home framework

Project specific credit

LHC@Home specific project can be selected from project preferences

Total of about 7.5 PFlop computing power available to LHC@Home

HTCondor for job submission to BOINC or VM run under BOINC (same as

CERN's batch system)

Igor Zacharov – CERN/EPFL - August 2017 9

Page 10: LHC@Home: A BOINC-based volunteer computing infrastructure ...boincfast.ru/files/5815/0522/1474/PRES_v1.1.pdf · A BOINC-BASED VOLUNTEER COMPUTING INFRASTRUCTURE FOR PHYSICS STUDIES

ATLAS@home experience

with the MonteCarlo simulation

Application has large memory requirement

Job on 1 core may require VM with up to 2.5 GB memory

Initial version of ATLAS@home: Not possible to fill all cores in a PC with ATLAS tasks

Sharing memory within multi-core VM:

Performance limit to max 8 cores/Job

BOINC changed to adjust VM

memory usage to the #cores

Two new parameters added

to the plan class: base + per_core

Pushed upstream for standard BOINC

Volunteers’ computing provides

up to 2% of ATLAS processing Igor Zacharov – CERN/EPFL - August 2017 10

Memory = 2.5GB + 0.8GB × ncores 2 cores: 4.1 GB 12 cores: 12.1 GB

Production version Shared memory

Color code for different core numbers

David Cameron: presentation at CHEP 2016

Page 11: LHC@Home: A BOINC-based volunteer computing infrastructure ...boincfast.ru/files/5815/0522/1474/PRES_v1.1.pdf · A BOINC-BASED VOLUNTEER COMPUTING INFRASTRUCTURE FOR PHYSICS STUDIES

CMS@home experience

with the MonteCarlo simulation

MC simulation of collision events job parameters

1-3 hour duration

10-50 MB output file

BOINC server

Submits to volunteers VM

Uploads results

HTCondor server Returns results to the

CMS computing infrastructure

Timing the result files received from GRID and CMS@home:

GRID has fast hosts and results start to flow back quickly

Slower volunteers’ hosts running VM return results at a constant rate

BOINC is suitable for long studies where results are collected

over several months Igor Zacharov – CERN/EPFL - August 2017 11

𝑡𝑡 production test

2×103 jobs

Page 12: LHC@Home: A BOINC-based volunteer computing infrastructure ...boincfast.ru/files/5815/0522/1474/PRES_v1.1.pdf · A BOINC-BASED VOLUNTEER COMPUTING INFRASTRUCTURE FOR PHYSICS STUDIES

Volunteers’ computing activity Individual processing capacities on volunteer’s machines:

CERN Data Centre (DC) is fully loaded with Experiment’s raw data

reconstruction processing and data analysis

High volume of data movement, unsuitable for remote processing

Limited capacity for Accelerator studies in addition to analysis

Compare to the average 2.5105 jobs running/queued in CERN DC

BOINC runs 2 x redundancy for verification and error checking

Igor Zacharov – CERN/EPFL - August 2017 12

Experiment Sustained BOINC

Simultaneous jobs

Comments

ATLAS 7103 Requires VM,

native version available in Beta on some Linux platforms

CMS 1103 Requires VM

LHCb 3.5103 Requires VM

Theory 6103 Requires VM

Sixtrack 3.5105 Fortran and C, compiled for every OS flavor and

processor type individually

Page 13: LHC@Home: A BOINC-based volunteer computing infrastructure ...boincfast.ru/files/5815/0522/1474/PRES_v1.1.pdf · A BOINC-BASED VOLUNTEER COMPUTING INFRASTRUCTURE FOR PHYSICS STUDIES

Sixtrack on volunteer’s machines

Some statistics since 2004: Volunteers: 150000

PCs: 300000

Delivering sustained processing capacity of ~45 TFlop

Essential for CERN’s Accelerator studies

Igor Zacharov – CERN/EPFL - August 2017 13

Time evolution of volunteers, active tasks, cumulative # WU since Feb. 2017

Tasks in progress/Total WUs [106]

Volunteers [106]

Pentathlon May 2017

Page 14: LHC@Home: A BOINC-based volunteer computing infrastructure ...boincfast.ru/files/5815/0522/1474/PRES_v1.1.pdf · A BOINC-BASED VOLUNTEER COMPUTING INFRASTRUCTURE FOR PHYSICS STUDIES

The BOINC Pentathlon (may 2017)

Organized by SETI in Germany, won by SETI.USA

CERN LHC@Home chosen for the Sprint event

Over 350,000 active tasks

Igor Zacharov – CERN/EPFL - August 2017 14

Page 15: LHC@Home: A BOINC-based volunteer computing infrastructure ...boincfast.ru/files/5815/0522/1474/PRES_v1.1.pdf · A BOINC-BASED VOLUNTEER COMPUTING INFRASTRUCTURE FOR PHYSICS STUDIES

Sixtrack basics

Computation of the trajectories of ultra-relativistic particles in

the presence of static and variable electric & magnetic fields

Multiple particles moving through the accelerator probe the

stability regions in 6-D phase space

15

Head-On

Long Range Beam-Beam Interactions

Accelerator

r→

Page 16: LHC@Home: A BOINC-based volunteer computing infrastructure ...boincfast.ru/files/5815/0522/1474/PRES_v1.1.pdf · A BOINC-BASED VOLUNTEER COMPUTING INFRASTRUCTURE FOR PHYSICS STUDIES

SixTrack tracking

Sixtrack is used to determine the Dynamic Aperture (DA)

Particle motion is deterministic but there is no theory for

predicting the onset of chaotic motion

DA is determining stable beam conditions

Boundary between chaotic & non-chaotic motion

Is there a link between DA and the beam lifetime?

Collimation studies

Protecting Superconducting magnets from quenches

Only tracking allows computing of DA

16

It is a DIVERGENT application in that even a 1 ULP

difference will grow exponentially with time giving

significantly different results at the onset of chaotic

motion (c.f. Non-linear, Lorentz, “butterfly” effect)

Key issue: numerical compatibility of results obtained

on heterogeneous architectures: solved for SixTrack!

beam

size

Page 17: LHC@Home: A BOINC-based volunteer computing infrastructure ...boincfast.ru/files/5815/0522/1474/PRES_v1.1.pdf · A BOINC-BASED VOLUNTEER COMPUTING INFRASTRUCTURE FOR PHYSICS STUDIES

SixTrack processing

Igor Zacharov – CERN/EPFL - August 2017 17

Massive numerical simulations for volunteers’ processing

Low data volume to transfer (detailed accelerator description)

Large number of simultaneous jobs

Scans over phase-space variables (particles amplitudes & angles)

Magnetic field errors distribution and accelerator settings

Beam parameters (intensity, emittance): single set for each LHC Study

A typical LHC Study might require ~105 jobs, each job:

105 - 106 turns, 103 - 104 initial conditions, 60 lattice realizations (seeds)

10 hours each job, unless particles “lost”

All jobs are independent and the Results are combined at the end:

this is an ideal feature for a BOINC application

Simulation campaigns running for months

SixTrack produces identical (0 ULP difference) results:

on the three principal Operating Systems (Linux, Windows, Mac)

using any of five different Fortran compilers with compliant optimization level

cmake compiler script

Probing magnet

Errors distribution ↘

Page 18: LHC@Home: A BOINC-based volunteer computing infrastructure ...boincfast.ru/files/5815/0522/1474/PRES_v1.1.pdf · A BOINC-BASED VOLUNTEER COMPUTING INFRASTRUCTURE FOR PHYSICS STUDIES

Methodology for 0-ULP identity

Source code and compilation:

add parenthesis to fix order of execution

Disable Extended Precision (all proprietary >64 bit formats)

Use library from Écoles normales supérieure (ENS) Lyon:

CRLIBM for elementary functions

Use identity a**b = exp(b*ln(a)), NINT (nearest integer function)

Disable Fused MADD

Use DM. Gay routines for formatted input/output

Performance reduction due to disabled optimizations: ~2%

BOINC heterogeneous redundancy:

Each case run twice (or more in case of a difference)

Bitwise comparison of ASCI output record for each particle

Error rate ~2%

Overclocking is recognized as one of the sources of the errors

Igor Zacharov – CERN/EPFL - August 2017 18

Copyright F. McIntosh and CERN

Page 19: LHC@Home: A BOINC-based volunteer computing infrastructure ...boincfast.ru/files/5815/0522/1474/PRES_v1.1.pdf · A BOINC-BASED VOLUNTEER COMPUTING INFRASTRUCTURE FOR PHYSICS STUDIES

SixTrack study results

Igor Zacharov – CERN/EPFL - August 2017 19

Comparison between

simulated and measured DA

of the LHC at injection.

Extrapolated DA of LHC at 30

minutes after injection

as a function of different

chromaticity & octupoles settings

Accelerator design: DA at fixed time is used to specify the required magnetic field quality

104 105 106 107

Number of turns [N]

6

8

10

12

Sixtrack simulations

Measured DA

DA

~1 s

~10 h

Page 20: LHC@Home: A BOINC-based volunteer computing infrastructure ...boincfast.ru/files/5815/0522/1474/PRES_v1.1.pdf · A BOINC-BASED VOLUNTEER COMPUTING INFRASTRUCTURE FOR PHYSICS STUDIES

Challenges for SixTrack (1): use of BOINC

Varying length of WU Not known when “particles” will be “lost” (1 min – 10 h jobs)

Definition of the “Outliers” in BOINC

Scheduling of the WUs Serious tuning work may still be necessary to always distribute the

WUs to free volunteers’ PCs

Coping with varying load SixTrack work comes in batches to support the studies

No work is submitted between studies

Better responding to errors When running the SixTrack application

In the network and CERN infrastructure

Igor Zacharov – CERN/EPFL - August 2017 20

Page 21: LHC@Home: A BOINC-based volunteer computing infrastructure ...boincfast.ru/files/5815/0522/1474/PRES_v1.1.pdf · A BOINC-BASED VOLUNTEER COMPUTING INFRASTRUCTURE FOR PHYSICS STUDIES

Challenges for SixTrack (2): new features

Split a study with N turns into M studies each of N/M turns More efficient use of the computing resources

Implement the capability of storing the end-state of a study to make it the

initial-state of the following one

longer time-scales of simulated analysis: 106 → 107 turns (factor of x10)

More complex physics

Radiation effects

New structural elements (eg. electron lenses for collimation)

Evolution of beam distribution for computation of stability diagram

New physics features (requires code restructuring)

Internal analysis of loss location comparing particle’s trajectory against

accelerator mechanical aperture. This would open up the use of BOINC

infrastructure for collimation studies!

Igor Zacharov – CERN/EPFL - August 2017 21

Page 22: LHC@Home: A BOINC-based volunteer computing infrastructure ...boincfast.ru/files/5815/0522/1474/PRES_v1.1.pdf · A BOINC-BASED VOLUNTEER COMPUTING INFRASTRUCTURE FOR PHYSICS STUDIES

Challenges for SixTrack (3): more resources

Increase the number of volunteers participating

Opportunity to use the GPU in volunteers’ machines: • rewrite CPU intensive loop in subset of C, called from Fortran main/subr

• Compile with OpenCL and/or CUDA

• Essential use of Double Precision IEEE 754 FP

Igor Zacharov – CERN/EPFL - August 2017 22

Type Cores [#] Clock [MHz] FP64 [GF/s] Year Bench [μs/part/turn]

I7 920 1 2670 5.2 2009 545

Xeon E5-2630 1 2200 17 2016 364

2 x Xeon E5-2630 2 x 10 2200 340 2016 16

Nvidia P100 (16GB) 3584 1480 5300 2016 1.8

Nvidia GTX 1080 2560 1700 288 2016 12.8

Nvidia K20x 2680 732 1312 2015 10.8

AMD R9 280x 2048 1000 1024 2013 4.3

AMD W8100 2560 824 2110 2014 4.0

Page 23: LHC@Home: A BOINC-based volunteer computing infrastructure ...boincfast.ru/files/5815/0522/1474/PRES_v1.1.pdf · A BOINC-BASED VOLUNTEER COMPUTING INFRASTRUCTURE FOR PHYSICS STUDIES

LHC@HOME searching for resources

LHC@Home data base:

Total number of hosts in db: 896,356 with a total of 4,224,211 cores

About 20% of hosts have one or more GPU(s) that may be used

Work is just starting to characterize and prepare to use these resources

Hopefully more volunteers with the GPUs will join when we are ready

Igor Zacharov – CERN/EPFL - August 2017 23

AMD GPU card # entries

Radeon HD 7xxx 6204

Radeon HD 6xxx 66865

Radeon HD 5xxx 8064

Radeon HD 4xxx 6520

Radeon HD 3/2xxx 4252

AMD Other 9298

Total 105785 (11.8%)

NVIDIA GPU card # entries

GeForce GTX 30478

GeForce Other 26360

GeForce 9xxx/8xxx 10126

GeForce GTX 10xx 3235

Quadro 4537

NVIDIA Other 1906

Total 84293 (9.4%)

Page 24: LHC@Home: A BOINC-based volunteer computing infrastructure ...boincfast.ru/files/5815/0522/1474/PRES_v1.1.pdf · A BOINC-BASED VOLUNTEER COMPUTING INFRASTRUCTURE FOR PHYSICS STUDIES

Next challenge: Future Circular Collider

24

Parameters LHC

HL-LHC HE-LHC FCC

CM [TeV] 14 27 100

Circumference [km] 27 27 80 - 100

Dipole Fields [T] 8.33 16 16

Lattice Elements [#] 23000 30000 100000

Luminosity [1034 cm-2s-1] 1 - 5 25 5 – 30

Events/bunch crossing 77 – 135 800 170 - 1000

× 5 Computational Complexity

FCC parameters and LHC comparison

LHC: ~500 µs/particle/turn

FCC: ~2500 µs/particle/turn

LHC@Home ideal framework!

Page 25: LHC@Home: A BOINC-based volunteer computing infrastructure ...boincfast.ru/files/5815/0522/1474/PRES_v1.1.pdf · A BOINC-BASED VOLUNTEER COMPUTING INFRASTRUCTURE FOR PHYSICS STUDIES

Conclusions

CERN is setup for the advance of science

Physics of Elementary Particles and “origin of it all”

Results of experiments and scientific conclusions are obtained after a

lot of computer computations

Some computations are suitable for volunteer’s processing

CERN IT Dep has a dedicated effort to support

for the Experiments (for ATLAS, CMS, LHCb)

for the Theory Department

for the Accelerator physics studies

The calculation of the Beam Dynamics using the Sixtrack

program is essential

There are plans to expand volunteers’ computing use for future studies

Volunteers’ help is essential for the advancement of science

and CERN values a lot this contribution of the volunteers

Igor Zacharov – CERN/EPFL - August 2017 25

Page 26: LHC@Home: A BOINC-based volunteer computing infrastructure ...boincfast.ru/files/5815/0522/1474/PRES_v1.1.pdf · A BOINC-BASED VOLUNTEER COMPUTING INFRASTRUCTURE FOR PHYSICS STUDIES

Igor Zacharov – CERN/EPFL - August 2017 26

Thank you very much for your contribution

and for the continuous support

you have given us over the years

We count on your participation for the future

and would like to advance science with your help