Summary of the HEPiX Autumn 2013 Meeting

13
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/ Summary of the HEPiX Autumn 2013 Meeting Arne Wiebalck Afroditi Xafi Thomas Oulevey CERN ITTF November 22, 2013

description

Summary of the HEPiX Autumn 2013 Meeting. Arne Wiebalck Afroditi Xafi Thomas Oulevey CERN ITTF November 22, 2013. Outline. Miscellaneous Site reports Storage Basic IT Services Computing & Batch Systems IT facilities End User Services Clouds & Virtualisation Networking & Security. - PowerPoint PPT Presentation

Transcript of Summary of the HEPiX Autumn 2013 Meeting

Page 1: Summary of the  HEPiX Autumn 2013 Meeting

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Summary of the HEPiX Autumn 2013 Meeting

Arne Wiebalck

Afroditi Xafi

Thomas Oulevey

CERN ITTF

November 22, 2013

Page 2: Summary of the  HEPiX Autumn 2013 Meeting

Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 2

Outline

• Miscellaneous• Site reports• Storage• Basic IT Services• Computing & Batch Systems• IT facilities• End User Services • Clouds & Virtualisation• Networking & Security

Arne

Afroditi

Thomas

Page 3: Summary of the  HEPiX Autumn 2013 Meeting

Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 3

HEPiX – www.hepix.org

• Global organization of service managers and support staff providing computing facilities for HEP community

• Participating sites include BNL, CERN, DESY,

FNAL, IN2P3, NIKHEF, RAL, SLAC, TRIUMF …

• Meetings are held twice per year– Spring: Europe, Autumn: U.S./Asia

• Exchange of experiences, reports on recent work,work in progress & future plans– Usually no showing-off

Page 4: Summary of the  HEPiX Autumn 2013 Meeting

Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 4

Next HEPiX Meetings

• Spring 2014– LAPP, Annecy, France – May 19 – May 23, 2014

• Autumn 2014– University of Nebraska (NE), U.S.– Final approval needed, dates to be determined

• Spring 2015– U.K. discussed as an option

Page 5: Summary of the  HEPiX Autumn 2013 Meeting

Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 5

HEPiX Autumn 2013

• Oct 28 - Nov 1 at U Michigan, Ann Arbor (MI)– Very well organized, pretty rich program

– Network access: eduroam (as in Bologna)

• 115 (!) registered participants– Europe: 48, U.S./Canada: 47, Asia: 3, Australia: 2 (CERN: 13)

– Many first timers, several North-American WLCG Tier-2 Univ.’s

– DoE labs could mostly participate, only few cancellations (ZFS)

– 15 participants from 9 companies

• 65 presentations from 35 institutes– 26 hours of presentations– Many offline discussions

• Sponsors: WD, UMICH, DDN, NetApp, and Univa

Page 6: Summary of the  HEPiX Autumn 2013 Meeting

Updates from the WGs (1)

• Storage– WG terminated, no summary as Andrei could not participate

• Batch – WG terminated, updates to Wiki will continue

• IPv6– Big ISPs move to IPv6 (CH: >10% of Google traffic already via IPv6)

– CERN seems well prepared, some smaller labs have not even started

– IPv6 support in batch systems?

– A lot of testing ongoing, including the experiments, test bed growing – https://indico.cern.ch/getFile.py/access?contribId=26&sessionId=2&resId=1&materialId=slides&confId=247864

Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 6

Page 7: Summary of the  HEPiX Autumn 2013 Meeting

Updates from the WGs (2)

• Benchmarking– New SPEC CPU benchmark suite planned for Oct 2014– Plan is to start working with the experiments early (to identify apps to validate)

• Bit preservation– New working group led by CERN (German Cancio) and DESY (Dimitry Ozerov)– Follow-up on DPHEP presentation from J. Shiers during Bologna meeting– Focus on technical advice on bit preservation– https://indico.cern.ch/getFile.py/access?contribId=45&sessionId=3&resId=1&materialId=slides&confId=247864

• Configuration Management– No update (chairs could not participate)

• Energy efficiency– On hold for now, little feedback, no interest or no resources?– To be re-discussed in Annecy

Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 7

Page 8: Summary of the  HEPiX Autumn 2013 Meeting

Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 8

Site reports (1)

• Configuration Management (Puppet) “hot topic” – Sites come from Rocks, Quattor, home-grown scripts, …

– Interesting: master-less Puppet at FNAL

– Other sites discuss similar topics as we do (workflow, secrets, …)

– Little synergy in the community so far, WG activity needed!

• Batch system reviews ongoing– Univa GridEngine & HTCondor take the lead

(SLURM did not survive testing at various sites)

– IPv6 and job authentication remain open issues

• Broad use of cloud services & virtualization – Clouds move into production everywhere

– Complete virtualization of services (e.g. AFS at UMICH)

Page 9: Summary of the  HEPiX Autumn 2013 Meeting

Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 9

Site reports (2)

• “Dropbox”-like service at GridKA– For 55’000 users from several universities (10GB quota)

– Powerfolder was picked as their solution

• Lustre/Hadoop established at various sites– Lustre: GSI (10PB), IHEP (3PB), FNAL (0.2PB), JLAB, …– Hadoop: smaller sites, PB installations

• Interest in & investigations around Ceph– Mostly for OpenStack VMs, but also other usage cases (RBD),

backend for dCache, NFS replacement, CASTOR complement …

– Most sites still at an early stage

Page 10: Summary of the  HEPiX Autumn 2013 Meeting

Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 10

Site reports (3)

• Scientific Linux 6– Many sites finished migration (of batch) to SL6: RAL, GridKA, INFN, …

– Significantly improved performance on older systems

Page 11: Summary of the  HEPiX Autumn 2013 Meeting

Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 11

Storage (1)

• dCache update – Support for v4.1/pNFS currently being tested (looks OK)

– xroot and HTTP/WebDAV federations

– Backend testing (DDN, Ceph)

• Summary of FNAL USCMS T1 storage investigation – Seeking solutions for online (2GB, POSIX) and nearline (1TB w/ tape)

– Currently on BlueArc & dCache & Lustre & EOS

– Goal: consolidation of storage solutions

– Evaluated: the current systems plus NetApp, GPFS, Nexsan, SnapScale

– Result: dCache for T1 production, EOS for LPC analysis, HNAS for home

Page 12: Summary of the  HEPiX Autumn 2013 Meeting

Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 12

Storage (2)

• Western Digital on disk drive technology– Giving insights on difficulties when doing macroscopic mechanics on

nano-scale• Platter ‘non-flatness’ plus unequal lube distribution can cause problems • Heads usually fly at 10nm and “descend” to ~2nm for actual I/O (by thermal expansion!)

– Introducing a new reliability metric (MPbF): disk failure rate dependent on load (not on power-on-hours)

– http://indico.cern.ch/getFile.py/access?contribId=37&sessionId=3&resId=3&materialId=slides&confId=247864

• 3 presentations on AFS– OpenAFS status report

• 1.6 released in Sep 2011, slow (server-side) uptake • Security advisories

– YFS : new security, new Rx (WAN), IPv4/IPv6, limits removed, …

– Summary of IPv6 investigations & survey, concluding that dual-stack seems to be solution to “IPv6/AFS issue”

Page 13: Summary of the  HEPiX Autumn 2013 Meeting

Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 13

Questions?

• “We built the first data centre with heaters!”(from Ulf Tigerstedt’s presentation on building the Kajaani DC )

•“Controlling a disk head is like flying a Jumbo 747 above a highway at a distance of less than 1 inch for 5 years!”(from Amit Chattopadhyay’s presentation on Disk Load Monitoring)