BABAR D PRESERVATION ACCESS · 2016-05-26 · • 54 batch and storage servers – Dell R510: dual...

BABAR DATA PRESERVATION

AND ACCESS

Concetta Cartaro SLAC

May 26th, 2016

Stable matter

… All other particles are created in cosmological events, at Big Bang or are… man-made using particle accelerators!

Standard Model of fundamental particles

BREAKING THE SYMMETRY The BABAR experiment aims to study the different behavior of matter and antimatter under the same fundamental forces. Such difference, called CP symmetry violation, is important to explain the lack of antimatter in our universe. (CP = Charge and Parity conjugation)

http://www-public.slac.stanford.edu/babar/Purpose.aspx

C. Cartaro 2

The LINAC injects electrons and anti-electrons (positrons) into the PEP-II storage rings. With its 2 miles it is the longest linear accelerator in the world.

PEP-II storage rings: electrons and anti-electrons travel in opposite directions and finally collide inside the BABAR particle detector. PEP-II rings are 1.4 miles long.

50 Mega Watts of power are needed to operate the accelerator and the detector.

1700 bunches circulate in the beam pipes at any time. Each bunch contains 80 billion electrons or positrons travelling at 99.99% light speed.

Bunches collide head-on every 4.2 nanoseconds.

SLAC LINAC AND PEP II

C. Cartaro 3

The BABAR detector: Nuclear Inst. and Methods in Physics Research, A (2013), pp. 615-701 DOI information: 10.1016/j.nima.2013.05.107 (arXiv:1305.3560)

BABAR is a 1200 ton

general purpose particle detector,

20 feet high and 20 feet long.

800 tons of its weight are iron.

C. Cartaro 4

From October 1999 to April 2008 BABAR collected billions of physics events each corresponding to a e+ e– annihilation process at an energy corresponding to the production threshold of a (4S) resonance, a quark b and anti-quark b bound state.

C. Cartaro 5

C. Cartaro 6

The final dataset contains over 9.1x109 physics events and 27x109 simulated events in 1.6x106 files and over 2.7 PB of storage, including the raw data and the latest data reprocessing (6PB from all previous data processing cycles).

Fun fact: http://www2.slac.stanford.edu/tip/2004/may21/database.htm

BABAR COLLABORATION

• ~270 members from 67 institutions in 13 countries.

– Plus ~100 associates

• The BABAR physics program includes the study of the nature of matter and antimatter, the properties and interactions of the particles known as quarks and leptons, and searches for new physics, including searches for Dark Matter and light Higgs bosons.

C. Cartaro 7

http://www-public.slac.stanford.edu/babar/

ENORMOUS WEALTH OF WORLD CLASS PHYSICS

C. Cartaro 8

Search for a light Higgs decaying to two gluons or ss in the radiative decays of ϒ(1S) PRD 88, 031701 (2013)

BD(*) result incompatible with SM and 2‐Higgs Doublet Model Type II, excluded at >3σ on the whole tanβ‐mH plane.

PRL 109, 101802 (2012),

PRD 88, 072012 (2013) in addition to Sept. 2015 Scientific American article

PRD 88 (2013) 032013

PRL 103 (2009) 231801

ENORMOUS WEALTH OF WORLD CLASS PHYSICS

C. Cartaro 9

558 papers published to date and tens of analyses aiming to publication still in the pipeline…

… but very limited funding and support.

BABAR DATA • BABAR (and Japan’s experiment Belle) data will not be

superseded by CERN LHC data. – Match with Belle II

data taking schedule • Belle data will become part of

Belle II dataset

– Some datasets expected to remain unique for longer:

• Y(3S) dataset for BABAR

• Y(5S) dataset for Belle

C. Cartaro 10

http://www-superkekb.kek.jp/documents/luminosityProjection150319.pdf

LONG TERM DATA ACCESS

• The BABAR Long Term Data Access (LTDA) project is dedicated to the preservation of the BABAR data access and computing environment until at least 2018

– May be adjusted depending on Belle II schedule (now aim to 2020)

– Provides support of data, code, repositories, databases, documentation, storage, and CPU capacity

C. Cartaro 11

Enclose BABAR Analysis Framework into a frozen environment Keep all potential of the Framework including discovery potential Simpler to maintain documentation and provide support

Open formats ROOT data format MySQL databases Code written in open formats: C/C++, TCL, Perl, Python Data management and distribution (reach data transparently on disk, tape, or network) via XRootD

http://xrootd.slac.stanford.edu/

http://root.cern.ch/

https://www.python.org/

https://www.perl.org/

https://isocpp.org/

http://www.tcl.tk/

http://www.mysql.com

VIRTUALIZATION TO THE RESCUE

C. Cartaro 12

Minimize the effort needed to maintain the system Tools validations, upgrades , documentation, hardware

Still not very easy on the long run Hardware support and lifecycle Maintain out of date OSs

Potential security risks Need to keep know-how about the old OS and the Framework

LTDA CLUSTER • The central element of the BABAR LTDA project is an

integrated cluster of computation and storage resources. The cluster uses virtualization technologies to ensure continued operation with future hardware and software platforms, and utilizes distributed computation and storage methods for scalability.

C. Cartaro 13

The LTDA is in production mode since March 21st 2012

IN DEPTH: LTDA CLUSTER • Cisco 6506 network switch with 2x10Gb link card and 192 x 1Gb ports

• 9 infrastructure servers (Dell R410/R510)

• 54 batch and storage servers

– Dell R510: dual 6-core Intel Xeon X5675, 3.07GHz, 48GB RAM, 12x2TB disks

• 11x2TB disks used to stage data through XROOTD

• 1x2TB used as local scratch

• 24 cores with hyper threading (one VM per core up to 22 VMs/host)

• 1 physical core used for the host itself and XROOTD

• 20 batch servers (no XROOTD)

– Dell R410: dual 6-core Intel Xeon X5675, 3.07GHz, 48GB RAM, 2x2TB disks mirrored (for OS + local scratch)

– 24 cores used to run batch jobs (VMs)

• 2 NFS servers

– Sun X4540 Thor server, 12 cores, 32 GB RAM and 32TB

– One for local home directories and code repositories and one for user data

• Robust backup by using a combination of ZFS snapshots and tape backup

• All the data is stored on tape (HPSS) and the most used data is staged on demand on the local disks via xrootd

C. Cartaro 14

A simple unique solution Mix CPU and storage resources on the same node Each machine is a simple building block Standalone capacity

Scalable and portable design All inclusive design easy to reproduce at any scale in any of our collaborating institutions in the world

Security and flexibility all in one Stable against disk or node loss Isolation of all back versioned elements Completely on demand

A MATCH FOR OUR NEEDS

C. Cartaro 15

1.2 PB XRootD-based storage for physics data 64TB ZFS-based storage for user data 3.5 TB of RAM 1668 job slots SL4, SL5, SL6 platforms available

LTDA FACTS

C. Cartaro 16

bbrltda01

bbrltda02

bbrltda03

BBRLTDA Login Pool

Router/ Firewall

Three subnets: 1. Login 2. Infrastructure 3. Virtual machines The load balanced login servers

are the only point of access to the cluster for the users

BBR-LTDA-VM

BBR-LTDA-SRV

Red Hat 6 Hosts

LTDA NETWORK

C. Cartaro 17

The infrastructure servers provide the internal services: identification, batch scheduler, databases, NFS space, …

LDAP,NTP,DNS,DHCP (Primary)

MySQL (Master)

PBS, Maui, XROOTD, cron

Infrastructure servers

MySQL (Slave)

Test Code Repositories, Home Directories

User and Production Areas

Cron and Batch Server

Identification and Network Services Servers

Database Servers NFS Servers Test Server

LDAP,NTP,DNS,DHCP (Secondary)

Router/ Firewall

Red Hat 6 Hosts C. Cartaro 18

The batch servers provide the disks for the local XRootD cluster that distributes the data to the user applications running on the virtual machines, and the cores to run the VMs. The batch servers are connected to the infrastructure network while the VMs are connected through a virtual bridge to the VM network, separated from the others by firewall rules implemented in the router.

Virtual Bridge

2TB IaaS Client

x24 x20

Virtual Bridge

24TB IaaS Client

XROOTD

Batch and XROOTD servers Intel(R) Xeon(R) CPU X5670 @ 2.93GHz

Router/ Firewall

VM Guest SL4/SL5/SL6

Virtual Bridge

2TB IaaS Client

x24 x20

Virtual Bridge

24TB IaaS Client

XROOTD

VM Guest SL4/SL5/SL6

Router/ Firewall

LDAP,NTP,DNS,DHCP (Primary)

MySQL (Master)

PBS, Maui, XROOTD, cron

Infrastructure servers

MySQL (Slave)

Test Code Repositories, Home Directories

User and Production Areas

Cron and Batch Server

Identification and Network Services Servers

Database Servers NFS Servers Test Server

LDAP,NTP,DNS,DHCP (Secondary) bbrltda01

bbrltda02

bbrltda03

BBRLTDA Login Pool

VIRTUALIZATION AND SECURITY • Use of system images on VMs solves the system administration problem

– Easier management of a small number of OS images.

– Physical hosts centrally managed by SLAC Computing Division

• Security threat associated to a VM connected to a network running old OS

Risk based approach: assume that the frozen outdated systems that can be easily compromised are actually compromised.

• Prevent modification or deletion of the data including modifications that would allow systems outside of the LTDA to be compromised. – Isolation of back versioned components with firewall rules

• On demand creation of VMs from read-only images adds a small layer of security by avoiding the compromised elements from being persisted beyond the VM destruction

– Images are read-only, qcow2 produces a temporary file with changes to OS and it is deleted when the VM is shut down

C. Cartaro 21

FIREWALL RULES • VMs are not allowed to connect to

SLAC network or the world. The Login network is protected from the VM network – Allow one way ssh from Login to VM

network

• Password less connection via shosts

– VMs are not allowed to write over the Login network

• Well defined services between VM network and SRV network – Infrastructure (DNS, LDAP, NTP), file

service (Xrootd, nfs), batch scheduling

– LDAP is a subset of the SLAC Kerberos list mapped on NFS internal home directories

• Allow SRV and Login networks use SLAC infrastructure

C. Cartaro 22

A SIMPLE “CLOUD”

• PBS/Torque is used to manage the batch resources and Maui is the batch scheduler – Moved away from Condor and Nimbus due to their instabilities

• The virtualization layer uses qemu with kvm support directly – Moved away from libvirt due to instability

• Resources – Each batch server has 12 physical cores of which one is dedicated to the host itself

and the XROOTD service. The other 11 cores (22 logical cores) are used to run VMs – Each batch server has 12x2TB disks of which 11 are dedicated to XROOTD (no raid,

all data is recoverable from tape) and one disk is dedicated to use for the copy-on-write VM images (OS and scratch)

• Hyper-threading – Performance tests ( few slides ahead)

C. Cartaro 23

JOB SUBMISSION • PBS/Torque is used to manage the

batch resources and Maui is the batch scheduler.

• PBS Prologues and Epilogues scripts are used to create and destroy the VMs and the needed network environment

– Home grown system developed by BaBar

– Create the network interface for the VMs • 24 MAC addresses per host and

usage status stored in local DB

C. Cartaro 24

CPU PERFORMANCE • VMs vs. bare metal (SLAC

batch queue) – CPU X5670 @ 2.93GHz vs.

(x5355 @ 2.66GHz and x5570 @2.93GHz mixed)

– RH5, Hyper-threading off 2.7% in favor of LTDA

• Hyper-threading on/off – 40% slower with HT on, but SL6

faster than SL5 by 35% CPU time, 15% wall time

C. Cartaro 25

I/O PERFORMANCE

C. Cartaro 26

• I/O Performance test for XROOTD – Test LTDA XROOTD installation for extreme

load (how much data can be delivered to clients processes) and scalability while monitoring resource (memory and CPU) utilization • The cluster architecture can deliver up to 12GB/s

of data to its clients and exceeds any possible demand of BABAR applications. It can be inferred that the cluster, in its present setup, could scale up to a factor 5

• 48GB of RAM/server

– 24 VMs with 2GB RAM

– 22 VMs on machines with storage space

• One physical core left for xrootd

• RAM is also needed for the system itself

• Deduplication for identical blocks already used on filesystems

• “Kernel Samepage Merging” (KSM) introduced in kernel 2.6.32

– Same memory pages are merged together into a single one among different processes!

– most effective for a lot of identical processes

• The VMs!

On a sample analysis with about 1500 parallel jobs the total memory usage was reduced from 2.1TB to 1.2TB! Freed memory is used for caching files, but can also be used for memory intensive jobs.

MEMORY USAGE

C. Cartaro 27

PROBLEMS, SOLUTIONS AND… • System updates used to be delivered to all

hosts automatically. This caused long outages in in the past. – For example:

• Kernel update with network bug: VMs not reachable

• Auto mount bug caused crashes when used with LDAP

– A validation system now allows to test the updates before delivering them to all hosts • Ltda-test server available for testing,

validating and releasing updates to the whole cluster

• Remove not essential software packages to reduce the list of updates

• Intrinsic dependency between services running on different machines forced the boot order of the servers – Often caused delays during outages – Removing all the dependencies and use of

auto mount on all servers made the cluster independent from the boot order of the single machines

• Monitoring – System and batch queues check

• Systems, disks, connectivity, services, queues and users’ job stats are regularly tested and monitored and the results summarized and displayed on the web

• In case of problems email alerts are generated

C. Cartaro 28

…STABILITY

• The cluster is able to run for long periods of time without human intervention and it is able to recover even from complete outages with minimal or no human help.

C. Cartaro 29

Build light weight VMs for each used platform

Latest code release included Running under the most common virtual machine players on your laptop

… Just add the data

BABAR POCKET VERSION

C. Cartaro 30

BABAR –To–Go

http://xrootd.slac.stanford.edu/

On demand sharing of BABAR data among collaborating institutions through the creation of a Federated XRootD Dataset accessible by all our VMs in the world. • Very similar to what ATLAS and CMS already have

Test installations ready at SLAC, GridKa (DE), and other BaBar institutions • Installation package ready and easy to use

Basic monitoring tools also included in the package

BABAR XROOTD FEDERATION

C. Cartaro 31

• Requirements: a small Linux machine with few 100s GB or few TB of space • Easy installation package • Little or no manpower needed, just make sure

that the machine is up and connected to the network!

FEDERATED DATASETS

C. Cartaro 32

All BABAR data are distributed among BABAR institutions Some data are at all institutions

Conditions files, for example

No whole physics skims at a single place

For performance reasons

Jobs don’t need to know where the data are

Jobs can just access data files instantly like always before

All data files have to be distributed redundantly in the cluster

institutions can use simple desktop machines with large disks no need for RAID

This IS NOT data preservation!

DOCUMENTATION • All the most used and fundamental information has been checked, updated and

moved to a Media Wiki server, the BABAR WIKI – Old pages clearly marked but kept online for archival purposes – Pages that will supposedly never change again are left in their original location

• The wiki pages undergo a Collaboration wide review process and experts sign-off on the content of migrated/updated pages before they are officially released – The Documentation Working Group members act as moderators to avoid duplication of

information and proliferation of cross references – Pages reviewed periodically

• Workbook: A step by step tutorial from the login to the analysis of the final rootuples • Technical pages: Tracking, trigger, particle ID, data and datasets, …

– Personal pages and Notebooks • Used as logbooks during an analysis to show plots, make to-do lists, write/receive comments, …

• Solidity of the documentation tested by new users C. Cartaro 33

LONG TERM • Toward a new operating model

– All the services now running on physical hardware will soon run on virtual machines – This will enable us to prolong the availability of the services beyond the lifetime of the

hardware itself • OpenStack is being adopted at SLAC and this could open a number of possibilities for

managing centrally all our virtual machines

• Tapes – We are currently using 2.7 PB of tape to store our data: raw data + last two data

processing cycles • All data is on StorageTek T10000-B (1TB) tapes while SLAC migrated to T10000-C (5 TB)

– Current finances will allow us to migrate 2PB to the next generation of media, T10000-D (8.5TB) • Physically the same media of the T10000-C but reformatted to allow for more capacity

– SLAC will soon migrate to T10000-D types reaching the 100PB total capacity) – Oldest data processing will not be migrated (D drives can read A/B/C tapes) and will

eventually be dropped C. Cartaro 34

A WIDER HORIZON

• DPHEP, Data Preservation in High Energy Physics, born as a study group is now a Collaboration for Data Preservation and Long Term Analysis in High Energy Physics

– https://hep-project-dphep-portal.web.cern.ch/

– Events: https://indico.cern.ch/category/4458/

C. Cartaro 35

Horizon2020 https://ec.europa.eu/programmes/horizon2020/en/what-horizon-2020

MORE THAN BITS

• Data is not just the “data”, it is really the data, the software (with the environment), and the documentation

• Without preserving all of it, preserving the data could be meaningless

• Other parameters for preservation – Use cases and target communities

• New analyses or cross checks of new discoveries, education, outreach, …

– Uniqueness – Re-use – Discoverability, accessibility, … – …

C. Cartaro 36

FUTURE OF DATA PRESERVATION

• Data Storage – Primary copy is on tape

• BaBar data are at SLAC and a second copy is hosted at CC-IN2P3, Lyon, France

– Migration to new media every ~5 years • More efficient, more reliable, not necessarily less expensive • For how long tapes will be a solution? One, two decades, or more? • What’s next? Cloud storage? Hybrid solutions?

• Software environment – Virtualization has proved its worth but it is also evolving – What if we face a disruptive change during the next decade or so?

• Documentation – Wikis and traditional HTML could actually be a viable solution for short and

longer term preservation

C. Cartaro 37

BECAUSE PHYSICS IS NOT SCALE INVARIANT

• Preserving 2PB now can be difficult. But will it be still true when the HEP experiments will face a problem a 1000 times larger, at the Exabyte scale?

C. Cartaro 38

In more than one way

Computing: new records broken !

Data in Tier-0 vs time

Data transfer

20 GB/s x10 design !

WLCG today: ~ 170 sites (40 countries)

~ 500k CPU cores, 500 PB storage > 2 million jobs/days, 10-100 Gb links

Longer-term future: hybrid cloud model ? (commercial cloud services, public e-infrastructure,

in-house IT resources) à Model will be tested with HNSciCloud H2020

project (CERN, Tiers-1, many European HEP FA, EMBL, …).

70% of funds (~4.7 M€) from EC. Started 1 Jan

Present model should work for Runs 2-3

Courtesy of Jamie Shiers, DPHEP & CERN, Cambridge Big Data Workshop, March 2016

THANKS! • LTDA developers

– Coordinator: Tina Cartaro – BaBar software: Homer Neal – Development and system administration: Marcus

Ebert – Network design: Steffen Luitz – Virtualization: Kyle Fransham, Marcus Ebert – System performance and CDB: Igor Gaponenko – Databases, tools and production: Douglas Smith,

Tim Adye – Xrootd: Wilko Kroeger – Computing Division experts

• System setup and administration: Booker Bense, Lance Nakata, Randall Radmer and all the SLAC Unix-Admin team

• Network setup: Antonio Ceseracciu • BaBar-SLAC Computing Division liaison: Len Moss,

Andrew May

C. Cartaro 39

• Particle Windchime – Map detector and particle properties (momentum,

detector hits, particle-ID, etc.) onto sonic characteristics (volume, pitch, timbre, etc.). This experience was featured in a BBC science podcast and in Symmetry Magazine.

• The Particle Windchime was originally created by a team of hackers at Science Hack Day SF and continues to be maintained by Matt Bellis

• More sounds: – http://www.mattbellis.com/windchime/related-media/

BABAR D PRESERVATION ACCESS · 2016-05-26 · • 54 batch and storage servers – Dell R510: dual...

Documents

Transcript of BABAR D PRESERVATION ACCESS · 2016-05-26 · • 54 batch and storage servers – Dell R510: dual...

Magnetic Disks (1)

01 Disks Files

AttractSPE™ Disks · AttractSPE™ Disks 4 AttractSPE™Disks Environment 8 AttractSPE™Disks BioMol 10 AttractSPE™Prefilters 12 SPE accessories - SPE Disks manifolds 13 AttractSPE™Disk

Floppy disks

Protoplanetary Disks as Accretion Disks

Physical Disks

Module 4: Managing Disks. Overview Working with Disk Management Working with Basic Disks Working with Dynamic Disks Preparing Disks when Upgrading to.

Searl Antigravity Disks

CENG 3511 Secondary Storage Devices: Magnetic Disks Optical Disks Floppy Disks Magnetic Tapes.

Compact disks

rotating disks

Work With Disks

Hard Disks 15

HD 100453 An Evolutionary Link Between Protoplanetary Disks and Debris Disks.

RAID Redundant Array of Independent Disks Redundant Array of Inexpensive Disks

Avid 48GB RAM Upgrade Guide Rev B

Introduction Introduction Types of Secondary storage devices Types of Secondary storage devices Floppy Disks Floppy Disks Hard Disks Hard Disks.

NITROCEF DISKS™

SMT H-disks

Disks and Formatting