Post on 21-Feb-2020
BABAR DATA PRESERVATION
AND ACCESS
Concetta Cartaro SLAC
XLDB
May 26th, 2016
Stable matter
… All other particles are created in cosmological events, at Big Bang or are… man-made using particle accelerators!
Standard Model of fundamental particles
BREAKING THE SYMMETRY The BABAR experiment aims to study the different behavior of matter and antimatter under the same fundamental forces. Such difference, called CP symmetry violation, is important to explain the lack of antimatter in our universe. (CP = Charge and Parity conjugation)
http://www-public.slac.stanford.edu/babar/Purpose.aspx
C. Cartaro 2
The LINAC injects electrons and anti-electrons (positrons) into the PEP-II storage rings. With its 2 miles it is the longest linear accelerator in the world.
PEP-II storage rings: electrons and anti-electrons travel in opposite directions and finally collide inside the BABAR particle detector. PEP-II rings are 1.4 miles long.
50 Mega Watts of power are needed to operate the accelerator and the detector.
1700 bunches circulate in the beam pipes at any time. Each bunch contains 80 billion electrons or positrons travelling at 99.99% light speed.
Bunches collide head-on every 4.2 nanoseconds.
SLAC LINAC AND PEP II
C. Cartaro 3
The BABAR detector: Nuclear Inst. and Methods in Physics Research, A (2013), pp. 615-701 DOI information: 10.1016/j.nima.2013.05.107 (arXiv:1305.3560)
BABAR is a 1200 ton
general purpose particle detector,
20 feet high and 20 feet long.
800 tons of its weight are iron.
C. Cartaro 4
From October 1999 to April 2008 BABAR collected billions of physics events each corresponding to a e+ e– annihilation process at an energy corresponding to the production threshold of a (4S) resonance, a quark b and anti-quark b bound state.
C. Cartaro 5
C. Cartaro 6
The final dataset contains over 9.1x109 physics events and 27x109 simulated events in 1.6x106 files and over 2.7 PB of storage, including the raw data and the latest data reprocessing (6PB from all previous data processing cycles).
Fun fact: http://www2.slac.stanford.edu/tip/2004/may21/database.htm
BABAR COLLABORATION
• ~270 members from 67 institutions in 13 countries.
– Plus ~100 associates
• The BABAR physics program includes the study of the nature of matter and antimatter, the properties and interactions of the particles known as quarks and leptons, and searches for new physics, including searches for Dark Matter and light Higgs bosons.
C. Cartaro 7
http://www-public.slac.stanford.edu/babar/
ENORMOUS WEALTH OF WORLD CLASS PHYSICS
C. Cartaro 8
Search for a light Higgs decaying to two gluons or ss in the radiative decays of ϒ(1S) PRD 88, 031701 (2013)
BD(*) result incompatible with SM and 2‐Higgs Doublet Model Type II, excluded at >3σ on the whole tanβ‐mH plane.
PRL 109, 101802 (2012),
PRD 88, 072012 (2013) in addition to Sept. 2015 Scientific American article
PRD 88 (2013) 032013
PRL 103 (2009) 231801
2015
2014
ENORMOUS WEALTH OF WORLD CLASS PHYSICS
C. Cartaro 9
558 papers published to date and tens of analyses aiming to publication still in the pipeline…
… but very limited funding and support.
BABAR DATA • BABAR (and Japan’s experiment Belle) data will not be
superseded by CERN LHC data. – Match with Belle II
data taking schedule • Belle data will become part of
Belle II dataset
– Some datasets expected to remain unique for longer:
• Y(3S) dataset for BABAR
• Y(5S) dataset for Belle
C. Cartaro 10
http://www-superkekb.kek.jp/documents/luminosityProjection150319.pdf
LONG TERM DATA ACCESS
• The BABAR Long Term Data Access (LTDA) project is dedicated to the preservation of the BABAR data access and computing environment until at least 2018
– May be adjusted depending on Belle II schedule (now aim to 2020)
– Provides support of data, code, repositories, databases, documentation, storage, and CPU capacity
C. Cartaro 11
Enclose BABAR Analysis Framework into a frozen environment Keep all potential of the Framework including discovery potential Simpler to maintain documentation and provide support
Open formats ROOT data format MySQL databases Code written in open formats: C/C++, TCL, Perl, Python Data management and distribution (reach data transparently on disk, tape, or network) via XRootD
http://xrootd.slac.stanford.edu/
http://root.cern.ch/
https://www.python.org/
https://www.perl.org/
https://isocpp.org/
http://www.tcl.tk/
http://www.mysql.com
VIRTUALIZATION TO THE RESCUE
C. Cartaro 12
Minimize the effort needed to maintain the system Tools validations, upgrades , documentation, hardware
Still not very easy on the long run Hardware support and lifecycle Maintain out of date OSs
Potential security risks Need to keep know-how about the old OS and the Framework
LTDA CLUSTER • The central element of the BABAR LTDA project is an
integrated cluster of computation and storage resources. The cluster uses virtualization technologies to ensure continued operation with future hardware and software platforms, and utilizes distributed computation and storage methods for scalability.
C. Cartaro 13
The LTDA is in production mode since March 21st 2012
IN DEPTH: LTDA CLUSTER • Cisco 6506 network switch with 2x10Gb link card and 192 x 1Gb ports
• 9 infrastructure servers (Dell R410/R510)
• 54 batch and storage servers
– Dell R510: dual 6-core Intel Xeon X5675, 3.07GHz, 48GB RAM, 12x2TB disks
• 11x2TB disks used to stage data through XROOTD
• 1x2TB used as local scratch
• 24 cores with hyper threading (one VM per core up to 22 VMs/host)
• 1 physical core used for the host itself and XROOTD
• 20 batch servers (no XROOTD)
– Dell R410: dual 6-core Intel Xeon X5675, 3.07GHz, 48GB RAM, 2x2TB disks mirrored (for OS + local scratch)
– 24 cores used to run batch jobs (VMs)
• 2 NFS servers
– Sun X4540 Thor server, 12 cores, 32 GB RAM and 32TB
– One for local home directories and code repositories and one for user data
• Robust backup by using a combination of ZFS snapshots and tape backup
• All the data is stored on tape (HPSS) and the most used data is staged on demand on the local disks via xrootd
C. Cartaro 14
A simple unique solution Mix CPU and storage resources on the same node Each machine is a simple building block Standalone capacity
Scalable and portable design All inclusive design easy to reproduce at any scale in any of our collaborating institutions in the world
Security and flexibility all in one Stable against disk or node loss Isolation of all back versioned elements Completely on demand
A MATCH FOR OUR NEEDS
C. Cartaro 15
1.2 PB XRootD-based storage for physics data 64TB ZFS-based storage for user data 3.5 TB of RAM 1668 job slots SL4, SL5, SL6 platforms available
LTDA FACTS
C. Cartaro 16
bbrltda01
bbrltda02
bbrltda03
BBRLTDA Login Pool
SLAC
Router/ Firewall
Three subnets: 1. Login 2. Infrastructure 3. Virtual machines The load balanced login servers
are the only point of access to the cluster for the users
BBR-LTDA-VM
BB
R-L
TDA
-LO
GIN
BBR-LTDA-SRV
Red Hat 6 Hosts
LTDA NETWORK
C. Cartaro 17
The infrastructure servers provide the internal services: identification, batch scheduler, databases, NFS space, …
LDAP,NTP,DNS,DHCP (Primary)
MySQL (Master)
PBS, Maui, XROOTD, cron
Infrastructure servers
MySQL (Slave)
Test Code Repositories, Home Directories
User and Production Areas
Cron and Batch Server
Identification and Network Services Servers
Database Servers NFS Servers Test Server
LDAP,NTP,DNS,DHCP (Secondary)
SLAC
Router/ Firewall
Red Hat 6 Hosts C. Cartaro 18
The batch servers provide the disks for the local XRootD cluster that distributes the data to the user applications running on the virtual machines, and the cores to run the VMs. The batch servers are connected to the infrastructure network while the VMs are connected through a virtual bridge to the VM network, separated from the others by firewall rules implemented in the router.
x54
NIC
VM
Virtual Bridge
NIC
2TB IaaS Client
x24 x20
NIC
VM
Virtual Bridge
NIC
24TB IaaS Client
x22
XROOTD
Batch and XROOTD servers Intel(R) Xeon(R) CPU X5670 @ 2.93GHz
SLAC
Router/ Firewall
VM Guest SL4/SL5/SL6
Red Hat 6 Hosts C. Cartaro 19
x54
NIC
VM
Virtual Bridge
NIC
2TB IaaS Client
x24 x20
NIC
VM
Virtual Bridge
NIC
24TB IaaS Client
x22
XROOTD
VM Guest SL4/SL5/SL6
SLAC
Router/ Firewall
LDAP,NTP,DNS,DHCP (Primary)
MySQL (Master)
PBS, Maui, XROOTD, cron
Infrastructure servers
MySQL (Slave)
Test Code Repositories, Home Directories
User and Production Areas
Cron and Batch Server
Identification and Network Services Servers
Database Servers NFS Servers Test Server
LDAP,NTP,DNS,DHCP (Secondary) bbrltda01
bbrltda02
bbrltda03
BBRLTDA Login Pool
Red Hat 6 Hosts C. Cartaro 20
VIRTUALIZATION AND SECURITY • Use of system images on VMs solves the system administration problem
– Easier management of a small number of OS images.
– Physical hosts centrally managed by SLAC Computing Division
• Security threat associated to a VM connected to a network running old OS
Risk based approach: assume that the frozen outdated systems that can be easily compromised are actually compromised.
• Prevent modification or deletion of the data including modifications that would allow systems outside of the LTDA to be compromised. – Isolation of back versioned components with firewall rules
• On demand creation of VMs from read-only images adds a small layer of security by avoiding the compromised elements from being persisted beyond the VM destruction
– Images are read-only, qcow2 produces a temporary file with changes to OS and it is deleted when the VM is shut down
C. Cartaro 21
FIREWALL RULES • VMs are not allowed to connect to
SLAC network or the world. The Login network is protected from the VM network – Allow one way ssh from Login to VM
network
• Password less connection via shosts
– VMs are not allowed to write over the Login network
• Well defined services between VM network and SRV network – Infrastructure (DNS, LDAP, NTP), file
service (Xrootd, nfs), batch scheduling
– LDAP is a subset of the SLAC Kerberos list mapped on NFS internal home directories
• Allow SRV and Login networks use SLAC infrastructure
C. Cartaro 22
A SIMPLE “CLOUD”
• PBS/Torque is used to manage the batch resources and Maui is the batch scheduler – Moved away from Condor and Nimbus due to their instabilities
• The virtualization layer uses qemu with kvm support directly – Moved away from libvirt due to instability
• Resources – Each batch server has 12 physical cores of which one is dedicated to the host itself
and the XROOTD service. The other 11 cores (22 logical cores) are used to run VMs – Each batch server has 12x2TB disks of which 11 are dedicated to XROOTD (no raid,
all data is recoverable from tape) and one disk is dedicated to use for the copy-on-write VM images (OS and scratch)
• Hyper-threading – Performance tests ( few slides ahead)
C. Cartaro 23
JOB SUBMISSION • PBS/Torque is used to manage the
batch resources and Maui is the batch scheduler.
• PBS Prologues and Epilogues scripts are used to create and destroy the VMs and the needed network environment
– Home grown system developed by BaBar
– Create the network interface for the VMs • 24 MAC addresses per host and
usage status stored in local DB
C. Cartaro 24
CPU PERFORMANCE • VMs vs. bare metal (SLAC
batch queue) – CPU X5670 @ 2.93GHz vs.
(x5355 @ 2.66GHz and x5570 @2.93GHz mixed)
– RH5, Hyper-threading off 2.7% in favor of LTDA
• Hyper-threading on/off – 40% slower with HT on, but SL6
faster than SL5 by 35% CPU time, 15% wall time
C. Cartaro 25
I/O PERFORMANCE
C. Cartaro 26
• I/O Performance test for XROOTD – Test LTDA XROOTD installation for extreme
load (how much data can be delivered to clients processes) and scalability while monitoring resource (memory and CPU) utilization • The cluster architecture can deliver up to 12GB/s
of data to its clients and exceeds any possible demand of BABAR applications. It can be inferred that the cluster, in its present setup, could scale up to a factor 5
• 48GB of RAM/server
– 24 VMs with 2GB RAM
– 22 VMs on machines with storage space
• One physical core left for xrootd
• RAM is also needed for the system itself
• Deduplication for identical blocks already used on filesystems
• “Kernel Samepage Merging” (KSM) introduced in kernel 2.6.32
– Same memory pages are merged together into a single one among different processes!
– most effective for a lot of identical processes
• The VMs!
On a sample analysis with about 1500 parallel jobs the total memory usage was reduced from 2.1TB to 1.2TB! Freed memory is used for caching files, but can also be used for memory intensive jobs.
MEMORY USAGE
C. Cartaro 27
PROBLEMS, SOLUTIONS AND… • System updates used to be delivered to all
hosts automatically. This caused long outages in in the past. – For example:
• Kernel update with network bug: VMs not reachable
• Auto mount bug caused crashes when used with LDAP
– A validation system now allows to test the updates before delivering them to all hosts • Ltda-test server available for testing,
validating and releasing updates to the whole cluster
• Remove not essential software packages to reduce the list of updates
• Intrinsic dependency between services running on different machines forced the boot order of the servers – Often caused delays during outages – Removing all the dependencies and use of
auto mount on all servers made the cluster independent from the boot order of the single machines
• Monitoring – System and batch queues check
• Systems, disks, connectivity, services, queues and users’ job stats are regularly tested and monitored and the results summarized and displayed on the web
• In case of problems email alerts are generated
C. Cartaro 28
…STABILITY
• The cluster is able to run for long periods of time without human intervention and it is able to recover even from complete outages with minimal or no human help.
C. Cartaro 29
Build light weight VMs for each used platform
Latest code release included Running under the most common virtual machine players on your laptop
… Just add the data
BABAR POCKET VERSION
C. Cartaro 30
BABAR –To–Go
http://xrootd.slac.stanford.edu/
On demand sharing of BABAR data among collaborating institutions through the creation of a Federated XRootD Dataset accessible by all our VMs in the world. • Very similar to what ATLAS and CMS already have
Test installations ready at SLAC, GridKa (DE), and other BaBar institutions • Installation package ready and easy to use
Basic monitoring tools also included in the package
BABAR XROOTD FEDERATION
C. Cartaro 31
• Requirements: a small Linux machine with few 100s GB or few TB of space • Easy installation package • Little or no manpower needed, just make sure
that the machine is up and connected to the network!
FEDERATED DATASETS
C. Cartaro 32
All BABAR data are distributed among BABAR institutions Some data are at all institutions
Conditions files, for example
No whole physics skims at a single place
For performance reasons
Jobs don’t need to know where the data are
Jobs can just access data files instantly like always before
All data files have to be distributed redundantly in the cluster
institutions can use simple desktop machines with large disks no need for RAID
This IS NOT data preservation!
DOCUMENTATION • All the most used and fundamental information has been checked, updated and
moved to a Media Wiki server, the BABAR WIKI – Old pages clearly marked but kept online for archival purposes – Pages that will supposedly never change again are left in their original location
• The wiki pages undergo a Collaboration wide review process and experts sign-off on the content of migrated/updated pages before they are officially released – The Documentation Working Group members act as moderators to avoid duplication of
information and proliferation of cross references – Pages reviewed periodically
• Workbook: A step by step tutorial from the login to the analysis of the final rootuples • Technical pages: Tracking, trigger, particle ID, data and datasets, …
– Personal pages and Notebooks • Used as logbooks during an analysis to show plots, make to-do lists, write/receive comments, …
• Solidity of the documentation tested by new users C. Cartaro 33
LONG TERM • Toward a new operating model
– All the services now running on physical hardware will soon run on virtual machines – This will enable us to prolong the availability of the services beyond the lifetime of the
hardware itself • OpenStack is being adopted at SLAC and this could open a number of possibilities for
managing centrally all our virtual machines
• Tapes – We are currently using 2.7 PB of tape to store our data: raw data + last two data
processing cycles • All data is on StorageTek T10000-B (1TB) tapes while SLAC migrated to T10000-C (5 TB)
– Current finances will allow us to migrate 2PB to the next generation of media, T10000-D (8.5TB) • Physically the same media of the T10000-C but reformatted to allow for more capacity
– SLAC will soon migrate to T10000-D types reaching the 100PB total capacity) – Oldest data processing will not be migrated (D drives can read A/B/C tapes) and will
eventually be dropped C. Cartaro 34
A WIDER HORIZON
• DPHEP, Data Preservation in High Energy Physics, born as a study group is now a Collaboration for Data Preservation and Long Term Analysis in High Energy Physics
– https://hep-project-dphep-portal.web.cern.ch/
– Events: https://indico.cern.ch/category/4458/
C. Cartaro 35
Horizon2020 https://ec.europa.eu/programmes/horizon2020/en/what-horizon-2020
MORE THAN BITS
• Data is not just the “data”, it is really the data, the software (with the environment), and the documentation
• Without preserving all of it, preserving the data could be meaningless
• Other parameters for preservation – Use cases and target communities
• New analyses or cross checks of new discoveries, education, outreach, …
– Uniqueness – Re-use – Discoverability, accessibility, … – …
C. Cartaro 36
FUTURE OF DATA PRESERVATION
• Data Storage – Primary copy is on tape
• BaBar data are at SLAC and a second copy is hosted at CC-IN2P3, Lyon, France
– Migration to new media every ~5 years • More efficient, more reliable, not necessarily less expensive • For how long tapes will be a solution? One, two decades, or more? • What’s next? Cloud storage? Hybrid solutions?
• Software environment – Virtualization has proved its worth but it is also evolving – What if we face a disruptive change during the next decade or so?
• Documentation – Wikis and traditional HTML could actually be a viable solution for short and
longer term preservation
C. Cartaro 37
BECAUSE PHYSICS IS NOT SCALE INVARIANT
• Preserving 2PB now can be difficult. But will it be still true when the HEP experiments will face a problem a 1000 times larger, at the Exabyte scale?
C. Cartaro 38
In more than one way
Computing: new records broken !
22
Data in Tier-0 vs time
Data transfer
20 GB/s x10 design !
WLCG today: ~ 170 sites (40 countries)
~ 500k CPU cores, 500 PB storage > 2 million jobs/days, 10-100 Gb links
Longer-term future: hybrid cloud model ? (commercial cloud services, public e-infrastructure,
in-house IT resources) à Model will be tested with HNSciCloud H2020
project (CERN, Tiers-1, many European HEP FA, EMBL, …).
70% of funds (~4.7 M€) from EC. Started 1 Jan
2016.
Present model should work for Runs 2-3
Courtesy of Jamie Shiers, DPHEP & CERN, Cambridge Big Data Workshop, March 2016
THANKS! • LTDA developers
– Coordinator: Tina Cartaro – BaBar software: Homer Neal – Development and system administration: Marcus
Ebert – Network design: Steffen Luitz – Virtualization: Kyle Fransham, Marcus Ebert – System performance and CDB: Igor Gaponenko – Databases, tools and production: Douglas Smith,
Tim Adye – Xrootd: Wilko Kroeger – Computing Division experts
• System setup and administration: Booker Bense, Lance Nakata, Randall Radmer and all the SLAC Unix-Admin team
• Network setup: Antonio Ceseracciu • BaBar-SLAC Computing Division liaison: Len Moss,
Andrew May
C. Cartaro 39
• Particle Windchime – Map detector and particle properties (momentum,
detector hits, particle-ID, etc.) onto sonic characteristics (volume, pitch, timbre, etc.). This experience was featured in a BBC science podcast and in Symmetry Magazine.
• The Particle Windchime was originally created by a team of hackers at Science Hack Day SF and continues to be maintained by Matt Bellis
• More sounds: – http://www.mattbellis.com/windchime/related-media/