Xrootd usage @ LHC
Embed Size (px)
description
Transcript of Xrootd usage @ LHC

CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
Xrootd usage @ LHC
An up-to-date technical survey about xrootd-based storage solutions

CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
F.Furano (CERN IT-DM) - Xrootd usage @ LHC
Outline
• Intro– Main use cases in the storage arena
• Generic Pure xrootd @ LHC– The [email protected] way– The Alice way
• CASTOR2• Roadmap• Conclusions

CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
F.Furano (CERN IT-DM) - Xrootd usage @ LHC
Introduction and use cases

CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
F.Furano (CERN IT-DM) - Xrootd usage @ LHC
The historical Problem: data access• Physics experiments rely on rare events and statistics
– Huge amount of data to get a significant number of events• The typical data store can reach 5-10 PB… now• Millions of files, thousands of concurrent clients
– The transaction rate is very high• Not uncommon O(103) file opens/sec per cluster
– Average, not peak– Traffic sources: local GRID site, local batch system, WAN
• Up to O(104) clients per server!• If not met then the outcome is:
– Crashes, instability, workarounds, “need” for crazy things
• Scalable high performance direct data access– No imposed limits on performance and size, connectivity– Higher performance, supports WAN direct data access– Avoids WN under-utilization– No need to do inefficient local copies if not needed
• Do we fetch entire websites to browse one page?

CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
F.Furano (CERN IT-DM) - Xrootd usage @ LHC
The Challenges
• LHC User Analysis
• Boundary Conditions– GRID environment
• GSI authentication• User space deployment
– CC environment• Kerberos, - admin deployment
• High I/O load• Moderate Namespace load• Many clients O(1000-10000)
Sequential File Access Sparse File Access
Basic Analysis (today)RAW, ESD
Advanced Analysis (tomorrow)ESD,AOD, Ntuple, Histograms
Batch Data Access Interactive Data Access
RAP root, dcap,rfio ....
MFS Mounted File Systems
• T0/T3 @ CERN
• Preferred interface is MFS– Easy, intuitive, fast response, standard
applications
– Moderate I/O load
– High Namespace load • Compilation
• Software startup
• searches
• Less Clients O(#users)

CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
F.Furano (CERN IT-DM) - Xrootd usage @ LHC
Main requirement
• Data access has to work reliably at the desired scale– This also means:
• It has not to waste resources

CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
F. Furano, A. Hanushevsky - Scalla/xrootd WAN globalization tools: where we are. (CHEP09)
A simple use case
• I am a physicist, waiting for the results of my analysis jobs– Many bunches, several outputs
• Will be saved e.g. to an SE at CERN
– My laptop is configured to show histograms etc, with ROOT
– I leave for a conference, the jobs finish while in the plane– When there, I want to simply draw the results from my
home directory– When there, I want to save my new histos in the same
place– I have no time to loose in tweaking to get a copy of
everything. I loose copies into the confusion.– I want to leave the things where they are.
I know nothing about things to tweak.
What can I expect? Can I do it?

CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
F. Furano, A. Hanushevsky - Scalla/xrootd WAN globalization tools: where we are. (CHEP09)
Another use case
• ALICE analysis on the GRID• Each job reads ~100-150MB from
ALICE::CERN::SE• These are cond data accessed directly, not file copies
– I.e. VERY efficient, one job reads only what it needs.• It just works, no workarounds
– At 10-20MB/s it takes 5-10 secs (most common case)– At 5MB/s it takes 20secs– At 1MB/s it takes 100
• Sometimes data are accessed elsewhere– Alien allows to save a job by making it read data from
a different site. Very good performance
• Quite often the results are written/merged elsewhere

CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
F.Furano (CERN IT-DM) - Xrootd usage @ LHC
Pure Xrootd

CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
F.Furano (CERN IT-DM) - Xrootd usage @ LHC
xrootd Plugin Architecture
lfn2pfnprefix encoding
Storage System(oss, drm/srm, etc)
authentication(gsi, krb5, etc)
Clustering(cmsd)
authorization(name based)
File System(ofs, sfs, alice, etc)
Protocol (1 of n)(xrootd)
Protocol Driver(XRD)

CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
F.Furano (CERN IT-DM) - Xrootd usage @ LHC
The client side
• Fault tolerance in data access– Meets WAN requirements, reduces jobs mortality
• Connection multiplexing (authenticated sessions)• Up to 65536 parallel r/w requests at once per client process• Up to 32767 open files per client process
– Opens bunches of up to O(1000) files at once, in parallel– Full support for huge bulk prestages
• Smart r/w caching– Supports normal readaheads and “Informed Prefetching”
• Asynchronous background writes– Boosts writing performance in LAN/WAN
• Sophisticated integration with ROOT– Reads in advance the “right” chunks while the app computes the
preceding ones– Boosts read performance in LAN/WAN (up to the same order)

CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
F.Furano (CERN IT-DM) - Xrootd usage @ LHC
The Xrootd “protocol”
• The XRootD protocol is a good one• Efficient, clean, supports fault-tolerance etc. etc…
– It doesn’t do any magic, however• It does not multiply your resources• It does not overcome hw bottlenecks• BUT it allows the true usage of the hw resources
– One of the aims of the project still is sw quality• In the carefully crafted pieces of sw which come with the distribution
– What makes the difference with Scalla/XRootD is:• Scalla/XRootD Implementation details (performance + robustness)
– And bad performance can hurt robustness (and vice-versa)• Scalla SW architecture (scalability + performance + robustness)• Designed to fit the HEP requirements• You need a clean design where to insert it• Born with efficient direct access in mind
– But with the requirements of high performance computing– Copy-like access becomes a particular case

CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
F.Furano (CERN IT-DM) - Xrootd usage @ LHC
Pure Xrootd @ LHC

CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
F.Furano (CERN IT-DM) - Xrootd usage @ LHC
The [email protected] way with XROOTD
• Pure Xrootd + Xrootd-based “filesystem” extension• Adapters to talk to BestMan SRM and GridFTP
• More details in A.Hanushevsky’s talk @ CHEP09
xrootd/cmsd/cnsd
FUSE
ADAPTER
FUSE
GridFTP
Fire Wall
SRM

CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
F.Furano (CERN IT-DM) - Xrootd usage @ LHC
The ALICE way with XROOTD
• Pure Xrootd + ALICE strong authz plugin. No difference among T1/T2 (only size and QOS)• WAN-wide globalized deployment, very efficient direct data access• CASTOR at Tier-0 serving data, Pure Xrootd serving conditions to the GRID jobs• “Old” DPM+Xrootd in several tier2s
Xrootd site(GSI)
A globalized clusterALICE global redirector
Local clients workNormally at each
site
Missing a file?Ask to the global redirectorGet redirected to the right
collaborating cluster, and fetch it.Immediately.
A smart clientcould point here
Any otherXrootd site
Xrootd site(CERN)
Cmsd
Xrootd
VirtualMassStorageSystem… built on data Globalization
More details and complete info in “Scalla/Xrootd WAN globalization tools: where we are.” @ CHEP09

CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
F.Furano (CERN IT-DM) - Xrootd usage @ LHC
CASTOR2
Putting everything together @ Tier0/1s

CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
F.Furano (CERN IT-DM) - Xrootd usage @ LHC
The CASTOR way
• Client connects to a redirector node• The redirector asks CASTOR where the file is• Client then connects directly to the node holding the data• CASTOR handles tapes in the back
Disk Servers
Red
irec
tor A
B
C
Client
Open file X
Go to C
CASTORWhere is X ?
On C
Tape
bac
kend
Trigger migration/recall
Credits: S.Ponce (IT-DM)

CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
F.Furano (CERN IT-DM) - Xrootd usage @ LHC
CASTOR 2.1.8Improving Latency - Read
• 1st focus on file (read) open latencies
Estimate
1
10
100
1000
ms
Castor 2.1.7(rfio)
Castor 2.1.8(xroot) Castor 2.1.9
(xroot)
October 2008
Network Latency Limit
Read Open Latencies
Credits: A.Peters (IT-DM)

CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
F.Furano (CERN IT-DM) - Xrootd usage @ LHC
Estimate
CASTOR 2.1.8Improving Latency – Metadata Read
• Next focus on meta data (read) latencies
1
10
100
1000
ms
Castor 2.1.7 Castor 2.1.8 Castor 2.1.9
October 2008
Network Latency Limit
Stat Latencies
Credits: A.Peters (IT-DM)

CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
F.Furano (CERN IT-DM) - Xrootd usage @ LHC
Prototype - Architecture XCFS Overview - xroot + FUSE
DATA FS
Authz
CLI
EN
T
xcfsd
libXrdPosix
libXrdClient
/dev/fuse
VFS
Client Application
glibc
libfuse FUSE LL Implementation
XROOT Posix Library
XROOT Client Library
Posix access to /xcfs(i.e. a generic application)
libXrdCatalogFs
xrootd
libXrdSec<plugin>
xrootd
libXrdSec<plugin>
libXrdCatalogFs
xrootd
libXrdSec<plugin>libXrdCatalogOfs
xrootd
libXrdSecUnix
XFS
FS
Name Space Provider
Meta Data FilesystemlibXrdCatalogAuthz
Strong Auth Plugin
xrootd server daemon
Remote Access Protocol
(ROOT plugs here)
DIS
K S
ER
VE
R
MD
SE
RV
ER
Capability
Credits: A.Peters (IT-DM)

CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
F.Furano (CERN IT-DM) - Xrootd usage @ LHC
Early Prototype - Evaluation Meta Data Performance
• File Creation*
• File Rewrite
• File Read
• Rm
• Readdir/StatAccess
• ~1.000/s
• ~2.400/s
• ~2.500/s
• ~3.000/s
• Σ = 70.000/s
*These values have been measured executing shell commands on 216mount clients. Creation performance decreases with the filling of the namespace on a spinning medium. Using an XFS filesystem over a DRBD blockdevicein a high-availability setup file creation perfromance stabilizes at 400/s (20 Mio files in the namespace)
Credits: A.Peters (IT-DM)

CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
F.Furano (CERN IT-DM) - Xrootd usage @ LHC
Network usage (or waste!)
• Network traffic is an important factor – it has to match the ratio IO(CPU Server) / IO(Disk Server)– Too much unneeded traffic means fewer clients supported (serious bottleneck:
1 client works well, 100-1000 clients do not at all)– Lustre doesn't disable readahead during forward-seeking access and transfers
the complete file if reads are found in the buffer cache (readahead window starts with 1M and scales up to 40 M)
• XCFS/LUSTRE/NFS4 network volume without read-ahead is based on 4k pages in Linux– Most of the requests are not page aligned and result in additional pages to be
transferred (avg. read size 4k), hence they xfer twice as much data (but XCFS can skip this now!)
– 2nd execution plays no real role for analysis since datasets are usually bigger than client buffer cache
Credits: A.Peters (IT-DM) – ACAT2008

CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
F.Furano (CERN IT-DM) - Xrootd usage @ LHC
Why is that useful?• Users can access data by LFN
without specification of the stager
• Users are automatically directed to 'their' pool with write permissions
CASTOR 2.1.8-6Cross Pool Redirection
T3Stager
T0Stager
X X
X
ManagerServer
Server
Meta Manager
NameSpace
• T3 pool subscribed•r/w for /castor/user•r/w for /castor/cms/user/
• T0 pool subscribed•ro for /castor•ro for /castor/cms/data
Example Configuration
There are even more possibilitiesif a part of the namespace can be assigned to individual pools for write operations.
X xrootd
cmsd(cluster management)
Manager
Credits: A.Peters (IT-DM)

CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
F.Furano (CERN IT-DM) - Xrootd usage @ LHC
Towards a Production VersionFurther Improvements – Security
• GSI/VOMS authentication plugin prototype developed based on pure OpenSSL – using additionally code from mod_ssl & libgridsite– significantly faster than GLOBUS implementation
• After Security Workshop with A.Hanushevsky Virtual Socket Layer introduced into xrootd authentication plugin base to allow socket oriented authentication over xrootd protocol layer– Final version should be based on OpenSSL and
VOMS library
Virtual Socket
VirtualSocket

CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
F.Furano (CERN IT-DM) - Xrootd usage @ LHC
The roadmap

CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
F.Furano (CERN IT-DM) - Xrootd usage @ LHC
XROOT Roadmap @CERN
• XROOT is strategic for scalable analysis support with CASTOR at CERN / T1s
• will support other file access protocols until they become obsolete
• CASTOR• Secure RFIO has been released in 2.1.8• deployment impact in terms of CPU may be significant
– Secure XROOT is default in 2.1.8 (Kerb. or X509)• Expect to lower CPU cost than rfio due to session
model• No plans to provide un-authenticated access via
XROOT

CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
F.Furano (CERN IT-DM) - Xrootd usage @ LHC
XROOTD Roadmap
• CASTOR– Secure RFIO has been released in 2.1.8– deployment impact in terms of CPU may be significant
• Secure XROOT is default in 2.1.8 (Kerb. or X509)• Expect to lower CPU cost than rfio due to session model• No plans to provide un-authenticated access via XROOT
• DPM– support for authentication via xrootd is scheduled start
certification begin of July
• dCache– Relies on a custom full re-implementation of XROOTD protocol– protocol docs have been updated by A. Hanushevsky– in contact with CASTOR/DPM team to add
authentication/authorisation on the server side– evaluating common client plug-in / security protocol

CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
F.Furano (CERN IT-DM) - Xrootd usage @ LHC
Conclusion
• A very dense roadmap• Many, many tech details• Heading for
– Solid and high performance data access• For production and analysis
– More advanced user analysis scenarios– Need to match existing architectures, protocols
and workarounds

CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
Thank you
Questions?