Lessons Learned from Managing a Petabyte Jacek Becla Stanford Linear Accelerator Center (SLAC)...

Lessons Learned from Managing a Petabyte

Jacek BeclaStanford Linear Accelerator Center (SLAC)

Daniel Wangnow University of CA in Irvine,formerly SLAC

CIDR’05, Asilomar, CA

2 of 18

Roadmap

Who we are Simplified data processing Core architecture and migration Challenges/surprises/problems Summary

Don’t miss the “lessons”, just look for yellow stickers


3 of 18

Who We Are

Stanford Linear Accelerator Center– DoE National Lab, operated by Stanford University

BaBar– one of the largest High Energy Physics (HEP)

experiments online– in production since 1999– over petabyte of production data

HEP– data intensive science– statistical studies– needle in a haystack searches


4 of 18

Simplified Data Processing


5 of 18

A Typical Day in Life(SLAC only)

~8 TB accessed in ~100K files ~7 TB in/out tertiary storage 2-5 TB in/out SLAC ~35K jobs complete

– 2500 run at any given time– many long running jobs (up to few days)


6 of 18

Some of the

Data-related Challenges

Finding perfect snowflake(s) in an avalanche

Volume organizing data Dealing with I/O

– sparse reads– random access– small object size: o(100) bytes

Providing data for many tens of sites


7 of 18

More Challenges…

Data Distribution

~25 sites worldwide produce data– many more use it

Distribution pros/cons+ keep data close to users– makes administration tougher+ works as a backup

Kill two birds with one stone, replicate for availability as well as backup


8 of 18

Core Architecture

Mass Storage (HPSS)– tapes cost-effective &

more reliable than disks

160 TB disk cache, 40+ data servers Database engine: ODBMS: Objectivity/DB

– scalable thin-dataserver thick-client architecture– gives full control over data placement & clustering– ODBMS later replaced by system built within HEP

DB related code hidden behind transient-persistent wrapper

Consider all factors when choosing software and hardware


9 of 18

Reasons to Migrate

ODBMS not a mainstream– true for HEP and elsewhere– long term future

Locked in certain OSes/compilers Unnecessary DB overhead

– e.g. transactions for immutable data Maintenance at small

institutes Monetary cost

Build flexible system, be prepared for non-trivial changes. Bet on simplicity.


10 of 18

xrootd Data Server

Developed in-house– becoming de facto HEP standard now

Numerous must-have features, some hard to add to the commercial server– deferral– redirection– fault tolerance– scalability– automatic load balancing– proxy server

Larger systems depend more heavily on automation


11 of 18

More Lessons…

Challenges, Surprises, Problems

Organizing & managing data– Divide into mutable & immutable,

separate queryable data immutable easier to optimize, replicate & scale

– Decentralize metadata updates contention happens in unexpected places makes data mgmt harder still need some centralization

Fault tolerance– Large system likely to use commodity hardware

fault tolerance essential

Single technology likely not enough to efficiently manage petabytes


12 of 18

Main bottleneck: disk I/O– underlying persistency less important

than one’d expect– access patterns more important must

understand to derandomize I/O

Job mgmt/bookkeeping– better to stall jobs than to kill

Power, cooling, floor weight Admin

Challenges, Surprises, Problems(cont…)

Hide disruptive events by stalling data flow


13 of 18

On Bleeding Edge Since Day 1 Huge collection of interesting challenges…

– Increasing address space– Improving server code– Tuning and scaling whole system– Reducing lock collisions– Improving I/O– …many others

In summary– we made it work (big success), but…– continuous improvements were needed for

the first several years to keep up

When you push limits, expect many problems everywhere. Normal maxima are too small. Observe refine repeat


14 of 18

Uniqueness of …

Scientific Community

Hard to convince scientific community to use commercial products– BaBar: 5+ million lines of home grown, complex C++

Continuously look for better approaches– system has to be very flexible

Most data immutable Many smart people that can

build almost anything

Specific needs of your community can impact everything, including the system architecture


15 of 18

DB-related Effort

~4-5 core db developers since 1996– effort augmented by many physicists,

students and visitors

3 DBAs– since production started till recently– less than 3 now

system finally automated and fault tolerant

Automation. is the key to low-maintenance,fault tolerant, system


16 of 18

Lessons

Summary

Kill two birds with one stone, replicate for availability as well as backup

Consider all factors when choosing software and hardware

When you push limits, expect many problems everywhere. Normal maxima are too small. Observe refine repeat

Specific needs of your community can impact everything, including the system architecture

Automation. is the key to low-maintenance,fault tolerant, system

Larger systems depend more heavily on automation

Hide disruptive events by stalling data flow

Single technology likely not enough to efficiently manage petabytes

Organize data (mutable, immutable, queryable, …)

Build flexible system, be prepared for non-trivial changes. Bet on simplicity.


17 of 18

Problems @ Petabyte Frontierjust a few highlights…

How to cost-effectively backup a PB? How to provide fault tolerance with 1000s disks

– RAID 5 is not good enough

How to build low maintenance system? – “1 full-time person per 1 TB” does not scale

How to store the data? (tape anyone? )– consider all factors: cost, power, cooling, robustness

…YES, there are “new” problems beyond “known problems scaled up”


18 of 18

The

Summary

Great success– ODBMS based system, migration & 2nd generation– Some DoD projects are being built on ODBMS

Lots of useful experience with managing (very) large datasets– Would not be able to achieve all that

with any RDBMS (today)– Thin server thick client architecture works well– Starting to help astronomers (LSST) to manage

their petabytes

Lessons Learned from Managing a Petabyte Jacek Becla Stanford Linear Accelerator Center (SLAC)...

Documents

Transcript of Lessons Learned from Managing a Petabyte Jacek Becla Stanford Linear Accelerator Center (SLAC)...