Lessons Learned from Managing a Petabyte Jacek Becla Stanford Linear Accelerator Center (SLAC)...

18
Lessons Learned from Managing a Petabyte Jacek Becla Stanford Linear Accelerator Center (SLAC) Daniel Wang now University of CA in Irvine, formerly SLAC

Transcript of Lessons Learned from Managing a Petabyte Jacek Becla Stanford Linear Accelerator Center (SLAC)...

Page 1: Lessons Learned from Managing a Petabyte Jacek Becla Stanford Linear Accelerator Center (SLAC) Daniel Wang now University of CA in Irvine, formerly SLAC.

Lessons Learned from Managing a Petabyte

Jacek BeclaStanford Linear Accelerator Center (SLAC)

Daniel Wangnow University of CA in Irvine,formerly SLAC

Page 2: Lessons Learned from Managing a Petabyte Jacek Becla Stanford Linear Accelerator Center (SLAC) Daniel Wang now University of CA in Irvine, formerly SLAC.

CIDR’05, Asilomar, CA

2 of 18

Roadmap

Who we are Simplified data processing Core architecture and migration Challenges/surprises/problems Summary

Don’t miss the “lessons”, just look for yellow stickers

Page 3: Lessons Learned from Managing a Petabyte Jacek Becla Stanford Linear Accelerator Center (SLAC) Daniel Wang now University of CA in Irvine, formerly SLAC.

CIDR’05, Asilomar, CA

3 of 18

Who We Are

Stanford Linear Accelerator Center– DoE National Lab, operated by Stanford University

BaBar– one of the largest High Energy Physics (HEP)

experiments online– in production since 1999– over petabyte of production data

HEP– data intensive science– statistical studies– needle in a haystack searches

Page 4: Lessons Learned from Managing a Petabyte Jacek Becla Stanford Linear Accelerator Center (SLAC) Daniel Wang now University of CA in Irvine, formerly SLAC.

CIDR’05, Asilomar, CA

4 of 18

Simplified Data Processing

Page 5: Lessons Learned from Managing a Petabyte Jacek Becla Stanford Linear Accelerator Center (SLAC) Daniel Wang now University of CA in Irvine, formerly SLAC.

CIDR’05, Asilomar, CA

5 of 18

A Typical Day in Life(SLAC only)

~8 TB accessed in ~100K files ~7 TB in/out tertiary storage 2-5 TB in/out SLAC ~35K jobs complete

– 2500 run at any given time– many long running jobs (up to few days)

Page 6: Lessons Learned from Managing a Petabyte Jacek Becla Stanford Linear Accelerator Center (SLAC) Daniel Wang now University of CA in Irvine, formerly SLAC.

CIDR’05, Asilomar, CA

6 of 18

Some of the

Data-related Challenges

Finding perfect snowflake(s) in an avalanche

Volume organizing data Dealing with I/O

– sparse reads– random access– small object size: o(100) bytes

Providing data for many tens of sites

Page 7: Lessons Learned from Managing a Petabyte Jacek Becla Stanford Linear Accelerator Center (SLAC) Daniel Wang now University of CA in Irvine, formerly SLAC.

CIDR’05, Asilomar, CA

7 of 18

More Challenges…

Data Distribution

~25 sites worldwide produce data– many more use it

Distribution pros/cons+ keep data close to users– makes administration tougher+ works as a backup

Kill two birds with one stone, replicate for availability as well as backup

Page 8: Lessons Learned from Managing a Petabyte Jacek Becla Stanford Linear Accelerator Center (SLAC) Daniel Wang now University of CA in Irvine, formerly SLAC.

CIDR’05, Asilomar, CA

8 of 18

Core Architecture

Mass Storage (HPSS)– tapes cost-effective &

more reliable than disks

160 TB disk cache, 40+ data servers Database engine: ODBMS: Objectivity/DB

– scalable thin-dataserver thick-client architecture– gives full control over data placement & clustering– ODBMS later replaced by system built within HEP

DB related code hidden behind transient-persistent wrapper

Consider all factors when choosing software and hardware

Page 9: Lessons Learned from Managing a Petabyte Jacek Becla Stanford Linear Accelerator Center (SLAC) Daniel Wang now University of CA in Irvine, formerly SLAC.

CIDR’05, Asilomar, CA

9 of 18

Reasons to Migrate

ODBMS not a mainstream– true for HEP and elsewhere– long term future

Locked in certain OSes/compilers Unnecessary DB overhead

– e.g. transactions for immutable data Maintenance at small

institutes Monetary cost

Build flexible system, be prepared for non-trivial changes. Bet on simplicity.

Page 10: Lessons Learned from Managing a Petabyte Jacek Becla Stanford Linear Accelerator Center (SLAC) Daniel Wang now University of CA in Irvine, formerly SLAC.

CIDR’05, Asilomar, CA

10 of 18

xrootd Data Server

Developed in-house– becoming de facto HEP standard now

Numerous must-have features, some hard to add to the commercial server– deferral– redirection– fault tolerance– scalability– automatic load balancing– proxy server

Larger systems depend more heavily on automation

Page 11: Lessons Learned from Managing a Petabyte Jacek Becla Stanford Linear Accelerator Center (SLAC) Daniel Wang now University of CA in Irvine, formerly SLAC.

CIDR’05, Asilomar, CA

11 of 18

More Lessons…

Challenges, Surprises, Problems

Organizing & managing data– Divide into mutable & immutable,

separate queryable data immutable easier to optimize, replicate & scale

– Decentralize metadata updates contention happens in unexpected places makes data mgmt harder still need some centralization

Fault tolerance– Large system likely to use commodity hardware

fault tolerance essential

Single technology likely not enough to efficiently manage petabytes

Page 12: Lessons Learned from Managing a Petabyte Jacek Becla Stanford Linear Accelerator Center (SLAC) Daniel Wang now University of CA in Irvine, formerly SLAC.

CIDR’05, Asilomar, CA

12 of 18

Main bottleneck: disk I/O– underlying persistency less important

than one’d expect– access patterns more important must

understand to derandomize I/O

Job mgmt/bookkeeping– better to stall jobs than to kill

Power, cooling, floor weight Admin

Challenges, Surprises, Problems(cont…)

Hide disruptive events by stalling data flow

Page 13: Lessons Learned from Managing a Petabyte Jacek Becla Stanford Linear Accelerator Center (SLAC) Daniel Wang now University of CA in Irvine, formerly SLAC.

CIDR’05, Asilomar, CA

13 of 18

On Bleeding Edge Since Day 1 Huge collection of interesting challenges…

– Increasing address space– Improving server code– Tuning and scaling whole system– Reducing lock collisions– Improving I/O– …many others

In summary– we made it work (big success), but…– continuous improvements were needed for

the first several years to keep up

When you push limits, expect many problems everywhere. Normal maxima are too small. Observe refine repeat

Page 14: Lessons Learned from Managing a Petabyte Jacek Becla Stanford Linear Accelerator Center (SLAC) Daniel Wang now University of CA in Irvine, formerly SLAC.

CIDR’05, Asilomar, CA

14 of 18

Uniqueness of …

Scientific Community

Hard to convince scientific community to use commercial products– BaBar: 5+ million lines of home grown, complex C++

Continuously look for better approaches– system has to be very flexible

Most data immutable Many smart people that can

build almost anything

Specific needs of your community can impact everything, including the system architecture

Page 15: Lessons Learned from Managing a Petabyte Jacek Becla Stanford Linear Accelerator Center (SLAC) Daniel Wang now University of CA in Irvine, formerly SLAC.

CIDR’05, Asilomar, CA

15 of 18

DB-related Effort

~4-5 core db developers since 1996– effort augmented by many physicists,

students and visitors

3 DBAs– since production started till recently– less than 3 now

system finally automated and fault tolerant

Automation. is the key to low-maintenance,fault tolerant, system

Page 16: Lessons Learned from Managing a Petabyte Jacek Becla Stanford Linear Accelerator Center (SLAC) Daniel Wang now University of CA in Irvine, formerly SLAC.

CIDR’05, Asilomar, CA

16 of 18

Lessons

Summary

Kill two birds with one stone, replicate for availability as well as backup

Consider all factors when choosing software and hardware

When you push limits, expect many problems everywhere. Normal maxima are too small. Observe refine repeat

Specific needs of your community can impact everything, including the system architecture

Automation. is the key to low-maintenance,fault tolerant, system

Larger systems depend more heavily on automation

Hide disruptive events by stalling data flow

Single technology likely not enough to efficiently manage petabytes

Organize data (mutable, immutable, queryable, …)

Build flexible system, be prepared for non-trivial changes. Bet on simplicity.

Page 17: Lessons Learned from Managing a Petabyte Jacek Becla Stanford Linear Accelerator Center (SLAC) Daniel Wang now University of CA in Irvine, formerly SLAC.

CIDR’05, Asilomar, CA

17 of 18

Problems @ Petabyte Frontierjust a few highlights…

How to cost-effectively backup a PB? How to provide fault tolerance with 1000s disks

– RAID 5 is not good enough

How to build low maintenance system? – “1 full-time person per 1 TB” does not scale

How to store the data? (tape anyone? )– consider all factors: cost, power, cooling, robustness

…YES, there are “new” problems beyond “known problems scaled up”

Page 18: Lessons Learned from Managing a Petabyte Jacek Becla Stanford Linear Accelerator Center (SLAC) Daniel Wang now University of CA in Irvine, formerly SLAC.

CIDR’05, Asilomar, CA

18 of 18

The

Summary

Great success– ODBMS based system, migration & 2nd generation– Some DoD projects are being built on ODBMS

Lots of useful experience with managing (very) large datasets– Would not be able to achieve all that

with any RDBMS (today)– Thin server thick client architecture works well– Starting to help astronomers (LSST) to manage

their petabytes