Ceph Day Santa Clara: Ceph at DreamHost

Post on 16-Jan-2015

1.014 views 7 download

Tags:

description

Dallas Kashuba, Co-Founder of DreamHost walks through why they went with Ceph at the Santa Clara Ceph Day.

Transcript of Ceph Day Santa Clara: Ceph at DreamHost

Ceph at DreamHost

A Storage Journey

About Me• One of the original four of

DreamHost

• Still active daily at DreamHost

• Have spent a lot of time working on the Ops side.

• Hosting company founded in 1997

• Sage’s other company

• shared hosting, virtual servers, dedicated servers, cloud storage, cloud computing

• 375k customers, 1.3MM websites

Storage JourneyA long strange trip

His name was Destro

... and then there were more.

The First NetApp

Remote Failover

Remote Failover

Meanwhile...

... and still more.

Lots of NetApps• Peak of around 125 individual

NetApps

• Smallish capacity on each (8TB)

• Internal software continuously moving data between NetApps

• Lots of time spent managing nearly full filers

Ideal

Reality

Hosting Landscape

• Included storage had grown from 50MB to gigabytes, then terabytes.

• Prices stayed the same.

• Eventually went to unlimited Storage

• Usage per customer skyrocketed.

Failed Experiments

Failed Experiments

• ATAoE and XFS-based systems

• Performance & Stability issues

• 2006 era gear

Failed Experiments

• High capacity• Nice features• Expensive• 85% full and it

failed

Some Success

• First on Sun hardware then Supermicro

• Great stability

• Not enough IO for front-line network storage

Back to Basics

Local RAID

• SATA drives had grown in capacity and were very cheap

• 4-6TB per hosting server

• Less dependence on congested network

• Smaller failure domains

The Good

Local RAID

• No more quota, too slow to scan filesystem

• No more fast failovers

• Multiple hour filesystem check with ext3

• More failure domains

The Bad

Local RAID

• Complete RAID loss more common than anticipated

• Multiple days to fully restore from backup

The Ugly

Storage TodayLight at the end of the tunnel

Hybrid Mix

• We learned something from every step of the way

• No one size fits all when it comes to storage

• Use whatever is best for the job

• Be ready to change

Best Tool For The Job

A Bit of Everything• Clustered NetApps and NFS for

email

• Local RAID in hosting servers

• ZFS and OpenSolaris backup servers

• Ceph for DreamObjects and DreamCompute

Best Tool For The Job

• Object Storage, S3/Swift compatible

• 2+ Petabytes raw storage

• 3x replication, 900+ OSDs

• RGW behind HAProxy

• Row, rack, node and disk fault tolerant

• OpenStack-based Public Cloud

• 3+ Petabytes raw storage

• All storage is on Ceph RBD

• Boot and Attachable Volumes

• Nicira SDN + Ceph, Live Migration

CephFS & The Future

• The return of Failovers

• No more backup servers

• No more major disk-related outages

• Fault tolerant low cost hosting

Storage Panacea?

Thanks!@dallas

dallas@dreamhost.com