Post on 16-Jan-2015
description
Ceph at DreamHost
A Storage Journey
About Me• One of the original four of
DreamHost
• Still active daily at DreamHost
• Have spent a lot of time working on the Ops side.
• Hosting company founded in 1997
• Sage’s other company
• shared hosting, virtual servers, dedicated servers, cloud storage, cloud computing
• 375k customers, 1.3MM websites
Storage JourneyA long strange trip
His name was Destro
... and then there were more.
The First NetApp
Remote Failover
Remote Failover
Meanwhile...
... and still more.
Lots of NetApps• Peak of around 125 individual
NetApps
• Smallish capacity on each (8TB)
• Internal software continuously moving data between NetApps
• Lots of time spent managing nearly full filers
Ideal
Reality
Hosting Landscape
• Included storage had grown from 50MB to gigabytes, then terabytes.
• Prices stayed the same.
• Eventually went to unlimited Storage
• Usage per customer skyrocketed.
Failed Experiments
Failed Experiments
• ATAoE and XFS-based systems
• Performance & Stability issues
• 2006 era gear
Failed Experiments
• High capacity• Nice features• Expensive• 85% full and it
failed
Some Success
• First on Sun hardware then Supermicro
• Great stability
• Not enough IO for front-line network storage
Back to Basics
Local RAID
• SATA drives had grown in capacity and were very cheap
• 4-6TB per hosting server
• Less dependence on congested network
• Smaller failure domains
The Good
Local RAID
• No more quota, too slow to scan filesystem
• No more fast failovers
• Multiple hour filesystem check with ext3
• More failure domains
The Bad
Local RAID
• Complete RAID loss more common than anticipated
• Multiple days to fully restore from backup
The Ugly
Storage TodayLight at the end of the tunnel
Hybrid Mix
• We learned something from every step of the way
• No one size fits all when it comes to storage
• Use whatever is best for the job
• Be ready to change
Best Tool For The Job
A Bit of Everything• Clustered NetApps and NFS for
• Local RAID in hosting servers
• ZFS and OpenSolaris backup servers
• Ceph for DreamObjects and DreamCompute
Best Tool For The Job
• Object Storage, S3/Swift compatible
• 2+ Petabytes raw storage
• 3x replication, 900+ OSDs
• RGW behind HAProxy
• Row, rack, node and disk fault tolerant
• OpenStack-based Public Cloud
• 3+ Petabytes raw storage
• All storage is on Ceph RBD
• Boot and Attachable Volumes
• Nicira SDN + Ceph, Live Migration
CephFS & The Future
• The return of Failovers
• No more backup servers
• No more major disk-related outages
• Fault tolerant low cost hosting
Storage Panacea?
Thanks!@dallas
dallas@dreamhost.com