BACD LA 2013 - Scaling Storage with Ceph
-
Upload
buildacloud -
Category
Technology
-
view
960 -
download
1
description
Transcript of BACD LA 2013 - Scaling Storage with Ceph
![Page 1: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/1.jpg)
SCALING STORAGE WITH CEPH
Ross Turk, Inktank
![Page 2: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/2.jpg)
![Page 3: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/3.jpg)
![Page 4: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/4.jpg)
RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
LIBRADOS A library allowing apps to directly access RADOS, with support for C, C++, Java, Python, Ruby, and PHP
RBD A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver
CEPH FS A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE
RADOSGW A bucket-based REST gateway, compatible with S3 and Swift
APP APP HOST/VM CLIENT
![Page 5: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/5.jpg)
IN THE BEGINNING Magic Madzik, Flickr / CC BY 2.0
![Page 6: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/6.jpg)
EARLY INFORMATION STORAGE Chico.Ferreira, Flickr / CC BY 2.0
![Page 7: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/7.jpg)
WRITING > CAVE PAINTINGS kevingessner, Flickr / CC BY-SA 2.0
![Page 8: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/8.jpg)
x1000
== x1
![Page 9: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/9.jpg)
PEOPLE BEGIN WRITING A LOT Moyan_Brenn, Flickr / CC BY-ND 2.0
![Page 10: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/10.jpg)
WRITING IS T IME-‐CONSUMING trekkyandy, Flickr / CC BY 2.0
![Page 11: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/11.jpg)
THE INDUSTRIALIZATION OF WRITING FateDenied, Flickr / CC BY 2.0
![Page 12: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/12.jpg)
x1000
== x1
+ magnet = tape magnetic tape
![Page 13: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/13.jpg)
STORAGE BECOMES MECHANICAL Erik Pitti, Wikipedia / CC BY-ND 2.0
![Page 14: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/14.jpg)
HUMAN COMPUTER TAPE
HUMAN ROCK
HUMAN
INK
PAPER
![Page 15: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/15.jpg)
COMPUTERS NEED PEOPLE TO WORK USDAgov, Flickr / CC BY 2.0
![Page 16: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/16.jpg)
HUMAN COMPUTER TAPE
![Page 17: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/17.jpg)
11101011 10110110 10110101 10101001 00100100 01001001 10100100 10100101 01011010 01101010 10101010 10101010 01010110 01010011
==
![Page 18: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/18.jpg)
THROUGHPUT BECOMES IMPORTANT Zane Luke, Flickr / CC BY-ND 2.0
![Page 19: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/19.jpg)
LAZ0R B3AMS CHANGE EVERYTHING!! Jeff Kubina, Flickr / CC-BY-SA 2.0
![Page 20: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/20.jpg)
HARD DRIVES ARE TOTALLY BETTER
amazing spinny hard drives sucky stupid tape slow
![Page 21: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/21.jpg)
EVERYTHING GETS MESSY Rob!, Flickr / CC BY 2.0
![Page 22: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/22.jpg)
000
aa
ac ab
ba
111010
bb bc
110
010 111
dc
101
da 000
110 001
010 011 db
![Page 23: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/23.jpg)
owner: rturk created: aug12
last viewed: aug17 size: 42025 perms: 644 11101011 10110110 10110101
10101001 00100100 01001001 10100100 10100101 01011010 01101010 10101010 10101010
file
![Page 24: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/24.jpg)
000
aa
ac ab
ba
111010
bb bc
110
010 111
dc
101
da 000
110 001
010 db 01 10
![Page 25: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/25.jpg)
WE OUTGROW THE HARD DRIVE Mr. T in DC, Flickr / CC BY 2.0
![Page 26: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/26.jpg)
HUMAN COMPUTER DISK
DISK
DISK
DISK
DISK
DISK
DISK
![Page 27: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/27.jpg)
PEOPLE NEED S IMULTANEOUS ACCESS wFourier, Flickr / CC BY 2.0
![Page 28: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/28.jpg)
HUMAN COMPUTER DISK
DISK
DISK
DISK
DISK
DISK
DISK
HUMAN
HUMAN
![Page 29: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/29.jpg)
(COMPUTER)
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
HUMAN
HUMAN
HUMAN
HUMAN HUMAN
HUMAN
HUMAN HUMAN
HUMAN HUMAN
HUMAN
HUMAN HUMAN
HUMAN
HUMAN
HUMAN
HUMAN
HUMAN
HUMAN
HUMAN
HUMAN
HUMAN (actually more like this…)
![Page 30: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/30.jpg)
DISK COMPUTER
HUMAN
HUMAN
HUMAN
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
![Page 31: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/31.jpg)
000
aa
ac ab
ba
111010
bb bc
110
010 111
dc
101
da 000
110 001
010 011 db X
![Page 32: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/32.jpg)
pace: quick driver: frog
license: expired expression: agog
11101011 10110110 10110101 10101001 00100100 01001001 10100100 10100101 01011010 01101010 10101010 10101010
object
![Page 33: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/33.jpg)
DISK COMPUTER
APP
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
![Page 34: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/34.jpg)
DISK
COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
COMPUTER
DISK
![Page 35: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/35.jpg)
DISK
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
COMPUTER
VM
VM
VM
![Page 36: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/36.jpg)
STORAGE THROUGHOUT H ISTORY Time-scale: Roughly logarithmic. Content: Whatever the opposite of “scientific” is.
Writing
Computers
Shared storage
Distributed storage
Cloud computing
Ceph
Painting
![Page 37: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/37.jpg)
DISK COMPUTER
HUMAN
HUMAN
HUMAN
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
![Page 38: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/38.jpg)
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
DISK COMPUTER
![Page 39: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/39.jpg)
D C
D C
D C
D C
D C
D C
D C
D C
D C
D C
D C
D C
![Page 40: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/40.jpg)
HUMAN
HUMAN
HUMAN
D C
D C
D C
D C
D C
D C
D C
D C
D C
D C
D C
D C
![Page 41: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/41.jpg)
STORAGE APPLIANCES Michael Moll, Wikipedia / CC BY-SA 2.0
![Page 42: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/42.jpg)
6.4 MILL ION SQFT OF FACTORIES Dude94111, Flickr / CC BY 2.0
![Page 43: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/43.jpg)
STORAGE VENDORS HAVE BIG BILLS CarbonNYC, Flickr / CC BY 2.0
![Page 44: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/44.jpg)
STORAGE APPLIANCES ARE EXPENSIVE 401K 2012, Flickr / CC BY-SA 2.0
![Page 45: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/45.jpg)
TECHNOLOGY IS A COMMODITY RaeAllen, Flickr / CC-BY 2.0
![Page 46: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/46.jpg)
COMMODITY PRICES FLUCTUATE
May-07 May-08 May-09 May-10 May-11 May-12
![Page 47: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/47.jpg)
GROWING WITH HARDWARE APPLIANCES
§ First PB § Proprietary
storage hardware
§ Well-known storage vendor
§ $14 b’zillion
§ Second PB § Proprietary
storage hardware
§ Same storage vendor
§ Another $14 b’zillion
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
![Page 48: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/48.jpg)
APPLIANCES ARE OLD TECHNOLOGY Paul Keller, Flickr / CC BY 2.0
![Page 49: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/49.jpg)
Source: http://www.cpubenchmark.net/high_end_cpus.html
![Page 50: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/50.jpg)
FLAGSHIP HARDWARE APPLIANCE
![Page 51: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/51.jpg)
Hardware Appliances are Mysterious Black Boxes Abode of Chaos, Flickr / CC BY 2.0
![Page 52: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/52.jpg)
DC
DC
DC
DC
D
C
DC
DC
DC
DC
DC
DC
DC
C++
![Page 53: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/53.jpg)
DC
DC
DC
DC
D
C
DC
DC
DC
DC
DC
DC
DC
C++ X
![Page 54: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/54.jpg)
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
HUMAN [DEVELOPER]
!!
![Page 55: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/55.jpg)
THE WORLD NEEDS
A STORAGE TECHNOLOGY THAT
SCALES INFINITELY
![Page 56: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/56.jpg)
THE WORLD NEEDS
A STORAGE TECHNOLOGY THAT DOESN’T REQUIRE
AN INDUSTRIAL
MANUFACTURING PROCESS
![Page 57: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/57.jpg)
SAGE WEIL
§ Co-founder of DreamHost
§ Inventor of Ceph
§ CEO of Inktank
![Page 58: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/58.jpg)
OPEN SOURCE
philosophy design
![Page 59: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/59.jpg)
OPEN SOURCE SPREADS IDEAS orchidgalore, Flickr / CC BY 2.0
![Page 60: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/60.jpg)
OPEN SOURCE
COMMUNITY-FOCUSED
philosophy design
![Page 61: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/61.jpg)
WE ARE SMARTER TOGETHER rturk, Linkedin Inmap
![Page 62: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/62.jpg)
CEPH BELONGS TO ALL OF US wackybadger, Flickr / CC BY 2.0
![Page 63: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/63.jpg)
OPEN SOURCE
COMMUNITY-FOCUSED
SCALABLE
philosophy design
![Page 64: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/64.jpg)
CEPH IS BUILT TO SCALE
Too much for a book
Too much for a drive
Too much for a computer
Too much for a room
Ceph
Too much for a cave
![Page 65: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/65.jpg)
OPEN SOURCE
COMMUNITY-FOCUSED
SCALABLE
NO SINGLE POINT OF FAILURE
philosophy design
![Page 66: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/66.jpg)
ARILOMAX CALIFORNICUS aroid, Flickr / CC BY 2.0
![Page 67: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/67.jpg)
THE OCTOPUS (A METAPHOR) I love speaking in metaphors.
single point of failure
highly-available replicated
![Page 68: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/68.jpg)
THE BEEHIVE (ANOTHER METAPHOR) blumenbiene, Flickr / CC BY 2.0
![Page 69: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/69.jpg)
OPEN SOURCE
COMMUNITY-FOCUSED
SCALABLE
NO SINGLE POINT OF FAILURE
SOFTWARE BASED
philosophy design
![Page 70: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/70.jpg)
DC
DC
DC
DC
D
C
DC
DC
DC
DC
DC
DC
DC
C++
![Page 71: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/71.jpg)
DC
DC
DC
DC
D
C
DC
DC
DC
DC
DC
DC
DC
C++ ✔
![Page 72: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/72.jpg)
OPEN SOURCE
COMMUNITY-FOCUSED
SCALABLE
NO SINGLE POINT OF FAILURE
SOFTWARE BASED
SELF-MANAGING
philosophy design
![Page 73: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/73.jpg)
DISKS = JUST T INY RECORD PLAYERS jon_a_ross, Flickr / CC BY 2.0
![Page 74: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/74.jpg)
D
55 times / day
= D
D D
x 1 MILLION
D D
D D
![Page 75: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/75.jpg)
![Page 76: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/76.jpg)
IT ALL STARTED WITH A DREAM
![Page 77: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/77.jpg)
+
![Page 78: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/78.jpg)
NEW MONTHLY CODE COMMITS
0
100
200
300
400
500
600
700
2004-06 2005-07 2006-07 2007-07 2008-07 2009-07 2010-07 2011-07
![Page 79: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/79.jpg)
CEPH STARTS POPPING UP!
(sorry about all the logo tampering)
![Page 80: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/80.jpg)
RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
LIBRADOS A library allowing apps to directly access RADOS, with support for C, C++, Java, Python, Ruby, and PHP
RBD A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver
CEPH FS A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE
RADOSGW A bucket-based REST gateway, compatible with S3 and Swift
APP APP HOST/VM CLIENT
![Page 81: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/81.jpg)
RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
LIBRADOS A library allowing apps to directly access RADOS, with support for C, C++, Java, Python, Ruby, and PHP
RBD A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver
CEPH FS A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE
RADOSGW A bucket-based REST gateway, compatible with S3 and Swift
APP APP HOST/VM CLIENT
![Page 82: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/82.jpg)
DISK
FS
DISK DISK
OSD
DISK DISK
OSD OSD OSD OSD
FS FS FS FS btrfs xfs ext4
M M M
![Page 83: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/83.jpg)
M
M
M
HUMAN
![Page 84: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/84.jpg)
Monitors: § Maintain cluster map § Provide consensus for
distributed decision-making
§ Must have an odd number § These do not serve stored
objects to clients
M
OSDs: § One per disk
(recommended) § At least three in a cluster § Serve stored objects to
clients § Intelligently peer to perform
replication tasks § Supports object classes
![Page 85: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/85.jpg)
RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
LIBRADOS A library allowing apps to directly access RADOS, with support for C, C++, Java, Python, Ruby, and PHP
RBD A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver
CEPH FS A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE
RADOSGW A bucket-based REST gateway, compatible with S3 and Swift
APP APP HOST/VM CLIENT
![Page 86: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/86.jpg)
LIBRADOS
M
M
M
APP
native
![Page 87: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/87.jpg)
L
87
LIBRADOS § Provides direct access to
RADOS for applications § C, C++, Python, PHP,
Java § No HTTP overhead
![Page 88: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/88.jpg)
RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
LIBRADOS A library allowing apps to directly access RADOS, with support for C, C++, Java, Python, Ruby, and PHP
RBD A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver
CEPH FS A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE
RADOSGW A bucket-based REST gateway, compatible with S3 and Swift
APP APP HOST/VM CLIENT
![Page 89: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/89.jpg)
M
M
M
native
REST
APP
LIBRADOS RADOSGW
LIBRADOS RADOSGW
APP
![Page 90: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/90.jpg)
RADOS Gateway: § REST-based interface to
RADOS § Supports buckets,
accounting § Compatible with S3 and
Swift applications
![Page 91: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/91.jpg)
RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
LIBRADOS A library allowing apps to directly access RADOS, with support for C, C++, Java, Python, Ruby, and PHP
CEPH FS A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE
RADOSGW A bucket-based REST gateway, compatible with S3 and Swift
APP APP HOST/VM CLIENT
RBD A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver
![Page 92: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/92.jpg)
M
M
M
VM
LIBRADOS LIBRBD
VIRTUALIZATION CONTAINER
![Page 93: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/93.jpg)
LIBRADOS
M
M
M
LIBRBD CONTAINER
LIBRADOS LIBRBD
CONTAINER VM
![Page 94: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/94.jpg)
LIBRADOS
M
M
M
KRBD (KERNEL MODULE) HOST
![Page 95: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/95.jpg)
RADOS Block Device: § Storage of virtual disks in
RADOS § Allows decoupling of VMs
and containers § Live migration!
§ Images are striped across the cluster
§ Boot support in QEMU, KVM, and OpenStack Nova
§ Mount support in the Linux kernel
![Page 96: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/96.jpg)
RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
LIBRADOS A library allowing apps to directly access RADOS, with support for C, C++, Java, Python, Ruby, and PHP
RBD A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver
CEPH FS A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE
RADOSGW A bucket-based REST gateway, compatible with S3 and Swift
APP APP HOST/VM CLIENT
![Page 97: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/97.jpg)
M
M
M
CLIENT
01 10
data metadata
![Page 98: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/98.jpg)
Metadata Server § Manages metadata for a
POSIX-compliant shared filesystem § Directory hierarchy § File metadata (owner,
timestamps, mode, etc.) § Stores metadata in RADOS § Does not serve file data to
clients § Only required for shared
filesystem
![Page 99: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/99.jpg)
WHAT MAKES CEPH UNIQUE?
![Page 100: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/100.jpg)
HOW DO YOU F IND YOUR KEYS? azmeen, Flickr / CC BY 2.0
![Page 101: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/101.jpg)
APP ??
D C
D C
D C
D C
D C
D C
D C
D C
D C
D C
D C
D C
![Page 102: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/102.jpg)
APP
D C
D C
D C
D C
D C
D C
D C
D C
D C
D C
D C
D C
A-G
H-N
O-T
U-Z
F*
![Page 103: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/103.jpg)
I ALWAYS PUT MY KEYS ON THE HOOK vitamindave, Flickr / CC BY 2.0
![Page 104: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/104.jpg)
APP
D C
D C
D C
D C
D C
D C
D C
D C
D C
D C
D C
D C
![Page 105: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/105.jpg)
DEAR DIARY: KEYS = IN THE KITCHEN Barnaby, Flickr / CC BY 2.0
![Page 106: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/106.jpg)
HOW DO YOU FIND YOUR KEYS
WHEN YOUR HOUSE IS
INFINITELY BIG AND
ALWAYS CHANGING?
![Page 107: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/107.jpg)
THE ANSWER: CRUSH!! pasukaru76, Flickr / CC SA 2.0
![Page 108: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/108.jpg)
10 10 01 01 10 10 01 11 01 10
10 10 01 01 10 10 01 11 01 10
hash(object name) % num pg
CRUSH(pg, cluster state, rule set)
![Page 109: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/109.jpg)
10 10 01 01 10 10 01 11 01 10
10 10 01 01 10 10 01 11 01 10
![Page 110: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/110.jpg)
CRUSH § Pseudo-random placement
algorithm § Ensures even distribution § Repeatable, deterministic § Rule-based configuration
§ Replica count § Infrastructure topology § Weighting
![Page 111: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/111.jpg)
CLIENT
??
![Page 112: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/112.jpg)
![Page 113: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/113.jpg)
![Page 114: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/114.jpg)
CLIENT
??
![Page 115: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/115.jpg)
LIBRADOS
M
M
M
VM
LIBRBD VIRTUALIZATION CONTAINER
![Page 116: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/116.jpg)
HOW DO YOU SPIN UP
THOUSANDS OF VMs INSTANTLY
AND EFFICIENTLY?
![Page 117: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/117.jpg)
144 0 0 0 0
instant copy
= 144
![Page 118: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/118.jpg)
4 144
CLIENT
write
write
write
= 148
write
![Page 119: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/119.jpg)
4 144
CLIENT read
read
read
= 148
![Page 120: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/120.jpg)
HOW DO YOU MANAGE
DIRECTORY HEIRARCHY WITHOUT
A SINGLE POINT OF FAILURE?
![Page 121: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/121.jpg)
FILESYSTEMS REQUIRE METADATA Barnaby, Flickr / CC BY 2.0
![Page 122: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/122.jpg)
M
M
M
CLIENT
01 10
![Page 123: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/123.jpg)
M
M
M
![Page 124: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/124.jpg)
one tree
three metadata servers
??
![Page 125: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/125.jpg)
![Page 126: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/126.jpg)
![Page 127: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/127.jpg)
![Page 128: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/128.jpg)
![Page 129: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/129.jpg)
DYNAMIC SUBTREE PARTITIONING
![Page 130: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/130.jpg)
AND NOW BACKPEDALING
![Page 131: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/131.jpg)
ALMOST EVERYTHING
WORKS
![Page 132: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/132.jpg)
RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
LIBRADOS A library allowing apps to directly access RADOS, with support for C, C++, Java, Python, Ruby, and PHP
RBD A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver
RADOSGW A bucket-based REST gateway, compatible with S3 and Swift
APP APP HOST/VM CLIENT
CEPH FS A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE
NEARLY AWESOME
AWESOME AWESOME
AWESOME
AWESOME
![Page 133: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/133.jpg)
LAN SCALE!! *
* OR REALLY REALLY SCARY FAST WAN
![Page 134: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/134.jpg)
CEPH AND CLOUDSTACK tableatny, Flickr / CC BY 2.0
![Page 135: BACD LA 2013 - Scaling Storage with Ceph](https://reader034.fdocuments.net/reader034/viewer/2022042813/544f33c5af7959c4068b5e5d/html5/thumbnails/135.jpg)
RBD SUPPORT IN CLOUDSTACK
§ Just announced two weeks ago! § Allows storage of virtual disks inside RADOS
§ Works with KVM only right now § No volume snapshots yet
§ Requires the latest version of, um, everything § More information can be found on the mailing list:
§ ceph-devel / incubator-cloudstack-dev: http://article.gmane.org/gmane.comp.file-systems.ceph.devel/7505