Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt
-
Upload
inktank -
Category
Technology
-
view
2.273 -
download
4
description
Transcript of Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt
![Page 1: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/1.jpg)
Building Tomorrow's CephSage Weil
![Page 2: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/2.jpg)
![Page 3: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/3.jpg)
![Page 4: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/4.jpg)
![Page 5: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/5.jpg)
![Page 6: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/6.jpg)
![Page 7: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/7.jpg)
![Page 8: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/8.jpg)
![Page 9: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/9.jpg)
Research beginnings
9
![Page 10: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/10.jpg)
![Page 11: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/11.jpg)
UCSC research grant
“Petascale object storage” US Dept of Energy: LANL, LLNL, Sandia
Scalability
Reliability
Performance Raw IO bandwidth, metadata ops/sec
HPC file system workloads Thousands of clients writing to same file, directory
![Page 12: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/12.jpg)
Distributed metadata management
Innovative design Subtree-based partitioning for locality, efficiency
Dynamically adapt to current workload
Embedded inodes
Prototype simulator in Java (2004)
First line of Ceph code Summer internship at LLNL
High security national lab environment
Could write anything, as long as it was OSS
![Page 13: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/13.jpg)
The rest of Ceph
RADOS – distributed object storage cluster (2005)
EBOFS – local object storage (2004/2006)
CRUSH – hashing for the real world (2005)
Paxos monitors – cluster consensus (2006)
→ emphasis on consistent, reliable storage
→ scale by pushing intelligence to the edges
→ a different but compelling architecture
![Page 14: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/14.jpg)
Click to edit the outline text format
Second Outline Level
Third Outline Level Fourth Outline Level
Fifth Outline Level Sixth Outline Level Seventh Outline Level Eighth Outline Level
Ninth Outline LevelClick to edit Master text styles
![Page 15: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/15.jpg)
Industry black hole
Many large storage vendors Proprietary solutions that don't scale well
Few open source alternatives (2006) Very limited scale, or
Limited community and architecture (Lustre)
No enterprise feature sets (snapshots, quotas)
PhD grads all built interesting systems... ...and then went to work for Netapp, DDN, EMC, Veritas.
They want you, not your project
![Page 16: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/16.jpg)
A different path
Change the world with open source Do what Linux did to Solaris, Irix, Ultrix, etc.
What could go wrong?
License GPL, BSD...
LGPL: share changes, okay to link to proprietary code
Avoid community un-friendly practices No dual licensing
No copyright assignment
![Page 17: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/17.jpg)
Incubation
17
![Page 18: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/18.jpg)
![Page 19: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/19.jpg)
DreamHost!
Move back to Los Angeles, continue hacking
Hired a few developers
Pure development
No deliverables
![Page 20: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/20.jpg)
Ambitious feature set
Native Linux kernel client (2007-)
Per-directory snapshots (2008)
Recursive accounting (2008)
Object classes (2009)
librados (2009)
radosgw (2009)
strong authentication (2009)
RBD: rados block device (2010)
![Page 21: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/21.jpg)
The kernel client
ceph-fuse was limited, not very fast
Build native Linux kernel implementation
Began attending Linux file system developer events (LSF)
Early words of encouragement from ex-Lustre devs
Engage Linux fs developer community as peer
Eventually merged CephFS client for v2.6.34 (early 2010)
RBD client merged in 2011
![Page 22: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/22.jpg)
Part of a larger ecosystem
Ceph need not solve all problems as monolithic stack
Replaced ebofs object file system with btrfs Same design goals
Robust, well optimized
Kernel-level cache management
Copy-on-write, checksumming, other goodness
Contributed some early functionality Cloning files
Async snapshots
![Page 23: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/23.jpg)
Budding community
#ceph on irc.oftc.net, [email protected]
Many interested users
A few developers
Many fans
Too unstable for any real deployments
Still mostly focused on right architecture and technical solutions
![Page 24: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/24.jpg)
Road to product
DreamHost decides to build an S3-compatible object storage service with Ceph
Stability Focus on core RADOS, RBD, radosgw
Paying back some technical debt Build testing automation
Code review!
Expand engineering team
![Page 25: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/25.jpg)
The reality
Growing incoming commercial interest Early attempts from organizations large and small
Difficult to engage with a web hosting company
No means to support commercial deployments
Project needed a company to back it Fund the engineering effort
Build and test a product
Support users
Bryan built a framework to spin out of DreamHost
![Page 26: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/26.jpg)
Launch
26
![Page 27: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/27.jpg)
![Page 28: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/28.jpg)
Do it right
How do we build a strong open source company?
How do we build a strong open source community?
Models? RedHat, Cloudera, MySQL, Canonical, …
Initial funding from DreamHost, Mark Shuttleworth
![Page 29: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/29.jpg)
Goals
A stable Ceph release for production deployment DreamObjects
Lay foundation for widespread adoption Platform support (Ubuntu, Redhat, SuSE)
Documentation
Build and test infrastructure
Build a sales and support organization
Expand engineering organization
![Page 30: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/30.jpg)
Branding
Early decision to engage professional agency MetaDesign
Terms like “Brand core”
“Design system”
Keep project and company independent Inktank != Ceph
The Future of Storage
![Page 31: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/31.jpg)
Click to edit the outline text format
Second Outline Level
Third Outline Level
Fourth Outline Level
Fifth Outline Level Sixth Outline Level
Seventh Outline LevelClick to edit Master text styles
Slick graphics broken powerpoint template 31
![Page 32: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/32.jpg)
Today: adoption
32
![Page 33: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/33.jpg)
![Page 34: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/34.jpg)
Traction
Too many production deployments to count We don't know about most of them!
Too many customers (for me) to count
Expansive partner list Lots of inbound
Lots of press and buzz
![Page 35: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/35.jpg)
Quality
Increased adoption means increased demands on robust testing
Across multiple platforms
Upgrades Rolling upgrades
Inter-version compatibility
![Page 36: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/36.jpg)
Developer community
Significant external contributors Many full-time contributors outside of Inktank
First-class feature contributions from contributors
Non-Inktank participants in daily stand-ups
External access to build/test lab infrastructure
Common toolset Github
Email (kernel.org)
IRC (oftc.net)
Linux distros
![Page 37: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/37.jpg)
CDS: Ceph Developer Summit
Community process for building project roadmap
100% online Google hangouts
Wikis
Etherpad
Quarterly Our 4th CDS next week
Great participation
Ongoing indoctrination of Inktank engineers to open development model
![Page 38: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/38.jpg)
Erasure coding
Replication for redundancy is flexible and fast
For larger clusters, it can be expensive
Erasure coded data is hard to modify, but ideal for cold or read-only objects
Will be used directly by radosgw
Coexists with new tiering capability
Storage overhead
Repair traffic
MTTDL (days)
3x replication 3x 1x 2.3 E10
RS (10, 4) 1.4x 10x 3.3 E13
LRC (10, 6, 5) 1.6x 5x 1.2 E15
![Page 39: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/39.jpg)
Tiering
Client side caches are great, but only buy so much.
Separate hot and cold data onto different storage devices
Promote hot objects into a faster (e.g., flash-backed) cache pool
Push cold object back into slower (e.g., erasure-coded) base pool
Use bloom filters to track temperature
Common in enterprise solutions; not found in open source scale-out systems
→ new (with erasure coding) in Firefly release
![Page 40: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/40.jpg)
The Future
40
![Page 41: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/41.jpg)
Technical roadmap
How do we reach new use-cases and users
How do we better satisfy existing users
How do we ensure Ceph can succeed in enough markets for supporting organizations to thrive
Enough breadth to expand and grow the community
Enough focus to do well
![Page 42: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/42.jpg)
Multi-datacenter, geo-replication
Ceph was originally designed for single DC clusters Synchronous replication
Strong consistency
Growing demand Enterprise: disaster recovery
ISPs: replication data across sites for locality
Two strategies: use-case specific: radosgw, RBD
low-level capability in RADOS
![Page 43: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/43.jpg)
RGW: Multi-site and async replication
Multi-site, multi-cluster Regions: east coast, west coast, etc.
Zones: radosgw sub-cluster(s) within a region
Can federate across same or multiple Ceph clusters
Sync user and bucket metadata across regions Global bucket/user namespace, like S3
Synchronize objects across zones Within the same region
Across regions
Admin control over which zones are master/slave
![Page 44: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/44.jpg)
RBD: block devices
Today: backup capability Based on block device snapshots
Efficiently mirror changes between consecutive snapshots across clusters
Now supported/orchestrated by OpenStack
Good for coarse synchronization (e.g., hours or days)
Tomorrow: data journaling for async mirroring Pending blueprint at next week's CDS
Mirror active block device to remote cluster
Possibly with some configurable delay
![Page 45: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/45.jpg)
Async replication in RADOS
One implementation to capture multiple use-cases RBD, CephFS, RGW, … RADOS
A harder problem Scalable: 1000s OSDs → 1000s of OSDs
Point-in-time consistency
Challenging research problem
→ Ongoing design discussion among developers
![Page 46: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/46.jpg)
CephFS
→ This is where it all started – let's get there
Today Stabilization of multi-MDS, directory fragmentation, QA
NFS, CIFS, Hadoop/HDFS bindings complete but not productized
Need Greater QA investment
Fsck
Snapshots
Amazing community effort (Intel, NUDT and Kylin) 2014 is the year
![Page 47: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/47.jpg)
Governance
How do we strengthen the project community?
2014 is the year
Recognized project leads RBD, RGW, RADOS, CephFS, ...
Formalize emerging processes around CDS, community roadmap
External foundation?
![Page 48: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/48.jpg)
The larger ecosystem
![Page 49: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/49.jpg)
The enterprise
How do we pay for all of this?
Support legacy and transitional client/server interfaces
iSCSI, NFS, pNFS, CIFS, S3/Swift
VMWare, Hyper-V
Identify the beachhead use-cases Earn others later
Single platform – shared storage resource
Bottom-up: earn respect of engineers and admins
Top-down: strong brand and compelling product
![Page 50: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/50.jpg)
Why Ceph is the Future of Storage
It is hard to compete with free and open source software
Unbeatable value proposition
Ultimately a more efficient development model
It is hard to manufacture community
Strong foundational architecture
Next-generation protocols, Linux kernel support Unencumbered by legacy protocols like NFS
Move from client/server to client/cluster
Ongoing paradigm shift Software defined infrastructure, data center
Widespread demand for open platforms
![Page 51: Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt](https://reader037.fdocuments.net/reader037/viewer/2022102900/54b823334a795940358b461f/html5/thumbnails/51.jpg)
Click to edit the outline text format
Second Outline Level
Third Outline Level
Fourth Outline Level
Fifth Outline Level Sixth Outline Level
Seventh Outline LevelClick to edit Master text styles
Thank you, and Welcome!