Ceph Day London 2014 - The current state of CephFS development

86
CephFS Update John Spray [email protected] Ceph Day London

description

John Spray, Red Hat

Transcript of Ceph Day London 2014 - The current state of CephFS development

Page 1: Ceph Day London 2014 - The current state of CephFS development

CephFS Update

John [email protected]

Ceph Day London

Page 2: Ceph Day London 2014 - The current state of CephFS development

2 Ceph Day London - CephFS Update

Agenda

● Introduction to distributed filesystems

● Architectural overview

● Recent development

● Test & QA

Page 3: Ceph Day London 2014 - The current state of CephFS development

3Ceph Day London – CephFS Update

Distributed filesystems...and why they are hard.

Page 4: Ceph Day London 2014 - The current state of CephFS development

4 Ceph Day London - CephFS Update

Interfaces to storage

● Object● Ceph RGW, S3, Swift

● Block (aka SAN)● Ceph RBD, iSCSI, FC, SAS

● File (aka scale-out NAS)● Ceph, GlusterFS, Lustre, proprietary filers

Page 5: Ceph Day London 2014 - The current state of CephFS development

5 Ceph Day London - CephFS Update

Interfaces to storage

FILE SYSTEMCephFS

BLOCK STORAGE

RBD

OBJECT STORAGE

RGW

Keystone

Geo-Replication

Native API

Multi-tenant

S3 & Swift

OpenStack

Linux Kernel

iSCSI

Clones

Snapshots

CIFS/NFS

HDFS

Distributed Metadata

Linux Kernel

POSIX

Page 6: Ceph Day London 2014 - The current state of CephFS development

6 Ceph Day London - CephFS Update

Object stores scale out well

● Last writer wins consistency

● Consistency rules only apply to one object at a time

● Clients are stateless (unless explicitly doing lock ops)

● No relationships exist between objects

● Objects have exactly one name

● Scale-out accomplished by mapping objects to nodes

● Single objects may be lost without affecting others

Page 7: Ceph Day London 2014 - The current state of CephFS development

7 Ceph Day London - CephFS Update

POSIX filesystems are hard to scale out

● Extents written from multiple clients must win or lose on all-or-nothing basis → locking

● Inodes depend on one another (directory hierarchy)

● Clients are stateful: holding files open

● Users have local-filesystem latency expectations: applications assume FS client will do lots of metadata caching for them.

● Scale-out requires spanning inode/dentry relationships across servers

● Loss of data can damage whole subtrees

Page 8: Ceph Day London 2014 - The current state of CephFS development

8 Ceph Day London - CephFS Update

Failure cases increase complexity further

● What should we do when... ?● Filesystem is full● Client goes dark● An MDS goes dark● Memory is running low● Clients are competing for the same files● Clients misbehave

● Hard problems in distributed systems generally, especially hard when we have to uphold POSIX semantics designed for local systems.

Page 9: Ceph Day London 2014 - The current state of CephFS development

9 Ceph Day London - CephFS Update

Terminology

● inode: a file. Has unique ID, may be referenced by one or more dentries.

● dentry: a link between an inode and a directory

● directory: special type of inode that has 0 or more child dentries

● hard link: many dentries referring to the same inode

● Terms originate form original (local disk) filesystems, where these were how a filesystem was represented on disk.

Page 10: Ceph Day London 2014 - The current state of CephFS development

10Ceph Day London – CephFS Update

Architectural overview

Page 11: Ceph Day London 2014 - The current state of CephFS development

11 Ceph Day London - CephFS Update

CephFS architecture

● Dynamically balanced scale-out metadata

● Inherit flexibility/scalability of RADOS for data

● POSIX compatibility

● Beyond POSIX: Subtree snapshots, recursive statistics

Weil, Sage A., et al. "Ceph: A scalable, high-performance distributed file system." Proceedings of the 7th symposium on Operating systems

design and implementation. USENIX Association, 2006.http://ceph.com/papers/weil-ceph-osdi06.pdf

Page 12: Ceph Day London 2014 - The current state of CephFS development

12Ceph Day London – CephFS Update

Components

● Client: kernel, fuse, libcephfs● Server: MDS daemon● Storage: RADOS cluster (mons & OSDs)

Page 13: Ceph Day London 2014 - The current state of CephFS development

13Ceph Day London – CephFS Update

Components

Linux host

M M

M

Ceph server daemons

ceph.ko

datametadata 0110

Page 14: Ceph Day London 2014 - The current state of CephFS development

14 Ceph Day London - CephFS Update

From application to disk

ceph-mds

libcephfsceph-fuse Kernel client

RADOS

Client network protocol

Application

Disk

Page 15: Ceph Day London 2014 - The current state of CephFS development

15Ceph Day London – CephFS Update

Scaling out FS metadata

● Options for distributing metadata?– by static subvolume

– by path hash

– by dynamic subtree

● Consider performance, ease of implementation

Page 16: Ceph Day London 2014 - The current state of CephFS development

16Ceph Day London – CephFS Update

DYNAMIC SUBTREE PARTITIONING

Page 17: Ceph Day London 2014 - The current state of CephFS development

17 Ceph Day London - CephFS Update

Dynamic subtree placement

● Locality: get the dentries in a dir from one MDS

● Support read heavy workloads by replicating non-authoritative copies (cached with capabilities just like clients do)

● In practice work at directory fragment level in order to handle large dirs

Page 18: Ceph Day London 2014 - The current state of CephFS development

18 Ceph Day London - CephFS Update

Data placement

● Stripe file contents across RADOS objects● get full rados cluster bandwidth from clients● delegate all placement/balancing to RADOS

● Control striping with layout vxattrs● layouts also select between multiple data pools

● Deletion is a special case: client deletions mark files 'stray', RADOS delete ops sent by MDS

Page 19: Ceph Day London 2014 - The current state of CephFS development

19 Ceph Day London - CephFS Update

Clients

● Two implementations:● ceph-fuse/libcephfs● kclient

● Interplay with VFS page cache, efficiency harder with fuse (extraneous stats etc)

● Client perf. matters, for single-client workloads

● Slow client can hold up others if it's hogging metadata locks: include clients in troubleshooting

● - future: want more per client perf stats and maybe metadata QoS per client. Clients probably group into jobs or workloads.

● - future: may want to tag client io with job id (eg hpc workload, samba client I'd, container/VM id)

Page 20: Ceph Day London 2014 - The current state of CephFS development

20 Ceph Day London - CephFS Update

Journaling and caching in MDS

● Metadata ops initially journaled to striped journal "file" in the metadata pool.

● I/O latency on metadata ops is sum of network latency and journal commit latency.

● Metadata remains pinned in in-memory cache until expired from journal.

Page 21: Ceph Day London 2014 - The current state of CephFS development

21 Ceph Day London - CephFS Update

Journaling and caching in MDS

● In some workloads we expect almost all metadata always in cache, in others its more of a stream.

● Control cache size with mds_cache_size

● Cache eviction relies on client cooperation

● MDS journal replay not only recovers data but also warms up cache. Use standby replay to keep that cache warm.

Page 22: Ceph Day London 2014 - The current state of CephFS development

22 Ceph Day London - CephFS Update

Lookup by inode

● Sometimes we need inode → path mapping:● Hard links● NFS handles

● Costly to store this: mitigate by piggybacking paths (backtraces) onto data objects

● Con: storing metadata to data pool● Con: extra IOs to set backtraces● Pro: disaster recovery from data pool

● Future: improve backtrace writing latency?

Page 23: Ceph Day London 2014 - The current state of CephFS development

23 Ceph Day London - CephFS Update

Extra features

● Snapshots:● Exploit RADOS snapshotting for file data● … plus some clever code in the MDS● Fast petabyte snapshots

● Recursive statistics● Lazily updated● Access via vxattr● Avoid spurious client I/O for df

Page 24: Ceph Day London 2014 - The current state of CephFS development

24 Ceph Day London - CephFS Update

Extra features

● Snapshots:● Exploit RADOS snapshotting for file data● … plus some clever code in the MDS● Fast petabyte snapshots

● Recursive statistics● Lazily updated● Access via vxattr● Avoid spurious client I/O for df

Page 25: Ceph Day London 2014 - The current state of CephFS development

25 Ceph Day London - CephFS Update

CephFS in practice

ceph-deploy mds create myserver

ceph osd pool create fs_data

ceph osd pool create fs_metadata

ceph fs new myfs fs_metadata fs_data

mount -t cephfs x.x.x.x:6789 /mnt/ceph

Page 26: Ceph Day London 2014 - The current state of CephFS development

26 Ceph Day London - CephFS Update

Managing CephFS clients

● New in giant: see hostnames of connected clients

● Client eviction is sometimes important:● Skip the wait during reconnect phase on MDS restart● Allow others to access files locked by crashed client

● Use OpTracker to inspect ongoing operations

Page 27: Ceph Day London 2014 - The current state of CephFS development

27 Ceph Day London - CephFS Update

CephFS tips

● Choose MDS servers with lots of RAM

● Investigate clients when diagnosing stuck/slow access

● Use recent Ceph and recent kernel

● Use a conservative configuration:● Single active MDS, plus one standby● Dedicated MDS server● Kernel client● No snapshots, no inline data

Page 28: Ceph Day London 2014 - The current state of CephFS development

28Ceph Day London – CephFS Update

Development update

Page 29: Ceph Day London 2014 - The current state of CephFS development

29Ceph Day London – CephFS Update

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APPAPP APPAPP HOST/VMHOST/VM CLIENTCLIENT

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

NEARLYAWESOME

AWESOMEAWESOME

AWESOME

AWESOME

Page 30: Ceph Day London 2014 - The current state of CephFS development

30 Ceph Day London - CephFS Update

Towards a production-ready CephFS

● Focus on resilience:

1. Don't corrupt things

2. Stay up

3. Handle the corner cases

4. When something is wrong, tell me

5. Provide the tools to diagnose and fix problems

● Achieve this first within a conservative single-MDS configuration

Page 31: Ceph Day London 2014 - The current state of CephFS development

31 Ceph Day London - CephFS Update

Giant → Hammer timeframe

● Initial online fsck (a.k.a. forward scrub)

● Online diagnostics (`session ls`, MDS health alerts)

● Journal resilience & tools (cephfs-journal-tool)

● flock in the FUSE client

● Initial soft quota support

● General resilience: full OSDs, full metadata cache

Page 32: Ceph Day London 2014 - The current state of CephFS development

32 Ceph Day London - CephFS Update

FSCK and repair

● Recover from damage:● Loss of data objects (which files are damaged?)● Loss of metadata objects (what subtree is damaged?)

● Continuous verification:● Are recursive stats consistent?● Does metadata on disk match cache?● Does file size metadata match data on disk?

● Repair:● Automatic where possible● Manual tools to enable support

Page 33: Ceph Day London 2014 - The current state of CephFS development

33 Ceph Day London - CephFS Update

Client management

● Current eviction is not 100% safe against rogue clients● Update to client protocol to wait for OSD blacklist

● Client metadata● Initially domain name, mount point● Extension to other identifiers?

Page 34: Ceph Day London 2014 - The current state of CephFS development

34 Ceph Day London - CephFS Update

Online diagnostics

● Bugs exposed relate to failures of one client to release resources for another client: “my filesystem is frozen”. Introduce new health messages:

● “client xyz is failing to respond to cache pressure”● “client xyz is ignoring capability release messages”● Add client metadata to allow us to give domain names

instead of IP addrs in messages.

● Opaque behavior in the face of dead clients. Introduce `session ls`

● Which clients does MDS think are stale?● Identify clients to evict with `session evict`

Page 35: Ceph Day London 2014 - The current state of CephFS development

35 Ceph Day London - CephFS Update

Journal resilience

● Bad journal prevents MDS recovery: “my MDS crashes on startup”:

● Data loss● Software bugs

● Updated on-disk format to make recovery from damage easier

● New tool: cephfs-journal-tool● Inspect the journal, search/filter● Chop out unwanted entries/regions

Page 36: Ceph Day London 2014 - The current state of CephFS development

36 Ceph Day London - CephFS Update

Handling resource limits

● Write a test, see what breaks!

● Full MDS cache:● Require some free memory to make progress● Require client cooperation to unpin cache objects● Anticipate tuning required for cache behaviour: what

should we evict?

● Full OSD cluster● Require explicit handling to abort with -ENOSPC

● MDS → RADOS flow control:● Contention between I/O to flush cache and I/O to journal

Page 37: Ceph Day London 2014 - The current state of CephFS development

37 Ceph Day London - CephFS Update

Test, QA, bug fixes

● The answer to “Is CephFS production ready?”

● teuthology test framework:● Long running/thrashing test● Third party FS correctness tests● Python functional tests

● We dogfood CephFS internally● Various kclient fixes discovered● Motivation for new health monitoring metrics

● Third party testing is extremely valuable

Page 38: Ceph Day London 2014 - The current state of CephFS development

38 Ceph Day London - CephFS Update

What's next?

● You tell us!

● Recent survey highlighted:● FSCK hardening● Multi-MDS hardening● Quota support

● Which use cases will matter to community?● Backup● Hadoop● NFS/Samba gateway● Other?

Page 39: Ceph Day London 2014 - The current state of CephFS development

39 Ceph Day London - CephFS Update

Reporting bugs

● Does the most recent development release or kernel fix your issue?

● What is your configuration? MDS config, Ceph version, client version, kclient or fuse

● What is your workload?

● Can you reproduce with debug logging enabled?

http://ceph.com/resources/mailing-list-irc/

http://tracker.ceph.com/projects/ceph/issues

http://ceph.com/docs/master/rados/troubleshooting/log-and-debug/

Page 40: Ceph Day London 2014 - The current state of CephFS development

40 Ceph Day London - CephFS Update

Future

● Ceph Developer Summit:● When: 8 October● Where: online

● Post-Hammer work:● Recent survey highlighted multi-MDS, quota support ● Testing with clustered Samba/NFS?

Page 41: Ceph Day London 2014 - The current state of CephFS development

41Ceph Day London – CephFS Update

Questions?

Page 42: Ceph Day London 2014 - The current state of CephFS development

42Ceph Day London – CephFS Update

Page 43: Ceph Day London 2014 - The current state of CephFS development

43 Ceph Day London - CephFS Update

Body slide design guidelines

● > 15 words per bullet

● If your slide is text-only, reserve at least 1/3 of the slide for white space.

● If you use a graphic, make sure text is readable.

Page 44: Ceph Day London 2014 - The current state of CephFS development

44 Ceph Day London - CephFS Update

Body slide design guidelines

● > 15 words per bullet

● If your slide is text-only, reserve at least 1/3 of the slide for white space.

● If you use a graphic, make sure text is readable.

Page 45: Ceph Day London 2014 - The current state of CephFS development

45 Ceph Day London - CephFS Update

Introduce Red Hat

● Create an agenda slide for every presentation.● Outline what you’re going to tell the audience.● Prepare them for a call to action after the presentation.

● If this is a confidential presentation, use the confidential presentation template located on the Corporate > Templates > Presentation templates page of the PNT Portal.

Page 46: Ceph Day London 2014 - The current state of CephFS development

46 Ceph Day London - CephFS Update

Introduce Red Hat solutions and services

● Provide product details that specifically solve the customer pain point you’re addressing.

● These slides explain how Red Hat solutions work, what makes them unique and valuable.

Page 47: Ceph Day London 2014 - The current state of CephFS development

47 Ceph Day London - CephFS Update

Learn more

● End with a call to action.

● Let the audience know what can be done next, how you or Red Hat can help them.

Page 48: Ceph Day London 2014 - The current state of CephFS development

48Ceph Day London – CephFS Update

Divider slide

Page 49: Ceph Day London 2014 - The current state of CephFS development

49Ceph Day London – CephFS Update

Divider slide

Page 50: Ceph Day London 2014 - The current state of CephFS development

50Ceph Day London – CephFS Update

Divider slide

Page 51: Ceph Day London 2014 - The current state of CephFS development

51Ceph Day London – CephFS Update

Divider slide

Page 52: Ceph Day London 2014 - The current state of CephFS development

52Ceph Day London – CephFS Update

Divider slide

Page 53: Ceph Day London 2014 - The current state of CephFS development

53Ceph Day London – CephFS Update

Divider slide

Page 54: Ceph Day London 2014 - The current state of CephFS development

54Ceph Day London – CephFS Update

Divider slide

Page 55: Ceph Day London 2014 - The current state of CephFS development

55Ceph Day London – CephFS Update

Divider slide

Page 56: Ceph Day London 2014 - The current state of CephFS development

56Ceph Day London – CephFS Update

Divider SlideDivider slide

Page 57: Ceph Day London 2014 - The current state of CephFS development

A STORAGE REVOLUTION

PROPRIETARY HARDWARE

PROPRIETARY SOFTWARE

SUPPORT & MAINTENANCE

COMPUTER DISK

COMPUTER DISK

COMPUTER DISK

STANDARDHARDWARE

OPEN SOURCE SOFTWARE

ENTERPRISEPRODUCTS &

SERVICES

COMPUTER DISK

COMPUTER DISK

COMPUTER DISK

Page 58: Ceph Day London 2014 - The current state of CephFS development

Copyright © 2014 by Inktank | Private and Confidential

ARCHITECTURAL COMPONENTS

58

RGWA web services

gateway for object storage, compatible

with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable, fully-distributed block device with cloud

platform integration

CEPHFSA distributed file

system with POSIX semantics and scale-

out metadata management

APP HOST/VM CLIENT

Page 59: Ceph Day London 2014 - The current state of CephFS development

Copyright © 2014 by Inktank | Private and Confidential

ARCHITECTURAL COMPONENTS

59

RGWA web services

gateway for object storage, compatible

with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable, fully-distributed block device with cloud

platform integration

CEPHFSA distributed file

system with POSIX semantics and scale-

out metadata management

APP HOST/VM CLIENT

Page 60: Ceph Day London 2014 - The current state of CephFS development

OBJECT STORAGE DAEMONS

60

FS

DISK

OSD

DISK

OSD

FS

DISK

OSD

FS

DISK

OSD

FS

btrfsxfsext4

M

M

M

Page 61: Ceph Day London 2014 - The current state of CephFS development

RADOS CLUSTER

61

APPLICATION

M M

M M

M

RADOS CLUSTER

Page 62: Ceph Day London 2014 - The current state of CephFS development

RADOS COMPONENTS

62

OSDs: 10s to 10000s in a cluster One per disk (or one per SSD, RAID group…) Serve stored objects to clients Intelligently peer for replication & recovery

Monitors: Maintain cluster membership and state Provide consensus for distributed decision-

making Small, odd number These do not serve stored objects to clients

M

Page 63: Ceph Day London 2014 - The current state of CephFS development

WHERE DO OBJECTS LIVE?

63

??

APPLICATION

M

M

M

OBJECT

Page 64: Ceph Day London 2014 - The current state of CephFS development

A METADATA SERVER?

64

1

APPLICATION

M

M

M

2

Page 65: Ceph Day London 2014 - The current state of CephFS development

CALCULATED PLACEMENT

65

FAPPLICATION

M

M

MA-G

H-N

O-T

U-Z

Page 66: Ceph Day London 2014 - The current state of CephFS development

EVEN BETTER: CRUSH!

66

RADOS CLUSTER

OBJECT

10

01

01

10

10

01

11

01

10

01

01

10

10

01 11

01

1001

0110 10 01

11

01

Page 67: Ceph Day London 2014 - The current state of CephFS development

CRUSH IS A QUICK CALCULATION

67

RADOS CLUSTER

OBJECT

10

01

01

10

10

01 11

01

1001

0110 10 01

11

01

Page 68: Ceph Day London 2014 - The current state of CephFS development

CRUSH: DYNAMIC DATA PLACEMENT

68

CRUSH: Pseudo-random placement algorithm

Fast calculation, no lookup Repeatable, deterministic

Statistically uniform distribution Stable mapping

Limited data migration on change Rule-based configuration

Infrastructure topology aware Adjustable replication Weighting

Page 69: Ceph Day London 2014 - The current state of CephFS development

Copyright © 2014 by Inktank | Private and Confidential

ARCHITECTURAL COMPONENTS

69

RGWA web services

gateway for object storage, compatible

with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable, fully-distributed block device with cloud

platform integration

CEPHFSA distributed file

system with POSIX semantics and scale-

out metadata management

APP HOST/VM CLIENT

Page 70: Ceph Day London 2014 - The current state of CephFS development

ACCESSING A RADOS CLUSTER

70

APPLICATION

M M

M

RADOS CLUSTER

LIBRADOS

OBJECT

socket

Page 71: Ceph Day London 2014 - The current state of CephFS development

L

LIBRADOS: RADOS ACCESS FOR APPS

71

LIBRADOS: Direct access to RADOS for applications C, C++, Python, PHP, Java, Erlang Direct access to storage nodes No HTTP overhead

Page 72: Ceph Day London 2014 - The current state of CephFS development

Copyright © 2014 by Inktank | Private and Confidential

ARCHITECTURAL COMPONENTS

72

RGWA web services

gateway for object storage, compatible

with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable, fully-distributed block device with cloud

platform integration

CEPHFSA distributed file

system with POSIX semantics and scale-

out metadata management

APP HOST/VM CLIENT

Page 73: Ceph Day London 2014 - The current state of CephFS development

THE RADOS GATEWAY

73

M M

M

RADOS CLUSTER

RADOSGW

LIBRADOS

socket

RADOSGW

LIBRADOS

APPLICATION APPLICATION

REST

Page 74: Ceph Day London 2014 - The current state of CephFS development

RADOSGW MAKES RADOS WEBBY

74

RADOSGW: REST-based object storage proxy Uses RADOS to store objects API supports buckets, accounts Usage accounting for billing Compatible with S3 and Swift applications

Page 75: Ceph Day London 2014 - The current state of CephFS development

Copyright © 2014 by Inktank | Private and Confidential

ARCHITECTURAL COMPONENTS

75

RGWA web services

gateway for object storage, compatible

with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable, fully-distributed block device with cloud

platform integration

CEPHFSA distributed file

system with POSIX semantics and scale-

out metadata management

APP HOST/VM CLIENT

Page 76: Ceph Day London 2014 - The current state of CephFS development

STORING VIRTUAL DISKS

76

M M

RADOS CLUSTER

HYPERVISOR

LIBRBD

VM

Page 77: Ceph Day London 2014 - The current state of CephFS development

SEPARATE COMPUTE FROM STORAGE

77

M M

RADOS CLUSTER

HYPERVISOR

LIBRBDVM

HYPERVISOR

LIBRBD

Page 78: Ceph Day London 2014 - The current state of CephFS development

KERNEL MODULE FOR MAX FLEXIBLE!

78

M M

RADOS CLUSTER

LINUX HOST

KRBD

Page 79: Ceph Day London 2014 - The current state of CephFS development

RBD STORES VIRTUAL DISKS

79

RADOS BLOCK DEVICE: Storage of disk images in RADOS Decouples VMs from host Images are striped across the cluster (pool) Snapshots Copy-on-write clones Support in:

Mainline Linux Kernel (2.6.39+) Qemu/KVM, native Xen coming soon OpenStack, CloudStack, Nebula, Proxmox

Page 80: Ceph Day London 2014 - The current state of CephFS development

Copyright © 2014 by Inktank | Private and Confidential

ARCHITECTURAL COMPONENTS

80

RGWA web services

gateway for object storage, compatible

with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable, fully-distributed block device with cloud

platform integration

CEPHFSA distributed file

system with POSIX semantics and scale-

out metadata management

APP HOST/VM CLIENT

Page 81: Ceph Day London 2014 - The current state of CephFS development

SEPARATE METADATA SERVER

81

LINUX HOST

M M

M

RADOS CLUSTER

KERNEL MODULE

datametadata 0110

Page 82: Ceph Day London 2014 - The current state of CephFS development

SCALABLE METADATA SERVERS

82

METADATA SERVER Manages metadata for a POSIX-compliant

shared filesystem Directory hierarchy File metadata (owner, timestamps, mode,

etc.) Stores metadata in RADOS Does not serve file data to clients Only required for shared filesystem

Page 83: Ceph Day London 2014 - The current state of CephFS development

CEPH AND OPENSTACK

83

RADOSGWLIBRADOS

M M

RADOS CLUSTER

OPENSTACK

KEYSTONE CINDER GLANCE

NOVASWIFTLIBRB

DLIBRB

D

HYPER- VISOR

LIBRBD

Page 84: Ceph Day London 2014 - The current state of CephFS development

Read about the latest version of Ceph. The latest stuff is always at http://ceph.com/get

Deploy a test cluster using ceph-deploy. Read the quick-start guide at http://ceph.com/qsg

Read the rest of the docs! Find docs for the latest release at http://ceph.com/docs

Ask for help when you get stuck! Community volunteers are waiting for you at

http://ceph.com/help

Copyright © 2014 by Inktank | Private and Confidential

GETTING STARTED WITH CEPH

84

Page 85: Ceph Day London 2014 - The current state of CephFS development

85Ceph Day London – CephFS Update

Page 86: Ceph Day London 2014 - The current state of CephFS development

86Ceph Day London – CephFS Update