XSKY - ceph luminous update

24
CEPH LUMINOUS UPDATE XSKY Haomai Wang 2017.06.06

Transcript of XSKY - ceph luminous update

Page 1: XSKY - ceph luminous update

CEPH LUMINOUS UPDATEXSKYHaomai Wang

2017.06.06

Page 2: XSKY - ceph luminous update

• Haomai Wang,activeCeph contributor• Maintainmulticomponents• XSKYCTO,aChina-basedstoragestartup• [email protected]/[email protected]

WhoAmI

Page 3: XSKY - ceph luminous update
Page 4: XSKY - ceph luminous update

• Hammerv0.94.x(LTS)– March '15• Infernalis v9.2.x– November'15• Jewelv10.2.x(LTS)– April'16• Krakenv11.2.x– December'16• Luminous v12.2.x(LTS)– September‘17(delay)

Releases

Page 5: XSKY - ceph luminous update

Ceph Ecosystem

Page 6: XSKY - ceph luminous update

• BlueStore=Block+NewStore• Key/valuedatabase(RocksDB)formetadata• Alldatawrittendirectlytorawblockdevice(s)

• FastonbothHDDs(~2x)andSSDs(~1.5x)– SimilartoFileStore onNVMe,wherethedeviceisnotthebottleneck

• Fulldatachecksums(crc32c,xxhash,etc.)• Inlinecompression(zlib,snappy,zstd)• Stableanddefault

RADOS- BlueStore

Page 7: XSKY - ceph luminous update

• requiresBlueStore toperformreasonably• signicant improvementineffciency over3xreplication• 2+2→2x4+2→1.5x• smallwritesslowerthanreplication– earlytestingshowed4+2isabouthalfasfastas3xreplication

• largewritesfasterthanreplication– lessIOtodevice

• implementationstilldoesthe“simple”thing– allwritesupdateafullstripe

RADOS– RBDOverErasureCode

Page 8: XSKY - ceph luminous update

• ceph-mgr– newmanagementdaemontosupplementceph-mon(monitor)– easierintegrationpointforpythonmanagementlogic– integratedmetrics

• makeceph-monscalableagain– offloadpg statsfrommontomgr– pushto10KOSDs(planned“bigbang3”@CERN)

• newRESTAPI– pecan– basedonpreviousCalamariAPI

• built-inwebdashbard

CEPH-MGR

Page 9: XSKY - ceph luminous update

AsyncMessenger

• AsyncMessenger– CoreLibraryincludedbyallcomponents– KernelTCP/IPdriver– Epoll/Kqueue Drive–Maintainconnectionlifecycleandsession– replacesagingSimpleMessenger– fixedsizethreadpool(vs2threadspersocket)– scalesbettertolargerclusters– morehealthyrelationshipwithtcmalloc– nowthedefault!

Page 10: XSKY - ceph luminous update

DPDKSupport

• BuiltforHighPerformance– DPDK– SPDK– FulluserspaceIOpath– Shared-nothingTCP/IPStack(Seastarrefer)

Page 11: XSKY - ceph luminous update

• RDMAbackend– InheritNetworkStack andimplementRDMAStack– Usinguser-spaceverbsdirectly– TCPascontrolpath– ExchangemessageusingRDMASEND– Usingsharedreceivequeue– Multipleconnectionqp’s inmany-to-manytopology– Built-inintocephmaster– AllFeaturesarefullyavailoncephmaster

• Support:– RH/centos– INFINIBANDandETH– Roce V2forcrosssubnet– Front-endTCPandback-endRDMA

RDMASupport

Page 12: XSKY - ceph luminous update

Plugin Default HardwareRequirement

Performance Compatible OSDStorageEngineRequirement

OSDDiskBackendRequirement

Posix(Kernel) YES None Middle TCP/IPCompatible None None

DPDK+UserspaceTCP/IP

NO DPDKSupportedNIC High TCP/IPCompatible BlueStore MustbeNVMESSD

RDMA NO RDMASupportedNIC

High RDMA SupportedNetwork

None None

MessengerPlugins

Page 13: XSKY - ceph luminous update

RecoveryImprovements

Page 14: XSKY - ceph luminous update

RBD- iSCSI

• TCMU-RUNNER+LIBRBD• LIO+KernelRBD

Page 15: XSKY - ceph luminous update

RBDMirrorHA

Page 16: XSKY - ceph luminous update

RGWMETADATASEARCH

Page 17: XSKY - ceph luminous update

RGWMISC

• NFSgateway– NFSv4andv3– fullobjectaccess(notgeneralpurpose!)

• dynamicbucketindexsharding– automatic(nally!)

• inlinecompression• Encryption– followsS3encryptionAPIs

• S3andSwiftAPIoddsandends

NFS-Client

nfs-ganesha(nfs-v4)

librgw-file

RADOS

NFS-Server RadosGW

Apps

rados api

S3API SwiftAPI

rados apiRadosHandler

Page 18: XSKY - ceph luminous update

• multipleactiveMDSdaemons(nally!)• subtreepinningtospecificdaemon• directoryfragmentationonbydefault• (snapshotsstillobydefault)somanytests• somanybugsfixed• kernelclientimprovements

CephFS

Page 19: XSKY - ceph luminous update

CephFS – MultiMDS

Page 20: XSKY - ceph luminous update

Container

Page 21: XSKY - ceph luminous update

• Rados– IOPathRefactor– BlueStore Peformance

• QoS– dmclock

• Dedup– basedonTiering

• Tiering

Future

Page 22: XSKY - ceph luminous update

GrowingDeveloperCommunity

Page 23: XSKY - ceph luminous update

HowToHelp

Page 24: XSKY - ceph luminous update

Thank you