XSKY - ceph luminous update

Post on 22-Jan-2018

444 views 0 download

Transcript of XSKY - ceph luminous update

CEPH LUMINOUS UPDATEXSKYHaomai Wang

2017.06.06

• Haomai Wang,activeCeph contributor• Maintainmulticomponents• XSKYCTO,aChina-basedstoragestartup• haomaiwang@gmail.com/haomai@xsky.com

WhoAmI

• Hammerv0.94.x(LTS)– March '15• Infernalis v9.2.x– November'15• Jewelv10.2.x(LTS)– April'16• Krakenv11.2.x– December'16• Luminous v12.2.x(LTS)– September‘17(delay)

Releases

Ceph Ecosystem

• BlueStore=Block+NewStore• Key/valuedatabase(RocksDB)formetadata• Alldatawrittendirectlytorawblockdevice(s)

• FastonbothHDDs(~2x)andSSDs(~1.5x)– SimilartoFileStore onNVMe,wherethedeviceisnotthebottleneck

• Fulldatachecksums(crc32c,xxhash,etc.)• Inlinecompression(zlib,snappy,zstd)• Stableanddefault

RADOS- BlueStore

• requiresBlueStore toperformreasonably• signicant improvementineffciency over3xreplication• 2+2→2x4+2→1.5x• smallwritesslowerthanreplication– earlytestingshowed4+2isabouthalfasfastas3xreplication

• largewritesfasterthanreplication– lessIOtodevice

• implementationstilldoesthe“simple”thing– allwritesupdateafullstripe

RADOS– RBDOverErasureCode

• ceph-mgr– newmanagementdaemontosupplementceph-mon(monitor)– easierintegrationpointforpythonmanagementlogic– integratedmetrics

• makeceph-monscalableagain– offloadpg statsfrommontomgr– pushto10KOSDs(planned“bigbang3”@CERN)

• newRESTAPI– pecan– basedonpreviousCalamariAPI

• built-inwebdashbard

CEPH-MGR

AsyncMessenger

• AsyncMessenger– CoreLibraryincludedbyallcomponents– KernelTCP/IPdriver– Epoll/Kqueue Drive–Maintainconnectionlifecycleandsession– replacesagingSimpleMessenger– fixedsizethreadpool(vs2threadspersocket)– scalesbettertolargerclusters– morehealthyrelationshipwithtcmalloc– nowthedefault!

DPDKSupport

• BuiltforHighPerformance– DPDK– SPDK– FulluserspaceIOpath– Shared-nothingTCP/IPStack(Seastarrefer)

• RDMAbackend– InheritNetworkStack andimplementRDMAStack– Usinguser-spaceverbsdirectly– TCPascontrolpath– ExchangemessageusingRDMASEND– Usingsharedreceivequeue– Multipleconnectionqp’s inmany-to-manytopology– Built-inintocephmaster– AllFeaturesarefullyavailoncephmaster

• Support:– RH/centos– INFINIBANDandETH– Roce V2forcrosssubnet– Front-endTCPandback-endRDMA

RDMASupport

Plugin Default HardwareRequirement

Performance Compatible OSDStorageEngineRequirement

OSDDiskBackendRequirement

Posix(Kernel) YES None Middle TCP/IPCompatible None None

DPDK+UserspaceTCP/IP

NO DPDKSupportedNIC High TCP/IPCompatible BlueStore MustbeNVMESSD

RDMA NO RDMASupportedNIC

High RDMA SupportedNetwork

None None

MessengerPlugins

RecoveryImprovements

RBD- iSCSI

• TCMU-RUNNER+LIBRBD• LIO+KernelRBD

RBDMirrorHA

RGWMETADATASEARCH

RGWMISC

• NFSgateway– NFSv4andv3– fullobjectaccess(notgeneralpurpose!)

• dynamicbucketindexsharding– automatic(nally!)

• inlinecompression• Encryption– followsS3encryptionAPIs

• S3andSwiftAPIoddsandends

NFS-Client

nfs-ganesha(nfs-v4)

librgw-file

RADOS

NFS-Server RadosGW

Apps

rados api

S3API SwiftAPI

rados apiRadosHandler

• multipleactiveMDSdaemons(nally!)• subtreepinningtospecificdaemon• directoryfragmentationonbydefault• (snapshotsstillobydefault)somanytests• somanybugsfixed• kernelclientimprovements

CephFS

CephFS – MultiMDS

Container

• Rados– IOPathRefactor– BlueStore Peformance

• QoS– dmclock

• Dedup– basedonTiering

• Tiering

Future

GrowingDeveloperCommunity

HowToHelp

Thank you