Ceph Performance and Optimization - Ceph Day Frankfurt

Ceph performanceCephDays Frankfurt 2014

Whoami 💥 Sébastien Han

💥 French Cloud Engineer working for eNovance

💥 Daily job focused on Ceph and OpenStack

💥 Blogger

Personal blog: http://www.sebastien-han.fr/blog/

Company blog: http://techs.enovance.com/

Last Cephdays presentation

How does Ceph perform?

*The Hitchhiker's Guide to the Galaxy

The GoodCeph IO pattern

CRUSH: deterministic object placement

As soon as a client writes into Ceph, the operation is computed and the client decides to which OSD the object should belong

Aggregation: cluster levelAs soon as you write into Ceph, all the objects get equally spread across the entire

Cluster, understanding machines and disks..

Aggregation: OSD levelAs soon as an IO goes into an OSD, no matter how the original pattern was,

it becomes sequential.

The BadCeph IO pattern

JournalingAs soon as an IO goes into an OSD, it gets written twice.

Journal and OSD data on the same disk

Journal penalty on the disk

Since we write twice, if the journal is stored on the same disk as the OSD data this will result in the following:

Device: wMB/s

sdb1 - journal 50.11

sdb2 - osd_data 40.25

Filesystem fragmentation• Objects are stored as files on the OSD filesystem• Several IO patterns with different block sizes increase

filesystem fragmentation• Possible root cause: image sparseness

• One year old cluster ends up with (see allocsize options for XFS):

$ sudo xfs_db -c frag -r /dev/sdd

actual 196334, ideal 122582, fragmentation factor 37.56%

• RADOS hint: fadvice like | helps filesystem allocation

No parallelized reads

• Ceph will always serve the read request from the primary OSD

• Room for Nx times speed up where N is the replica count

Blueprint from Sage for the Giant release

Scrubbing impact• Consistent object check at the PG level• Compare replicas versions between each others (Fsck for

objects)

• Light scrubbing (daily) checks the object size and attributes. • Deep scrubbing (weekly) reads the data and uses checksums to

ensure data integrity.

• Corruption exists – ECC memory (10^15 for enterprise disk) ~113TB• No pain No gain

The UglyCeph IO pattern

IOs to the OSD diskOne IO into Ceph leads to 2 writes, well… the second write is the worst!

The problem

• Several objects map to the same physical disks• Sequential streams get mixed all together

• Result: The disk seeks like hell

Even worse with erasure coding?This is just an assumption!

•Since erasure coding does chunks of chunks we can possibly have this phenomena amplified

CLUSTERHow to build it?

How to start?Things that you must consider:

•Use case • IO profile: Bandwidth? IOPS? Mixed?• How many IOPS or Bandwidth per client do I want to deliver?• Do I use Ceph in standalone or is it combined with a software solution?

•Amount of data (usable not RAW)• Replica count• Do I have a data growth planning?

•Leftover• How much data am I willing to lose if a node fails? (%)• Am I ready to be annoyed by the scrubbing process?

•Budget :-)

Things that you must not do

• Don't put a RAID underneath your OSD• Ceph already manages the replication• Degraded RAID breaks performances• Reduce usable space on the cluster

• Don't build high density nodes with a tiny cluster• Failure consideration and data to re-balance• Potential full cluster

• Don't run Ceph on your hypervisors (unless you're broke)• Well maybe…

Firefly: Interesting things going on

Object store multi-backend

• ObjectStore is born

• Aims to support several backends:• levelDB (default)• RocksDB• Fusionio NVMKV• Seagate Kinetic• Yours!

Why is it so good?

• No more journal! Yay!

• Object backends have built-in atomic functions

Firefly leveldb

• Relatively new

• Need to be tested with your workload first

• Tend to be more efficient with small objects

Many thanks!

Questions?

Contact: sebastien@enovance.comTwitter: @sebastien_hanIRC: leseb

Ceph Performance and Optimization - Ceph Day Frankfurt

Technology

Transcript of Ceph Performance and Optimization - Ceph Day Frankfurt

Best Practices with Ceph as Distributed, Intelligent, Unified Cloud Storage - Ceph Day Frankfurt

Ceph Deployment with Dell Crowbar - Ceph Day Frankfurt

Using Ceph in OStack.de - Ceph Day Frankfurt

Deployment & Betrieb von Ceph mit (ceph-)ansible · Deployment & Betrieb von Ceph mit (ceph-)ansible FrühjahrsfachgesprächGUUG20181.März2018 Michel Raabe Linux Consultant & Trainer

Ceph Day LA: Ceph Ecosystem Update

Ceph All Flash Array Optimization with Aarch64

Ceph and Flash Storage - · PDF fileEMS Ceph QA team: QA and performance testing ... Applications realize limited benefits from flash without additional optimization

Ceph Day New York 2014: Ceph Ecosystem Update

Using Ceph in a Private Cloud - Ceph Day Frankfurt

Ceph Day Beijing- Ceph Community Update

Ceph Day New York: Ceph: one decade in

Ceph Day Santa Clara: Ceph Performance & Benchmarking

Ceph in a Flash - OpenStack · PDF fileMicron-Powered Ceph Architectures PERFORMANCE COMPARISON 5 May 15, 2017 Micron SATA PoC Micron SAS+SATA PoC Micron NVMe RA ... Optimization

Enterprise Ceph: Everyway, your way! - Red · PDF fileEnterprise Ceph: Everyway, your way! ... • Measured the throughput optimization and price/performance in the 3x replication

Ceph Day Santa Clara: Ceph at DreamHost

Ceph Rados Block Device - SNIA · Ceph Rados Block Device Venky Shankar Ceph Developer, Red Hat SNIA, 2017. WHAT IS CEPH? ... CEPH ON STEROIDS Bluestore newstore + block ...

Scaling Ceph at CERN - Ceph Day Frankfurt

Ceph Day London 2014 - Ceph Ecosystem Overview

Ceph Day Beijing: Ceph-Dokan: A Native Windows Ceph Client

Storage visibility and Optimization. A Story of Ceph