Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP...

54
GAJAH ANNUAL REPORT 2015 | Michael Holzerland Solution Architect Telco Red Hat Tech Day: Red Hat Ceph Storage

Transcript of Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP...

Page 1: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

GAJAH ANNUAL REPORT 2015 |

Michael Holzerland

Solution Architect

Telco

Red Hat Tech Day:

Red Hat

Ceph

Storage

Page 2: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

GAJAH ANNUAL REPORT 2015 |

Agenda- Einführung Software Defined Storage, warum SDS- Ceph im SDS Markt- Überblick CEPH- Unterschied Ceph und Red Hat Ceph Storage- Demo:

Status Ceph Cluster Objektdaten speichern / Rados Ebene Anlegen / löschen eines Pools Speichern von Daten in einem Pool Details des gespeicherten Objektes Anlegen Bock Device / RBD Ebene Mappen eines RBD Formatieren Daten schreiben Rados SC Bench IO erzeugen Calamari Ceph GUI

- Ausblick Red Hat Ceph Storage 2.0- Sizing Rules & Partner Solutions

Page 3: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Software Defined Storage

No More Limits

Page 4: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Session title

Red Hat Ceph Storage4

Page 5: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

STORAGE MARKET GROWTH FORECAST

“By 2020, between 70-80% of unstructured data will be held on lower-cost storage

managed by SDS environments.”

“By 2019, 70% of existing storage array productswill also be available as software only versions”

“By 2016, server-based storage solutions will lower storage hardware costs by 50% or more.”

Gartner: “IT Leaders Can Benefit From Disruptive Innovation in the Storage Industry”

Innovation Insight: Separating Hype From Hope for Software-Defined Storage

Innovation Insight: Separating Hype From Hope for Software-Defined StorageMarket size is projected to increase approximately 20%

year-over-year between 2015 and 2019

2013 2014 2015 2016 2017 2018 2019

$1,349B

$1,195B

$1,029B

$859B

$706B

$592B

SDS MARKET SIZE BY SEGMENT

$457B

Block StorageFile StorageObject StorageHyperconverged

Source: IDC

Software-Defined Storage is leading a shift in the global storage industry, with far-reaching effects.

Page 6: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

The Problem: Data Big BangFrom 2013 to 2020, the digital universe will grow by a factor of 10, from 4.4 ZB to 44 ZB

It more than doubles every two years.

Source: IDC – The Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of Things - April 2014

→ Data demand is growing disproportionately compared with today's Infrastructure and Software budget!

CAGR = Compound Annual Growth Rate

Page 7: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Diverse Workloads & Cost Drive Need for Distributed Storage

Page 8: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Distributed Storage

• Ceph is the most popular† open source virtual block storage option•

8† OpenStack User Survey October 2015 62% adaption rate of Ceph in Openstack deployments

Page 9: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

PAST: SCALE UP

FUTURE: DISTRIBUTED STORAGE

THE SOLUTION

Page 10: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

The Linux of Storage CEPH

Page 11: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Historical Timeline – 10+ Years of Ceph

● 2004 – Project began at University of California

● 2006 – Open Source

● 2010 – Mainline Linux Kernel

● 2011 – OpenStack Integration

● 2012 – Launch of Inktank

● 2013 October – Inktank Ceph Enterprise

● 2014 February – RHEL-OSP Certification

● 2015 March – Red Hat Ceph Storage

Page 12: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Proprietary cloud &storage technologies are too expensive and inhibit scaling

Why is Open Source important? Cloud Leaders Rely on Open Source

LinuxXen

LinuxKVM

LinuxKVM

OpenStack

Page 13: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Open Source-based are just as cost effective and scalable for mainstream

Many Mainstream Organizations follow

LinuxKVM

OpenStack

LinuxKVM

OpenStackOpenShift

LinuxKVM

OpenStack

Page 14: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Why Open Source drives Innovation

● - Development Method● - Better Quality● - Lower Cost● - Distributed Peer Review● - More Flexibility● - Transparency of Process● - OSS Model adopted outside

Software (Wikipedia, OpenStreetMap, OpenData, …)

Open Source Development Model

Page 15: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

OPEN SOURCE TO THE ENTERPRISE

RED HAT JBOSSMIDDLEWARE

RED HATSTORAGE

RED HATENTERPRISE LINUX

RED HAT ENTERPRISE LINUXOPENSTACK PLATFORM

RED HATENTERPRISEVIRTUALIZATION

RED HATSATELLITE

RED HATCLOUDFORMS

1M+projects*

* www.blackducksoftware.com/oss-logistics/choose

RED HAT ENTERPRISE LINUXATOMIC HOSTRED HAT MOBILE

Page 16: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Ceph Approach Unified Storage

FILE SYSTEMBLOCK STORAGEOBJECT STORAGE

Amazon S3 API

Snapshots/ Clones / Thin Cones

Storage TieringISCSI Target

Active / Active Cluster

Linux Kernel integration

Openstack Cinder API

NFS Support

Openstack Manila Support

SnapshotsSnapshots

Posix compliant FS

Linux Kernel Integration

Openstack Swift API

Geo Replication

Openstack Keystone API

Erasure Coding

Page 17: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Red Hat Confidential - NDA Required

RED HAT CEPH STORAGE

concepts

Page 18: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

RADOS

RADOS (Reliable Autonomic Distributed Object Store) is the foundation of the Ceph storage cluster. All data in Ceph, regardless of data type, is stored in Pools in the form of objects via the RADOS object store.

The RADOS layer provides: • No single point of failure• Data consistency and reliability• Data replication and migration• Automatic failure detection and recovery

RADOSRADOS

LIBRADOSLIBRADOS

RADOSGWRADOSGW LIBRBDLIBRBD

ClientsClients

Your AppYour App S3 APIS3 API Swift APISwift API Host/VMHost/VMAdmin APIAdmin API

Page 19: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Object Storage Devices (OSDs)

In a Ceph storage cluster, the Object Storage Daemon (OSD) stores data, handles data replication, recovery, backfilling, rebalancing, and provides monitoring information to Ceph Monitors by checking other Ceph OSD Daemons for a heartbeat. A Ceph Storage Cluster requires a minimum number of Ceph OSD Daemons (to achieve an active + clean state). This minimum number is based on the number of replica copies established for the cluster. OSDs roughly correspond to a directory on a physical hard drive (i.e. /var/lib/ceph/osd).

OSD.1/var/lib/ceph/osd

OSD.2/var/lib/ceph/osd

OSD.3/var/lib/ceph/osd

OSD.4/var/lib/ceph/osd

Page 20: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Monitors (MONs)

Before Ceph Clients can read or write data, they must contact a Ceph Monitor to obtain the most recent copy of the cluster map. A Ceph Storage Cluster can operate with a single monitor; however, this introduces a single point of failure (i.e., if the monitor goes down, Ceph Clients cannot read or write data).

For added reliability and fault tolerance, Ceph supports a cluster of monitors. In a cluster of monitors, latency and other faults can cause one or more monitors to fall behind the current state of the cluster. For this reason, Ceph must have agreement among various monitor instances regarding the state of the cluster. Ceph always uses a majority of monitors (e.g., 1, 2:3, 3:5, 4:6, etc.) and the Paxos algorithm to establish a consensus among the monitors about the current state of the cluster. Monitors hosts require NTP to prevent clock drift.

Most deployments will likely only need 3 monitors.

Mon1 Mon2 Mon3

Page 21: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Replicated Pool - WRITE

Pool – Replication size = 3

1. Primary OSD Write (W1)2. Secondary OSD Write (W2)3. Tertiary OSD Write (W3)4. Ack (A1)5. Ack (A2)6. Ack (A3)

Replicated Write: 1. Ceph client writes data2. Object created and sent to the primary OSD (W1)3. The primary OSD finds the number of replicas that it should store4. Primary OSD forwards the object to the replica OSDs (W2,W3)5. Replica OSDs write the data and signal write completion to primary OSD (ACK1,ACK2)6. Primary OSD signals write completion to Ceph client (ACK1)

Page 22: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Replicated Pool - READ

Pool – Replication size = 3

Replicated Read: 1. Ceph client issues read request2. RADOS sends the read request to the primary OSD (R1)3. Primary OSD reads data from local disk4. Primary OSD signals read completion ACK to Ceph client (A1)

R1

Primary

OSD.1 OSD.2 OSD.3

MyObj MyObjMyObj

Ceph Client

MyObj

A1

Page 23: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Writing and reading data in a Ceph Storage Cluster is accomplished using the Ceph client architecture.Ceph clients differ from competitive offerings in how they present data storage interfaces.

• LIBRBD – API layer for block device access to the Ceph cluster (includes QEMU/KVM driver). LIBRBD enables block storage that mounts like a physical storage drive for use by both physical and virtual systems (includes QEMU/KVM driver)

• RADOSGW - A Ceph gateway that presents a bucket-based object storage service with S3 compliant and Swift compliant RESTful interfaces

• LIBRADOS – Provides direct access to RADOS with support for C, C++, Java, Python, Ruby, and PHP

Client Interface Layer

Page 24: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Pools

The Ceph storage cluster stores data objects in logical partitions called Pools. Pools can be created for particular data types, such as for block devices, object gateways, or simply to separate user groups.From the perspective of a Ceph client, the storage cluster is very simple. When a Ceph client reads or writes data (i.e., called an i/o context), it connects to a storage pool in the Ceph cluster. The Ceph pool dictates the number of object replicas and number of PG’s in the pool.

In a Replicated storage pool, Ceph defaults to making 3 copies of an object with a minimum of two copies clean for write operations. If two of the three OSDs fail the data will still be preserved but write operations will be interrupted.

In an Erasure Coded storage pool objects are divided into chunks using the n = k + m equation. k: This is the number of Data chunks that will be createdm: This is the number of Coding chunks that will be created to provide data protectionn: This is the total number of chunks created after the Erasure Coding process

ObjObj ObjObj

ObjObjObjObj ObjObj

ObjObj

ObjObjObjObj

ObjObj ObjObj

ObjObjObjObj ObjObj

ObjObj

ObjObjObjObj

ObjObj ObjObj

ObjObjObjObj ObjObj

ObjObj

ObjObjObjObj

Page 25: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Data Placement/Retrieval in a Ceph Storage Cluster

In typical Read/Write operations

1. Ceph clients first contact a Ceph Monitor to retrieve the most recent cluster map2. Ceph clients will cache the cluster map and will only receive a new map when updates are available3. Data is then converted into object(s) containing object/pool ID’s4. CRUSH algorithm determines the PG and primary OSD5. Client then contacts the primary OSD directly to store/retrieve the data

Note: Steps 1-5 happen very quickly. Once OSD communication is established, data is written directly from the clients to the OSD

6. The primary OSD then performs a CRUSH lookup to determine the secondary PGs and OSDs7. In a Replicated pool, the primary OSD copies the object(s) and sends them to the secondary OSDs8. In an Erasure Coded pool, the primary OSD breaks up the object into chunks, encodes the chunks, then writes the chunks to the

secondary OSDs in K data chunks and M coding chunks

Page 26: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Erasure Code

Much like replicated pools, erasure coded pools depend on the primary OSD in the up set to handle initial writes/reads.

In replicated pools, Ceph makes a copy of each object and utilizes the primary OSD to send object copies to the secondary OSD(s) in the set.

In an erasure coded pool, the process is a bit different. Erasure coded pools store each object as a collection of chunks. The object is divided into data chunks (represented as K) and coded chunks (represented as M). The pool is configured to have a size of K+M (K+M = X) so that each chunk is stored in an OSD in the acting set. The rank of the chunk is stored as an attribute of the object. The primary OSD is responsible for encoding the payload into K+M chunks and forwarding those chunks to the other OSDs in the up set. The primary OSD is also responsible for responding to read requests and rebuilding the chunks into the original object.

Page 27: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Erasure Coded Pool: WRITE

Erasure Coded Pool: 4,3

Erasure Coded write: 1. Ceph client signals object write2. RADOS sends the object to the

primary OSD (W1)3. Primary OSD encodes and divides

object into data/coded chunks4. Chunks are distributed to secondary

OSDs (W2)5. Secondary OSDs complete the write

and signal completion to primary OSD (ACK1)

OSD.1

catdcatdirdcirdcpbatpbat

OSD.2

OSD.3

OSD.4

OSD.5

OSD.6

OSD.7

catdoghorsefishbirdcowturtlesheepbatcatdoghorsefishbirdcowturtlesheepbat

MyObjMyObj

MyObjMyObjShard_1Shard_1

oghooghoowtuowtuoooooo

MyObjMyObjShard_2Shard_2

rsefrsefrtlertleoooooo

MyObjMyObjShard_3Shard_3

ishbishbsheesheeoooooo

MyObjMyObjShard_4Shard_4

XYXXYXPYKPYKDPDDPD

MyObjMyObjShard_5Shard_5

QPQQPQZYXZYXQGCQGC

MyObjMyObjShard_6Shard_6

HJHHJHLPDLPDYKYYKY

MyObjMyObjShard_7Shard_7

Data Chunks Coding Chunks

W1

W2

6. Primary OSD completes the write and signals completion to the client (ACK2)

A1

Ceph Client

MyObj

A2

Page 28: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Erasure Encoding: 4,3

Ceph Client

Erasure Coded read: 1. Ceph client signals a read (R1)2. Primary OSD receives read request and pulls together chunks from participating OSDs (R2)3. Primary OSD decodes objects and assembles a copy of the original object4. Primary OSD signals read completion ACK to Ceph client (A1)

MyObj

Erasure Coded Pool: READ

OSD.1

catdcatdirdcirdcpbatpbat

OSD.2

OSD.3

OSD.4

OSD.5

OSD.6

OSD.7

MyObjMyObjShard_1Shard_1

oghooghoowtuowtuoooooo

MyObjMyObjShard_2Shard_2

rsefrsefrtlertleoooooo

MyObjMyObjShard_3Shard_3

ishbishbsheesheeoooooo

MyObjMyObjShard_4Shard_4

XYXXYXPYKPYKDPDDPD

MyObjMyObjShard_5Shard_5

QPQQPQZYXZYXQGCQGC

MyObjMyObjShard_6Shard_6

HJHHJHLPDLPDYKYYKY

MyObjMyObjShard_7Shard_7

Data Chunks Coding Chunks

A1R1

R2

Page 29: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

OSD2 OSD3OSD1 OSD5 OSD6OSD4

Placement Groups (PGs)

Placement Groups: Ceph maps objects to placement groups (PGs). Placement groups (PGs) are shards or fragments of a logical object pool that place objects as a group into OSDs. Placement groups reduce the amount of per-object metadata when Ceph stores the data in OSDs. A larger number of placement groups (e.g., 100 per OSD) leads to better balancing.

Ceph PGs per pool calculator: ceph.com/pgcalc

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

Page 30: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

1. Client communicates with a pool to write data to the storage cluster

2. CRUSH algorithm determines in which PGs the data object(s) and its replicas should be stored

3. Objects are written to the primary OSD first4. Primary OSD is then responsible for sending copies to the

other OSDs in the peer group

Placement Groups (PGs)

Page 31: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Pools – Which To Use?

Replicated – Full copies of stored objects. Usage examples: Very high durability, Quicker recoveryErasure Coded – One copy plus parity.Usage examples: Cold storage, Cost effective durability

PG2PG2PG1PG1 PG3PG3

ObjObj

PG4PG4

OSD1

ObjObj

ObjObj

ObjObj

ObjObj

ObjObj

ObjObj

ObjObj

ObjObj

ObjObj

ObjObj

OSD2 OSD4OSD3ObjObj

ObjObjObjObj

ObjObj

ObjObj

ObjObj

ObjObjObjObj

ObjObj

ObjObj

ObjObj

ObjObjObjObj

ObjObj

ObjObj

ObjObj

ObjObjObjObj

ObjObj

Clients view logical objects via Pools

Replicated Erasure Coded

Page 32: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Placement Groups (PGs)

Graphical representation of the relationships between Pools, PGs, and OSDs

Page 33: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Ceph Diagram Here

OSD2 OSD3OSD1 OSD5 OSD6OSD4

Objects in Pools

Placement Groups

Ceph Nodes: OSD HostsMonitors (mons)

ObjObj ObjObj

ObjObjObjObj ObjObj

ObjObj

ObjObjObjObj

ObjObj ObjObj

ObjObjObjObj ObjObj

ObjObj

ObjObjObjObj

RADOSGWRADOSGW RBDRBD

Client Interface Layer

CRUSH Ruleset

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

PGPGPGPG

Pool ID. (Hash(Object)%Num of PGs)

Pool ID. (Hash(Object)%Num of PGs)

Pool ID. (Hash(Object)%Num of PGs)

Pool ID. (Hash(Object)%Num of PGs)

CRUSH Map

Mon1 Mon2 Mon3

RADOSLIBRADOS

Page 34: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

CRUSH Overview

CRUSH (Controlled Replication Under Scalable Hashing) – Controlled, Scalable, Decentralized Placement of Replicated Data.

The CRUSH algorithm determines how to store and retrieve data by computing data storage locations. CRUSH empowers Ceph clients to communicate with OSDs directly rather than through a centralized server or broker. With an algorithmically determined method of storing and retrieving data, Ceph avoids a single point of failure, a performance bottleneck, and a physical limit to its scalability.

CRUSH requires a map of your cluster, and uses the CRUSH map to pseudo-randomly store and retrieve data in OSDs with a uniform distribution of data across the cluster.

Page 35: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

CRUSH Hierarchy

DataCenter2DataCenter1

Room7-N Room9-S Room2-S

ROOT

Rack2

OSD4 OSD5

OSD6

Host4

OSD7

Host3

Rack1

OSD0 OSD1

OSD2

Host2

OSD3

Host1

Rack8

OSD28 OSD29

OSD30

Host16

OSD31

Host15

Rack7

OSD24 OSD25

OSD26

Host14

OSD27

Host13

Rack6

OSD20 OSD21

OSD22

Host12

OSD23

Host11

Rack5

OSD16 OSD17

OSD18

Host10

OSD19

Host9

Rack4

OSD12 OSD13

OSD14

Host8

OSD15

Host7

Rack3

OSD8 OSD9

OSD10

Host6

OSD11

Host5

Room4-N

Page 36: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Crush Components

Using the previous example diagram, we have a Ceph storage cluster with two geographically dispersed data centers. Each data center contains:• 2 compute rooms• 2 server racks• 4 hosts – each with two dedicated drives hosting OSDs

Page 37: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Failure Domains

A failure domain is a physical container designed such that a component failure will have limited impact on the environment.

A failure could be a stopped daemon on a host; a hard disk failure, an OS crash, a malfunctioning NIC, etc. When planning your hardware needs, you succeed when you find a balance between cost containment (i.e. placing too many responsibilities into too few failure domains), and the added costs of isolating every potential failure domain.

Through the configuration of the CRUSH map, you can define where the object replicas are placed over the entire layout of your  environment. This includes the ability to identify your devices, hosts, racks, pods, rooms, etc as individual domains.  

The goal: ensuring that an objects replica (or erasure coded shard), is not placed on the same host or the same rack. This effectively  allows you to define the availability level of your data.

Page 38: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies
Page 39: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

GAJAH ANNUAL REPORT 2015 |

How it works

(simple)

Page 40: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Red Hat Confidential - NDA Required

RED HAT CEPH STORAGE

Demo

Page 41: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Lab environment Munich

Physical Server with 4 VM3 OSD-Nodes with 2 OSDs each

1 Monitor-Node

Physical Server with 4 VM3 OSD-Nodes with 2 OSDs each

1 Monitor-Node

Physical Server with 4 VM3 OSD-Nodes with 2 OSDs each

1 Monitor-Node

Page 42: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Community vs RHCS & Roadmap

Page 43: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Ceph (community)

Support Numerous consulting, service, and training options (recognized industry leadership in support services and online support)

Use at own risk

Product features Only production-ready code Some features not fully baked

Product access Consistent quality; packaging available through Red Hat Satellite Via ceph.com & Linux distros

Release lifecycle Well-defined, infrequent, hardened, curated, extensive QE, committed lifespan (3 years) ensuring strict update policies

Rapid, experimental, loosely tested, indefinite lifespan

Upgrades Timely, tested patches – unlimited, not tied to version/platform, with clearly-defined, documented, and supported migration path

Not coordinated, roll your own, unsupported

Deployment resources &

Red Hat subscription benefits

Red Hat Knowledgebase: RAs, articles, tech briefs, tutorial videos, documentation, open source community participation clout Automated services: Red Hat Access Labs (troubleshooting) Recommendations (RT analytics), Plug-ins (diagnostics and alerts)

NA

Security / Accountability

• Stable code backed by Red Hat Product Security team• Red Hat Certification (HW, SW, Cloud Providers)• Red Hat Open Source and Quality Assurance Programs

No official certification or single source of accountability

Red Hat Ceph Storage and Ceph Community

Page 44: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Red Hat Ceph Storage Lifecycle

Page 45: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Few thumb rules to architect a Ceph Cluster

Page 46: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

GAJAH ANNUAL REPORT 2015 |

Rules for OSD Servers

CPU: Use dedicated Serverhardware for Ceph Workloads, no converged Approach right nowStandard X86 INTEL or AMDHyperthreading can be used from Sandybridge Up1GHz for each OSD if you use Ceph Copies1.5-2GHz for each OSD if you use Erasure CodingRAM: Generally more RAM is better (e.g. page caching)1GB for 1TB for each OSD. Example: Server with 10 OSD, 3TB Drives --> 30GB RAM

Page 47: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

GAJAH ANNUAL REPORT 2015 |

Rules for OSD Servers

DataStorage: Ceph can operate with heterogeneous systems. CRUSH supports weighting for different sized drives (e.g., 1TB, 3TB, etc). Our recommendation is use for each Pools same Controller, same Drives.Use SSDs for Journals. SSD:SATA drive ratio ~1 SSD:5 SATA drives (recommend Intel DC S3700 or equivalent). Rade Controllers are fine for Drives and Journal SSDs. Network: On smaller clusters, 1Gbps networks may be suitable for normal operating conditions, but not for heavy loads or failure recovery scenarios. In the case of a drive failure, replicating 1TB of data across a 1Gbps network takes 3 hours, and 3TBs (a typical drive configuration) takes 9 hours. By contrast, with a 10Gbps network, the replication times would be 20 minutes and 1 hour respectively.

Page 48: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

GAJAH ANNUAL REPORT 2015 |

Networking Considerations

Page 49: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

GAJAH ANNUAL REPORT 2015 |

Example Ceph Dell Server Configurations

Type Typ Components

Performance • R720XD• 32 GB DRAM• 20 X 1 TB HDD (data drives)• 4 X 200 GB SSD (journal)

Capacity • R720XD• 64 GB DRAM• 10 X 4 TB HDD (data drives)• 4 X 400 GB SSH (journal)

• MD1200• 12 X 4 TB HHD (data drives)

Extra Capacity • R720XD• 128 GB DRAM• 10x400GB SSD(journal)

• MD3060e (JBOD)• 60 X 4 TB HHD (data drives)

Page 50: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

GAJAH ANNUAL REPORT 2015 |

Base

Page 51: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Partner Solutions

Page 52: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Partner CEPH Appliances (examples)

All-flash arrays, optimized for Ceph

SanDisk sells the InfiniFlash storage arrays, designed for use with Red Hat Ceph Storage. Optimizations contributed by SanDisk deliver up to 780.000 IOPS performance which allow Ceph customers to service new workloads.

Our relationship includes:

● Engineering and product collaboration

● Community thought leadership

Eternus CD10000 HyperScale Storage System

ETETERNUS CD10000 provides unlimited, modular scalability of storage capacity and performance at zero downtime for instant and cost efficient online access to extensive data volumes. Integrating open-source Ceph software into a storage system delivered with end-to-end maintenance from Fujitsu enables IT organizations to fully benefit from open standards without implementation and operational risks.

The Seagate Kinetic Open Storage platform delivers highercapacity, along with improved rack density, by allowing more flexibility than traditional storage server architectures.

Seagate Kinetic Open Storage

Page 53: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies

Partner CEPH Reference Architectures (examples)

Cisco UCS C3160 high Density Rack Server with Red Hat Ceph Storage

Systems designed with storage in mind

The Red Hat Ceph Storage cluster provides data protection through replication, block device cloning,The PowerEdge R730xd is an exceptionally flexible and scalable two-socket 2U rack server that delivershigh performance processing and a broad range of workload-optimized local storage possibilities,including hybrid tiering.

Dell Red Hat Reference Guide

The Cisco UCS® C3160 and C240 M4 Rack Servers hardware are well suited for object storage solutions like Ceph. The Cisco UCS C3160 is a modular server with high storage density and is directed particularly at the use cases mentioned in this document. It combines industry-leading performance and scalability and is well suited for OpenStack, Ceph-based storage, and other software-defined distributed storage environments.

Supermicro's Red Hat Ceph Storage optimized solutions offer durable, software-defined, scale-out storage platforms in 1U/2U/4U form factors and are designed to maximize performance, density, and capacity

Customer can expect to see:

● Reference architectures, validated for performance,

density, and capacity

● Whitepapers and datasheets that support Red Hat

Storage solutions

Page 54: Ceph - redhat.com · 2013 October – Inktank Ceph Enterprise 2014 February – RHEL-OSP Certification 2015 March – Red Hat Ceph Storage. Proprietary cloud &storage technologies