Ceph Day Bring Ceph To Enterprise

Post on 13-Jan-2017

432 views 1 download

Transcript of Ceph Day Bring Ceph To Enterprise

Bring Ceph to Enterprise Setup a 50T mobile cluster in 30 min

Alex Lau (劉俊賢 )Software Consultantalau@suse.com

Block Storage

File System

Object Storage

001011010101010111011011101110001011010

How to access to ceph storage?Introduction of iSCSI

Remote ClusterData Encrypted

at Rest

MonitorNodes

Management Node

Heterogeneous OS Access

RADOS gateway

RESTful api

iSCSI

001110111010110101101011010110110111010001110111101110100101101011101111011011

111010110101101101110100111010110101010

SUSE Enterprise Storage 3

A first commercial available ISCSI access to connect to SES3. It allow client access to ceph storage remotely over TCP/IP iscsi protocol. SES3 provide a iscsi Target driver on top of RDB ( RADOS block device ). This allow any iscsi Initiator can access SES3 over network.

iSCSI Architecture Technical Background

Protocol: ‒ Block storage access over TCP/IP

‒ Initiators the client that access the iscsi target over tcp/ip

‒ Targets, the server that provide access to a local block

SCSI and iSCSI:‒ iSCSI encapsulated commands and responses

‒ TCP package of iscsi is representing SCSI command

Remote access:‒ iSCSI Initiators able to access a remote block like local disk

‒ Attach and format with XFS, brtfs etc.

‒ Booting directly from a iscsi target is supported

Public Network

OSD1 OSD2 OSD3 OSD4

Before iSCSI RBD support … PRESENT

FUTURE

£€

$

Target System

RBD Block LIO to ISCSI

Initiator System

Before iSCSI support what’s wrong? Missing features

LIO over RBD: ‒ It doesn’t support “atomic compare and write”

‒ It doesn’t support “persistent group reservations”

iSCSI:‒ ISCSI Active/Active Multiple Path MPIO is not supported

‒ Block layer support all these require a different approach

Benefit of iSCSI LIO gateway for RBD

Multiple Platform access to ceph: ‒ It doesn’t require to be part of the cluster like radosgw

Standard iSCSI interface:‒ Most OS support iSCSI

‒ Open-iscsi in most Linux OS

LIO Linux IO Target:‒ In kernel scsi target implementation

Flexible configuration:‒ Targetcli utility is available with lrbd

Config RBD iSCSI gateway Introduction of lrbd

Easy Setup:‒ Package bundle with iscsi since SES2.0

‒ Multi-Node configuration support with targetcli

Technical Background:‒ JSON configuration format

‒ Target, Portals, Pools, Auth

‒ Configuration state stored in ceph cluster

Related Link:‒ https://github.com/swiftgist/lrbd

‒ https://github.com/swiftgist/lrbd/wiki

9

iSCSI Gateway Optimizations

Efficient handling of certain SCSI operations: ‒ Offload RBD image IO to OSDs

‒ Avoid Locking on iSCSI gateway nodes

‒ Compare and Write ‒ New cmpext OSD operation to handle RBD data comparison

‒ Dispatch as compound cmpext+write OSD request

‒ Write Same ‒ New writesame OSD operation to expand duplicate data at the OSD

‒ Reservations‒ State stored as RBD image extended attribute

‒ Updated using compound cmpxattr+setxattr OSD request

10

Public Network

OSD1 OSD2 OSD3 OSD4

Multiple Path Support with iSCSI on RBD

PRESENT

FUTURE

£€

$

Cluster Network

iSCSI Gateway

RBD Module

iSCSI Gateway

RBD Module

iSCSI Initiator

RBD image

How to manage storage growth and costs of ceph ?

Easily scale and manage data storage

Control storage growth and manage costs

Support today’s investment and adapt to the future

$

PRESENT

FUTURE

£€

$

Introduction to openATTIC

Easily scale and manage data storage

SUSE Enterprise Storage ManagementVision

Open Source : ‒ Alternative to proprietary storage management systems

Enterprise:‒ Work as expected with traditional storage unified storage

interface e.g. NAS, SAN

SDS Support:‒ Provide initial ceph setup in managing and monitoring to

ease in complicated scale out scenarios

It will be available in next SES release or download it now at

https://build.opensuse.org/package/show/filesystems:openATTIC/openattic

openATTIC Features Existing capability

Modern Web UI

RESTful API‒ Software Defined Storage

Unified Storage‒ NAS (NFS, CIFS, HTTP)

‒ SAN (iSCSI, Fiber Channel)

Volume Mirroring‒ DRBD

File System‒ LVM, XFS, ZFS, Btrfs,

ext3/4

Monitoring‒ Nagios / Icinga built-in

‒ Ceph Management (WIP)

openATTIC Architecture Technical Detail

Backend:‒ Python (Django)

‒ Django REST Framework

‒ Nagios / Icinga & PNP4Nagios

‒ Linux tools‒ LVM, LIO, DRBD

‒ Ceph API ‒ librados, librbd

Web Frontend ‒ AngularJS

‒ Bootstrap

‒ REST API

Automated Test Suites‒ Python unit tests

‒ Gatling‒ RESTful API

‒ Protractor / Jasmine ‒ WebUI test

openATTIC Architecture High Level Overview

Django

Linux OS Tools

openATTIC SYSTEMD

RESTful API

PostgreSQL

DBUS

ShellCeph Storage Cluster

librados/librbd

Web UI REST Client

HTTP

NoDB

openATTIC DevelopmentCurrent status

- Create and map RBDs as block devices (volumes)- Pool management Web UI (table view)- OSD management Web UI (table view)- RBD management Web UI (add/delete, table view)- Monitor a cluster health and performance- Support for managing Ceph with salt integration (WIP)

- Role management of node, monitor, storage, cephfs, iscsi, radosgw

Volume Management

Pool Listing

OSD Listing

RBD Listing

oA Ceph Roadmap future is in your hand

- Ceph Cluster Status Dashboard incl. Performance Graphs- Extend Pool Management- OSD Monitoring/Management- RBD Management/Monitoring- CephFS Management- RGW Management (users, buckets keys)- Deployment, remote configuration of Ceph nodes (via Salt)- Public Roadmap on the openATTIC Wiki to solicit community

feedback: http://bit.ly/28PCTWf

How ceph control storage cost?

Control storage growth and manage costs

$

Minimal recommendation

OSD Storage Node‒ 2GB RAM per OSD

‒ 1.5GHz CPU core per OSD

‒ 10GEb public and backend

‒ 4GB RAM for cache tier

MON Monitor Node‒ 3 Mons minimal

‒ 2GB RAM per node

‒ SSD System OS

‒ Mon and OSD should not be virtualized

‒ Bonding 10GEb

SUSE Storage Pricing

JBOD Storage

Mid-rangeArray

Mid-rangeNAS

High-endDisk Array

SUSE EnterpriseStorage

Fully FeaturedNAS Device

Entry-levelDisk Array

Use storage with multiple tiers

WRITE APPLICATION READ APPLICATION

Writing Quickly Application like:• e.g. Video Recording• e.g. Lots of IoT Data

Reading Quickly Application like:• e.g. Video Streaming• e.g. Big Data analysis

Write TierHot Pool

Normal TierCold Pool

Read TierHot Pool

SUSE Enterprise Storage Cluster

Normal TierCold Pool

How to create multiple price point?

1000$ = 1000G 2000MB rw4 PCIe = 4000$ = 8000MB rw 4T Storage 400,000 IOPS4$ per G

250$ = 1000G, 500MB rw16 Driver = 4000$ = 8000MB rw16T Storage 100,000 IOPS1$ per G

250$ = 8000G 150MB rw16 Driver = 4000$ = 2400MB rw128T Storage 2000 IOPS0.1$ per G

Control Costs

How EC reduce storage cost? $

Copy Copy Copy

OBJECT

Replication Pool

SES CEPH CLUSTSER

Control Costs

OBJECT

Erasure Coded Pool

SES CEPH CLUSTSER

Data Data Data DataParity Parity

Multiple Copy of stored data• 300% cost of data size• Low Latency, Faster Recovery

Single Copy with Parity• 150% cost of data size• Data/Parity ratio trade of CPU

Public Cloud Setup

H270-H70 - 40000$

- 48 Core * 8 : 384 Cores- 32G * 32: 1T Memory- 1T * 16: 16T SSD- 40GbE * 8

R120-T30 - 5700$ * 7- 48 Core * 7 : 336 Cores- 8 * 16G * 7 : 896G Memory- 1T * 2 * 7 : 14T SSD- 8T * 6 * 7 : 336T HDD - 40GbE * 7- 10GbE * 14

1000 Customer Running 5$ - Web Hosting = 5000$8 Months = 40000$

EC 5+2 is about 250T2500 Customer 100GB2$ Storage = 5000$8 Months = 40000$

For developer?

OSD1OSD2OSD3OSD4

MON1

OSD5OSD6OSD7OSD8

MON2

OSD9OSD10OSD11OSD12

MON3

Dual 1G Network

300$ 300$

6T = 220$220 * 3 = 660$

512G = 150$

300$

6T = 220$220 * 3 = 660$

512G = 150$

6T = 220$220 * 3 = 660$

512G = 150$

Pros and Cons of this mobile cluster

Price: ‒ Around 3200$ vs Expensive Laptops

Size:

‒ 50T and 20kg is mobile enough to demo a usable cluster

‒ Real HDD better for presentation of a storage solution

Benchmark:

‒ Beside Networking capability, all features and requirement of a ceph cluster meet

Features:

‒ Great fit for developers and tester to perform software base test but something that VM can’t be done

How DevOps story fit?Introduce you salt

Support today’s investment and adapt to the future

PRESENT

FUTURE

£€

$

Salt enable cephExisting capability

Sesceph‒ Python API library that help deploy and manage ceph

‒ Already upstream in to salt available in next release

‒ https://github.com/oms4suse/sesceph

Python-ceph-cfg‒ Python salt module that use sesceph to deploy

‒ https://github.com/oms4suse/python-ceph-cfg

Both library come with SES3.0 already

Why Salt? Existing capability

Product setup

‒ SUSE OpenStack cloud, SUSE manager and SUSE Enterprise Storage all come with salt enable

Parallel execution

‒ E.g. Compare to ceph-deploy to prepare OSD

Customize Python module

‒ Continuous development on python api easy to manage

Flexible Configuration

‒ Default Jinja2 + YAML ( stateconf )

‒ Pydsl if you like python directly, json, pyobject, etc

Create a cluster with a single stage file

https://github.com/AvengerMoJo/Ceph-Saltstack/blob/master/stages/ses/ceph/

ceph_create.sls

This is a show case of how a simple way to create a cluster with a simple stage file

It is up to your custom to create your own easily

Quick deployment example

Git repo for fast deploy and benchmark- https://github.com/AvengerMoJo/Ceph-Saltstack

Demo recording- https://asciinema.org/a/4hmdsrksn0fd8fgpssdgqsjdb

1) Salt setup2) Git clone and copy module to salt _modules3) Saltutil.sync_all push to all minion nodes 4) ntp_update all nodes 5) Create new mons, and create keys 6) Clean disk partitions and prepare OSD7) Update crushmap

Reduce storage costs and management with SUSE Enterprise Storage

Manage Less

Adapt Quickly

Control Costs

Scale storage from terabytes to hundreds of petabytes without downtime

SOCIAL MEDIA

BUSINESS OPERATIONS

MOBILE DATA

CUSTOMER DATA

% 100UPTIME