Ceph Day Bring Ceph To Enterprise
-
Upload
alex-lau -
Category
Technology
-
view
432 -
download
1
Transcript of Ceph Day Bring Ceph To Enterprise
Bring Ceph to Enterprise Setup a 50T mobile cluster in 30 min
Alex Lau (劉俊賢 )Software [email protected]
Block Storage
File System
Object Storage
001011010101010111011011101110001011010
How to access to ceph storage?Introduction of iSCSI
Remote ClusterData Encrypted
at Rest
MonitorNodes
Management Node
Heterogeneous OS Access
RADOS gateway
RESTful api
iSCSI
001110111010110101101011010110110111010001110111101110100101101011101111011011
111010110101101101110100111010110101010
SUSE Enterprise Storage 3
A first commercial available ISCSI access to connect to SES3. It allow client access to ceph storage remotely over TCP/IP iscsi protocol. SES3 provide a iscsi Target driver on top of RDB ( RADOS block device ). This allow any iscsi Initiator can access SES3 over network.
iSCSI Architecture Technical Background
Protocol: ‒ Block storage access over TCP/IP
‒ Initiators the client that access the iscsi target over tcp/ip
‒ Targets, the server that provide access to a local block
SCSI and iSCSI:‒ iSCSI encapsulated commands and responses
‒ TCP package of iscsi is representing SCSI command
Remote access:‒ iSCSI Initiators able to access a remote block like local disk
‒ Attach and format with XFS, brtfs etc.
‒ Booting directly from a iscsi target is supported
Public Network
OSD1 OSD2 OSD3 OSD4
Before iSCSI RBD support … PRESENT
FUTURE
£€
$
Target System
RBD Block LIO to ISCSI
Initiator System
Before iSCSI support what’s wrong? Missing features
LIO over RBD: ‒ It doesn’t support “atomic compare and write”
‒ It doesn’t support “persistent group reservations”
iSCSI:‒ ISCSI Active/Active Multiple Path MPIO is not supported
‒ Block layer support all these require a different approach
Benefit of iSCSI LIO gateway for RBD
Multiple Platform access to ceph: ‒ It doesn’t require to be part of the cluster like radosgw
Standard iSCSI interface:‒ Most OS support iSCSI
‒ Open-iscsi in most Linux OS
LIO Linux IO Target:‒ In kernel scsi target implementation
Flexible configuration:‒ Targetcli utility is available with lrbd
Config RBD iSCSI gateway Introduction of lrbd
Easy Setup:‒ Package bundle with iscsi since SES2.0
‒ Multi-Node configuration support with targetcli
Technical Background:‒ JSON configuration format
‒ Target, Portals, Pools, Auth
‒ Configuration state stored in ceph cluster
Related Link:‒ https://github.com/swiftgist/lrbd
‒ https://github.com/swiftgist/lrbd/wiki
9
iSCSI Gateway Optimizations
Efficient handling of certain SCSI operations: ‒ Offload RBD image IO to OSDs
‒ Avoid Locking on iSCSI gateway nodes
‒ Compare and Write ‒ New cmpext OSD operation to handle RBD data comparison
‒ Dispatch as compound cmpext+write OSD request
‒ Write Same ‒ New writesame OSD operation to expand duplicate data at the OSD
‒ Reservations‒ State stored as RBD image extended attribute
‒ Updated using compound cmpxattr+setxattr OSD request
10
Public Network
OSD1 OSD2 OSD3 OSD4
Multiple Path Support with iSCSI on RBD
PRESENT
FUTURE
£€
$
Cluster Network
iSCSI Gateway
RBD Module
iSCSI Gateway
RBD Module
iSCSI Initiator
RBD image
How to manage storage growth and costs of ceph ?
Easily scale and manage data storage
Control storage growth and manage costs
Support today’s investment and adapt to the future
$
PRESENT
FUTURE
£€
$
Introduction to openATTIC
Easily scale and manage data storage
SUSE Enterprise Storage ManagementVision
Open Source : ‒ Alternative to proprietary storage management systems
Enterprise:‒ Work as expected with traditional storage unified storage
interface e.g. NAS, SAN
SDS Support:‒ Provide initial ceph setup in managing and monitoring to
ease in complicated scale out scenarios
It will be available in next SES release or download it now at
https://build.opensuse.org/package/show/filesystems:openATTIC/openattic
openATTIC Features Existing capability
Modern Web UI
RESTful API‒ Software Defined Storage
Unified Storage‒ NAS (NFS, CIFS, HTTP)
‒ SAN (iSCSI, Fiber Channel)
Volume Mirroring‒ DRBD
File System‒ LVM, XFS, ZFS, Btrfs,
ext3/4
Monitoring‒ Nagios / Icinga built-in
‒ Ceph Management (WIP)
openATTIC Architecture Technical Detail
Backend:‒ Python (Django)
‒ Django REST Framework
‒ Nagios / Icinga & PNP4Nagios
‒ Linux tools‒ LVM, LIO, DRBD
‒ Ceph API ‒ librados, librbd
Web Frontend ‒ AngularJS
‒ Bootstrap
‒ REST API
Automated Test Suites‒ Python unit tests
‒ Gatling‒ RESTful API
‒ Protractor / Jasmine ‒ WebUI test
openATTIC Architecture High Level Overview
Django
Linux OS Tools
openATTIC SYSTEMD
RESTful API
PostgreSQL
DBUS
ShellCeph Storage Cluster
librados/librbd
Web UI REST Client
HTTP
NoDB
openATTIC DevelopmentCurrent status
- Create and map RBDs as block devices (volumes)- Pool management Web UI (table view)- OSD management Web UI (table view)- RBD management Web UI (add/delete, table view)- Monitor a cluster health and performance- Support for managing Ceph with salt integration (WIP)
- Role management of node, monitor, storage, cephfs, iscsi, radosgw
Volume Management
Pool Listing
OSD Listing
RBD Listing
oA Ceph Roadmap future is in your hand
- Ceph Cluster Status Dashboard incl. Performance Graphs- Extend Pool Management- OSD Monitoring/Management- RBD Management/Monitoring- CephFS Management- RGW Management (users, buckets keys)- Deployment, remote configuration of Ceph nodes (via Salt)- Public Roadmap on the openATTIC Wiki to solicit community
feedback: http://bit.ly/28PCTWf
How ceph control storage cost?
Control storage growth and manage costs
$
Minimal recommendation
OSD Storage Node‒ 2GB RAM per OSD
‒ 1.5GHz CPU core per OSD
‒ 10GEb public and backend
‒ 4GB RAM for cache tier
MON Monitor Node‒ 3 Mons minimal
‒ 2GB RAM per node
‒ SSD System OS
‒ Mon and OSD should not be virtualized
‒ Bonding 10GEb
SUSE Storage Pricing
JBOD Storage
Mid-rangeArray
Mid-rangeNAS
High-endDisk Array
SUSE EnterpriseStorage
Fully FeaturedNAS Device
Entry-levelDisk Array
Use storage with multiple tiers
WRITE APPLICATION READ APPLICATION
Writing Quickly Application like:• e.g. Video Recording• e.g. Lots of IoT Data
Reading Quickly Application like:• e.g. Video Streaming• e.g. Big Data analysis
Write TierHot Pool
Normal TierCold Pool
Read TierHot Pool
SUSE Enterprise Storage Cluster
Normal TierCold Pool
How to create multiple price point?
1000$ = 1000G 2000MB rw4 PCIe = 4000$ = 8000MB rw 4T Storage 400,000 IOPS4$ per G
250$ = 1000G, 500MB rw16 Driver = 4000$ = 8000MB rw16T Storage 100,000 IOPS1$ per G
250$ = 8000G 150MB rw16 Driver = 4000$ = 2400MB rw128T Storage 2000 IOPS0.1$ per G
Control Costs
How EC reduce storage cost? $
Copy Copy Copy
OBJECT
Replication Pool
SES CEPH CLUSTSER
Control Costs
OBJECT
Erasure Coded Pool
SES CEPH CLUSTSER
Data Data Data DataParity Parity
Multiple Copy of stored data• 300% cost of data size• Low Latency, Faster Recovery
Single Copy with Parity• 150% cost of data size• Data/Parity ratio trade of CPU
Public Cloud Setup
H270-H70 - 40000$
- 48 Core * 8 : 384 Cores- 32G * 32: 1T Memory- 1T * 16: 16T SSD- 40GbE * 8
R120-T30 - 5700$ * 7- 48 Core * 7 : 336 Cores- 8 * 16G * 7 : 896G Memory- 1T * 2 * 7 : 14T SSD- 8T * 6 * 7 : 336T HDD - 40GbE * 7- 10GbE * 14
1000 Customer Running 5$ - Web Hosting = 5000$8 Months = 40000$
EC 5+2 is about 250T2500 Customer 100GB2$ Storage = 5000$8 Months = 40000$
For developer?
OSD1OSD2OSD3OSD4
MON1
OSD5OSD6OSD7OSD8
MON2
OSD9OSD10OSD11OSD12
MON3
Dual 1G Network
300$ 300$
6T = 220$220 * 3 = 660$
512G = 150$
300$
6T = 220$220 * 3 = 660$
512G = 150$
6T = 220$220 * 3 = 660$
512G = 150$
Pros and Cons of this mobile cluster
Price: ‒ Around 3200$ vs Expensive Laptops
Size:
‒ 50T and 20kg is mobile enough to demo a usable cluster
‒ Real HDD better for presentation of a storage solution
Benchmark:
‒ Beside Networking capability, all features and requirement of a ceph cluster meet
Features:
‒ Great fit for developers and tester to perform software base test but something that VM can’t be done
How DevOps story fit?Introduce you salt
Support today’s investment and adapt to the future
PRESENT
FUTURE
£€
$
Salt enable cephExisting capability
Sesceph‒ Python API library that help deploy and manage ceph
‒ Already upstream in to salt available in next release
‒ https://github.com/oms4suse/sesceph
Python-ceph-cfg‒ Python salt module that use sesceph to deploy
‒ https://github.com/oms4suse/python-ceph-cfg
Both library come with SES3.0 already
Why Salt? Existing capability
Product setup
‒ SUSE OpenStack cloud, SUSE manager and SUSE Enterprise Storage all come with salt enable
Parallel execution
‒ E.g. Compare to ceph-deploy to prepare OSD
Customize Python module
‒ Continuous development on python api easy to manage
Flexible Configuration
‒ Default Jinja2 + YAML ( stateconf )
‒ Pydsl if you like python directly, json, pyobject, etc
Create a cluster with a single stage file
https://github.com/AvengerMoJo/Ceph-Saltstack/blob/master/stages/ses/ceph/
ceph_create.sls
This is a show case of how a simple way to create a cluster with a simple stage file
It is up to your custom to create your own easily
Quick deployment example
Git repo for fast deploy and benchmark- https://github.com/AvengerMoJo/Ceph-Saltstack
Demo recording- https://asciinema.org/a/4hmdsrksn0fd8fgpssdgqsjdb
1) Salt setup2) Git clone and copy module to salt _modules3) Saltutil.sync_all push to all minion nodes 4) ntp_update all nodes 5) Create new mons, and create keys 6) Clean disk partitions and prepare OSD7) Update crushmap
Reduce storage costs and management with SUSE Enterprise Storage
Manage Less
Adapt Quickly
Control Costs
Scale storage from terabytes to hundreds of petabytes without downtime
SOCIAL MEDIA
BUSINESS OPERATIONS
MOBILE DATA
CUSTOMER DATA
% 100UPTIME