Managing and Monitoring SUSE Enterprise Storage · SUSE Enterprise Storage Tim Serong Senior...
Transcript of Managing and Monitoring SUSE Enterprise Storage · SUSE Enterprise Storage Tim Serong Senior...
Managing and Monitoring SUSE Enterprise Storage
Tim SerongSenior Clustering Engineer
Eric JacksonSenior Software Developer Distributed Storage
Tim SerongSenior Clustering Engineer
2
SUSE Enterprise Storage...in 30 seconds or less
• Massively scalable
• No bottlenecks or single points of failure
• Object storage, block storage
• Based on Ceph
3
SUSE Enterprise Storage...in 30 seconds or less
• Data stored redundantly
• Lots of disks (OSDs) in lots of storage nodes
• A few monitor (MON) nodes
• All on commodity hardware
Deployment
5
DIY Deployment
• Boot a bunch of nodes
• Install SUSE Linux Enterprise Server 12
• Add SUSE Enterprise Storage
• Run ceph-deploy
6
DIY Deployment
# ceph-deploy new node1 node2 node3
# ceph-deploy mon create-initial
# ceph-deploy osd prepare node1:sdb
# ceph-deploy osd prepare node1:sdc
…
# ceph-deploy calamari --master node0 \
connect node1 node2 node3 …
7
Or, Take a Crowbar to it
• Same technology as SUSE OpenStack Cloud
• Install one admin node
• PXE boot everything else
• Click to deploy
8
Or, Take a Crowbar to it
9
Or, Take a Crowbar to it
10
Or, Take a Crowbar to it
11
Or, Take a Crowbar to it
12
Or, Take a Crowbar to it
13
Or, Take a Crowbar to it
14
Or, Take a Crowbar to it
15
Or, Take a Crowbar to it
16
Or, Take a Crowbar to it
17
Or, Take a Crowbar to it
18
Or, Take a Crowbar to it
19
Or, Take a Crowbar to it
iSCSI
21
Configuring iSCSI today
• RBD mapped devices
• Targetcli
- interactive
- command line
• Iblock backstore
22
Configuring iSCSI challenges
• Simple configuration requires a dozen steps
• Sometimes command order matters
• Locally saved configuration
• Synchronizing redundancy across gateways
• Experimentation can be cumbersome
23
Lrbd
• Uses rbd and targetcli
• Configuration stored in Ceph
• Synchronization is automatic
• Experimentation is quick
• Configuration format is JSON
• Command line options, man pages
24
Lrbd information
• Github https://github.com/SUSE/lrbd
• Wiki tutorial https://github.com/SUSE/lrbd.wiki
• 30 configuration samples
- /usr/share/doc/packages/lrbd/samples
Monitoring
26
Is it Working?
# ceph status
cluster 565bbaaf-11e9-4105-934a-6b468f0b7b7ehealth HEALTH_OKmonmap e1: 1 mons at {node1=192.168.124.81:6789/0} election epoch 1, quorum 0 node1osdmap e12: 2 osds: 2 up, 2 in pgmap v118: 64 pgs, 1 pools, 1024 kB data, 3 objects 74192 kB used, 38817 MB / 38889 MB avail 64 active+clean
27
Is it Working?
# ceph status
cluster 565bbaaf-11e9-4105-934a-6b468f0b7b7ehealth HEALTH_WARN 33 pgs degraded; 35 pgs stuck... monmap e1: 3 mons at {ceph2=...,ceph3=...,ceph4=...}, election epoch 22, quorum 0,1,2 ceph2,ceph3,...osdmap e411: 52 osds: 52 up, 52 inpgmap v1014: 4288 pgs, 4 pools, 0 bytes data, ... 2466 MB used, 12659 GB / 12662 GB avail 33 active+degraded 2 active+remapped 4253 active+clean
28
Is it Working?
# ceph status
cluster c9d3ae97-2f4c-4d91-a3f7-ff42bce754dfhealth HEALTH_WARN 2174 pgs backfill; 367 pgs backfilling; 3271 pgs degraded; 23 pgs down; 57 pgs peering; 35 pgs recovering; 188 pgs recovery_wait; 227 pgs stale; 26 pgs stuck inactive; 3065 pgs stuck unclean; recovery 2519083/14004502 objects degraded (17.988%); 1/148 in osds are downmonmap e3: 3 mons at {a001=172.16.25.1:6789/0,a002=172.168.25.2:6789/0,a003=172.16.25.3:6789/0}, election ...osdmap e51357: 168 osds: 147 up, 148 in pgmap v7243525: 20480 pgs, 5 pools, 1183 GB data, 5609 objects 3330 GB used, 88846 GB / 92177 GB avail 2519083/14004502 objects degraded (17.988%) 9 inactive 16755 active+clean 10 degraded+remapped 9 active+degraded+remapped 361 active+degraded+remapped+backfilling 39 stale+active+degraded+remapped+wait_backfill 30 peering 154 active+recovery_wait 171 stale+active+clean 22 active+recovery_wait+degraded+remapped 14 active+remapped+wait_backfill 23 down+peering 3 stale+active+degraded+remapped+backfilling 3 stale+active+recovery_wait 2 active+remapped 19 degraded 673 active+degraded 3 active+remapped+backfilling 4 remapped+peering 6 active+recovery_wait+remapped 3 active+recovery_wait+degraded 11 stale+active+degraded 2121 active+degraded+remapped+wait_backfill 35 active+recovering
29
Is it Working?
• What do we care about, right now?‒ Overall cluster health
‒ MON quorum
‒ OSD status
‒ PG status
‒ Disk used/free
‒ Is anything dead?
30
Is it Working Well?
• What do we care about, long term?‒ What's CPU bound?
‒ What's disk bound?
‒ What's network bound?
31
Enter Calamari and Romana
• Ceph GUI, included with SUSE Enterprise Storage
• Calamari is the backend (REST API)
• Romana is the frontend (GUI)
• Provides monitoring and some management
32
Cluster Status
33
Cluster Status
34
Cluster Performance
35
Cluster Performance
36
Cluster Performance
Management
38
When Everything is Fine
• OSD management
• Pool / Placement Group management
• iSCSI targets
• Users, authentication
• Adding new nodes, disks, etc.
39
Cluster Settings
40
OSD Management
41
Pool / Placement Group Management
42
Pool / Placement Group Management
43
CLI Tools for...
• iSCSI targets
• Users, authentication
• Adding new nodes, disks, etc.
44
When Everything is Not Fine
• Dead disks
• Dead nodes
• Half the building is on fire...
45
Calamari will tell you about it...
46
...and help you find the problem
47
Again, Other Tools for...
• Redeploying
• Adding new nodes
• Replacing OSDs
Beneath the Surface
49
Calamari Consists Of...
• Romana (the frontend)
• Calamari (the backend REST API)
• Salt (communication and minor configuration)
• Graphite (metrics, graphs)
50
Usually...
# zypper in romana
# calamari-ctl initialize[INFO] Loading configuration…[INFO] Starting/enabling salt... ...Username (leave blank to use 'root'): ...
# ceph-deploy calamari --master node0 \
connect node1 node2 node3 …
51
In Practice, Right Now
• Calamari Node:‒ Romana, Calamari, Salt Master, Graphite/Carbon
• Storage Nodes:‒ Salt Minion, Diamond
52
Variations
• More Salt with your Calamari?
• Longer metric retention/etc/carbon/storage-schemas.conf
• BYO Graphite
• Less Salt with your Calamari?
Alternatives
54
Alternatives
• Intel Virtual Storage Manager
• InkScope
• openATTIC
55
Virtual Storage Manager
• Intel
• Openstack Horizon Interface
• Django, Python
• Released 2.0 beta 1 on August 21
• Packages incomplete at OBS
- https://build.opensuse.org/project/show/home:swiftgist:vsm
56
InkScope
• Orange Labs
• AngularJS and Python REST api
• Rados gateway user management
• Packages at OBS
- https://build.opensuse.org/package/show/home:swiftgist/inkscope
Images from https://github.com/inkscope/inkscope/tree/master/screenshots
57
58
59
60
openATTIC
• IT-Novum GmbH
• AngularJS and Python REST api
• Released 2.0.1 on July 21
• Complete Linux storage management system
• Demo
- http://demo.openattic.org/openattic/
61
62
63
Questions?
Thank you.
65
For more information aboutSUSE Enterprise Storage:http://suse.com/storage
66
Corporate HeadquartersMaxfeldstrasse 590409 NurembergGermany
+49 911 740 53 0 (Worldwide)www.suse.com
Join us on:www.opensuse.org
67
Unpublished Work of SUSE LLC. All Rights Reserved.This work is an unpublished work and contains confidential, proprietary and trade secret information of SUSE LLC. Access to this work is restricted to SUSE employees who have a need to know to perform tasks within the scope of their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated, abridged, condensed, expanded, collected, or adapted without the prior written consent of SUSE. Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability.
General DisclaimerThis document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for SUSE products remains at the sole discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners.