SUSE High Availability for SAP HANA...1 SUSE High Availability for SAP HANA Tales from the real...
Transcript of SUSE High Availability for SAP HANA...1 SUSE High Availability for SAP HANA Tales from the real...
-
1
SUSE High Availability for SAP HANATales from the real world, tipps, tricks & troubleshooting
BP-1351
Thomas Korber Lars [email protected] [email protected]
mailto:[email protected]
-
2
Abstract
SAP HANA is the in-memory database for transactional and analytical
workloads. The SUSE HA cluster is an industry-leading open source high
availability clustering system designed to virtually eliminate unplanned
downtime. Combined, this two technologies are core IT for world's most
relevant enterprises. Learn on best practices from current projects for
setting up and troubleshooting SUSE HA and SAP HANA in scale-up and
scale-out scenarios and related Linux components.
-
3
Agenda
1. Best Practices SUSE HA for SAP HANA
2. Where to look for what
3. SAPHanaSR Scale-Up Demo
4. Conclusion
-
4
Best PracticesSUSE HA for SAP HANA
-
5
SAPHanaSR Scenarios
pacemaker
SAP HANA primary
SAP HANA secondary
PRD
SAPHana Promoted
SAPHanaTopology
SAPHana Demoted
SAP HANA secondary
SAP HANA primary
PRD
SAP HANA secondary
PRD
SAPHanaTopology
vIP vIP
pacemaker
active / active
SAP HANA primary
SAP HANA secondary
System Replication
PRD
SAPHana Promoted
SAPHanaTopology
SAPHana Demoted
SAPHanaTopology
SAP HANA primary
SAP HANA secondary
vIPvIP
PRD
pacemaker
active / active
SAP HANA primary
SAP HANA secondary
System Replication
PRD
SAPHana Promoted
SAPHanaTopology SAPHana Demoted
SAPHanaTopology
PRD
SAPInstance
SAP HANA primary
SAP HANA secondary
vIP
SAP HANA QAS
QAS
vIP
NodeA1
1
NodeA2
2 3 N
NodeA3 NodeA4 NodeA5
...
NodeB1
1
NodeB2
2 3 N
NodeB3 NodeB4 NodeB5
...
SR sync
Majority maker
SAP HANA primary SAP HANA secondaryvIP
SAPHanaTopology SAPHanaTopology
SAPHanaController D emotedSAPHanaController D emotedP
Performance Optimized - Scale Up Cost Optimized - Scale Up
Multi Target - Scale Up Performance Optimized - Scale Out
Fencing
Fencing
Fencing
Fencing
-
6
Fencing
Remote Mgmt.
IO Fencing(locking, reservation)
Node Fencing *(reboot, shutdown)
Built-in Locking
In Node
External
SBD + Watchdog (diskbased, diskless)
iLODRACIPMI
...
vCenterlibvirtHMCEC2...
hpwdtiTCO_wdtipmi_wdtsoftdog
...
Pure Locking
SFEX
SCSI2 ReservationSCSI3 Reservation
cLVM+DLMLVM+lvmlockd+DLMCluster MD TODO
Cluster-based
MD-RAIDCluster-handled
In Cluster
External
* mandatory
Fencing Phenomenology
-
7
Cluster and Fencing TopologyTwo Sites Scale-Up
2 nodes, 2 FC SBDs
2 nodes, 2 iSCSI SBDs
Three Sites Scale-Up
2 nodes, 3 FC SBDs
2 nodes, 2 FC SBDs, 1 iSCSI SBD
2 nodes, 3 iSCSI SBDs
( 2 nodes, 1 iSCSI SBD )
2+1 nodes, diskless SBD
Three Sites Scale-Up
2xN+1 nodes, 3 FC SBDs
2xN+1 nodes, 2 FC SBDs, 1 iSCSI SBD
2xN+1 nodes, 3 iSCSI SBDs
2xN+1 nodes, diskless SBD
-
8
It's good NOT to do:- directly re-use concepts from other cluster solutions- cluster resource, STONITH, and SBD timings shorter than SAN timings- OCFS2 if no concurrent access is needed- without stonith at all- manually changing status of cluster-controlled resources - other software use the watchdog in parallel to SBD- issue commands to cluster while it is in transition - go live without tests planned and done
-
9
+ two independent LAN links for cluster communication
+ two or three SBD devices, or diskless SBD
+ adapt resource time-outs to infrastructure. e.g. SAN MPIO or VMotion
+ make CIB simple, e.g. few groups instead of many constraints
+ resource naming schema, e.g. prefixes rsc_, msl_, grp_, ord_, loc_, col_
+ set up cluster step-by-step
+ use crm
+ always issue crm unmigrate after migration has completed
+ check cluster for clean idle state before triggering actions
+ be patient, respect cluster timings
+ define and perform tests for all failure scenarios
It's good to do:
-
10
Where to lookfor what
-
11
● Config files /etc/hosts /etc/ntp.conf /etc/multipath.conf
/etc/modules-load.d/ /etc/sysconfig/sbd
/etc/corosync/corosync.conf /etc/sudoers /usr/sap/sapservices /usr/sap/$SID/SYS/profile/$SID_HDB$nr_$host /usr/sap/$SID/SYS/global/hdb/custom/config/global.ini /hana/shared/myHooks/SAPHanaSR.py
● Log files /var/log/messages /var/lib/pacemaker/pengine/pe-input-*.bz2 /usr/sap/$SID/HDB$nr/$host/trace/
See man sbd, stonith_sbd, crm_no_quorum_policy, sudoers, multipath.conf, corosync.conf,SAPHanaSR-ScaleOut_basic_cluster
Config and Log Files
-
12
# sg_persist --read-reservation --device=/dev/...# cs_show_hana_autofailover --all# cs_show_error_patterns -c | grep -v ”=.0”# cs_show_hana_info --info $SID $nr# cs_show_memory# cs_sum_base_config# rear --help
~> sapcontrol -nr $nr -function StartSystem~> sapcontrol -nr $nr -function StopSystem ALL~> sapcontrol -nr $nr -function GetSystemInstanceList~> hdbnsutil -sr_state~> HDBsettings.sh systemOverview.py~> HDBsettings.sh systemReplicationStatus.py~> HDBsettings.sh landscapeHostConfiguration.py
Useful Commands online - plain
-
13
~> HDBSettings.sh landscapeHostConfiguration.py; echo RC:$?| Host | Host | Host | Failov| Remove | Stor | Stor | Failov | Failov | NameSrv | NameSrv | IndexSrv | IndexSrv | Host | Host | ...| | Actv | Status | Status| Status | Config| Actual| Config | Actual | Config | Actual | Config | Actual | Config | Actual || | | | | | Part | Part | Group | Group | Role | Role | Role | Role | Roles | Roles || ----- | ---- | ------ | ----- | ------ | ----- | ----- | ------ | ------ | -------- | ------- | -------- | -------- | ------- | ------- || db101 | yes | ok | | | 1 | 1 | deflt | deflt | master 1 | master | worker | master | worker | worker || db102 | yes | ok | | | 2 | 2 | deflt | deflt | master 2 | slave | worker | slave | worker | worker || db103 | yes | ignore | | | 0 | 0 | deflt | deflt | master 3 | slave | standby | standby | standby | standby | ...overall host status: okRC:4 ~> HDBSettings.sh systemReplicationStatus.py; echo RC:$?| Database | Host | Port| ServiceName | VolumeID | SiteID | SiteName | Secondary | Sec | Sec | Sec | Sec | Repl | Repl | Repl | | | | | | | | | Host | Port| SiteID | SiteName | Active | Mode | Status | Details | | -------- | ------| --- | ----------- | -------- | ------ | ---------| --------- | --- | ------ | -------- | ------ | ---- | ------ | ------- | | SYSTEMDB | db101 |31001| nameserver | 1 | 1 | DC1 | db401 |31001| 2 | DC2 | YES | SYNC | ACTIVE | | | P04 | db101 |31007| xsengine | 2 | 1 | DC1 | db401 |31007| 2 | DC2 | YES | SYNC | ACTIVE | | | P04 | db101 |31003| indexserver | 3 | 1 | DC1 | db401 |31003| 2 | DC2 | YES | SYNC | ACTIVE | | | P04 | db102 |31003| indexserver | 4 | 1 | DC1 | db402 |31003| 2 | DC2 | YES | SYNC | ACTIVE | |status system replication site "2": ACTIVEoverall system replication status: ACTIVE
Local System Replication State~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~mode: PRIMARYsite id: 1site name: DC1RC:15
Note: output beautified.
Note: output beautified
Example master+worker+standby
-
14
Useful Commands online - cluster# crm_mon -1Ar# crm configure show | grep cli-# cs_clusterstate -i# SAPHanaSR-showAttr
# SAPHanaSR-monitor
-
15
Example master+worker+standby# SAPHanaSR-showAttrGlobal cib-time prim sec srHook sync_state -----------------------------------------------------------global Thu Aug 1 09:39:01 2019 DC1 - SOK SOK
Sit lpt lss mns srr -------------------------------------DC1 1564645080 4 db102 P DC2 30 4 db403 S
Hosts clone_state node_state roles score --------------------------------------------------------------------------db101 online master1:master:worker:master -33333 db102 PROMOTED online master3:slave:worker:slave 110 db103 online master2:slave:standby:standby db401 online master2:master:worker:master db402 online master3:slave:worker:slave db403 online master1:slave:standby:standby mm888 online :shtdown:shtdown:shtdown
Note: output beautified, does not match other examples.
-
16
Useful Commands offline# cs_show_supportconfig -g $supportconfig_directory # cs_show_supportconfig -f $supportconfig_directory chk_saphana chk_sleha# cs_show_hana_autofailover_patterns --all $date message-\*# crm_simulate -S --xml-file $pengine-input# crm_mon --xml_file $pengine-input# SAPHanaSR-showAttr --sid=$SID:$nr --cib=$path/cib.xml# SAPHanaSR-replay-archive --format script $crm_report | \ SAPHana-filter --host='Hosts/$host/role' --filterDouble
-
17
SAPHanaSRScale-Up
Demo
-
18
SAPHanaSR Scale-Up Demo
-
19
Conclusion
-
20
TUT-1092Bootstrapping SLES for
SAP HANA & NetWeaver clusters with Terraform & Salt on public clouds
TUT-1212Running SAP Data Hub
on Kubernetes with SUSE CaaS Platform
BP-1209Planning, deployment,
maintenance, & operations of SAP S/4HANA
FUT-1439 SUSE Linux Enterprise
Server for SAP applications: The road ahead
TUT-1226
SAP HA on SUSE – All you need to know
HOL-1225 High Availability for
SAP application servers using ENSA2 enqueue replicationBP-1351
SUSE High Availability forSAP HANA: Tales from thereal world, tipps & tricks, &
troubleshooting
Related Sessions
-
21
More Informationhttps://www.suse.com/products/sles-for-saphttps://documentation.suse.com/sbp/allhttps://www.suse.com/c/tag/towardszerodowntime/https://www.suse.com/service/traininghttps://documentation.suse.com/sbp/allhttps://www.suse.com/c/saphanasr-scaleout-automating-sap-hana-system-replication-scale-installations-sles-sap-applications/https://www.suse.com/c/tag/supportconfig-analysis-sca-tools/https://software.opensuse.org/package/python-cluster-preflight-checkhttps://github.com/Thr3d/supportutils-plugin-suse-saphttps://github.com/SUSE/node_exporterhttps://github.com/SUSE/hanadb_exporterhttps://github.com/ClusterLabs/ha_cluster_exporterhttps://blogs.sap.com/2017/11/19/be-prepared-for-using-pacemaker-cluster-for-sap-hana-part-1-basics/https://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.03/en-US/a165e192ba374c2a8b17566f89fe8419.htmlhttps://www.sap.com/documents/2016/06/84ea994f-767c-0010-82c7-eda71af511fa.htmlhttps://www.b1-systems.de/
https://www.suse.com/products/sles-for-saphttps://documentation.suse.com/sbp/allhttps://www.suse.com/c/tag/towardszerodowntime/https://www.suse.com/service/traininghttps://documentation.suse.com/sbp/allhttps://www.suse.com/c/saphanasr-scaleout-automating-sap-hana-system-replication-scale-installations-sles-sap-applications/https://www.suse.com/c/tag/supportconfig-analysis-sca-tools/https://software.opensuse.org/package/python-cluster-preflight-checkhttps://github.com/Thr3d/supportutils-plugin-suse-saphttps://github.com/SUSE/node_exporterhttps://github.com/SUSE/hanadb_exporterhttps://github.com/ClusterLabs/ha_cluster_exporterhttps://blogs.sap.com/2017/11/19/be-prepared-for-using-pacemaker-cluster-for-sap-hana-part-1-basics/https://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.03/en-US/a165e192ba374c2a8b17566f89fe8419.htmlhttps://www.sap.com/documents/2016/06/84ea994f-767c-0010-82c7-eda71af511fa.htmlhttps://www.b1-systems.de/
-
22
Conclusion – what to take with● HANA System Replication is complex, particularly scale-out.
● Topology of cluster and fencing defines how well failures are handled.
● Understanding of resources is needed for planning, building, running and trouble shooting this clusters.
● Tools and documented procedures can help. E.g. SAPHanaSR-showAttr, SAPHanaSR-replay-archive.
● Cluster and maintenance procedures have to be tested carefully before going live.
● Professional services are available, e.g. review before going live.
-
23
Q&A
-
24
General Disclaimer
This document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for SUSE products remains at the sole discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in this presentation are trademarks or registered trademarks of SUSE, LLC, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners.