SUSE High Availability for SAP HANA...1 SUSE High Availability for SAP HANA Tales from the real...

1

SUSE High Availability for SAP HANATales from the real world, tipps, tricks & troubleshooting

BP-1351

Thomas Korber Lars [email protected] [email protected]

mailto:[email protected]

2

Abstract

SAP HANA is the in-memory database for transactional and analytical

workloads. The SUSE HA cluster is an industry-leading open source high

availability clustering system designed to virtually eliminate unplanned

downtime. Combined, this two technologies are core IT for world's most

relevant enterprises. Learn on best practices from current projects for

setting up and troubleshooting SUSE HA and SAP HANA in scale-up and

scale-out scenarios and related Linux components.

3

Agenda

1. Best Practices SUSE HA for SAP HANA

2. Where to look for what

3. SAPHanaSR Scale-Up Demo

4. Conclusion

4

Best PracticesSUSE HA for SAP HANA

5

SAPHanaSR Scenarios

pacemaker

SAP HANA primary

SAP HANA secondary

PRD

SAPHana Promoted

SAPHanaTopology

SAPHana Demoted

SAP HANA secondary

SAP HANA primary

PRD

SAP HANA secondary

PRD

SAPHanaTopology

vIP vIP

pacemaker

active / active

SAP HANA primary

SAP HANA secondary

System Replication

PRD

SAPHana Promoted

SAPHanaTopology

SAPHana Demoted

SAPHanaTopology

SAP HANA primary

SAP HANA secondary

vIPvIP

PRD

pacemaker

active / active

SAP HANA primary

SAP HANA secondary

System Replication

PRD

SAPHana Promoted

SAPHanaTopology SAPHana Demoted

SAPHanaTopology

PRD

SAPInstance

SAP HANA primary

SAP HANA secondary

vIP

SAP HANA QAS

QAS

vIP

NodeA1

1

NodeA2

2 3 N

NodeA3 NodeA4 NodeA5

...

NodeB1

1

NodeB2

2 3 N

NodeB3 NodeB4 NodeB5

...

SR sync

Majority maker

SAP HANA primary SAP HANA secondaryvIP

SAPHanaTopology SAPHanaTopology

SAPHanaController D emotedSAPHanaController D emotedP

Performance Optimized - Scale Up Cost Optimized - Scale Up

Multi Target - Scale Up Performance Optimized - Scale Out

Fencing

Fencing

Fencing

Fencing

6

Fencing

Remote Mgmt.

IO Fencing(locking, reservation)

Node Fencing *(reboot, shutdown)

Built-in Locking

In Node

External

SBD + Watchdog (diskbased, diskless)

iLODRACIPMI

...

vCenterlibvirtHMCEC2...

hpwdtiTCO_wdtipmi_wdtsoftdog

...

Pure Locking

SFEX

SCSI2 ReservationSCSI3 Reservation

cLVM+DLMLVM+lvmlockd+DLMCluster MD TODO

Cluster-based

MD-RAIDCluster-handled

In Cluster

External

* mandatory

Fencing Phenomenology

7

Cluster and Fencing TopologyTwo Sites Scale-Up

2 nodes, 2 FC SBDs

2 nodes, 2 iSCSI SBDs

Three Sites Scale-Up

2 nodes, 3 FC SBDs

2 nodes, 2 FC SBDs, 1 iSCSI SBD

2 nodes, 3 iSCSI SBDs

( 2 nodes, 1 iSCSI SBD )

2+1 nodes, diskless SBD

Three Sites Scale-Up

2xN+1 nodes, 3 FC SBDs

2xN+1 nodes, 2 FC SBDs, 1 iSCSI SBD

2xN+1 nodes, 3 iSCSI SBDs

2xN+1 nodes, diskless SBD

8

It's good NOT to do:- directly re-use concepts from other cluster solutions- cluster resource, STONITH, and SBD timings shorter than SAN timings- OCFS2 if no concurrent access is needed- without stonith at all- manually changing status of cluster-controlled resources - other software use the watchdog in parallel to SBD- issue commands to cluster while it is in transition - go live without tests planned and done

9

+ two independent LAN links for cluster communication

+ two or three SBD devices, or diskless SBD

+ adapt resource time-outs to infrastructure. e.g. SAN MPIO or VMotion

+ make CIB simple, e.g. few groups instead of many constraints

+ resource naming schema, e.g. prefixes rsc_, msl_, grp_, ord_, loc_, col_

+ set up cluster step-by-step

+ use crm

+ always issue crm unmigrate after migration has completed

+ check cluster for clean idle state before triggering actions

+ be patient, respect cluster timings

+ define and perform tests for all failure scenarios

It's good to do:

10

Where to lookfor what

11

● Config files /etc/hosts /etc/ntp.conf /etc/multipath.conf

/etc/modules-load.d/ /etc/sysconfig/sbd

/etc/corosync/corosync.conf /etc/sudoers /usr/sap/sapservices /usr/sap/$SID/SYS/profile/$SID_HDB$nr_$host /usr/sap/$SID/SYS/global/hdb/custom/config/global.ini /hana/shared/myHooks/SAPHanaSR.py

● Log files /var/log/messages /var/lib/pacemaker/pengine/pe-input-*.bz2 /usr/sap/$SID/HDB$nr/$host/trace/

See man sbd, stonith_sbd, crm_no_quorum_policy, sudoers, multipath.conf, corosync.conf,SAPHanaSR-ScaleOut_basic_cluster

Config and Log Files

12

# sg_persist --read-reservation --device=/dev/...# cs_show_hana_autofailover --all# cs_show_error_patterns -c | grep -v ”=.0”# cs_show_hana_info --info $SID $nr# cs_show_memory# cs_sum_base_config# rear --help

~> sapcontrol -nr $nr -function StartSystem~> sapcontrol -nr $nr -function StopSystem ALL~> sapcontrol -nr $nr -function GetSystemInstanceList~> hdbnsutil -sr_state~> HDBsettings.sh systemOverview.py~> HDBsettings.sh systemReplicationStatus.py~> HDBsettings.sh landscapeHostConfiguration.py

Useful Commands online - plain

13

~> HDBSettings.sh landscapeHostConfiguration.py; echo RC:$?| Host | Host | Host | Failov| Remove | Stor | Stor | Failov | Failov | NameSrv | NameSrv | IndexSrv | IndexSrv | Host | Host | ...| | Actv | Status | Status| Status | Config| Actual| Config | Actual | Config | Actual | Config | Actual | Config | Actual || | | | | | Part | Part | Group | Group | Role | Role | Role | Role | Roles | Roles || ----- | ---- | ------ | ----- | ------ | ----- | ----- | ------ | ------ | -------- | ------- | -------- | -------- | ------- | ------- || db101 | yes | ok | | | 1 | 1 | deflt | deflt | master 1 | master | worker | master | worker | worker || db102 | yes | ok | | | 2 | 2 | deflt | deflt | master 2 | slave | worker | slave | worker | worker || db103 | yes | ignore | | | 0 | 0 | deflt | deflt | master 3 | slave | standby | standby | standby | standby | ...overall host status: okRC:4 ~> HDBSettings.sh systemReplicationStatus.py; echo RC:$?| Database | Host | Port| ServiceName | VolumeID | SiteID | SiteName | Secondary | Sec | Sec | Sec | Sec | Repl | Repl | Repl | | | | | | | | | Host | Port| SiteID | SiteName | Active | Mode | Status | Details | | -------- | ------| --- | ----------- | -------- | ------ | ---------| --------- | --- | ------ | -------- | ------ | ---- | ------ | ------- | | SYSTEMDB | db101 |31001| nameserver | 1 | 1 | DC1 | db401 |31001| 2 | DC2 | YES | SYNC | ACTIVE | | | P04 | db101 |31007| xsengine | 2 | 1 | DC1 | db401 |31007| 2 | DC2 | YES | SYNC | ACTIVE | | | P04 | db101 |31003| indexserver | 3 | 1 | DC1 | db401 |31003| 2 | DC2 | YES | SYNC | ACTIVE | | | P04 | db102 |31003| indexserver | 4 | 1 | DC1 | db402 |31003| 2 | DC2 | YES | SYNC | ACTIVE | |status system replication site "2": ACTIVEoverall system replication status: ACTIVE

Local System Replication State~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~mode: PRIMARYsite id: 1site name: DC1RC:15

Note: output beautified.

Note: output beautified

Example master+worker+standby

14

Useful Commands online - cluster# crm_mon -1Ar# crm configure show | grep cli-# cs_clusterstate -i# SAPHanaSR-showAttr

# SAPHanaSR-monitor

15

Example master+worker+standby# SAPHanaSR-showAttrGlobal cib-time prim sec srHook sync_state -----------------------------------------------------------global Thu Aug 1 09:39:01 2019 DC1 - SOK SOK

Sit lpt lss mns srr -------------------------------------DC1 1564645080 4 db102 P DC2 30 4 db403 S

Hosts clone_state node_state roles score --------------------------------------------------------------------------db101 online master1:master:worker:master -33333 db102 PROMOTED online master3:slave:worker:slave 110 db103 online master2:slave:standby:standby db401 online master2:master:worker:master db402 online master3:slave:worker:slave db403 online master1:slave:standby:standby mm888 online :shtdown:shtdown:shtdown

Note: output beautified, does not match other examples.

16

Useful Commands offline# cs_show_supportconfig -g $supportconfig_directory # cs_show_supportconfig -f $supportconfig_directory chk_saphana chk_sleha# cs_show_hana_autofailover_patterns --all $date message-\*# crm_simulate -S --xml-file $pengine-input# crm_mon --xml_file $pengine-input# SAPHanaSR-showAttr --sid=$SID:$nr --cib=$path/cib.xml# SAPHanaSR-replay-archive --format script $crm_report | \ SAPHana-filter --host='Hosts/$host/role' --filterDouble

17

SAPHanaSRScale-Up

Demo

18

SAPHanaSR Scale-Up Demo

19

Conclusion

20

TUT-1092Bootstrapping SLES for

SAP HANA & NetWeaver clusters with Terraform & Salt on public clouds

TUT-1212Running SAP Data Hub

on Kubernetes with SUSE CaaS Platform

BP-1209Planning, deployment,

maintenance, & operations of SAP S/4HANA

FUT-1439 SUSE Linux Enterprise

Server for SAP applications: The road ahead

TUT-1226

SAP HA on SUSE – All you need to know

HOL-1225 High Availability for

SAP application servers using ENSA2 enqueue replicationBP-1351

SUSE High Availability forSAP HANA: Tales from thereal world, tipps & tricks, &

troubleshooting

Related Sessions

21

More Informationhttps://www.suse.com/products/sles-for-saphttps://documentation.suse.com/sbp/allhttps://www.suse.com/c/tag/towardszerodowntime/https://www.suse.com/service/traininghttps://documentation.suse.com/sbp/allhttps://www.suse.com/c/saphanasr-scaleout-automating-sap-hana-system-replication-scale-installations-sles-sap-applications/https://www.suse.com/c/tag/supportconfig-analysis-sca-tools/https://software.opensuse.org/package/python-cluster-preflight-checkhttps://github.com/Thr3d/supportutils-plugin-suse-saphttps://github.com/SUSE/node_exporterhttps://github.com/SUSE/hanadb_exporterhttps://github.com/ClusterLabs/ha_cluster_exporterhttps://blogs.sap.com/2017/11/19/be-prepared-for-using-pacemaker-cluster-for-sap-hana-part-1-basics/https://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.03/en-US/a165e192ba374c2a8b17566f89fe8419.htmlhttps://www.sap.com/documents/2016/06/84ea994f-767c-0010-82c7-eda71af511fa.htmlhttps://www.b1-systems.de/

https://www.suse.com/products/sles-for-saphttps://documentation.suse.com/sbp/allhttps://www.suse.com/c/tag/towardszerodowntime/https://www.suse.com/service/traininghttps://documentation.suse.com/sbp/allhttps://www.suse.com/c/saphanasr-scaleout-automating-sap-hana-system-replication-scale-installations-sles-sap-applications/https://www.suse.com/c/tag/supportconfig-analysis-sca-tools/https://software.opensuse.org/package/python-cluster-preflight-checkhttps://github.com/Thr3d/supportutils-plugin-suse-saphttps://github.com/SUSE/node_exporterhttps://github.com/SUSE/hanadb_exporterhttps://github.com/ClusterLabs/ha_cluster_exporterhttps://blogs.sap.com/2017/11/19/be-prepared-for-using-pacemaker-cluster-for-sap-hana-part-1-basics/https://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.03/en-US/a165e192ba374c2a8b17566f89fe8419.htmlhttps://www.sap.com/documents/2016/06/84ea994f-767c-0010-82c7-eda71af511fa.htmlhttps://www.b1-systems.de/

22

Conclusion – what to take with● HANA System Replication is complex, particularly scale-out.

● Topology of cluster and fencing defines how well failures are handled.

● Understanding of resources is needed for planning, building, running and trouble shooting this clusters.

● Tools and documented procedures can help. E.g. SAPHanaSR-showAttr, SAPHanaSR-replay-archive.

● Cluster and maintenance procedures have to be tested carefully before going live.

● Professional services are available, e.g. review before going live.

23

Q&A

24

General Disclaimer

This document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for SUSE products remains at the sole discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in this presentation are trademarks or registered trademarks of SUSE, LLC, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners.

SUSE High Availability for SAP HANA...1 SUSE High Availability for SAP HANA Tales from the real...

Documents

Transcript of SUSE High Availability for SAP HANA...1 SUSE High Availability for SAP HANA Tales from the real...