IDUG Db2 Tech Conference Rotterdam, Netherlands October ......Clients VIP HA Standby database server...

68
IDUG Db2 Tech Conference Rotterdam, Netherlands | October 20-24, 2019 Session code: Theory to Practice: HADR in the Real World Ember Crooks Db2 LUW

Transcript of IDUG Db2 Tech Conference Rotterdam, Netherlands October ......Clients VIP HA Standby database server...

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Session code:

    Theory to Practice: HADR in the Real World

    Ember Crooks

    Db2 LUW

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Agenda

    • Understand the four SYNCMODES and when they are appropriate• See the required and suggested parameters related to HADR• Learn about the HADR tools and what they can do for you• Understand the role of TSAMP, MSCS, or another tool in automating

    HADR failover• Learn from real world mistakes and problems

    2

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    HADR Basics

    3

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    DRHA

    An HADR Solution

    4

    Primary database

    server

    Clients

    VIP

    HA Standby database

    server

    DR Primary database

    server

    DR Standby database

    server

    Time Delay?

    HADR - Private or public network

    HADRPrivate

    or public network

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    HADR Ecosystem

    5

    • Provides retry functionality for some applicationsACR

    • Provides a heartbeat monitor• Detects failures• Fails database over• Can manage a Virtual IP

    address (VIP)

    TSAMP

    • Keeps data synchronized between 2-4 servers

    • Nothing is shared between the servers

    HADR

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Availability (Planned and Unplanned)Level of Availability Percent of Uptime Downtime per Year Downtime per

    MonthCost Db2 Solutions

    One Nine 90% 36.5 days 73 hours $ Standard Db2

    Two Nines 99% 3.65 days 7.3 hours $$ HADR

    Three Nines 99.9% 8.76 hours 43.8 minutes $$$ HADR

    Four Nines 99.99% 52.6 minutes 4.38 minutes $$$$ pureScale + HADR

    Five Nines 99.999% 5.25 minutes 26.25 seconds $$$$$ pureScale + Replication

    6

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Reads On Standby (ROS)

    • Allows read-only access to the database on the standby• Often not suitable for full reporting• Many restrictions

    7

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Reads on Standby (ROS) - Restrictions

    • Standby only has same indexes as primary• Temporary tables cannot be created• Query execution on the standby not possible

    during any of the following:• All DDL operation(add index, alter table, etc)• Runstats/Reorg• Table move

    • Explain not available on standby• Non-Inlined LOBs(default) cannot be queried

    8

    Modpack 4 of 11.1 (11.1.4.4) or 11.5 significantly decreases the impact of some of these restrictions. They are limited to the table affected only instead of the entire database.

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    HADR Licensing

    • Included in every Db2 edition• Primary server licensed normally• Standby

    • If ROS not used: 100 PVU• If ROS used: Fully licensed

    • Licensing may change with new versions, but usually seems to do so for the better

    9

    Please verify licensing with IBM before relying on this information!

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    HADR Synchronization Modes

    10

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Non-HADR Commit Processing

    11

    DB Server

    CommitLog

    Buffer

    1

    2 3

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    HADR Send Buffer

    Primary DB Server

    HADR Receive Buffer

    Standby DB Server

    Network

    Commit

    Log Buffer

    Log Buffer

    Work is done on standby and primary in parallel.

    HADR MODES - SYNC

    12

    2

    1

    3

    4

    5

    6

    7

    8

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    HADR Send Buffer

    Primary DB Server

    HADR Receive Buffer

    Standby DB Server

    Network

    Commit

    Log Buffer

    Log Buffer

    Work is done on standby and primary in parallel..

    HADR MODES - NEARSYNC

    13

    2

    1

    3

    4

    5

    6

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    HADR Send Buffer

    Primary DB Server

    HADR Receive Buffer

    Standby DB Server

    Network

    Commit

    Log Buffer

    Log Buffer

    HADR MODES - ASYNC

    14

    2

    13

    4

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    HADR Send Buffer

    Primary DB Server

    HADR Receive Buffer

    Standby DB Server

    Network

    Commit

    Log Buffer

    Log Buffer

    HADR MODES - SUPERASYNC

    15

    12

    3

    No acknowledgement whatsoever.

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Disaster RecoveryHigh AvailabilityHADR MODES

    SYNC

    Very low chance of data loss

    High chance of performance impact

    Log writes are considered successful only when they have

    been written to the log files on disk for both database servers

    NEARSYNC

    Low chance of data loss

    Some chance of performance impact

    Log writes are considered successful only when they have

    been written to disk on the primary and memory on the standby

    ASYNC

    Some chance of data loss

    Low chance of performance impact

    Log writes are considered successful only when they have

    been written to disk on the primary and have been delivered to the TCP

    layer on the primary.

    SUPERASYNC

    Higher risk of data loss

    No Chance of Performance Impact

    Log writes are considered successful as soon as they are written to disk on the primary.

    There is no verification with the standby.

    16

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    HADR Setup

    17

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    HADR Database Servers

    • Identical servers when possible• Operating systems may be at different levels for short periods• File systems must be identical in most aspects

    • Instance path must be the same• If possible, make the database path the same• Table space paths should be the same, or relative to the database path• Log paths can be different, if they must be

    • Version of Db2 must be identical• Fix pack on standby may be later than the primary for short periods

    18

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    HADR DB CFG Parameters

    $ db2 get db cfg for dbname |grep HADRHADR database role = PRIMARYHADR local host name (HADR_LOCAL_HOST) = server1HADR local service name (HADR_LOCAL_SVC) = db2_hadrpHADR remote host name (HADR_REMOTE_HOST) = server2HADR remote service name (HADR_REMOTE_SVC) = db2_hadrsHADR instance name of remote server (HADR_REMOTE_INST) = db2inst1HADR timeout value (HADR_TIMEOUT) = 120HADR target list (HADR_TARGET_LIST) = server2:db2_hadrp|server3:db2_hadraHADR log write synchronization mode (HADR_SYNCMODE) = NEARSYNCHADR spool log data limit (4KB) (HADR_SPOOL_LIMIT) = AUTOMATICHADR log replay delay (seconds) (HADR_REPLAY_DELAY) = 0HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 300

    Block non logged operations (BLOCKNONLOGGED) = YESIndex re-creation time and redo index build (INDEXREC) = RESTARTLog pages during index build (LOGINDEXBUILD) = ON

    19

    Also:

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    HADR Time Parameters

    • HADR_TIMEOUT• Seconds before Db2 considers the two database servers disconnected• Should be long enough to allow for short network glitches• Default is 120, and that works in most environments

    • HADR_PEER_WINDOW• Seconds after HADR becomes disconnected where Db2 still waits for an

    appropriate response from the standby before committing transactions• Should be long enough to allow failure to be detected and failover to occur• Only applies to SYNC or NEARSYNC• 300 is a good starting point, lower works in some environments

    • Max failover time: HADR_TIMEOUT plus HADR_PEER_WINDOW20

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    HADR in Non-Production Environments

    • Load Test environments should always include HADR as close to production configuration as possible

    • At least one non-production environment should use HADR for testing of upgrades and any other changes, particularly when DBAs are learning HADR.

    • Not every non-production environment needs to include HADR.

    21

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    HADR Problems

    Set-UpTypo

    Network

    Host name

    OngoingAlways check HADR status 5 minutes

    after starting

    Monitor HADR, as it can go down silently

    Unsupported loads

    Non-logged actions

    22

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    HADR Complications

    • LOAD operations• Replication (CDC or Q-Rep)• Values stored in the database specific to the database server, that

    change for a different database server• Changes to server names and server IP addresses• Situations that affect availability or connectivity to both the active

    primary and the principal standby at the same time• Problematic networks

    23

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    HADR TOOLS

    24

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    HADR Tools

    • Used together, these three tools can help you determine optimal network settings and optimal HADR mode for a specific environment

    HADR Simulator

    HADR Log Scanner

    HADR Calculator

    25

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    HADR Simulator

    • Simulates HADR on one or two servers• Does not require Db2 to be installed• Measures disk speed• Measures network speed• Performs name resolution the same way HADR would• Binary executable• Ideally run on the same server Db2 runs on or that you plan to run

    Db2 on

    26

    simhadr_aixsimhadr_aix53simhadr_hpiasimhadr_linuxsimhadr_linux32simhadr_linux_zsimhadr_linuxppcsimhadr_sunsimhadr_sunx86simhadr_win.exesimhadr_win64.exe

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Db2 Log Scanner

    • Measures and describes the transactional log volume in your log files • Only works for uncompressed log files• Only works for unencrypted databases• Only really good for providing input to the HADR Calculator• Can be run on any computer you copy transaction log files to• Binary executable

    27

    db2logscan_aixdb2logscan_linuxdb2logscan_win64.exe

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    HADR Calculator

    • Takes input from the HADR Simulator and the Db2 Log Scanner to determine if HADR would negatively impact performance

    • Perl script• No need to run it on database servers

    28

    hadrCalculator.pl

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    HADR Calculator OutputhadrCalculator.pl: Network speed 207 MB/s, 0.1 second round trip timehadrCalculator.pl: Disk speed 3.6 MB/s, 0.003651 second overhead per write

    This machine is small endian.Environment variable TZ is not set.Timezone is PST/PDT, 8.0 hours west of GMT, has_daylight_rule=1Current local time is 2014-12-07 13:29:14

    Reading file list from "logfiles.list"

    File "/db2fs/db2instd/NODE0000/SQL00001/SQLOGDIR/S0000000.LOG"

    2014-12-07 02:09:58 0.074 MB/s, 61 sec, 1.7 pg/f, 0.092564 sec/f, 163.8 pg/tr, 8.714286 sec/tr, 20.333333 sec/cmt, nOpenTrans 3.7actual 0.074 MB/s@ 1 pg/f 0.092564 s/fSYNC ? 0.080 MB/s@ 2 pg/f 0.112315 s/f, min 0.060 MB/s@ 1 pg/f 0.111023 s/f, max 1.651 MB/s@ 606 pg/f 1.433973 s/fNEARSYNC ? 0.089 MB/s@ 2 pg/f 0.100043 s/f, min 0.066 MB/s@ 1 pg/f 0.100032 s/f, max 3.580 MB/s@ 606 pg/f 0.661268 s/fASYNC 1.208 MB/s@ 1 pg/f 0.005496 s/f, min 1.208 MB/s@ 1 pg/f 0.005496 s/f, max 3.580 MB/s@ 606 pg/f 0.661268 s/f

    2014-12-07 02:11:35 0.005 MB/s, 97 sec, 1.5 pg/f, 1.227848 sec/f, 57.5 pg/tr, 21.000000 sec/tr, 48.500000 sec/cmt, nOpenTrans 5.5actual 0.005 MB/s@ 1 pg/f 1.227848 s/fSYNC 0.053 MB/s@ 1 pg/f 0.110586 s/f, min 0.053 MB/s@ 1 pg/f 0.110586 s/f, max 1.545 MB/s@ 316 pg/f 0.799576 s/fNEARSYNC 0.059 MB/s@ 1 pg/f 0.100028 s/f, min 0.059 MB/s@ 1 pg/f 0.100028 s/f, max 3.562 MB/s@ 316 pg/f 0.346804 s/fASYNC 1.110 MB/s@ 1 pg/f 0.005279 s/f, min 1.110 MB/s@ 1 pg/f 0.005279 s/f, max 3.562 MB/s@ 316 pg/f 0.346804 s/f

    2014-12-07 02:13:15 0.006 MB/s, 100 sec, 1.5 pg/f, 0.990099 sec/f, 150.1 pg/tr, 155.000000 sec/tr, 100.000000 sec/cmt, nOpenTrans 5.5actual 0.006 MB/s@ 1 pg/f 0.990099 s/fSYNC 0.053 MB/s@ 1 pg/f 0.110586 s/f, min 0.053 MB/s@ 1 pg/f 0.110586 s/f, max 1.684 MB/s@ 825 pg/f 1.914439 s/fNEARSYNC 0.059 MB/s@ 1 pg/f 0.100028 s/f, min 0.059 MB/s@ 1 pg/f 0.100028 s/f, max 3.585 MB/s@ 825 pg/f 0.899430 s/fASYNC 1.110 MB/s@ 1 pg/f 0.005279 s/f, min 1.110 MB/s@ 1 pg/f 0.005279 s/f, max 3.585 MB/s@ 825 pg/f 0.899430 s/f

    29

    ??????

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Automating Takeover

    30

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Methods to Automate Takeover

    • TSAMP• Included with most Db2 licenses

    • Other options (require custom scripting and setup)• HACMP/Power-HA• Linux-HA• RHCS• MSCS

    31

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Automation Concepts

    • Heartbeat• Detect if Db2 is down and try to restart it locally• Detect if Db2 is down on the other server and takeover HADR if so

    • Takeover• Requires HADR_PEER_WINDOW to be properly set• Must be detected and acted on • Should use the PEER WINDOW ONLY syntax on any forced takeover• Only automated between two HA servers, not for DR

    32

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    TSAMP: Overview of Setup Process

    Make design decisions

    Set up HADR

    TSAMP preparation

    Set up TSAMP with db2haicu

    Test failover scenarios33

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    TSAMP: Decisions

    Will you have a Virtual IP for the database connection?

    IP Address

    Subnet mask

    What will you use for a network Quorum device?

    Pingable IP address used as a tiebreaker

    Server that is very highly available such

    as a domain controller.

    Is there a private network between the two database servers that

    should be used by HADR and as a part of automated failover?

    Which network cards are associated with

    with which networks, by name

    34

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    TSA Inputs

    Required:

    • Public IP address of both database servers

    • Fully qualified host names of both database servers

    • IP address of the quorum device• Names of network cards associated

    with the public network on the servers IF there is more than one network card on each server(for example, ‘eth0’)

    Optional:

    • Virtual IP address if using one, along with the subnet mask

    • Private IP addresses, if private network being used in addition to the public network

    • Name of network cards for the private network, if one is being used (for example, ‘eth0’)

    35

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Reference: Detailed Blog Series on How to Set Up TSAMP

    • Preparation:• https://datageek.blog/2012/04/09/using-tsadb2haicu-to-automate-failover-part-1-the-preparation/

    • Normal Setup:• https://datageek.blog/2012/04/17/using-tsadb2haicu-to-automate-failover-part-2-how-it-looks-if-it-goes-

    smoothly/• Additional Configuration Steps:

    • https://datageek.blog/2018/08/16/using-tsa-db2haicu-to-automate-failover-part-5-additional-configuration-best-practices/

    • Problem Solving Techniques:• https://datageek.blog/2012/09/04/using-tsadb2haicu-to-automate-failover-part-3-testing-ways-setup-can-go-

    wrong-and-what-to-do/• Problems After Setup:

    • https://datageek.blog/2013/01/30/using-tsadb2haicu-to-automate-failover-part-4-dealing-with-problems-after-setup/

    36

    https://datageek.blog/2012/04/09/using-tsadb2haicu-to-automate-failover-part-1-the-preparation/https://datageek.blog/2012/04/17/using-tsadb2haicu-to-automate-failover-part-2-how-it-looks-if-it-goes-smoothly/https://datageek.blog/2018/08/16/using-tsa-db2haicu-to-automate-failover-part-5-additional-configuration-best-practices/https://datageek.blog/2012/09/04/using-tsadb2haicu-to-automate-failover-part-3-testing-ways-setup-can-go-wrong-and-what-to-do/https://datageek.blog/2013/01/30/using-tsadb2haicu-to-automate-failover-part-4-dealing-with-problems-after-setup/

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Real-World Problems and Mistakes

    37

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Real-World Problems

    1. Were you TRYING to cause split brain?2. BLOCKNONLOGGED and invalid tables on the standby3. Improper configuration for multi-standby cross-datacenter

    configurations4. Constant congestion

    38

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Split BrainTwo disconnected databases, each taking client traffic

    39

    Primary database server

    Standby database server

    Private or public network

    Clients

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Real World Problem #1 – Were you TRYING to cause split brain?• Environment:

    • Four-server HADR• Two HA database servers in one data center• Two DR database servers in another data center• Db2 10.5• Environment established for at least 6 months

    • Situation:• OS patching requires reboot of both servers in the DR data Center• Under-experienced DBAs performing the work• Mis-communication of what is required or complete lack of understanding of

    what is required40

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    DR Data CenterHA Data Center

    Real World Problem #1 – Diagram of environment

    41

    SERVER #1 Primary db

    Clients

    VIP

    SERVER #2 HA Principal Standby db

    SERVER #3 Auxiliary

    Standby db

    SERVER #4 Auxiliary

    Standby db

    HADR – shared network

    HADRshared

    networkPRIMARY

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    DR Data CenterHA Data Center

    Real World Problem #1 – What Should Have Happened

    42

    SERVER #1 Primary db

    Clients

    VIP

    SERVER #2 HA Principal Standby db

    SERVER #3 Auxiliary

    Standby db

    SERVER #4 Auxiliary

    Standby db

    HADR – shared network

    HADRshared

    networkPRIMARY

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    DR Data CenterHA Data Center

    Real World Problem #1 – What Actually Happened

    43

    SERVER #1 Primary db

    Clients

    VIP

    SERVER #2 HA Principal Standby db

    SERVER #3 Auxiliary

    Standby db

    SERVER #4 Auxiliary

    Standby db

    HADR – shared network

    HADRshared

    networkPRIMARYPRIMARY

    ??

    Outage #1 – 45 Minutes, Middle of Night

    Outage #2 – 2 hours, Business Day

    Outage #3 (DR only) – DR Down for Multiple Days due to Fear

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Real World Problem #1 – Text Description of What Should Have Happened• Deactivate databases on #3 and #4• Stop Db2 on #3 and #4• Patch the OS and reboot #3 and #4• Ensure Db2 started on #3 and #4• Activate Db2 (to start HADR) on #3 and #4• Verify HADR catches up

    44

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Real World Problem #1 – Text Description of What Actually Happened

    • TAKEOVER command issued from server #3 in the DR data center. #3 becomes primary • Db2 was stopped on server #3 and #4 in DR data center• Db2 was started on server #1 in the primary data center by force• DR Servers were patched and rebooted• Databases were activated in the DR data center • #4 reintegrated into the cluster just fine• #3 knew it was primary when it went down, so HADR did not start• On server #3, the command was issued:

    • db2 start hadr on db dbname as primary• Server #1 detected that another server was trying to be primary, that at one time was primary, and DB2

    immediately forced all connections and stopped work to avoid split brain. A fun message that included the term “poison pill” was written to the diagnostic log

    • All DR servers were powered down, the database on #1 was force-started. • Fear kept them from being restarted for several days• #3 had to be restored to bring it back in the cluster (safe, as no transactions occurred when it was primary)

    45

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Real World Problem #1 – The Moral of the Story

    • Define procedures very well for less experienced DBAs• When a mistake occurs, stop for a minute to determine what

    happened and the best way to undo it

    46

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Real World Problem #2 – Non-Logged LOAD

    • Environment:• Two-server HADR on a PureApp appliance• Db2 10.5• BLOCKNONLOGGED set to YES• NFS-mounted filesystem called /db2copy available on both primary and standby• Scripts direct COPY YES location to /db2copy• Reg Var DB2_LOAD_COPY_NO_OVERRIDE set to ‘COPY YES to /db2copy’

    47

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Real World Problem #2 – Diagram of Environment

    48

    SERVER #1 Primary db

    Clients VIP

    SERVER #2 HA Principal Standby db

    HADRshared

    network

    PRIMARY

    /db2copy

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Real World Problem #2 – The Symptoms

    • Diagnostic log growing unreasonably on standby with these errors:

    49

    2016-02-01-00.58.49.673747-480 I5930E469 LEVEL: WarningPID : 30951 TID : 139869221283584 PROC : db2syscINSTANCE: db2inst1 NODE : 000 DB : SAMPLEAPPHDL : 0-55736 APPID: *LOCAL.DB2.151120023013HOSTNAME: server1

    EDUID : 56 EDUNAME: db2redom (SAMPLE)FUNCTION: DB2 UDB, database utilities, DIAG_NOTE, probe:0DATA #1 : String, 35 bytes

    Access not allowed 14, -2147352523

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Real World Problem #2 – Resolution

    • What was happening: User was issuing a load with ‘COPY YES to /tmp’• Solution(s)

    • Restore tablespaces in recovery pending on standby• Educate users and developers again on appropriate locations for the copy file• Implement monitor to parse the Db2 diagnostic log for these errors

    • RFE: https://ibmanalytics.ideas.aha.io/ideas/DB24LUW-I-384

    50

    https://ibmanalytics.ideas.aha.io/ideas/DB24LUW-I-384

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Real World Problem #2 –Proactive Identification

    • To be run on standby only• If ROS (Reads On Standby) are enabled

    51

    select TABSCHEMA

    , TABNAME , TABTYPE

    , AVAILABLE from TABLE(ADMIN_GET_TAB_INFO(null, null))

    where AVAILABLE='N'

    with ur

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Real World Problem #2 – The Moral of the Story

    • Educate developers and users• Monitor thoroughly to verify• Don’t ignore messages in the diagnostic log that you don’t

    understand

    52

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Real World Problem #3 – Cross-Data Center Multi-Standby

    • Environment:• Four-server HADR• Two HA database servers in one data center• Two DR database servers in another data center• Db2 10.5

    • Situation:• Load test targeting database failover to DR• New implementation, nothing production yet• Under-experienced DBAs performed the configuration

    53

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    DR Data CenterHA Data Center

    Real World Problem #3 – Diagram of environment

    54

    SERVER #1 Primary db

    Clients

    VIP

    SERVER #2 HA Principal Standby db

    SERVER #3 Auxiliary

    Standby db

    SERVER #4 Auxiliary

    Standby db

    HADR – shared network

    HADRshared

    networkPRIMARY Principal

    Standby

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    DR Data CenterHA Data Center

    Real World Problem #3 – What Should have Happened

    55

    SERVER #1 Primary db

    Clients

    VIP

    SERVER #2 HA Principal Standby db

    SERVER #3 Auxiliary

    Standby db

    SERVER #4 Auxiliary

    Standby db

    HADR – shared network

    HADRshared

    networkPRIMARY PrincipalStandby

    HADRshared

    network

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    DR Data CenterHA Data Center

    Real World Problem #3 – What Actually Happened

    56

    SERVER #1 Primary db

    Clients

    VIP

    SERVER #2 HA Principal Standby db

    SERVER #3 Auxiliary

    Standby db

    SERVER #4 Auxiliary

    Standby db

    HADR – shared network

    HADRshared

    networkPRIMARY PrincipalStandby

    HADRshared

    network

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Real World Problem #3 – Text Description of What Should have Happened• Planned load test against database running in the DR data center1. #3 takes over the primary role2. #4 becomes principal standby (NEARSYNC)3. Load test run from application servers in the DR data center

    57

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Real World Problem #3 – Text Description of What Actually Happened• Planned load test against database running in the DR data center1. #3 takes over the primary role2. #2 stays principal standby (NEARSYNC)3. Load test run from application servers in the DR data center4. NEARSYNC slows down database processing due to the slower link

    between servers in different data centers

    58

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Real World Problem #3 – The Moral of the Story

    • Always test takeovers• Load test takeovers• Verify configuration

    59

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Real World Problem #4 – Congestion

    • Environment:• Three-server HADR• Two HA database servers in one data center• One DR database servers in another data center• Db2 11.1• All SYNCMODES are SUPERASYNC

    • Situation:• Frequent monitoring of HADR• A lot of congestion

    60

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    DR Data CenterHA Data Center

    Real World Problem #4 – Diagram of environment

    61

    Primary db

    Clients

    VIP

    STANDBY #1 HA Principal Standby db

    STANDBY #2 Auxiliary

    Standby db

    HADR – shared network

    HADRshared

    networkPRIMARY Principal

    Standby

    SUPERASYNC for both standbys

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Real World Problem #4 – Congestion and Log Gap on Standby #1

    62

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Real World Problem #4 – Congestion and Log Gap on Standby #2

    63

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Real World Problem #4 – Congestion and Log Gap on Standby #1 Before and After Meeting with SA/Network

    64

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Real World Problem #4 – Congestion and Log Gap on Standby #1 Before and After Meeting with SA/Network

    65

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Real World Problem #4 – Text Description of What Happened• Health Check of HADR to try to determine issues• Started to investigate• Had a meeting with SA and Network people on September 13• Less than a week later, the problem disappears• No one admits to changing anything • 6 months later, the problem still has not returned

    66

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Real World Problem #4 – The Moral of the Story

    • Don’t be afraid to engage experts in other areas

    67

  • IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019

    Questions?

    • Contact me:• Twitter: @ember_crooks• [email protected]• https://datageek.blog• LinkedIn: https://www.linkedin.com/in/ember-crooks-25aa9b8/

    68

    http://gmail.comhttps://datageek.blog/https://www.linkedin.com/in/ember-crooks-25aa9b8/