IDUG Db2 Tech Conference Rotterdam, Netherlands October ......Clients VIP HA Standby database server...
Transcript of IDUG Db2 Tech Conference Rotterdam, Netherlands October ......Clients VIP HA Standby database server...
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Session code:
Theory to Practice: HADR in the Real World
Ember Crooks
Db2 LUW
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Agenda
• Understand the four SYNCMODES and when they are appropriate• See the required and suggested parameters related to HADR• Learn about the HADR tools and what they can do for you• Understand the role of TSAMP, MSCS, or another tool in automating
HADR failover• Learn from real world mistakes and problems
2
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
HADR Basics
3
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
DRHA
An HADR Solution
4
Primary database
server
Clients
VIP
HA Standby database
server
DR Primary database
server
DR Standby database
server
Time Delay?
HADR - Private or public network
HADRPrivate
or public network
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
HADR Ecosystem
5
• Provides retry functionality for some applicationsACR
• Provides a heartbeat monitor• Detects failures• Fails database over• Can manage a Virtual IP
address (VIP)
TSAMP
• Keeps data synchronized between 2-4 servers
• Nothing is shared between the servers
HADR
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Availability (Planned and Unplanned)Level of Availability Percent of Uptime Downtime per Year Downtime per
MonthCost Db2 Solutions
One Nine 90% 36.5 days 73 hours $ Standard Db2
Two Nines 99% 3.65 days 7.3 hours $$ HADR
Three Nines 99.9% 8.76 hours 43.8 minutes $$$ HADR
Four Nines 99.99% 52.6 minutes 4.38 minutes $$$$ pureScale + HADR
Five Nines 99.999% 5.25 minutes 26.25 seconds $$$$$ pureScale + Replication
6
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Reads On Standby (ROS)
• Allows read-only access to the database on the standby• Often not suitable for full reporting• Many restrictions
7
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Reads on Standby (ROS) - Restrictions
• Standby only has same indexes as primary• Temporary tables cannot be created• Query execution on the standby not possible
during any of the following:• All DDL operation(add index, alter table, etc)• Runstats/Reorg• Table move
• Explain not available on standby• Non-Inlined LOBs(default) cannot be queried
8
Modpack 4 of 11.1 (11.1.4.4) or 11.5 significantly decreases the impact of some of these restrictions. They are limited to the table affected only instead of the entire database.
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
HADR Licensing
• Included in every Db2 edition• Primary server licensed normally• Standby
• If ROS not used: 100 PVU• If ROS used: Fully licensed
• Licensing may change with new versions, but usually seems to do so for the better
9
Please verify licensing with IBM before relying on this information!
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
HADR Synchronization Modes
10
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Non-HADR Commit Processing
11
DB Server
CommitLog
Buffer
1
2 3
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
HADR Send Buffer
Primary DB Server
HADR Receive Buffer
Standby DB Server
Network
Commit
Log Buffer
Log Buffer
Work is done on standby and primary in parallel.
HADR MODES - SYNC
12
2
1
3
4
5
6
7
8
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
HADR Send Buffer
Primary DB Server
HADR Receive Buffer
Standby DB Server
Network
Commit
Log Buffer
Log Buffer
Work is done on standby and primary in parallel..
HADR MODES - NEARSYNC
13
2
1
3
4
5
6
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
HADR Send Buffer
Primary DB Server
HADR Receive Buffer
Standby DB Server
Network
Commit
Log Buffer
Log Buffer
HADR MODES - ASYNC
14
2
13
4
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
HADR Send Buffer
Primary DB Server
HADR Receive Buffer
Standby DB Server
Network
Commit
Log Buffer
Log Buffer
HADR MODES - SUPERASYNC
15
12
3
No acknowledgement whatsoever.
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Disaster RecoveryHigh AvailabilityHADR MODES
SYNC
Very low chance of data loss
High chance of performance impact
Log writes are considered successful only when they have
been written to the log files on disk for both database servers
NEARSYNC
Low chance of data loss
Some chance of performance impact
Log writes are considered successful only when they have
been written to disk on the primary and memory on the standby
ASYNC
Some chance of data loss
Low chance of performance impact
Log writes are considered successful only when they have
been written to disk on the primary and have been delivered to the TCP
layer on the primary.
SUPERASYNC
Higher risk of data loss
No Chance of Performance Impact
Log writes are considered successful as soon as they are written to disk on the primary.
There is no verification with the standby.
16
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
HADR Setup
17
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
HADR Database Servers
• Identical servers when possible• Operating systems may be at different levels for short periods• File systems must be identical in most aspects
• Instance path must be the same• If possible, make the database path the same• Table space paths should be the same, or relative to the database path• Log paths can be different, if they must be
• Version of Db2 must be identical• Fix pack on standby may be later than the primary for short periods
18
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
HADR DB CFG Parameters
$ db2 get db cfg for dbname |grep HADRHADR database role = PRIMARYHADR local host name (HADR_LOCAL_HOST) = server1HADR local service name (HADR_LOCAL_SVC) = db2_hadrpHADR remote host name (HADR_REMOTE_HOST) = server2HADR remote service name (HADR_REMOTE_SVC) = db2_hadrsHADR instance name of remote server (HADR_REMOTE_INST) = db2inst1HADR timeout value (HADR_TIMEOUT) = 120HADR target list (HADR_TARGET_LIST) = server2:db2_hadrp|server3:db2_hadraHADR log write synchronization mode (HADR_SYNCMODE) = NEARSYNCHADR spool log data limit (4KB) (HADR_SPOOL_LIMIT) = AUTOMATICHADR log replay delay (seconds) (HADR_REPLAY_DELAY) = 0HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 300
Block non logged operations (BLOCKNONLOGGED) = YESIndex re-creation time and redo index build (INDEXREC) = RESTARTLog pages during index build (LOGINDEXBUILD) = ON
19
Also:
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
HADR Time Parameters
• HADR_TIMEOUT• Seconds before Db2 considers the two database servers disconnected• Should be long enough to allow for short network glitches• Default is 120, and that works in most environments
• HADR_PEER_WINDOW• Seconds after HADR becomes disconnected where Db2 still waits for an
appropriate response from the standby before committing transactions• Should be long enough to allow failure to be detected and failover to occur• Only applies to SYNC or NEARSYNC• 300 is a good starting point, lower works in some environments
• Max failover time: HADR_TIMEOUT plus HADR_PEER_WINDOW20
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
HADR in Non-Production Environments
• Load Test environments should always include HADR as close to production configuration as possible
• At least one non-production environment should use HADR for testing of upgrades and any other changes, particularly when DBAs are learning HADR.
• Not every non-production environment needs to include HADR.
21
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
HADR Problems
Set-UpTypo
Network
Host name
OngoingAlways check HADR status 5 minutes
after starting
Monitor HADR, as it can go down silently
Unsupported loads
Non-logged actions
22
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
HADR Complications
• LOAD operations• Replication (CDC or Q-Rep)• Values stored in the database specific to the database server, that
change for a different database server• Changes to server names and server IP addresses• Situations that affect availability or connectivity to both the active
primary and the principal standby at the same time• Problematic networks
23
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
HADR TOOLS
24
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
HADR Tools
• Used together, these three tools can help you determine optimal network settings and optimal HADR mode for a specific environment
HADR Simulator
HADR Log Scanner
HADR Calculator
25
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
HADR Simulator
• Simulates HADR on one or two servers• Does not require Db2 to be installed• Measures disk speed• Measures network speed• Performs name resolution the same way HADR would• Binary executable• Ideally run on the same server Db2 runs on or that you plan to run
Db2 on
26
simhadr_aixsimhadr_aix53simhadr_hpiasimhadr_linuxsimhadr_linux32simhadr_linux_zsimhadr_linuxppcsimhadr_sunsimhadr_sunx86simhadr_win.exesimhadr_win64.exe
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Db2 Log Scanner
• Measures and describes the transactional log volume in your log files • Only works for uncompressed log files• Only works for unencrypted databases• Only really good for providing input to the HADR Calculator• Can be run on any computer you copy transaction log files to• Binary executable
27
db2logscan_aixdb2logscan_linuxdb2logscan_win64.exe
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
HADR Calculator
• Takes input from the HADR Simulator and the Db2 Log Scanner to determine if HADR would negatively impact performance
• Perl script• No need to run it on database servers
28
hadrCalculator.pl
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
HADR Calculator OutputhadrCalculator.pl: Network speed 207 MB/s, 0.1 second round trip timehadrCalculator.pl: Disk speed 3.6 MB/s, 0.003651 second overhead per write
This machine is small endian.Environment variable TZ is not set.Timezone is PST/PDT, 8.0 hours west of GMT, has_daylight_rule=1Current local time is 2014-12-07 13:29:14
Reading file list from "logfiles.list"
File "/db2fs/db2instd/NODE0000/SQL00001/SQLOGDIR/S0000000.LOG"
2014-12-07 02:09:58 0.074 MB/s, 61 sec, 1.7 pg/f, 0.092564 sec/f, 163.8 pg/tr, 8.714286 sec/tr, 20.333333 sec/cmt, nOpenTrans 3.7actual 0.074 MB/s@ 1 pg/f 0.092564 s/fSYNC ? 0.080 MB/s@ 2 pg/f 0.112315 s/f, min 0.060 MB/s@ 1 pg/f 0.111023 s/f, max 1.651 MB/s@ 606 pg/f 1.433973 s/fNEARSYNC ? 0.089 MB/s@ 2 pg/f 0.100043 s/f, min 0.066 MB/s@ 1 pg/f 0.100032 s/f, max 3.580 MB/s@ 606 pg/f 0.661268 s/fASYNC 1.208 MB/s@ 1 pg/f 0.005496 s/f, min 1.208 MB/s@ 1 pg/f 0.005496 s/f, max 3.580 MB/s@ 606 pg/f 0.661268 s/f
2014-12-07 02:11:35 0.005 MB/s, 97 sec, 1.5 pg/f, 1.227848 sec/f, 57.5 pg/tr, 21.000000 sec/tr, 48.500000 sec/cmt, nOpenTrans 5.5actual 0.005 MB/s@ 1 pg/f 1.227848 s/fSYNC 0.053 MB/s@ 1 pg/f 0.110586 s/f, min 0.053 MB/s@ 1 pg/f 0.110586 s/f, max 1.545 MB/s@ 316 pg/f 0.799576 s/fNEARSYNC 0.059 MB/s@ 1 pg/f 0.100028 s/f, min 0.059 MB/s@ 1 pg/f 0.100028 s/f, max 3.562 MB/s@ 316 pg/f 0.346804 s/fASYNC 1.110 MB/s@ 1 pg/f 0.005279 s/f, min 1.110 MB/s@ 1 pg/f 0.005279 s/f, max 3.562 MB/s@ 316 pg/f 0.346804 s/f
2014-12-07 02:13:15 0.006 MB/s, 100 sec, 1.5 pg/f, 0.990099 sec/f, 150.1 pg/tr, 155.000000 sec/tr, 100.000000 sec/cmt, nOpenTrans 5.5actual 0.006 MB/s@ 1 pg/f 0.990099 s/fSYNC 0.053 MB/s@ 1 pg/f 0.110586 s/f, min 0.053 MB/s@ 1 pg/f 0.110586 s/f, max 1.684 MB/s@ 825 pg/f 1.914439 s/fNEARSYNC 0.059 MB/s@ 1 pg/f 0.100028 s/f, min 0.059 MB/s@ 1 pg/f 0.100028 s/f, max 3.585 MB/s@ 825 pg/f 0.899430 s/fASYNC 1.110 MB/s@ 1 pg/f 0.005279 s/f, min 1.110 MB/s@ 1 pg/f 0.005279 s/f, max 3.585 MB/s@ 825 pg/f 0.899430 s/f
29
??????
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Automating Takeover
30
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Methods to Automate Takeover
• TSAMP• Included with most Db2 licenses
• Other options (require custom scripting and setup)• HACMP/Power-HA• Linux-HA• RHCS• MSCS
31
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Automation Concepts
• Heartbeat• Detect if Db2 is down and try to restart it locally• Detect if Db2 is down on the other server and takeover HADR if so
• Takeover• Requires HADR_PEER_WINDOW to be properly set• Must be detected and acted on • Should use the PEER WINDOW ONLY syntax on any forced takeover• Only automated between two HA servers, not for DR
32
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
TSAMP: Overview of Setup Process
Make design decisions
Set up HADR
TSAMP preparation
Set up TSAMP with db2haicu
Test failover scenarios33
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
TSAMP: Decisions
Will you have a Virtual IP for the database connection?
IP Address
Subnet mask
What will you use for a network Quorum device?
Pingable IP address used as a tiebreaker
Server that is very highly available such
as a domain controller.
Is there a private network between the two database servers that
should be used by HADR and as a part of automated failover?
Which network cards are associated with
with which networks, by name
34
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
TSA Inputs
Required:
• Public IP address of both database servers
• Fully qualified host names of both database servers
• IP address of the quorum device• Names of network cards associated
with the public network on the servers IF there is more than one network card on each server(for example, ‘eth0’)
Optional:
• Virtual IP address if using one, along with the subnet mask
• Private IP addresses, if private network being used in addition to the public network
• Name of network cards for the private network, if one is being used (for example, ‘eth0’)
35
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Reference: Detailed Blog Series on How to Set Up TSAMP
• Preparation:• https://datageek.blog/2012/04/09/using-tsadb2haicu-to-automate-failover-part-1-the-preparation/
• Normal Setup:• https://datageek.blog/2012/04/17/using-tsadb2haicu-to-automate-failover-part-2-how-it-looks-if-it-goes-
smoothly/• Additional Configuration Steps:
• https://datageek.blog/2018/08/16/using-tsa-db2haicu-to-automate-failover-part-5-additional-configuration-best-practices/
• Problem Solving Techniques:• https://datageek.blog/2012/09/04/using-tsadb2haicu-to-automate-failover-part-3-testing-ways-setup-can-go-
wrong-and-what-to-do/• Problems After Setup:
• https://datageek.blog/2013/01/30/using-tsadb2haicu-to-automate-failover-part-4-dealing-with-problems-after-setup/
36
https://datageek.blog/2012/04/09/using-tsadb2haicu-to-automate-failover-part-1-the-preparation/https://datageek.blog/2012/04/17/using-tsadb2haicu-to-automate-failover-part-2-how-it-looks-if-it-goes-smoothly/https://datageek.blog/2018/08/16/using-tsa-db2haicu-to-automate-failover-part-5-additional-configuration-best-practices/https://datageek.blog/2012/09/04/using-tsadb2haicu-to-automate-failover-part-3-testing-ways-setup-can-go-wrong-and-what-to-do/https://datageek.blog/2013/01/30/using-tsadb2haicu-to-automate-failover-part-4-dealing-with-problems-after-setup/
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Real-World Problems and Mistakes
37
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Real-World Problems
1. Were you TRYING to cause split brain?2. BLOCKNONLOGGED and invalid tables on the standby3. Improper configuration for multi-standby cross-datacenter
configurations4. Constant congestion
38
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Split BrainTwo disconnected databases, each taking client traffic
39
Primary database server
Standby database server
Private or public network
Clients
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Real World Problem #1 – Were you TRYING to cause split brain?• Environment:
• Four-server HADR• Two HA database servers in one data center• Two DR database servers in another data center• Db2 10.5• Environment established for at least 6 months
• Situation:• OS patching requires reboot of both servers in the DR data Center• Under-experienced DBAs performing the work• Mis-communication of what is required or complete lack of understanding of
what is required40
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
DR Data CenterHA Data Center
Real World Problem #1 – Diagram of environment
41
SERVER #1 Primary db
Clients
VIP
SERVER #2 HA Principal Standby db
SERVER #3 Auxiliary
Standby db
SERVER #4 Auxiliary
Standby db
HADR – shared network
HADRshared
networkPRIMARY
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
DR Data CenterHA Data Center
Real World Problem #1 – What Should Have Happened
42
SERVER #1 Primary db
Clients
VIP
SERVER #2 HA Principal Standby db
SERVER #3 Auxiliary
Standby db
SERVER #4 Auxiliary
Standby db
HADR – shared network
HADRshared
networkPRIMARY
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
DR Data CenterHA Data Center
Real World Problem #1 – What Actually Happened
43
SERVER #1 Primary db
Clients
VIP
SERVER #2 HA Principal Standby db
SERVER #3 Auxiliary
Standby db
SERVER #4 Auxiliary
Standby db
HADR – shared network
HADRshared
networkPRIMARYPRIMARY
??
Outage #1 – 45 Minutes, Middle of Night
Outage #2 – 2 hours, Business Day
Outage #3 (DR only) – DR Down for Multiple Days due to Fear
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Real World Problem #1 – Text Description of What Should Have Happened• Deactivate databases on #3 and #4• Stop Db2 on #3 and #4• Patch the OS and reboot #3 and #4• Ensure Db2 started on #3 and #4• Activate Db2 (to start HADR) on #3 and #4• Verify HADR catches up
44
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Real World Problem #1 – Text Description of What Actually Happened
• TAKEOVER command issued from server #3 in the DR data center. #3 becomes primary • Db2 was stopped on server #3 and #4 in DR data center• Db2 was started on server #1 in the primary data center by force• DR Servers were patched and rebooted• Databases were activated in the DR data center • #4 reintegrated into the cluster just fine• #3 knew it was primary when it went down, so HADR did not start• On server #3, the command was issued:
• db2 start hadr on db dbname as primary• Server #1 detected that another server was trying to be primary, that at one time was primary, and DB2
immediately forced all connections and stopped work to avoid split brain. A fun message that included the term “poison pill” was written to the diagnostic log
• All DR servers were powered down, the database on #1 was force-started. • Fear kept them from being restarted for several days• #3 had to be restored to bring it back in the cluster (safe, as no transactions occurred when it was primary)
45
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Real World Problem #1 – The Moral of the Story
• Define procedures very well for less experienced DBAs• When a mistake occurs, stop for a minute to determine what
happened and the best way to undo it
46
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Real World Problem #2 – Non-Logged LOAD
• Environment:• Two-server HADR on a PureApp appliance• Db2 10.5• BLOCKNONLOGGED set to YES• NFS-mounted filesystem called /db2copy available on both primary and standby• Scripts direct COPY YES location to /db2copy• Reg Var DB2_LOAD_COPY_NO_OVERRIDE set to ‘COPY YES to /db2copy’
47
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Real World Problem #2 – Diagram of Environment
48
SERVER #1 Primary db
Clients VIP
SERVER #2 HA Principal Standby db
HADRshared
network
PRIMARY
/db2copy
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Real World Problem #2 – The Symptoms
• Diagnostic log growing unreasonably on standby with these errors:
49
2016-02-01-00.58.49.673747-480 I5930E469 LEVEL: WarningPID : 30951 TID : 139869221283584 PROC : db2syscINSTANCE: db2inst1 NODE : 000 DB : SAMPLEAPPHDL : 0-55736 APPID: *LOCAL.DB2.151120023013HOSTNAME: server1
EDUID : 56 EDUNAME: db2redom (SAMPLE)FUNCTION: DB2 UDB, database utilities, DIAG_NOTE, probe:0DATA #1 : String, 35 bytes
Access not allowed 14, -2147352523
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Real World Problem #2 – Resolution
• What was happening: User was issuing a load with ‘COPY YES to /tmp’• Solution(s)
• Restore tablespaces in recovery pending on standby• Educate users and developers again on appropriate locations for the copy file• Implement monitor to parse the Db2 diagnostic log for these errors
• RFE: https://ibmanalytics.ideas.aha.io/ideas/DB24LUW-I-384
50
https://ibmanalytics.ideas.aha.io/ideas/DB24LUW-I-384
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Real World Problem #2 –Proactive Identification
• To be run on standby only• If ROS (Reads On Standby) are enabled
51
select TABSCHEMA
, TABNAME , TABTYPE
, AVAILABLE from TABLE(ADMIN_GET_TAB_INFO(null, null))
where AVAILABLE='N'
with ur
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Real World Problem #2 – The Moral of the Story
• Educate developers and users• Monitor thoroughly to verify• Don’t ignore messages in the diagnostic log that you don’t
understand
52
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Real World Problem #3 – Cross-Data Center Multi-Standby
• Environment:• Four-server HADR• Two HA database servers in one data center• Two DR database servers in another data center• Db2 10.5
• Situation:• Load test targeting database failover to DR• New implementation, nothing production yet• Under-experienced DBAs performed the configuration
53
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
DR Data CenterHA Data Center
Real World Problem #3 – Diagram of environment
54
SERVER #1 Primary db
Clients
VIP
SERVER #2 HA Principal Standby db
SERVER #3 Auxiliary
Standby db
SERVER #4 Auxiliary
Standby db
HADR – shared network
HADRshared
networkPRIMARY Principal
Standby
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
DR Data CenterHA Data Center
Real World Problem #3 – What Should have Happened
55
SERVER #1 Primary db
Clients
VIP
SERVER #2 HA Principal Standby db
SERVER #3 Auxiliary
Standby db
SERVER #4 Auxiliary
Standby db
HADR – shared network
HADRshared
networkPRIMARY PrincipalStandby
HADRshared
network
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
DR Data CenterHA Data Center
Real World Problem #3 – What Actually Happened
56
SERVER #1 Primary db
Clients
VIP
SERVER #2 HA Principal Standby db
SERVER #3 Auxiliary
Standby db
SERVER #4 Auxiliary
Standby db
HADR – shared network
HADRshared
networkPRIMARY PrincipalStandby
HADRshared
network
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Real World Problem #3 – Text Description of What Should have Happened• Planned load test against database running in the DR data center1. #3 takes over the primary role2. #4 becomes principal standby (NEARSYNC)3. Load test run from application servers in the DR data center
57
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Real World Problem #3 – Text Description of What Actually Happened• Planned load test against database running in the DR data center1. #3 takes over the primary role2. #2 stays principal standby (NEARSYNC)3. Load test run from application servers in the DR data center4. NEARSYNC slows down database processing due to the slower link
between servers in different data centers
58
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Real World Problem #3 – The Moral of the Story
• Always test takeovers• Load test takeovers• Verify configuration
59
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Real World Problem #4 – Congestion
• Environment:• Three-server HADR• Two HA database servers in one data center• One DR database servers in another data center• Db2 11.1• All SYNCMODES are SUPERASYNC
• Situation:• Frequent monitoring of HADR• A lot of congestion
60
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
DR Data CenterHA Data Center
Real World Problem #4 – Diagram of environment
61
Primary db
Clients
VIP
STANDBY #1 HA Principal Standby db
STANDBY #2 Auxiliary
Standby db
HADR – shared network
HADRshared
networkPRIMARY Principal
Standby
SUPERASYNC for both standbys
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Real World Problem #4 – Congestion and Log Gap on Standby #1
62
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Real World Problem #4 – Congestion and Log Gap on Standby #2
63
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Real World Problem #4 – Congestion and Log Gap on Standby #1 Before and After Meeting with SA/Network
64
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Real World Problem #4 – Congestion and Log Gap on Standby #1 Before and After Meeting with SA/Network
65
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Real World Problem #4 – Text Description of What Happened• Health Check of HADR to try to determine issues• Started to investigate• Had a meeting with SA and Network people on September 13• Less than a week later, the problem disappears• No one admits to changing anything • 6 months later, the problem still has not returned
66
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Real World Problem #4 – The Moral of the Story
• Don’t be afraid to engage experts in other areas
67
-
IDUG Db2 Tech ConferenceRotterdam, Netherlands | October 20-24, 2019
Questions?
• Contact me:• Twitter: @ember_crooks• [email protected]• https://datageek.blog• LinkedIn: https://www.linkedin.com/in/ember-crooks-25aa9b8/
68
http://gmail.comhttps://datageek.blog/https://www.linkedin.com/in/ember-crooks-25aa9b8/