Post on 04-Jul-2020
DB2 pureScale Feature: Workload balancing, automatic
client reroute, and client affinities concepts and
administration
Frankie Sun Senior Staff Developer Jason Woods Software Developer Jo-Ann Woods Information Developer
2
Table of Contents
1 Introduction..............................................................................................4
2 IBM Data Server drivers and clients .............................................................4
3 Server lists ...............................................................................................5
4 Member priority.........................................................................................5
5 Workload balancing (WLB)..........................................................................6
5.1 Reuse permission................................................................................7
5.2 Transaction-level WLB for single-connection application processes ............7
5.3 Expectations of workload balancing.......................................................8
6 Automatic client reroute (ACR)....................................................................9
6.1 Alternate group support.....................................................................10
7 First connection redundancy .....................................................................11
8 Client affinities ........................................................................................12
8.1 Configuration ...................................................................................12
8.1.1 User-defined ordering .................................................................13
8.1.2 Round-robin ordering..................................................................14
8.1.3 Failback ....................................................................................15
9 Client affinities configuration reload ...........................................................15
9.1 Configuration ...................................................................................15
9.1.1 DB2 pureScale server connectivity configuration ............................16
9.1.2 Starting connection managers on DB2 pureScale members..............16
9.2 Advanced timeout configuration..........................................................17
9.2.1 Configuring hard time limits for establishing a connection or executing
a command or query................................................................................17
9.2.2 Improving failure-detection times.................................................21
9.2.3 Configuring ACR behavior............................................................24
10 Application considerations .....................................................................24
10.1 Dynamic SQL statements in embedded SQL applications........................25
10.2 Sequences and the PREVIOUS VALUE expression ..................................26
11 Monitoring...........................................................................................27
11.1 Monitoring transaction throughput on a member...................................27
11.1.1 Sample MON_GET_UNIT_OF_WORK function output .......................27
11.2 Monitoring the server list for a specific database...................................28
11.2.1 Sample MON_GET_SERVERLIST function output.............................28
11.2.2 Sample db2pd -serverlist command output....................................29
11.3 Monitoring CPU and memory load .......................................................30
11.3.1 Sample ENV_GET_SYSTEM_RESOURCES function output ................30
12 Troubleshooting ...................................................................................31
12.1 Transaction-level WLB does not appear to be enabled even though it is
configured at the driver or client...................................................................31
12.2 With transaction-level WLB enabled, DB2 workload appears stalled on one
or more members with no discernible balancing..............................................33
12.3 With transaction-level WLB enabled, DB2 workload is executing on multiple
members, but it does not appear balanced.....................................................33
12.4 Server list reports unusual member priority values with transaction-level
WLB enabled ..............................................................................................34
12.4.1 Priority value is zero ...................................................................34
12.4.2 Server lists as maintained by different members report different
priority values for the same member..........................................................34
12.4.3 Member reports a server list as empty or nonexistent.....................35
12.5 ACR does not appear to be working as configured.................................35
3
12.6 ACR failover takes a long time to finish or appears to hang, affecting DB2
workload throughput ...................................................................................35
12.6.1 Detection of socket termination....................................................36
12.6.2 Attempt to open a socket to a down member.................................36
12.6.3 ACR retries................................................................................36
13 Conclusion...........................................................................................37
Appendix.......................................................................................................38
A. Creating supplemental view using table functions .......................................38
B. DB2PoolMonitor class ..............................................................................39
Contributors ..................................................................................................40
4
1 Introduction
In today's global business, your distributed database system must not only be
available 24x7: it must also grow as required. Whether that means short-term
scalability for peak high-demand periods or uninterrupted service during both
planned outages (maintenance) and unplanned outages, stakes are high, and every
second counts.
The IBM DB2 pureScale Feature is designed for organizations that run online
transaction processing (OLTP) applications on distributed systems. The DB2
pureScale Feature offers clustering technology that helps reduce the risk and cost of
business growth, with application cluster transparency, scalability, and continuous
availability. DB2 pureScale technology is based on the proven DB2 for z/OS Parallel
Sysplex architecture, recognized as the gold industry standard for maintaining high
availability and scalability.
With the DB2 pureScale Feature, scaling your database solution is simple. Multiple
members process incoming database requests. Members operate in a clustered
system and share data. You can add more members to scale out to meet even the
most demanding business needs. Scaling your system is simply a matter of adding a
new member to the pureScale cluster and issuing two simple commands. Few or no
application changes are required, and you do not have to redistribute data or
perform performance tuning.
The DB2 pureScale Feature provides resiliency through workload balancing (WLB)
and automatic client reroute (ACR), as driven by an IBM Data Server driver or client.
As of DB2 Version 9.7 Fix Pack 1, all IBM Data Server drivers and clients support
WLB and ACR running against DB2 pureScale server.
2 IBM Data Server drivers and clients
Even though a DB2 pureScale server can handle both local and remote applications,
for high availability, applications should run remotely on client hosts that are
separate from the server instance. Each such client host requires an independent
(stand-alone) IBM Data Server driver or client installation.
Before DB2 Version 10.1, a DB2 pureScale instance had to be at Version 9.8 or a
Version 9.8 fix pack. However, because Version 9.8 was a server-only release, the
latest stand-alone IBM Data Server driver or client package that was available when
Version 9.8 was released was for Version 9.7. Therefore, only IBM Data Server
drivers or clients from Version 9.7 FP1 or a later fix pack level were supported for a
DB2 pureScale instance, which had to be at Version 9.8 or a Version 9.8 fix pack. A
DB2 pureScale instance at either Version 9.8 or 10.1 can support IBM Data Server
drivers and clients running at:
o Version 9.7 FP1 or a later fix pack
o Version 10.1 or a later fix pack
5
Although they have similar functionality, different configuration mechanisms are
used depending on the type of driver or client:
• The IBM Data Server Driver for JDBC and SQLJ normally uses configuration
properties.
• All other drivers and clients use the db2dsdriver.cfg client configuration
file.
3 Server lists
The server list forms the basis for WLB and ACR operations by IBM Data Server
drivers and clients. For a particular database, the server list provides the following
information:
• The available members, as identified by each member's TCP/IP host name or
IP address and port
• The capacity of each member to handle work relative to one another, also
known as the priority
Each member stores its copy of the server list for a particular database in a cache.
Periodically, the remote driver or client requests a member to return a server list for
the database being accessed. When a member is requested to return a server list,
the member determines whether the cached copy is outdated based on a number of
considerations. For example, the cached server list is immediately marked outdated
when a member starts and joins the cluster or when a member is stopped and leaves
the cluster. If the cached copy is considered outdated, the cached copy is refreshed
before being returned to the driver or client.
Each application that a remote driver or client runs is represented by an application
process, which in the case of the IBM Data Server Driver for JDBC and SQLJ, a Java
virtual machine (JVM). Multiple application processes running on the same host can
share the same configuration through a common db2dsdriver.cfg configuration file
or JDBC properties file. However, the processes are all independent, each having its
own copy of the server list. Each copy of the server list can be different because they
were generated at different times. WLB and ACR decisions are made within an
application process based only on its copy of the server list, independent of any
other application process.
4 Member priority
For a particular database, the server list contains a priority value for each member.
The member priority is a relative value that is meaningful only when compared with
that of another member. The higher the priority, the more work that the member
can handle relative to members with lower priorities. Therefore, the driver or client
should give the database member with the higher priority more work. The typical
maximum value is 100. The minimum value of 0 indicates a member is unavailable
for work. However, the driver or client can still attempt to connect to a database
member with a priority of 0, if all other members, regardless of their priorities, are
inaccessible.
6
Consider the case where there are three members (0, 1, and 2):
Member Priority
0 10
1 100
2 55
Of the three members, member 1 has the highest priority. Therefore, it should
receive the most work. The member with the lowest priority is member 0. Because
its priority is 10, member 0 should be given only 10% of the work that is given to
member 1. Because member 2 has a priority of 55, member 2 should be given only
55% of the work that is given to member 1.
For information about how the priority value is derived, see the “Monitoring CPU and
memory load” section.
5 Workload balancing (WLB)
In a DB2 pureScale environment, scalability is achieved with the help of WLB. An
IBM Data Server driver or client drives WLB based on priority information in the
server list that is returned by a member. With WLB, as workload increases, you can
add more members to the DB2 pureScale cluster to handle the extra workload. After
being added, the new member becomes visible in the server list to the IBM Data
Server drivers and clients. New members begin processing incoming database
requests as soon as they join the DB2 pureScale cluster. As the number of members
doubles, overall throughput almost doubles. Truly scalable, the DB2 pureScale
cluster can grow with business needs, even if only short term for peak times.
There are two levels of WLB:
• Connection-level
With connection-level WLB, the driver or client balances the distribution of
connections among DB2 pureScale members according to the member
priorities in the server list. Connection-level WLB is not supported by the IBM
Data Server Driver for JDBC and SQLJ but is enabled by default for non-Java-
based IBM Data Server drivers and clients.
• Transaction-level
Transaction-level WLB is more granular because it balances the distribution of
both connections and transactions among DB2 pureScale members according
to the member priorities in the server list. For all drivers and clients,
transaction-level WLB is disabled by default. To use it, you must explicitly
enable it.
In general, WLB operation is invisible to an application. Some conditions can cause
WLB to be disabled, and you might have to modify applications to allow WLB to be
enabled. For details, see the “Application considerations” section.
7
5.1 Reuse permission
Under transaction-level WLB, a single logical connection from an application can be
directed to one member for one transaction and directed to different member for
another transaction. At the end of a transaction, the server determines whether the
logical connection can execute the next transaction on a different member. This
decision, called reuse permission, is conveyed to the driver or client in the commit or
rollback reply.
If reuse permission has been granted, the driver or client workload can move to a
different member for the next transaction, or it can stay on the same member.
However, if reuse permission is denied, the driver or client must not move to a
different member for the next transaction. Denial of reuse permission also forces
ACR to become non-seamless, as described in the “Automatic client reroute (ACR)”
section.
Reuse permission is granted if all resources that a logical connection uses for a
database persist across all members of a DB2 pureScale instance. If the SQL
statements that are used in a workload result in the use of a session-specific
resource that is not available on all members, reuse permission is denied. For a list
of session-specific resources that can cause reuse permission to be denied, see the
"DB2 client considerations for the DB2 pureScale Feature" topic in the DB2
Information Center:
• For Version 9.8:
http://publib.boulder.ibm.com/infocenter/db2luw/v9r8/topic/com.ibm.db2.luw
.sd.doc/doc/r0056430.html
• For Version 10.1:
http://publib.boulder.ibm.com/infocenter/db2luw/v10r1/topic/com.ibm.db2.lu
w.qb.server.doc/doc/r0056430.html
To avoid certain types of reuse permission denial, see the “Application
considerations” section.
5.2 Transaction-level WLB for single-connection application
processes
Transaction-level WLB is designed for processing transactions concurrently on a
multithreaded application process with many connections, such as an application
server environment. However, an application process can be single-threaded with
only a single connection so that at any time, only a single transaction is being
processed.
Complications occur if there are multiple such application processes executing
concurrently. There is no coordinated transaction-level WLB among multiple
application processes. Consider what would happen if all such application processes
were allowed to favor the same member (for example, the one with the highest
priority) at the same time. That member would become overloaded, and its priority
would decrease. Another member with a higher priority would then be favored by all
application processes, and so on.
8
To allow transaction-level WLB to perform properly in this environment, special logic
was added to the drivers and clients to introduce an element of randomness in terms
of which member is picked. This logic prevents all single-threaded, single-connection
application processes from favoring the same member at the same time.
5.3 Expectations of workload balancing
With respect to WLB, the job of the IBM Data Server driver or client is to distribute
workload based on member priorities in the server list, regardless of whether they
make sense. Ideally, the result is perfect balancing or something close to it. If the
driver or client is distributing work based on the server list, and yet the perfect
balancing that you expect is not occurring, you should focus on the member
priorities. Consider the following areas.
Hardware and configuration
The first consideration is whether all member hosts have identical hardware and
configurations. If they do not, workload distribution might be skewed toward the
most capable member host.
Workload other than DB2 workload
Another consideration is the amount of DB2 workload that the IBM Data Server
driver or client drives compared to other workload executing on member hosts.
Transaction-level WLB is based on the premise that the driver or client can influence
member priorities through distribution of its workload. Some other workload, DB2 or
otherwise, might exert an overwhelming influence on the load of one or more
member hosts. In that case, member priorities, regardless of whether they are
skewed, might remain the same regardless of how the IBM Data Server driver or
client distributes its DB2 workload.
Transaction characteristics
For an application process, the IBM Data Server driver or client performs transaction-
level WLB based on how many transactions are executing concurrently on each
member. All concurrently executing transactions are considered equal for
transaction-level WLB purposes even when they have vastly different resource
requirements and durations. Member priorities might be skewed toward certain
members, and the driver and client might not be able to correct this imbalance even
after driving more transactions to members with higher priorities as part of
transaction-level WLB.
Multiple application processes
Finally, if multiple applications access the same DB2 pureScale instance by using
transaction-level WLB, each application process that is executed by a remote driver
or client is associated with an independent process. Each such process uses its own
copy of the server list, and the different server lists can contain different member
priorities because they were generated at different times. WLB decisions are made
independently within each application process. There is no WLB coordination among
multiple applications even when they are executed on the same client host. In
addition, such applications can be vastly different, with different transaction
characteristics. Therefore, even when perfect balancing is achieved from each
9
application's perspective, if you look at the aggregate workload distribution at the
server, there might not be a perfect balance.
6 Automatic client reroute (ACR)
Designed originally for use in an HADR environment, ACR has been enhanced to
provide high availability in a DB2 pureScale environment. Through ACR, a database
connection that suffered an outage can be recovered by being re-associated to the
database. In a DB2 pureScale environment, ACR allows work on a failed member to
be redirected to any active (surviving) member with minimal or no interruption. Like
WLB, ACR is performed by the IBM DB2 Data Server driver or client by using
information in the server list.
ACR can be seamless or non-seamless:
• With seamless ACR, the application is unaware of the communication failure.
The remote driver or client can seamlessly recover from the failure by
reconnecting to the same or another member and re executing the failed SQL
statements.
Seamless ACR has the following main restrictions:
o It is only supported with ODBC, CLI, .NET, and Java APIs.
o The failed SQL statement must be the first SQL statement in a
transaction.
o Reuse permission must have been granted at the end of the previous
transaction. For more details, see the “Reuse permission” section
Additional restrictions are listed in the “Application programming
requirements for high availability connections to the DB2 for Linux, UNIX, and
Windows servers” topic in the DB2 Information Center
(http://pic.dhe.ibm.com/infocenter/db2luw/v10r1/topic/com.ibm.db2.luw.
apdv.cli.doc/doc/c0056507.html).
• With non-seamless ACR, the application is made aware of the communication
failure. SQL error code -30108 or -4498 is issued. The driver or client
recovers the connection to the database to any member (including the current
member), but the application is responsible for replaying SQL statements
from the failed transaction. An example of non-seamless ACR is shown in
Figure 1.
10
Figure 1: Non-seamless ACR
With ACR, if there is one active member, planned outages can be made seamless so
that the application is not interrupted. Depending on the application state, unplanned
outages might not be seamless and might interrupt application requests at the time
of failure.
6.1 Alternate group support
As of Version 9.7 Fix Pack 5, when client affinities are not in use, you can configure
one or more alternate DB2 pureScale instances (groups) for use by the remote driver
or client. If none of the members of the default DB2 pureScale instance is accessible,
an alternate instance is used. However, after a failover to an alternate DB2
pureScale instance, there is no failback to the default instance.
The configuration varies depending on the driver or client that you use. For details,
see the DB2 Information Center
(http://pic.dhe.ibm.com/infocenter/db2luw/v10r1/index.jsp?topic=%2Fcom.ibm.db2.
luw.apdv.cli.doc%2Fdoc%2Fc0059473.html).
In this diagram, the client has been configured to use an alternate instance. The
connection is lost between the client and the default DB2 pureScale instance, and the
client establishes a new connection to the alternate instance.
11
Figure 2: The client that uses alternate group support
You can use Q Replication to keep the database in sync among multiple DB2
pureScale instances. For more information, see the Combining IBM DB2 pureScale
with Q Replication for scalability and business continuity paper in developerWorks
(http://www.ibm.com/developerworks/data/bestpractices/purescaleqreplication/
index.html).
Alternative group support and client affinities are mutually exclusive. For details
about client affinities, see the “Client affinities” section.
7 First connection redundancy
As mentioned earlier, WLB and ACR rely on the server-supplied server list. For an
application, the server list is returned on the first successful connection to a
database. This connection is directed to one of the members. However, if that
member is inaccessible initially, no server list is received, so the application is not
aware of other members that might be accessible. To avoid this situation, you can
use an alternate server list, which can be static or dynamic. If the designated
member is inaccessible, the alternate server list allows the driver or client to try
other members for the first connection.
The static alternate server list is a list of members that you specify in the
db2dsdriver.cfg client configuration file or through JDBC properties. After the initial
database connection is established, the server returns a fresh server list, and the
static alternate server list is no longer used.
As of Version 9.7 Fix Pack 4, the non-Java-based IBM Data Server drivers and clients
also support dynamic alternate server list caching in a file named srvrlst.xml. This
file, as maintained by the non-Java-based IBM Data Server driver or client, contains
the most recent copy of the server list as received by application processes on a
12
particular host. The dynamic alternate server list is enabled by default except when
the static alternate server list is enabled. After the initial database connection is
established, the server returns a fresh server list, and the dynamic alternate server
list is no longer used. However, the alternative server list continues to be refreshed
for the benefit of any application that must establish its initial connection to the
database.
As of Version 9.7 Fix Pack 3 (JCC Version 3.61.65), the IBM Data Server Driver for
JDBC and SQLJ supports a dynamic alternate server list through the use of the
jccServerListCache.bin file with the db2.jcc.outputDirectory property.
For ease of maintenance, the dynamic alternate server list is preferred over the
static alternate server list.
8 Client affinities
The client affinities scheme is used for workload distribution and ACR. The client
affinities scheme ignores any server-supplied server list. Instead, the remote driver
or client uses a hard-coded server list that is constructed based on information that
you supply. With client affinities, you define one or more client hosts. For each client
host, you define a preferred member and the order in which members are used for
ACR when a member becomes inaccessible. Therefore, with multiple client host
definitions, you have full control over workload distribution on a per client-host basis.
With the client affinities scheme, DB2 workload stays on a particular member as long
as that member remains available. This can be advantageous to certain
performance-sensitive applications. Different ordering schemes are supported. In
addition, like regular ACR, client-affinitized ACR can be seamless or non-seamless.
Failback to the preferred member is also an option.
Each client has one db2dsdriver.cfg configuration file. That configuration file can
be used to configure multiple databases. However, all applications accessing that
database use either client affinities or WLB, which are mutually exclusive.
8.1 Configuration
You configure client affinities in the db2dsdriver.cfg client configuration file. There
are two member ordering schemes:
• User-defined ordering
• Round-robin ordering
First, in both ordering schemes, you define a list of members in an alternate server
list in the db2dsdriver.cfg configuration file. You identify each member in the list
by its TCP/IP host name and port number. For example, you can define alternate
server list L as follows:
13
<alternateServerList>
<server name="member0" hostname="coralxib23" port="50000" />
<server name="member1" hostname="coralxib24" port="50000" />
<server name="member2" hostname="coralxib25" port="50000" />
</alternateServerList>
8.1.1 User-defined ordering
In user-defined ordering, you define one or more ordered lists containing entries
from the alternate server list. These ordered lists are called affinity lists. For
example, for the alternate server list L in the previous section, you can define
affinity list A1 as member 2, member 1, member 0 and affinity list A2 as member 0,
member 1,member 2:
<affinityList>
<list name="A1" serverorder="member2,member1,member0" />
<list name="A2" serverorder="member0,member1,member2" />
</affinityList>
Next, define one or more client hosts, as identified by their TCP/IP host names, and
specify the affinity list to use for each host, as shown in the following example:
<clientAffinityDefined>
<client name="clientHost1" hostname="hotel67" listname="A1" />
<client name="clientHost2" hostname="hotel68" listname="A2" />
</clientAffinityDefined>
14
Figure 3: Clients that use user-defined affinity lists
User-defined ordering allows applications to use different member ordering based on
the client host that they are on. For example, with affinity list A1 in effect, all work
must take place on member 2. If member 2 becomes unavailable, ACR reroutes all
work to member 1. And if member 1 then becomes unavailable, all work is rerouted to
member 0. The first member in each affinity list is referred to as the preferred
member.
If you use the IBM Data Server Driver for JDBC and SQLJ with IBM WebSphere
Application Server, you can configure client affinities in the WebSphere Application
Server administration console or through JDBC data source properties. However,
only the user-defined ordering scheme is supported. In this version, no client host
definitions are supported. All JDBC driver instances use the same members in the
order in which you define them in the ordered member list.
8.1.2 Round-robin ordering
With round-robin ordering, you do not directly define members for a client or driver
to use or an order of members for that client. However, you still define one or more
client or driver hosts, as identified by their TCP/IP host names. Each driver or client
can use all members in the alternate server list, according to the order in that list.
Round-robin ordering is not supported by the IBM Data Server Driver.
The preferred member for each defined client or driver host is based on its offset in
the client host list. For example, the preferred member for the first client host that
you define is the first entry in the alternate server list in the db2dsdriver.cfg file,
the preferred member for the second client or driver host is the second entry in the
alternate server list, and so on. If there are more client or driver hosts than there
15
are entries in the alternate server list, entries in the list are recycled as the preferred
member. The following is an example of a client host list:
<clientAffinityRoundrobin>
<client name="clientHost3" hostname="hotel69" />
<client name="clientHost4" hostname="hotel70" />
<client name="clientHost5" hostname="hotel71" />
<client name="clientHost6" hostname="hotel72" />
<client name="clientHost7" hostname="hotel73" />
</clientAffinityRoundrobin>
8.1.3 Failback
With either ordering scheme, you can configure client affinities to enable connections
to periodically attempt to fail back to the preferred member. For non-Java-based IBM
Data server clients and drivers, set the value of the affinityFailbackInterval
parameter in the db2dsdriver.cfg configuration file. For the IBM Data Server Driver
for JDBC and SQLJ, use the affinityFailbackInterval JDBC property.
Examples of various configurations are in the DB2 pureScale enablement paper in
developerWorks (http://www.ibm.com/developerworks/data/library/long/dm-
1206purescaleenablement/index.html).
9 Client affinities configuration reload
A CLI application that uses the non-Java-based IBM Data Server driver or client can
invoke the SQLReloadConfig API to reload the client affinities configuration in the
db2dsdriver.cfg file at the next transaction boundary, thus applying configuration
changes. For details, see the “SQLReloadConfig function (CLI)” topic in the DB2
Information Center
(http://publib.boulder.ibm.com/infocenter/db2luw/v10r1/topic/com.ibm.db2.luw.apd
v.cli.doc/doc/r0056916.html).
9.1 Configuration
Basic IBM Data Server driver and client configuration is covered in the DB2
pureScale enablement paper in developerWorks
(http://www.ibm.com/developerworks/data/library/long/dm-
1206purescaleenablement/index.html). This section instead focuses on the following
configuration aspects:
• DB2 pureScale server connectivity configuration
• Advanced driver and client configuration using connection and command
timeouts
16
9.1.1 DB2 pureScale server connectivity configuration
The WLB and ACR features that the IBM Data Server drivers and clients provide
require a TCP/IP or SSL-based connection to a DB2 pureScale member. Before a
connection from an IBM Data Server driver or client to a DB2 pureScale member can
be established, these requirements must be met:
• The connection manager for the TCP/IP or SSL communication protocol must
be started for all DB2 pureScale members that are to participate in the
workload.
• At a minimum, the IBM Data Server driver or client must have the following
connectivity information for one member of the DB2 pureScale cluster where
you start the connection manager for the TCP/IP or SSL communication
protocol:
o A TCP/IP host name or IP address
o A connection service name or port
o A database name or alias
9.1.2 Starting connection managers on DB2 pureScale members
Before starting connection managers on DB2 pureScale members, you must:
• Determine which communication protocol to start: TCP/IP, SSL, or both. The
examples in this section focus on TCP/IP.
• Choose a connection service name or connection port for each communication
protocol that will be started.
You can use the DB2COMM registry variable to specify which protocol connection
managers are started when you start a member by using the db2start command.
The syntax of the registry variable is as follows:
DB2COMM=[protocol[,protocol]]
where protocol is TCPIP or SSL.
The steps in this section focus on TCP/IP.
Step 1: To start the TCP/IP protocol connection manager when a DB2 member
starts, enter the following command:
db2set DB2COMM=TCPIP
Step 2: For each communication protocol for which a connection manager will be
started, to allow remote driver or client connections, you must configure a
connection port which must be available on each member host of the DB2 pureScale
cluster to which you want.
You identify each connection port by a port number or a connection service name
that maps to a port number. The connection service name or connection port must
17
be available on each member of the DB2 pureScale cluster and cannot be shared
between communication protocols.
o If you plan on using a port number, set the SVCENAME parameter to the
connection port that you want to use. For example, if you wanted to use
connection port 50001, you must use the following command:
db2 update dbm cfg using SVCENAME 50001
o If you plan to use a connection service name, you must define it in the TCP/IP
services file. The default location of this file varies depending on the operating
system. For AIX and Linux operating systems, the file is /etc/services. To
define a new service name, open the TCP/IP services file in a text editor and
add a connection entry for the service name. For example, if you want to use
connection service name db2c_db2inst1 that is tied to connection port 50001,
you must update the TCP/IP services file with the following entry:
db2c_db2inst1 50001/tcp # DB2 Connection Service Port
After you update the TCP/IP services file, you can update the database
manager configuration file with the connection service name using the
following command:
db2 update dbm cfg using SVCENAME db2c_db2inst1
9.2 Advanced timeout configuration
Using ACR, the IBM Data Server driver or client redirects a logical connection from a
failed member to an alternate member so that the application can continue its work
with minimal or no interruption. While the driver or client attempts to reestablish
connect to the database at an alternate member, the application can appear
unresponsive. Statements that typically run in milliseconds can end up taking tens of
seconds.
There are a number of configuration parameters for tuning the responsiveness of the
driver or client when it is re-establishing a database connection. Tuning these values
can help set a limit on the responsiveness of the driver or client and can decrease
the amount of time that the driver or client requires to reestablish a connection to an
alternate server.
9.2.1 Configuring hard time limits for establishing a connection
or executing a command or query
To understand how to configure the driver or client to tune ACR, first configure the
driver or client for normal operations. You can then impose limits on the driver or
client to control how long it attempts to establish a connection or execute a
statement before returning an error to the application.
18
The coarsest level of timeout configuration parameters is the logical connection
parameters and the command or query timeout parameters. The logical connection
timeout parameters control how long the driver or client waits for the server to
accept a database connection request. The command or query timeout parameters
control how long the driver or client waits for the server to handle a database
request.
Unlike their socket-level equivalents, described in later sections, you can track logical
connection and command timeouts over multiple send/receive flows to and from the
database server. You can use these timeouts to enforce the maximum amount of
time that an application waits for an IBM Data Server driver or client function to
return control.
The logical connection timeout is relevant only when an application issues a
connection request. For control over connections that are attempted under the
covers, as in the case of ACR, you must specify a socket-level connection timeout
instead, as described in the “Improving failure-detection times” section.
9.2.1.1 Configuring application connection and command timeouts for non-Java-based driver or clients
There are several configuration methods for configuring application connection and
command timeouts for non-Java-based drivers and clients. The recommended
configuration method for setting the logical connection timeout is to use the
connectionTimeout parameter in the db2dsdriver.cfg configuration file. The unit
of time for this configuration parameter is seconds.
There are several configuration methods for configuring application connection and
command timeouts for non-Java-based drivers and clients. The recommended
configuration method for setting the command timeout is to use the queryTimeout
and queryTimeoutInterval parameters in the db2dsdriver.cfg file. The unit of
time for these configuration parameters is also seconds. The queryTimeout
configuration parameter specifies the maximum number of seconds that the driver or
client waits for a command to time out before attempting to cancel the execution of
the command and return control to the application. The queryTimeoutInterval
configuration parameter controls how often the driver or client checks whether or not
commands violate the setting of the queryTimeout configuration parameter.
Therefore, the actual command timeout value is calculated as follows:
queryTimeout - (queryTimeout + queryTimeoutInterval)
If you configure the database-level locktimeout parameter, it should have a higher
value than the total time that you configure for the application command timeout.
In the following sample db2dsdriver.cfg file, the application connection timeout is
30 seconds, and the command timeout range is 10 - 20 seconds for all databases:
19
<configuration>
<dsncollection>
<dsn alias="xibtest" name="testdb" host="coralxib14" port="50002"/>
</dsncollection>
<databases>
<database name="testdb" host="coralxib14" port="50002">
<parameter name=”connectionTimeout” value=”30”/>
<wlb>
<parameter name="enableWLB" value="true"/>
</wlb>
<acr>
<parameter name="enableACR" value="true"/>
<parameter name="enableSeamlessACR" value="false"/>
</acr>
</database>
</databases>
<parameters>
<parameter name=”queryTimeout” value=”10”/>
<parameter name=”queryTimeoutInterval” value=”10”/>
</parameters>
</configuration>
9.2.1.2 Configuring logical connection and command timeouts for the Java-based drivers
To set the application connection timeout, set the connectionTimeout property of
the DB2BaseDataSource class. The unit of time for this configuration parameter is
seconds.
To set the application configure the application command timeout, set the data
source commandTimeout property. The unit of time for this configuration parameter is
also seconds. If you configure the database-level locktimeout parameter, it must
have a higher value than that of the commandTimeout property.
In the following sample configuration of DB2BaseDataSource class, the application
connection timeout is 30 seconds, and the command timeout is 20 seconds:
. . .
String url = “jdbc:db2://coralxib14:50002/TESTDB” ;
Properties properties = new Properties() ;
properties.put(“user”, “yourID”);
properties.put(“password”, “yourPassword”);
properties.put(“enableSysplexWLB”, “true”);
properties.put(“connectionTimeout”, “30”);
properties.put(“commandTimeout”, “20”);
Connection con = DriverManager.getConnection( url, properties ) ;
20
9.2.1.3 Socket-level connection timeouts
IBM Data Server drivers and clients also offer a socket-level connection timeout
configuration parameter. You can set this parameter to customize the responsiveness
of the driver or client when connection attempts are made under the covers, as in
the case of ACR.
9.2.1.3.1 Configuring the socket-level connection timeout for non-Java-based drivers or clients
You can configure the socket-level connection timeout value for a non-Java-based
IBM Data Server driver or client through the tcpipConnectTimeout parameter in the
db2dsdriver.cfg file. The unit of time for this configuration parameter is seconds. If
you configure the connectionTimeout parameter, it must have a higher value than
that of the tcpipConnectTimeout parameter.
In the following sample db2dsdriver.cfg file, the tcpipConnectTimeout parameter
is set to 5 seconds:
<configuration>
<dsncollection>
<dsn alias="xibtest" name="testdb" host="coralxib14" port="50002"/>
</dsncollection>
<databases>
<database name="testdb" host="coralxib14" port="50002">
<parameter name=”connectionTimeout” value=”30”/>
<parameter name=”tcpipConnectTimeout” value=”5”/>
<wlb>
<parameter name="enableWLB" value="true"/>
</wlb>
<acr>
<parameter name="enableACR" value="true"/>
<parameter name="enableSeamlessACR" value="false"/>
</acr>
</database>
</databases>
<parameters>
<parameter name=”queryTimeout” value=”10”/>
<parameter name=”queryTimeoutInterval” value=”10”/>
</parameters>
</configuration>
9.2.1.3.2 Configuring the socket-level connection timeout for
the Java-based driver
You can configure the socket-level connection timeout value for the Java-based IBM
Data Server driver through the data source loginTimeout property. The unit of time
21
for this property is seconds. If you configure the connectionTimeout property, it
must have a higher value than that of the loginTimeout property.
In the following sample configuration of the DB2BaseDataSource class, the
loginTimeout property is set to 5 seconds:
. . .
String url = “jdbc:db2://coralxib14:50002/TESTDB” ;
Properties properties = new Properties() ;
properties.put(“user”, “yourID”);
properties.put(“password”, “yourPassword”);
properties.put(“enableSysplexWLB”, “true”);
properties.put(“connectionTimeout”, “30”);
properties.put(“commandTimeout”, “20”);
properties.put(“loginTimeout”, “5”);
9.2.2 Improving failure-detection times
In certain failure conditions, before it is terminated, a member might not close the
TCP/IP socket that is used by a connection. In these scenarios, the keepalive
mechanism detects the socket failure and termination of the member. However, the
default keepalive timeout value at the operating system level is 2 hours, which is an
extremely long time for an application to wait to act upon a member failure.
As of Version 10.1, you can configure the keepalive timeout, the default being 15
seconds.
Alternatively, you can use socket-level receive timeouts to limit the amount of time
that is spent on the receive operation. This is a more granular alternative to the
application connection timeout that was described in an earlier section. Socket-level
receive timeouts are not as reliable as the keepalive mechanism in determining when
a socket has failed.
9.2.2.1 Configuring the KEEPALIVE timeout for non-Java-based drivers or clients
To set the operating system keepalive parameters for a non-Java IBM Data Server
driver or client, use the DB2 keepAliveTimeout parameter in the db2dsdriver.cfg
file. The unit of time for this configuration parameter is seconds.
In the following sample db2dsdriver.cfg file, the keepAliveTimeout parameter is
set to 10 seconds:
22
<configuration>
<dsncollection>
<dsn alias="xibtest" name="testdb" host="coralxib14="50002"/>
</dsncollection>
<databases>
<database name="testdb" host="coralxib14" port="50002">
<parameter name=”connectionTimeout” value=”30”/>
<parameter name=”tcpipConnectTimeout” value=”5”/>
<parameter name=”keepAliveTimeout” value=”10”/>
<wlb>
<parameter name="enableWLB" value="true"/>
</wlb>
<acr>
<parameter name="enableACR" value="true"/>
<parameter name="enableSeamlessACR" value="false"/>
</acr>
</database>
</databases>
<parameters>
<parameter name=”queryTimeout” value=”10”/>
<parameter name=”queryTimeoutInterval” value=”10”/>
</parameters>
</configuration>
9.2.2.2 Configuring the KEEPALIVE timeout for the Java-based driver
The recommended way to configure the operating system keepalive parameters for
the Java-based IBM Data Server driver is to set the data source DB2
keepAliveTimeOut property. The unit of time for this configuration parameter is
seconds. This property is supported only with Version 3.63, 4.13, or later of the IBM
Data Server Driver for JDBC and SQLJ, with IBM Java 6 SR10 or later.
In the following sample configuration of DB2BaseDataSource class, the
keepAliveTimeOut property is set to 10 seconds:
. . .
String url = “jdbc:db2://coralxib14:50002/TESTDB” ;
Properties properties = new Properties() ;
properties.put(“user”, “yourID”);
properties.put(“password”, “yourPassword”);
properties.put(“enableSysplexWLB”, “true”);
properties.put(“connectionTimeout”, “30”);
properties.put(“commandTimeout”, “20”);
properties.put(“loginTimeout”, “5”);
properties.put(“keepAliveTimeOut”, “10”);
23
9.2.2.3 Configuring the socket-level receive timeout for non-
Java-based drivers or clients
You can configure the socket-level receive timeout for a non-Java-based IBM Data
Server driver or client through the receiveTimeout parameter in the
db2dsdriver.cfg file. The unit of time for this configuration parameter is seconds.
If you configure the database-level locktimeout parameter and the total application
command timeout, each must have a higher value than that of the receiveTimeout
parameter.
In the following sample db2dsdriver.cfg file, the receiveTimeout parameter is set
to 5 seconds:
<configuration>
<dsncollection>
<dsn alias="xibtest" name="testdb" host="coralxib14" port="50002"/>
</dsncollection>
<databases>
<database name="testdb" host="coralxib14" port="50002">
<parameter name=”connectionTimeout” value=”30”/>
<parameter name=”tcpipConnectTimeout” value=”5”/>
<parameter name=”receiveTimeout” value=”5”/>
<wlb>
<parameter name="enableWLB" value="true"/>
</wlb>
<acr>
<parameter name="enableACR" value="true"/>
<parameter name="enableSeamlessACR" value="false"/>
</acr>
</database>
</databases>
<parameters>
<parameter name=”queryTimeout” value=”10”/>
<parameter name=”queryTimeoutInterval” value=”10”/>
</parameters>
</configuration>
9.2.2.4 Configuring the socket-level receive timeout for the Java-based driver
You can set the socket-level receive timeout for the Java-based IBM Data Server
driver through the data source blockingReadConnectionTimeout property. The unit
of time for this configuration parameter is seconds.
If you configure the database-level locktimeout parameter and the commandTimeout
property, each must have a higher value than that of the
blockingReadConnectionTimeout property.
24
In the following sample configuration of DB2BaseDataSource class, the
blockingReadConnectionTimeout property is set to 5 seconds:
. . .
String url = “jdbc:db2://coralxib14:50002/TESTDB” ;
Properties properties = new Properties() ;
properties.put(“user”, “yourID”);
properties.put(“password”, “yourPassword”);
properties.put(“enableSysplexWLB”, “true”);
properties.put(“connectionTimeout”, “30”);
properties.put(“commandTimeout”, “20”);
properties.put(“loginTimeout”, “5”);
properties.put(“blockingReadConnectionTimeout”, “5”);
9.2.3 Configuring ACR behavior
You can configure the following aspects of ACR behavior:
• The number of times that the driver or client traverses the server list to try
each member address
• The amount of time that the driver or client waits between server list passes
• The connection timeouts and command or query timeouts, as mentioned
earlier
By default, with non-affinitized ACR, the driver or client attempts to recover from a
failure for up to 10 minutes (or up to 2 minutes if you defined one or more alternate
groups) by re-establishing a connection to the database at any member. With
affinitized ACR, the driver or client attempts to recover from a failure by trying each
member up to three times. For a non-Java-based IBM Data Server driver or client,
you can use the maxAcrRetries and acrRetryInterval parameters in the ACR
section of the db2dsdriver.cfg file to override the default ACR behavior. The IBM
Data Server Driver for JDBC and SQLJ uses the equivalent
maxRetriesForClientReroute and retryIntervalForClientReroute properties.
10 Application considerations
To take advantage of transaction-level WLB and seamless ACR features, applications
must ensure they do not use resources that bind them to a specific member for the
lifetime of their connections. The full list of resources that can bind an application to
a member is in the “Reuse permission” section. If you write applications in a
particular manner, some of the reuse permission restrictions do not apply. The
following sections describe when and how to lift these restrictions.
25
10.1 Dynamic SQL statements in embedded SQL applications
By default, applications that you write in embedded SQL and that prepare and
execute dynamic SQL statements cannot use transaction-level WLB and seamless
ACR.
When a dynamic SQL statement is prepared for the first time on a DB2 for Linux,
UNIX, and Windows server, the statement text and an executable version of the
statement (a section) are stored in the database’s package cache. This information
stays in the package cache until it is invalidated, the cache space is required for
another statement, or the database is shut down. Applications using the same
compilation environment and same statement text as the dynamic statement can use
the dynamic SQL statements in the package cache. A reference to the section for the
prepared statement is kept in the application's SQL context until you terminate the
application. The application can execute the SQL statement in multiple transactions
without re-preparing it before each execution.
In a DB2 pureScale environment, the package cache on each member contains
sections for SQL statements that were prepared only on that member. An application
can execute a dynamic SQL statement only on the member where the statement was
prepared. By default, this behavior is incompatible with transaction-level WLB
because transactions from a single application executing remotely can be routed by a
driver or client to different members for different transactions. For the same reason,
the package cache behavior is incompatible with seamless ACR.
An application that exploits the existing package cache behavior in a DB2 pureScale
environment falls into one of the following categories:
• Depending on the driver or client and the API that is used, if unknown to the
application, re-prepares in a new transaction can be driven under the covers
by the driver or client, the existing remote application can work unchanged
with transaction-level WLB and seamless ACR.
If your ODBC, CLI, .NET, or Java application uses an IBM Data Server driver
or client to prepare dynamic SQL statements, your application falls into this
category.
• If the driver, client, or API does not drive re-prepares in a new transaction
under the covers, neither transaction-level WLB nor seamless ACR is allowed.
If your application is not an ODBC, CLI, .NET, or Java application or it does
not use an IBM Data Server driver or client to prepare dynamic SQL
statements by default, your application falls into this category. Applications
that use embedded SQL also fall into this category by default. The command
line processor (CLP) utility is an example of such an application. However,
you can code your application so that it does not use the package cache
across transactions, that is, the application always re-prepares the statement
in a new transaction. In that case, binding, rebinding, or altering your
package with the KEEPDYNAMIC(NO)bind option lifts the transaction-level WLB
and seamless ACR restrictions.
26
The KEEPDYNAMIC bind option allows applications to specify whether the package
cache will be used across transactions, which affects whether transaction-level WLB
and seamless ACR will be allowed:
• KEEPDYNAMIC(YES) corresponds to the current package cache behavior, where
an application has a reference to the executable section for a prepared
statement in its SQL context for the life of the application. This is the default
behavior. With this option, transaction-level WLB is disabled for an embedded
SQL application after you use dynamic SQL statements, and seamless ACR is
not allowed.
• KEEPDYNAMIC(NO) dictates that references to the section for a prepared
statement are removed from the SQL context at transaction boundaries. The
application must re-prepare any dynamic SQL statement that it wants to
reuse in a new transaction. This option does not disable transaction-level WLB
upon the use of dynamic SQL statements for an embedded SQL application,
and seamless ACR is allowed. Ensure proper application behavior before using
the NO value.
10.2 Sequences and the PREVIOUS VALUE expression
The PREVIOUS VALUE of a sequence for an application is primed by executing a
statement that includes the NEXT VALUE expression. The value that the NEXT VALUE
expression generates persists beyond a transaction but is cached only on the
member where the NEXT VALUE expression was executed.
By default, the server prevents applications that execute statements that reference
the PREVIOUS VALUE of a sequence from taking advantage of transaction-level WLB
and seamless ACR. If the server allowed the application to perform transaction-level
WLB or seamless ACR, it could move to a different member where the PREVIOUS
VALUE expression might return the wrong result or the SQL0845N error.
However, the PREVIOUS VALUE expression can be safely used with transaction-level
WLB and seamless ACR if the application guarantees that the NEXT VALUE
expression is executed in a transaction before a PREVIOUS VALUE expression. This
ensures that the PREVIOUS VALUE expression returns the last value that the NEXT
VALUE expression generated. If all applications use this safe model of sequence
execution, you can instruct the server to allow transaction-level WLB and seamless
ACR by setting the DB2_ALLOW_WLB_WITH_SEQUENCES registry variable as follows:
db2set DB2_ALLOW_WLB_WITH_SEQUENCES=YES
This registry variable takes effect when a database is activated on a member. If all
applications that reference sequences do not use the safe model of execution and the
YES setting for this registry variable is in effect, the SQL0845N error is returned if an
application references the PREVIOUS VALUE expression in a transaction before
referencing the NEXT VALUE expression.
27
11 Monitoring
A common task is determining whether transaction-level WLB is working. First,
determine the transaction throughput on each member for the workload. Next,
examine the server list to determine whether the workload distribution matches the
member priorities. A member's priority is calculated based on several factors,
including CPU load and memory load. Finally, examine system metrics to determine
whether they are consistent with the member priorities in the server list.
11.1 Monitoring transaction throughput on a member
To determine whether workload is distributed properly among members, you must
examine the transaction throughput on each member.
The MON_GET_UNIT_OF_WORK table function can provide a rough estimate of the
number of active transactions on each member. Each row of the
MON_GET_UNIT_OF_WORK function output represents one transaction on one member.
The UOW_START_TIME column is non-null for any active transaction. You can use this
metric to roughly estimate the number of active transactions on a member at a
single time.
When using the data that the MON_GET_UNIT_OF_WORK table function provides, keep
in mind that applications that are not using WLB do not respect server list priorities.
Therefore, if such workload is being executed concurrently with applications that do
use WLB, the number of active transactions might not correlate to the relative
member priorities.
For additional ways of determining the transaction throughput, see the Appendix.
11.1.1 Sample MON_GET_UNIT_OF_WORK function output
The following sample query is issued on and collects data on all members of a DB2
pureScale cluster. The query returns the number of active transactions on each
member at the time of collection. Each row contains the name of the member where
the data was collected and the number of active transactions on that member.
db2 "select uow.member, count(uow.member) from table(
mon_get_unit_of_work( NULL, -2 ) ) as uow where
uow.uow_start_time is not null group by uow.member"
MEMBER 2
------ -----------
0 43
1 26
2 28
3 record(s) selected.
28
11.2 Monitoring the server list for a specific database
A DB2 member maintains a server list for each active database for which a TCP/IP
connection was established.
The MON_GET_SERVERLIST table function, which was introduced in DB2 Version 9.8
Fix Pack 4, returns the server list for the current database as maintained by a
specific member or all server lists as maintained by all members in a DB2 pureScale
cluster. Each row that the MON_GET_SERVERLIST table function returns provides the
priority and connectivity information for a DB2 member in a server list as maintained
by the DB2 member that is identified by the MEMBER column (The MEMBER column
does not identify the member whose priority and connectivity information is being
provided.) The row also includes the last time that the server list was refreshed.
The db2pd -serverlist command returns the same information as the
MON_GET_SERVERLIST table function. The db2pd command can target a single active
database or all active databases. By default, the command returns data for the local
member. However, you can direct it to return data from a specific member by using
the -member parameter.
11.2.1 Sample MON_GET_SERVERLIST function output
The following sample query is issued on member 0 in a DB2 pureScale cluster with
three members. The query requests the server list as maintained on the current
member only. Three rows are returned. Each row contains the priority, host name,
and port for a DB2 member, the member where the server list is maintained, and the
last time that the server list was refreshed on that member.
db2 "select member, cached_timestamp, substr(hostname, 1, 30) as
hostname, port_number, priority from table( mon_get_serverlist( -1 ) ) as
server_list"
MEMBER CACHED_TIMESTAMP HOSTNAME PORT_NUMBER
PRIORITY
------ -------------------------- ------------------------------ ----------- --------
0 2012-07-05-20.52.56.000000 coralxib14.torolab.ibm.com 50002 100
0 2012-07-05-20.52.56.000000 coralxib16.torolab.ibm.com 50002 81
0 2012-07-05-20.52.56.000000 coralxib15.torolab.ibm.com 50002 80
3 record(s) selected.
The following sample query is issued on member 0 in a DB2 pureScale cluster with
three members. The query requests server lists as maintained on all members. Nine
rows are returned. Each row contains the priority, host name, and port for a DB2
member. Each row also identifies the member where the server list is maintained
and the last time that the server list was refreshed on that member.
29
db2 "select member, cached_timestamp, substr(hostname, 1, 30) as hostname,
port_number, priority from table( mon_get_serverlist( -2 ) ) as server_list"
MEMBER CACHED_TIMESTAMP HOSTNAME PORT_NUMBER
PRIORITY
------ -------------------------- ------------------------------ ----------- --------
2 2012-07-05-20.52.53.000000 coralxib16.torolab.ibm.com 50002 80
2 2012-07-05-20.52.53.000000 coralxib14.torolab.ibm.com 50002 100
2 2012-07-05-20.52.53.000000 coralxib15.torolab.ibm.com 50002 79
1 2012-07-05-20.52.42.000000 coralxib15.torolab.ibm.com 50002 79
1 2012-07-05-20.52.42.000000 coralxib14.torolab.ibm.com 50002 100
1 2012-07-05-20.52.42.000000 coralxib16.torolab.ibm.com 50002 83
0 2012-07-05-20.52.56.000000 coralxib14.torolab.ibm.com 50002 100
0 2012-07-05-20.52.56.000000 coralxib16.torolab.ibm.com 50002 81
0 2012-07-05-20.52.56.000000 coralxib15.torolab.ibm.com 50002 80
9 record(s) selected.
11.2.2 Sample db2pd -serverlist command output
The following command is issued on member 0. The command targets the database
TESTDB. The output includes the following information:
• A header indicating the server local time that the server list was last
refreshed on member 0
• The name of the database that the server list data was collected from
• The number of server list entries
Each server list entry is displayed in a row. The row consists of the host name, non-
SSL and SSL ports, and the priority for the member.
db2pd -serverlist -db testdb
Database Member 0 -- Active -- Up 0 days 02:20:16 -- Date 2012-07-05-
21.11.49.631511
Server List:
Time: Thu Jul 5 21:11:16
Database Name: TESTDB
Count: 3
Hostname Non-SSL Port SSL Port Priority
coralxib14.torolab.ibm.com 50002 0 100
coralxib16.torolab.ibm.com 50002 0 78
coralxib15.torolab.ibm.com 50002 0 70
The following sample command is issued on member 0. The command targets the
database TESTDB on member 1. The format of the output is the same as in the last
example.
30
db2pd -serverlist -db testdb -member 1
Database Member 1 -- Active -- Up 0 days 02:26:07 -- Date 2012-07-05-
21.17.40.064578
Server List:
Time: Thu Jul 5 21:17:32
Database Name: TESTDB
Count: 3
Hostname Non-SSL Port SSL Port Priority
coralxib15.torolab.ibm.com 50002 0 77
coralxib14.torolab.ibm.com 50002 0 100
coralxib16.torolab.ibm.com 50002 0 85
11.3 Monitoring CPU and memory load
Member priorities as reported by the MON_GET_SERVERLIST table function are based
primarily on the short-time (1-minute) CPU load average and the memory swap rate.
Examining the CPU load average and memory swap rate of a specific member can
give insight into how the DB2 pureScale server assigns member priorities. These
statistics also provide an avenue of investigation if the priority for a member is not
as expected.
The ENV_GET_SYSTEM_RESOURCES table function provides an SQL interface for
identifying the short-time CPU load average and number of pages that were swapped
in and out of memory. These values are also available through operating-system-
specific commands such as uptime and vmstat. Each row that the
ENV_GET_SYSTEM_RESOURCES function returns contains data that it collected on a
specific DB2 member.
11.3.1 Sample ENV_GET_SYSTEM_RESOURCES function
output
The following sample query is issued on member 0. Each row shows the name of a
member with the short-time CPU load average and the number of pages that were
swapped in and swapped out for that member. The row indicates the member that
the data was collected on:
db2 "select member, cpu_load_short, swap_pages_in, swap_pages_out from table(
env_get_system_resources() ) as system_resources"
MEMBER CPU_LOAD_SHORT SWAP_PAGES_IN SWAP_PAGES_OUT
------ ------------------------ -------------------- --------------------
1 +2.80000000000000E+000 0 86
2 +2.89000000000000E+000 0 0
0 +1.98000000000000E+000 0 20
31
12 Troubleshooting
For the DB2 pureScale Feature, the most commonly reported issues regarding WLB
and ACR are as follows. The steps to troubleshoot them are provided.
12.1 Transaction-level WLB does not appear to be enabled even
though it is configured at the driver or client
To diagnose this issue for the non-Java-based IBM Data Server driver and client:
1. Verify the configuration in the db2dsdriver.cfg file on the host where the
driver or client is located:
• Ensure that you properly configured the transaction-level WLB settings
in the WLB section of a database entry in the db2dsdriver.cfg
configuration file.
• Ensure that the remote driver or client can access the database entry
in the db2dsdriver.cfg configuration file. The database name and the
connectivity information (host name and port values) in the database
entry must match what is provided to the application for connecting to
the database. The connectivity information that you provide to the
application can be in the form of a connection string, a DSN, or system
database and node directory entries. If the application uses a DSN,
ensure that you configured a matching DSN entry in the
db2dsdriver.cfg configuration file and that it maps correctly to the
relevant database entry in that file.
2. Capture diagnostics:
a. Specify the appropriate mechanism for collecting data.
o For a DB2 Version 9.7 driver or client, set the database manager
configuration diaglevel parameter to 4.
o For a DB2 Version 10.1 driver or client, set the DB2_SRVLSTLOG_LEVEL
DB2 registry variable to 3.
b. Establish a connection to the database at a DB2 pureScale instance.
c. Search for the sqljrParseSrvlst entry in one of the following
locations:
o For Version 9.7, in the db2diag.log file as located in your
DIAGPATH
o For Version 10.1, in the db2srvlst.0.log file as located in your
DIAGPATH
If the sqljrParseSrvlst entry is missing from the log file, the server
did not return the server list. Verify that you properly configured the
DB2 server as a DB2 pureScale instance.
If the sqljrParseSrvlst entry contains the line Enable Transport
Pooling: with a value of true, transaction-level WLB is enabled.
32
If the sqljrParseSrvlst entry contains the line Enable Transport
Pooling: but the value is false, transaction-level WLB is not enabled.
Either the db2dsdriver.cfg configuration file is not being accessed, or transaction-level WLB is not properly configured in the
db2dsdriver.cfg file.
o For a DB2 Version 9.7 driver or client, all required diagnostics
were collected from the previous connection operation.
o For a DB2 Version 10.1 driver or client, additional information is
required. Set the diaglevel database manager configuration
parameter to 4 and connect again.
3. Look for an entry in the db2diag.log file in your DIAGPATH from the
rccConfig::getInstance function, which dumps the contents of the
db2dsdriver.cfg file as read by the driver or client. If there is no such entry,
the db2dsdriver.cfg file was not accessed. Verify the correct placement of
this file.
If you can locate the entry from the rccConfig::getInstance function in the
db2diag.log file, but transaction-level WLB is not enabled, the problem is in
its configuration in the db2dsdriver.cfg file. Ensure that you properly
configured the relevant database entry and its matching DSN entry, if
applicable.
To diagnose this issue for the Java-based IBM Data Server driver, monitor
transaction-level WLB activity by using the techniques in the following topics in the
DB2 Information Center:
• “Techniques for monitoring IBM Data Server Driver for JDBC and SQLJ
Sysplex support”
(http://pic.dhe.ibm.com/infocenter/db2luw/v10r1/topic/com.ibm.db2.luw.apd
v.java.doc/src/tpc/imjcc_c0020930.html)
• “DB2PoolMonitor class”
(http://pic.dhe.ibm.com/infocenter/db2luw/v10r1/topic/com.ibm.db2.luw.apd
v.java.doc/src/tpc/imjcc_r0052956.html)
For an example of how you can use DB2PoolMonitor class methods, see the
Appendix.
If IBM Service requests a driver trace, follow the instructions in “Problem diagnosis with the IBM Data Server Driver for JDBC and SQLJ” (http://pic.dhe.ibm.com/infocenter/db2luw/v10r1/topic/com.ibm.db2.luw.apdv.java.doc/src/tpc/imjcc_cjvjcdig.html) .
33
12.2 With transaction-level WLB enabled, DB2 workload
appears stalled on one or more members with no discernible balancing
If you verified that transaction-level WLB is properly enabled, check whether your
DB2 workload is affected by one or more server restrictions that can disallow the
transfer of transaction-level work from one member to another. See the “Application
considerations” section.
12.3 With transaction-level WLB enabled, DB2 workload is
executing on multiple members, but it does not appear balanced
As described in the “Expectation of workload balancing” section, perfect balancing
might not always be achievable. To determine whether the balancing is working as
designed:
1. Determine the transaction throughput on each member. 2. Determine whether the driver or client is distributing work according to the
member priorities in the server list, regardless of what they are. For details on
collecting this information, see the “Monitoring” section.
Each member stores its copy of a server list in a cache for a particular database.
Member priorities as maintained by a particular member might not be identical
to those on another member, because the server lists might have been refreshed
at different times. Therefore, you should compare DB2 workload distribution by
using member priorities from the most recently refreshed server list across all
members.
As shown in the example in the “Member priority” section, DB2 workload
distribution is expected to be consistent. In that example, member 0 was
assigned priority 10, member 1 was assigned priority 100, and member 2 was
assigned priority 55. Therefore, member 1 was given the most work, member 0
was given 10% of the work of member 1, and member 2 was given 55% of the
work of member 1. If you find a discrepancy between the DB2 workload
distribution as driven by the remote driver or client and the member priorities, it
might point to a problem with the remote driver or client. For example, in the
previously mentioned scenario, member 0 should not be given the most work,
and member 1 should not be given the least work.
3. If DB2 workload distribution is consistent with member priorities but the validity
of the member priorities is suspect, determine whether there is a problem with
the DB2 server. For details on what metrics constitute a member priority and how
to collect those metrics, see the “Monitoring CPU and memory load” section.
Memory paging rate is usually not an issue, so the focus is typically on the 1-
minute CPU load average. If member priorities are out of sync with these two
34
metrics, it might point to a problem with the DB2 pureScale server.
4. If member priorities accurately reflect the load for each member but you are
unsure about the CPU and memory load metrics, talk to your system
administrator about any work unrelated to DB2 that is executing on each
member host.
Before restarting your DB2 workload, check the load on each member host. If the
load is already quite uneven before you start your DB2 workload, there is no
guarantee that this workload will become evenly distributed. If the DB2 workload
is heavy enough to overcome the original unevenness in the load among
members, you should see an even distribution of the DB2 workload among
members. Otherwise, the DB2 workload will be distributed unevenly among
members.
12.4 Server list reports unusual member priority values with transaction-level WLB enabled
You might find unusual member priority values, such as the ones described in the
following sections.
12.4.1 Priority value is zero
A priority of zero is assigned to a member that is inaccessible to a remote driver or
client. Typically, the member is down. It is also possible that the member is up but
that its TCP/IP or SSL listener is down because the listener port is owned by some
other process. If you believe that the member that is assigned a priority of zero
should be fully accessible to a remote driver or client, try manually connecting to the
database at that member over TCP/IP or SSL. If one or more members continue to
report a zero priority when they should not, ensure that APAR IC84020 is applied by
upgrading your DB2 pureScale server to the latest fix pack.
12.4.2 Server lists as maintained by different members report different priority values for the same member
As mentioned previously, each member maintains its own server list. Server list
refreshes are performed as required whenever a remote driver or client requests that
a server list be returned, as indicated by the timestamp that accompanies each
server list. It cannot be determined which member will be asked to return a server
list; the server list cannot be requested from a specific member. Therefore, even
though transaction-level WLB is working properly, it might take longer for the server
list on a particular member to be refreshed than it does on other members. As a
result, member priorities that are shown in the server list on a particular member
can be out of date. However, if and when a request for a server list is received by
that member, its server list will be refreshed and returned.
35
12.4.3 Member reports a server list as empty or nonexistent
An empty server list as reported by a member indicates that even though the
database is active, it has not been accessed by a remote driver or client. Another
possibility is that a member reports the database as being inactive so the server list
does not exist. If your DB2 workload has been executing at the remote client or
driver for a while with transaction-level WLB supposedly enabled, this indicates that
transaction-level WLB is not working as expected. Follow the steps for the symptom
“Transaction-level WLB does not appear to be enabled even though it is configured at
the driver or client.”
12.5 ACR does not appear to be working as configured
To diagnose the case where ACR does not appear to be working as configured:
Ensure that you configured ACR settings properly. For the non-Java-based IBM Data
Server driver or client, you specify the ACR settings in the ACR section of a database
entry in the db2dsdriver.cfg file. For the IBM Data Server Driver for JDBC and
SQLJ, you specify ACR settings on the data source or in the URL.
If you are using the client affinities scheme and want to use failback to the preferred
member, ensure that you configured the affinityFailbackInterval parameter in
the db2dsdriver.cfg file or configured the affinityFailbackInterval JDBC
property.
1) To ensure that the db2dsdriver.cfg file and the relevant database entry are
accessed by the non-Java-based IBM Data Server driver or client, follow the
steps for the symptom “Transaction-level WLB does not appear to be enabled
even though it is configured at the driver or client.”
2) If ACR fails to recover from an outage, even though it is enabled by default,
ensure that at least one member of the DB2 pureScale instance is up.
3) Even when seamless ACR is enabled, not every instance of ACR can be made
seamless. For reasons why it might not be possible to make ACR seamless, see
the “Application considerations” section.
12.6 ACR failover takes a long time to finish or appears to hang,
affecting DB2 workload throughput
ACR delays can occur during any of the following operations:
• Detection of socket termination
• An attempt to open a socket to a down member
• ACR retries
36
12.6.1 Detection of socket termination
Depending on the nature of an outage, the TCP/IP stack might not come down
properly. If this is the case, the remote driver or client might not detect the outage
immediately. To minimize the delay in detecting the outage, you should take these
steps:
� For the non-Java-based IBM Data Server driver or client, use one of the following
options:
• In the db2dsdriver.cfg file, configure the keepAliveTimeout parameter
under the relevant database entry, outside its WLB and ACR sections.
• Configure the receiveTimeout parameter.
� For the IBM Data Server Driver for JDBC and SQLJ, use one of the following
options:
� Configure the keepAliveTimeOut property.
� Configure the blockingReadConnectionTimeout property.
Do not set the parameters or properties so low that false positives occur.
12.6.2 Attempt to open a socket to a down member
To minimize the delay in detecting a member that is down when attempting to open
a socket to it, take these steps:
• For the non-Java-based IBM Data Server driver or client, in the
db2dsdriver.cfg file, configure the tcpipConnectTimeout parameter under the
relevant database entry, outside its WLB and ACR sections.
• For the IBM Data Server Driver for JDBC and SQLJ, set the loginTimeout
property.
Do not set the parameter or property so low that false positives occur.
12.6.3 ACR retries
If all members are inaccessible, ACR retries can give the appearance of a hang. By
default, ACR retries can last up to 10 minutes (or 2 minutes if you defined one or
more alternate groups). You should take these steps:
• For the non-Java-based IBM Data Server driver or client, consider configuring
the maxAcrRetries and acrRetryInterval parameters in the ACR section of
the relevant database entry in the db2dsdriver.cfg file.
• For the IBM Data Server Driver for JDBC and SQLJ, set the
maxRetriesForClientReroute and retryIntervalForClientReroute
properties on the data source or in the URL.
For details on these and additional settings, see the “Advanced driver or client
configuration options” section.
37
13 Conclusion
The DB2 pureScale Feature keeps your distributed database system available 24x7.
WLB, ACR, and client affinities provide uninterrupted service during both planned and
unplanned outages. If a member fails, applications are automatically rerouted to
other DB2 members. When the failed member comes back online, applications are
transparently routed to the restarted member. With the DB2 pureScale Feature,
scaling your database solution to meet the most demanding business needs is easy.
38
Appendix
Appendix A provides a starting point for developing more advanced transaction-level
WLB monitoring reports. Appendix B provides a Java-based driver example of how
you can use DB2PoolMonitor class methods for monitoring the transport pool.
A. Creating supplemental view using table functions
The following view definition reports the number of active transactions per member
from a specific driver or client host. Where possible, the row output is further
distinguished by the driver’s or client's process ID. The current connection and IPC-
based connections are excluded from the output.
CREATE OR REPLACE VIEW WLB_ACTIVE_TRANSACTIONS_PER_MEMBER
(
MEMBER,
TRANSACTIONS_ON_MEMBER,
CLIENT_PID,
CLIENT_APPLNAME,
CLIENT_HOSTNAME
)
AS
SELECT UOW.MEMBER,
COUNT(UOW.MEMBER),
CONN.CLIENT_PID,
CONN.CLIENT_APPLNAME,
CONN.CLIENT_HOSTNAME
FROM TABLE ( SYSPROC.MON_GET_UNIT_OF_WORK( NULL, -2 ) ) AS UOW,
TABLE ( SYSPROC.MON_GET_CONNECTION( NULL, -2 ) ) AS CONN
WHERE UOW.APPLICATION_HANDLE <>
SYSPROC.MON_GET_APPLICATION_HANDLE()
AND
CONN.CLIENT_PROTOCOL LIKE 'TCPIP%'
AND
UOW.UOW_START_TIME IS NOT NULL
AND
UOW.APPLICATION_HANDLE = CONN.APPLICATION_HANDLE
GROUP BY UOW.MEMBER, CONN.CLIENT_PID, CONN.CLIENT_APPLNAME,
CONN.CLIENT_HOSTNAME
The following sample query lists the number of active transactions per member from
a specific driver or client host:
db2 "select member, transactions_on_member, substr(client_hostname, 1, 10) as
client_hostname from wlb_active_transactions_per_member"
MEMBER TRANSACTIONS_ON_MEMBER CLIENT_HOSTNAME
------ ---------------------- -----------------
0 38 hotel44
1 28 hotel44
2 28 hotel44
39
B. DB2PoolMonitor class
The following example shows how you can use DB2PoolMonitor class methods for
monitoring the transport pool that the IBM Data Server for JDBC and SQLJ uses
when transaction-level WLB is enabled:
// This program section prints output similar to what follows:
// PoolStatistics npr:1062 nsr:1062 lwroc:1020 hwroc:2 coc:40 aooc:0 rmoc:0 nbr:0
// sbt:0 abt:0 lbt:0 tbt:0 crr:1 tpo:40
DB2PoolMonitor pm = DB2PoolMonitor.getPoolMonitor
(DB2PoolMonitor.TRANSPORT_OBJECT);
StringBuffer s = new StringBuffer (512);
s.append ("PoolStatistics npr:");
s.append (pm.totalRequestsToPool ());
s.append (" nsr:");
s.append (pm.successfullRequestsFromPool ());
s.append (" lwroc:");
s.append (pm.lightWeightReusedObjectCount ());
s.append (" hwroc:");
s.append (pm.heavyWeightReusedObjectCount ());
s.append (" coc:");
s.append (pm.createdObjectCount ());
s.append (" aooc:");
s.append (pm.agedOutObjectCount ());
s.append (" rmoc:");
s.append (pm.removedObjectCount ());
s.append (" nbr:");
int nbr = pm.numberOfRequestsBlocked ();
s.append (nbr);
s.append (" sbt:");
s.append (pm.shortestBlockedRequestTime ());
s.append (" abt:");
s.append ((nbr == 0 ? 0 : (pm.totalTimeBlocked () / nbr)));
s.append (" lbt:");
s.append (pm.longestBlockedRequestTime ());
s.append (" tbt:");
s.append (pm.totalTimeBlocked ());
s.append (" crr:");
s.append (pm.numberOfConnectionReleaseRefused ());
s.append (" tpo:");
s.append (pm.totalPoolObjects ());
System.out.println (s);
40
Contributors
Farzana Anwar
Information Developer
Geneva Smith
Information Developer
41
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in
other countries. Consult your local IBM representative for information on the
products and services currently available in your area. Any reference to an IBM
product, program, or service is not intended to state or imply that only that IBM
product, program, or service may be used. Any functionally equivalent product,
program, or service that does not infringe any IBM intellectual property right may
be used instead. However, it is the user's responsibility to evaluate and verify the
operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter
described in this document. The furnishing of this document does not grant you any
license to these patents. You can send license inquiries, in writing, to:
IBM Director of Licensing
IBM Corporation
North Castle Drive
Armonk, NY 10504-1785
U.S.A.
The following paragraph does not apply to the United Kingdom or any other country
where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS
MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY
OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A
PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied
warranties in certain transactions, therefore, this statement may not apply to you.
Without limiting the above disclaimers, IBM provides no representations or
warranties regarding the accuracy, reliability or serviceability of any information or
recommendations provided in this publication, or with respect to any results that
may be obtained by the use of the information or observance of any
recommendations provided herein. The information contained in this document
has not been submitted to any formal IBM test and is distributed AS IS. The use of this
information or the implementation of any recommendations or techniques herein is
a customer responsibility and depends on the customer’s ability to evaluate and
integrate them into the customer’s operational environment. While each item may
have been reviewed by IBM for accuracy in a specific situation, there is no
guarantee that the same or similar results will be obtained elsewhere. Anyone
attempting to adapt these techniques to their own environment do so at their own
risk.
This document and the information contained herein may be used solely in
connection with the IBM products discussed in this document.
This information could include technical inaccuracies or typographical errors.
Changes are periodically made to the information herein; these changes will be
incorporated in new editions of the publication. IBM may make improvements
42
and/or changes in the product(s) and/or the program(s) described in this
publication at any time without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any
manner serve as an endorsement of those websites. The materials at those websites are not part of the
materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes
appropriate without incurring any obligation to you.
Any performance data contained herein was determined in a controlled
environment. Therefore, the results obtained in other operating environments may
vary significantly. Some measurements may have been made on development-
level systems and there is no guarantee that these measurements will be the same
on generally available systems. Furthermore, some measurements may have been
estimated through extrapolation. Actual results may vary. Users of this document
should verify the applicable data for their specific environment.
Information concerning non-IBM products was obtained from the suppliers of those
products, their published announcements or other publicly available sources. IBM
has not tested those products and cannot confirm the accuracy of performance,
compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those
products.
All statements regarding IBM's future direction or intent are subject to change or
withdrawal without notice, and represent goals and objectives only.
This information contains examples of data and reports used in daily business
operations. To illustrate them as completely as possible, the examples include the
names of individuals, companies, brands, and products. All of these names are
fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE: © Copyright IBM Corporation 2011. All Rights Reserved.
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any
form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the
sample programs are written. These examples have not been thoroughly tested under all conditions. IBM,
therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines
Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked
on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S.
registered or common law trademarks owned by IBM at the time this information was published. Such
trademarks may also be registered or common law trademarks in other countries. A current list of IBM
trademarks is available on the Web at “Copyright and trademark information” at
www.ibm.com/legal/copytrade.shtml
Windows is a trademark of Microsoft Corporation in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
43
© Copyright IBM Corporation 2013 All Rights Reserved. IBM Canada 8200 Warden Avenue Markham, ON L6G 1C7 Canada Printed in United States of America 12-12 Neither this documentation nor any part of it may be copied or reproduced in any form or by any means or translated into another language, without the prior consent of all of the above mentioned copyright owners. IBM makes no warranties or representations with respect to the content hereof and specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. IBM assumes no responsibility for any errors that may appear in this document. The information contained in this document is subject to change without any notice. IBM reserves the right to make any such changes without obligation to notify any person of such revision or changes. IBM makes no commitment to keep the information contained herein up to date. The information in this document concerning non-IBM products was obtained from the supplier(s) of those products. IBM has not tested such products and cannot confirm the accuracy of the performance, compatibility or any other claims related to non-IBM products. Questions about the capabilities of non-IBM products should be addressed to the supplier(s) of those products. IBM, the IBM logo, DB2, DB2 Universal Database, and Tivoli are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. Linux is a trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others. References in this publication to IBM products or services do not imply that IBM intends to make them available in all countries in which IBM operates.