Reduce Costs and Increase Oracle Database OLTP Workload Service Levels

14
Consolidating OLTP Workloads on Dell™ PowerEdge™ 11G 4 Socket Servers Authors Balamurugan B Ravi Ramappa

description

Server consolidation is an approach that maximizes IT efficiency by minimizing the power/cooling, rack footprint and licensing costs of computer server resources. It solves a fundamental problem called server sprawl, in which multiple, underutilized servers take up more space and consume more power than the workload requires. This white paper identifies the results, or consolidation factor, of consolidating an Oracle® database running online transaction processing (OLTP) workloads on a legacy 9th-generation Dell™ PowerEdge™ 2950 2U two-socket server to the newer 11th-generation (11G) PowerEdge R810 2U four- or two-socket server.

Transcript of Reduce Costs and Increase Oracle Database OLTP Workload Service Levels

Page 1: Reduce Costs and Increase Oracle Database OLTP Workload Service Levels

Consolidating OLTP Workloads on Dell™ PowerEdge™ 11G 4 Socket Servers

A Dell Technical White Paper

Authors

Balamurugan B

Ravi Ramappa

Page 2: Reduce Costs and Increase Oracle Database OLTP Workload Service Levels

Consolidating OLTP Workloads on Dell™ PowerEdge™ 11G 4 socket servers

Page ii

THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL

ERRORS AND TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS OR

IMPLIED WARRANTIES OF ANY KIND.

© 2010 Dell Inc. All rights reserved. Reproduction of this material in any manner whatsoever without

the express written permission of Dell Inc. is strictly forbidden. For more information, contact Dell.

Dell, the DELL logo, and the DELL badge, PowerConnect, and PowerVault are trademarks of Dell Inc.

Symantec and the SYMANTEC logo are trademarks or registered trademarks of Symantec Corporation or

its affiliates in the US and other countries. Microsoft, Windows, Windows Server, and Active Directory

are either trademarks or registered trademarks of Microsoft Corporation in the United States and/or

other countries. EMC is the registered trademark of EMC Corporation. Intel, and Xeon are either

trademarks or registered trademarks of Intel Corporation.

April 2010

Page 3: Reduce Costs and Increase Oracle Database OLTP Workload Service Levels

Consolidating OLTP Workloads on Dell™ PowerEdge™ 11G 4 socket servers

Page iii

Page 4: Reduce Costs and Increase Oracle Database OLTP Workload Service Levels

Consolidating OLTP Workloads on Dell™ PowerEdge™ 11G 4 socket servers

Page 1

Contents

EXECUTIVE SUMMARY................................................................................................... 2

INTRODUCTION .......................................................................................................... 3

SYSTEM ARCHITHECTURE .............................................................................................. 4

TEST CONFIGURATION ................................................................................................. 5

TEST METHODOLOGY ................................................................................................... 5

CONSOLIDATION FACTOR ............................................................................................ 10

SUMMARY ............................................................................................................... 11

Tables

Table 1: Test Configuration ............................................................................... 5

Figures

Figure 1: System Architecture ............................................................................ 4 Figure 2: Base Configuration —TPS Comparison Between Legacy Production and R810 Test

Environment ........................................................................................... 7 Figure 3: Base Configuration - AQRT Comparison Between Legacy Production and R810 Test

Environment ........................................................................................... 7 Figure 4: Base Configuration - CPU Time Comparison Between Legacy Production and R810

Test Environment ..................................................................................... 8 Figure 5: At Legacy Saturated User Load - TPS Comparison Between Legacy Production and

R810 Test Environment ............................................................................... 9 Figure 6: At Legacy Saturated User Load - AQRT Comparison Between Legacy Production and

R810 Test Environment ............................................................................... 9 Figure 7: At Legacy Saturated User Load - CPU Time Comparison Between Legacy Production

and R810 Test Environment ........................................................................ 10

Page 5: Reduce Costs and Increase Oracle Database OLTP Workload Service Levels

Consolidating OLTP Workloads on Dell™ PowerEdge™ 11G 4 socket servers

Page 2

EXECUTIVE SUMMARY

The Dell™ enterprise portfolio is evolving to incorporate better‐performing, more energy‐efficient, and more highly‐available products. With the introduction of Dell’s latest server product line, customers have an opportunity to improve their total cost of ownership by consolidating distributed legacy environments. This is the third white paper in a series that discusses server consolidation on Dell 11G product line. Earlier white papers that discuss the DSS/OLTP workload consolidation on Dell PowerEdge™ 11G 2 socket servers include:

Consolidating DSS Workloads on Dell™ PowerEdge™ 11G Servers Using Oracle® 11g Database Replay http://www.dell.com/downloads/global/solutions/database_11g_consolidate.pdf?c=ec&l=en&s=gen

Consolidating OLTP Workloads on Dell™ PowerEdge™ 11G Servers http://www.dell.com/downloads/global/solutions/Consolidating_OLTP_Workloads.pdf?c=us&cs=555&l=en&s=biz This white paper focuses on Online Transaction Processing (OLTP) workloads and consolidation on Dell PowerEdge™11G 4/2 socket servers. Dell strives to simplify IT infrastructure by providing methods to consolidate legacy production environments and reduce data center complexity. The tools and procedures described in this white paper can help administrators test, compare, validate, and implement the latest hardware and database solution bundles. Dell established these procedures and guidelines based on lab experiments and database workload simulations performed by the Dell Database Solutions Engineering team. Using the tools and procedures described in this document, customers may not only select the appropriate database solution hardware and software stack, but also optimize the solutions’ total cost of ownership according to the database workloads they choose to run. The intended audience of this white paper includes database administrators, IT managers, and system consultants.

Page 6: Reduce Costs and Increase Oracle Database OLTP Workload Service Levels

Consolidating OLTP Workloads on Dell™ PowerEdge™ 11G 4 socket servers

Page 3

INTRODUCTION

Server consolidation can be defined as maximizing the efficiency of computer server resources, thereby minimizing the associated power/cooling, rack footprint and licensing costs. It essentially solves a fundamental problem called server sprawl in which multiple, under‐utilized servers take up more space and consume more power resources than the workload requirement indicates.

OLTP database systems typically service hundreds or thousands of concurrent users. An example of this type of system is a travel reservation system where a large number of customers and agents perform online travel reservations, or check available flights and schedules. The OLTP database transactions performed by these thousands of concurrent users get translated into tens of thousands of I/O requests to the backend storage subsystem depending on the nature of these OLTP transactions. The database host CPUs may only be efficiently used if the backend storage subsystem is configured with a sufficient number of disks to handle the large number of I/O requests. In the case of an Oracle database host CPUs exhibit large IOWAIT times instead of doing useful work. In this scenario, consolidating, upgrading, or migrating to a faster database server, or scaling the number of CPUs or memory does not help. The correct approach is to appropriately scale the backend disk subsystem to handle the I/O requests, and then move to the next stage of CPU and memory sizing as discussed later in this white paper. The objective of this white paper is to identify the consolidation factor for an Oracle database running OLTP workloads on legacy 9th Generation PowerEdge 2950 2U 2 socket to the new 11G PowerEdge R810 2U 4/2 socket servers. An enterprise database system may be running DSS, OLTP, or a mixed workload. The OLTP workloads typically send thousands of small I/O requests from the database servers to the backend storage subsystem. The large amount of I/O requests characteristic of the OLTP workload, means that the backend storage subsystem must have sufficient number of disks to handle the I/O requests coming from the hosts.

Consider a two‐node Oracle RAC database hosted on two ninth‐generation (9G) PowerEdge 2950 dual‐socket, dual‐core, or quad‐core servers running Oracle 10g Release 2. Dell recently announced the availability of its eleventh-generation (11G) server product line equipped with the chipset that is designed to support Intel® Xeon® 7500 series 4/6/8 core processors, QuickPath Interconnect, DDR3 memory technology and PCI Express generation 2. A potential replacement for 9G 2U Dell servers is the 11G 2U Dell PowerEdge R810 server. The R810 supports four‐socket, eight‐core processors, two different types of energy efficient CPUs, and has a highly efficient overall architecture.

A multi‐node Oracle RAC cluster on legacy systems with 2 socket dual core processors can be replaced by an Oracle RAC cluster consisting of fewer nodes of PowerEdge 11G with 4 socket eight core processors, and still process the OLTP workload faster with less power consumption and lower Oracle RAC licensing cost. The savings in RAC licensing may be used to efficiently configure and scale the backend storage system with enough I/O modules and disks to remove the I/O bottlenecks that are almost always an issue in an OLTP environment. Also, based on the results of this study, one may determine how many distributed standalone legacy environments running OLTP workloads can be consolidated on a single Oracle RAC solution running on Dell R810 servers.

Page 7: Reduce Costs and Increase Oracle Database OLTP Workload Service Levels

Consolidating OLTP Workloads on Dell™ PowerEdge™ 11G 4 socket servers

Page 4

SYSTEM ARCHITHECTURE

Figure 1: System Architecture

As shown in the above figure the legacy production environment consists of 2 node Oracle 10g R2 RAC running on 9G 2U 2-socket PowerEdge 2950 III servers and the test environment is an single node ORACLE 11g R2 RAC running on an 11G 2U 4/2 socket PowerEdge R810 server. Note: It should be noted that the intent of this paper is not to recommend converting a RAC cluster to a single node setup. The test setup was designed to compare the host CPU behavior for consolidating database workload at the same time ensuring that the number of cores are same in both the setups. To simulate Oracle RAC overhead in single node configuration (with R810 server), Oracle 11g R2 database was configured with Oracle 11g R2 grid infrastructure.

Page 8: Reduce Costs and Increase Oracle Database OLTP Workload Service Levels

Consolidating OLTP Workloads on Dell™ PowerEdge™ 11G 4 socket servers

Page 5

TEST CONFIGURATION

Table 1 describes the complete software and hardware configuration that was used throughout testing on both the simulated legacy production environment and the 11G test environment.

Table 1: Test Configuration

Component Legacy Production Environment R810 Test Environment

Systems Two PowerEdge 2950 III,2U servers One PowerEdge R810 2U 4/2 socket server

Processors Two Intel Xeon X5460, 3.16 GHz quad core per node Cache: L2=2x4M per CPU

One Intel Xeon X7560, 2.26 GHz eight core Cache: L2=8x256k L3=24M

Memory 32 GB DDR2 per node (64 Gb total) 64 GB DDR3

Internal disks Two 73 GB 2.5” SAS per node Two 73 GB 2.5” SAS

Network Two Broadcom® NetXtreme II BCM5708 Gigabit Ethernet

Four Broadcom® NetXtreme II BCM5709 Gigabit Ethernet

External storage Dell|EMC CX4‐480 with 146GB Fibre Channel disks

Dell|EMC CX4‐480 with 146GB Fibre Channel disks

HBA One QLE2462 per node One QLE2462

OS Enterprise Linux® 4.6 Enterprise Linux 5.4

Oracle software • Oracle 10g R2 10.2.0.4 • File System: ASM • Disk groups: DATABASE, DATA • sga_target = 1600M • pga_target = 800M

• Oracle 11g R2 11.2.0.1.0 • File System: ASM • Disk groups: DATABASE, DATA • memory_target = 2400M

Workload • Quest Benchmark Factory TPCC workload • Scale factor: 3000 • User connections: 200‐5000

• Quest Benchmark Factory TPCC workload • Scale factor: 3000 • User connections: 200‐5000

TEST METHODOLOGY Dell’s Database Solution engineers used Quest Software Benchmark Factory TPCC, a load-generating utility that simulates OLTP users and transactions on a database for a given number of users. The TPCC workload provided by the Benchmark Factory schema simulates an order entry system consisting of multiple warehouses, with data populated in tables with rows according to the scale factor defined during table creation. The most commonly used metrics for an OLTP environment are transaction per second (TPS) and average query response time (AQRT). The AQRT of an OLTP database environment may be described as the average time it takes for an OLTP transaction to complete and deliver the results of the transaction to the end user. The AQRT is the most important factor when it comes to fulfilling end-user requirements, and it establishes the performance criteria for an OLTP database. The 2-seconds response time metric was chosen as the basis for our Service Level Agreement (SLA) which was maintained throughout the testing Our initial goal was to stress the legacy system PowerEdge 2950 III to determine the optimal performance in terms of userload and TPS, ensuring that there is no bottleneck from a storage perspective as well as that of the host memory. The legacy database was configured with a scale factor

Page 9: Reduce Costs and Increase Oracle Database OLTP Workload Service Levels

Consolidating OLTP Workloads on Dell™ PowerEdge™ 11G 4 socket servers

Page 6

of 3000 that created a couple of tables with millions of rows. The total database size that resulted with this scale factor was around 290 GB. Initially the backend storage subsystem, consisting of a Dell/EMC® CX4‐480 storage array, was configured with ten 15K RPM 146GB disks in a RAID 10 configuration. Once populated, we started with 200 concurrent users and increased the userload to 5000 in increments of 200, randomly running transactions against the legacy database while making sure that the AQRT always stayed below 2 seconds. The test methodology used is as follows: 1. To simulate the legacy production environment, a two‐node Oracle 10g R2 RAC cluster comprised of a PowerEdge 2950 III with quad‐core, dual‐socket 3.16 GHz CPU, connected to a Dell EMC CX4-480 storage system configured with a 100 GB LUN for the database SYSTEM, a 400 GB LUN for DATA ASM disk groups, and a 2 GB LUN for the voting and Oracle Cluster Registry (OCR) partitions. 2. Using the Quest Software Benchmark Factory TPCC workload populated the test data with a scale factor of 3000 into the legacy server simulated production environment. 3. After data population, we used the Oracle Data Pump to export data at the schema level and avoid a data reload for each test iteration. expdp system/oracle@racdb1 SCHEMAS=quest CONTENT=all directory=export; 4. We started the first test iteration with a base configuration of 10 disks for the DATA ASM disk group and 200 userload to establish the saturation point of the legacy production environment. We then increased the userload in 200 user increments while constantly monitoring the AQRT. Once the AQRT exceeded 2 seconds, the test was stopped. 5. After each iteration we conducted a host CPU time analysis to determine the limiting factor for host performance. 6. Once the back‐end spindles were saturated, they start exhibiting large I/O latency. This resulted in large IOWAIT at the host CPU and a large AQRT. To reduce the IOWAIT at the host CPU the number of spindles were increased by 10 disks for the DATA ASM disk group for the next iteration performed. The above methodology was continued till the host CPU was optimally utilized with a smaller IOWAIT time. At the same time, we monitored whether or not we were able to lower the average query response below 2 seconds with a higher userload compared to the earlier iteration. 7. To simulate our test environment, we configured Oracle 11g R2 single node RAC comprising of PowerEdge R810 server populated with two Sockets having eight cores each, to match the total CPU cores to that of the legacy production environment. Using the Quest Software Benchmark Factory, we populated the test data with the same TPCC scale factor that was used for the legacy production environment. 8. The test iterations, similar to the legacy production environment, were carried out within an R810 test environment until we matched the userload to the maximum userload supported on the legacy production environment, with an SLA of 2 seconds AQRT. Figures 2 and 3 below compare transactions per second and AQRT between the legacy production and the R810 test environment using the base configuration of 10 disks RAID 10 ASM disk group.

Page 10: Reduce Costs and Increase Oracle Database OLTP Workload Service Levels

Consolidating OLTP Workloads on Dell™ PowerEdge™ 11G 4 socket servers

Page 7

Figure 2: Base Configuration —TPS Comparison Between Legacy Production and R810 Test Environment

Figure 3: Base Configuration - AQRT Comparison Between Legacy Production and R810 Test Environment

0

10

20

30

40

50

60

70

80

90

200 400 600 800 1000 1200 1400 1600

T

P

S

User Load

TPS legacy 10 disks

TPS R810 10 disks

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

200 400 600 800 1000 1200 1400 1600 1800

A

Q

R

T

(

s

e

c)

User Load

Avg Response Time legacy 10 disks

Avg Response Time R810 10 disks

Page 11: Reduce Costs and Increase Oracle Database OLTP Workload Service Levels

Consolidating OLTP Workloads on Dell™ PowerEdge™ 11G 4 socket servers

Page 8

In Figures 2 and 3, it is observed that the legacy production environment exhibits similar performance in terms of transactions per second and the AQRT when compared to the R810 test environment. Do not be misled by these results. Further analysis of the host CPU time in terms of USER time and IOWAIT times revealed that the legacy production environment exhibited a higher USER time to IOWAIT time ratio as compared to the R810 test environment as shown in Figure 4.

Figure 4: Base Configuration - CPU Time Comparison Between Legacy Production and R810 Test Environment

The above chart reveals a very interesting fact: in comparison to the legacy production environment the R810 test environment, having the faster CPU and overall more efficient design, was able to handle the OLTP workload much faster , and exhibited a low USER to IOWAIT time ratio as well (0.45 for legacy vs. 0.11 for R810 at 1400 userload). Since both the environments had an identical storage configuration, the reason for higher IOWAIT and lower USER CPU time on the R810 test environment was due to the faster processing power available on that environment as compared to the legacy production environment. Overall, Figure 4 reveals that in order to take advantage of the faster processing power of the R810G test environment, we need to remove the I/O bottleneck to reduce the IOWAIT time. This result led to further tests and analysis, and we decided to verify our conclusions by trying to alleviate some of the I/O bottlenecks from both our legacy production and the 11G test environment by increasing the spindle count in an increment of 10 disks for our DATA disk group. For legacy production environment we continued the iterations by increasing the number of disks until we reached the minimal IOWAIT on host CPU. For this CPU saturation point, we captured the maximum userload supported (with AQRT of 2 seconds). We termed this userload as ‘legacy saturation userload’. For new R810 test environment, we performed similar iterations with increasing the number of disks until we reached the ‘legacy saturation userload’. Figures 5 and 6 compare the test results for the R810 test environment and the legacy production environment at the ‘legacy saturation userload’.

0

10

20

30

40

50

60

70

80

average iowait average user time average system time

C

P

U

U

t

i

l

i

z

a

t

i

o

n

Legacy: 10 Disks

R810: 10 Disks

Page 12: Reduce Costs and Increase Oracle Database OLTP Workload Service Levels

Consolidating OLTP Workloads on Dell™ PowerEdge™ 11G 4 socket servers

Page 9

Figure 5: At Legacy Saturated User Load - TPS Comparison Between Legacy Production and R810 Test Environment

Figure 6: At Legacy Saturated User Load - AQRT Comparison Between Legacy Production and R810 Test Environment

As seen in Figures 5 and 6, at the legacy production environments saturated userload the TPS and the

AQRT on both the environment are similar.

0

50

100

150

200

250

20

0

40

0

60

0

80

0

10

00

12

00

14

00

16

00

18

00

20

00

22

00

24

00

26

00

28

00

30

00

32

00

34

00

36

00

38

00

40

00

42

00

T

P

S

User Load

At Legacy saturated user load

TPS legacy

TPS R810

0

1

2

3

4

5

6

7

8

9

10

20

0

40

0

60

0

80

0

10

00

12

00

14

00

16

00

18

00

20

00

22

00

24

00

26

00

28

00

30

00

32

00

34

00

36

00

38

00

40

00

42

00

A

Q

R

T

(s

e

c)

User Load

At Legacy saturated user load

TPS legacy

TPS R810

Page 13: Reduce Costs and Increase Oracle Database OLTP Workload Service Levels

Consolidating OLTP Workloads on Dell™ PowerEdge™ 11G 4 socket servers

Page 10

Figure 7: At Legacy Saturated User Load - CPU Time Comparison Between Legacy Production and R810 Test Environment

Analysis of the CPU time in Figure 7 revealed that the CPU user time for the legacy production environment increased drastically and the CPU was optimally utilized for productive work, which was obvious as the CPU IOWAIT time neared zero. Thus we concluded that the legacy production environment will not scale up on further addition of disks and host memory. Also it is observed that the average CPU user time for the R810 test environment is about 40% of the total CPU time, whereas on the legacy production environment it is about 70%. As CPU IOWAIT time on the R810 test environment is about 38% of the total CPU time, this environment can be scaled further by reducing the CPU IOWAIT time and using the same for doing productive work. This can be achieved by further increasing number of disks at the backend.

CONSOLIDATION FACTOR Based on above test results, one can conclude that the single node Oracle 11g R2 RAC running on 11G PowerEdge R810 4/2 socket server populated with 2 sockets (with 8 core processors) was able to handle the OLTP workload of a two-node Oracle RAC cluster running on 9G PowerEdge 2950 III servers populated with all 2 sockets (with quad core processors). Thus we can achieve a consolidation factor of 4 when we fully populate R810 server. Using this consolidation factor, we can consolidate an Oracle RAC cluster with many nodes to fewer nodes. For example, if we populate all four sockets of R810 server in a 2 node Oracle RAC setup, then it can accomplish the OLTP workload of a eight-node Oracle RAC running on PowerEdge 2950 III servers, provided both the environments are configured with sufficient host memory and I/O disk subsystems.

0

10

20

30

40

50

60

70

average iowait average user time

average system time

C

P

U

U

t

i

l

i

z

a

t

i

o

n

At Legacy saturated user load

Legacy

R810

Page 14: Reduce Costs and Increase Oracle Database OLTP Workload Service Levels

Consolidating OLTP Workloads on Dell™ PowerEdge™ 11G 4 socket servers

Page 11

Also as seen from the CPU time analysis graphs (Figure7), on the R810 servers the CPU utilization for productive work was only about 40%. Thus there was enough headroom in the CPU for scaling up further in terms of userload supported for the assumed AQRT of 2 seconds by reducing the CPU IOWAIT time. We may then conclude that if the CPU IOWAIT time on the R810 system is brought down to almost nil, then we can have a higher consolidation factor which will be as high as 7.

SUMMARY Database systems running Online Transaction Processing workloads require the optimal backend storage disk layout and disk quantities to efficiently service a large concurrent user population. The legacy servers running these types of workloads have suffered inefficient CPU resource usage due to the architectural limitations. Thus, only a limited number of disks or memory could be serviced by a CPU core in a system. In this white paper we demonstrated that PowerEdge 11G servers equipped with Xeon 7500 Series chipsets for I/O and processor interfacing remove the bottlenecks and provide an ideal platform to consolidate legacy database environments. The R810 chipset is designed to support Intel’s Xeon 5700 series processor family, QuickPath Interconnect, DDR3 memory technology, and PCI Express Generation 2. This study also demonstrated that 11G servers offer large performance gains when compared to older generation servers. The database systems running on PowerEdge 11G servers exhibit better scalability when additional resources, such as disks and memory, are added. Customers running Oracle 9i or 10g RAC environments on legacy servers and storage will benefit from the findings in the test methodologies outlined in this white paper to consolidate power‐hungry RAC nodes into fewer, faster, more energy efficient nodes. As discussed in earlier section, customers can expect a consolidation factor of at least 4 (which can go up to 7) depending on different database usage pattern. The resulting legacy RAC node consolidation can also drive down Oracle licensing costs, resulting in savings that you can use to increase backend storage resources to improve AQRT, implement disaster recovery sites and additional RAC test‐bed sites for application development and testing. The reduced number of nodes does not compromise performance when paired with PowerEdge 11G servers. The result is less cluster overhead, simplified management, and positive movement toward an objective of simplifying IT and reducing complexity in data centers.