Scaling PostgreSQL on SMP Architectures

18
PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 1 Scaling PostgreSQL on SMP Architectures Doug Tolbert, David Strong, Johney Tsai {doug.tolbert, david.strong, johney.tsai}@unisys.com

description

 

Transcript of Scaling PostgreSQL on SMP Architectures

Page 1: Scaling PostgreSQL on SMP Architectures

PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 1

Scaling PostgreSQL on SMP ArchitecturesDoug Tolbert, David Strong, Johney Tsai{doug.tolbert, david.strong, johney.tsai}@unisys.com

Page 2: Scaling PostgreSQL on SMP Architectures

PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 2

Unisys Cellular Multi-Processor Architecture• Up to 32 CPU sockets

– In 4- or 8-CPU pods– Dual-core supported

Multi-core eventually– Dynamically partitionable

• 32-bit & 64-bit technologies• Multilayered caches• Large cross-bar memories

– IA32: 64Gb (with PAE)– EM64T, Opteron, Athlon: 256Tb– Itanium: 1Pb– Theoretical: Multi-Pb

Processor family dependent

• Up to 96 PCI Slots– Lower on newer models

• Linux: SLES, RedHat• Windows: Data Center

CPUCPUCPUCPU CPUCPU

CPUCPU

PCIPCIPCIPCIPCIPCI

PCIPCIPCIPCIPCIPCI

CPUCPUCPUCPU CPUCPU

CPUCPUTLCTLC

CPUCPUCPUCPU CPUCPU

CPUCPUTLCTLC

PCIPCIPCIPCIPCIPCIDIBDIB

PCIPCIPCIPCIPCIPCIDIBDIB

CPUCPUCPUCPU CPUCPU

CPUCPUTLCTLC

8 GB8 GBMSUMSU

8 GB8 GBMSUMSU

8 GB8 GBMSUMSU

8 GB8 GBMSUMSUTLCTLC CPUCPU

CPUCPU CPUCPUCPUCPU

TLCTLC CPUCPUCPUCPU CPUCPU

CPUCPU

DIBDIB PCIPCIPCIPCIPCIPCI

DIBDIB PCIPCIPCIPCIPCIPCI

TLCTLC CPUCPUCPUCPU CPUCPU

CPUCPU

TLCTLC CPUCPUCPUCPU CPUCPU

CPUCPU

DIBDIB PCIPCIPCIPCIPCIPCI

DIBDIB PCIPCIPCIPCIPCIPCI

Page 3: Scaling PostgreSQL on SMP Architectures

PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 3

Unisys Open And Secure Integrated Stack

Page 4: Scaling PostgreSQL on SMP Architectures

PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 4

PostgreSQL ASC Lab Engagement

• February 2005, Unisys Mission Viejo Facility– Simon Riggs, Mark Wong (OSDL) participated– PostgreSQL 8.0.1 measured– Summary at

http://developer.osdl.org/markw/papers/PostgreSQL%20Visit%20Report%20External%20050513.pdf

• Interesting findings– Locking protocol and granularity– Context switch “storms” (~80K/sec)

Page 5: Scaling PostgreSQL on SMP Architectures

PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 5

PostgreSQL 8.1 Enhancements

• A few stand out from a performance perspective– Improve concurrent access to the shared buffer cache (Tom)– Allow index scans to use an intermediate in-memory bitmap (Tom)– Improve performance for partitioned tables (Simon)

• Context switches reduced – From ~80K/sec to ~30K/sec

• Other effects only seen in overall throughput improvements

• Undertook more complete characterization of 8.1 performance improvements

Page 6: Scaling PostgreSQL on SMP Architectures

PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 6

TPC-C Benchmark• Transaction Processing Performance Council (TPC)

– Defines suite of benchmarks– Sole certifier of vendor-submitted benchmark results– Specs, results, more info: http://www.tpc.org

• TPC-C benchmark is a complete order-entry environment– Industry standard measure of DBMS OLTP performance across vendors– Reports transactions per minute (tpmC) and cost metrics– Mix of 5 transactions

• New Order (selects, updates, inserts)• Payment (selects, updates, inserts)• Order Status (selects)• Stock Level (selects)• Delivery (selects, updates, deletes), interactive & deferred variants

• Data volumes, number of clients scaled by vendor’s tpmC goal– See spec for details

• Current (June 2006) result ranges– Best performance: 3,210,540 tpmC @ $5.07/tpmC– Lowest performance: 9,347 tpmC– Best price/performance: 38,622 tpmC @ $0.99/tpmC

Page 7: Scaling PostgreSQL on SMP Architectures

PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 7

Test Application• DBMS stressor application with well-known characteristics

• TPCC4J = “TPC-C For Java”– Unisys-developed from TPC-C version 5.3 (April 2004)– Runs on 1.4.2 JVM (5.0 JVM works)– Small application footprint (~72K) – Very few garbage collection cycles– Client overhead low (7-9% for ~1000 users)

• Deviates from TPC-C in important ways– Non-audited components and results– Data entry screens not populated, only SQL statements run– User think time eliminated– Reports TPS, not tpmC

• No, you can’t multiply by 60 to get tpmC!– No checkpoints during benchmark runs– Not run for a minimum of 8 hours– Data volumes not scaled

Ah, these are NOT TPC-C

numbers!

Page 8: Scaling PostgreSQL on SMP Architectures

PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 8

Test Configuration

Client Driver: Unisys ES3040

4 x 3.0Ghz Intel Xeon MP CPUs (IA32)Hyper threading enabled4Gb RAM1 x 36Gb boot driver (10K rpm)1 Gbit NIC (copper)

OS: SLES 9 SP3

PostgreSQL: Unisys ES7000/540

32 x 3.0Ghz Intel Xeon MP CPUs (IA32)Hyper threading disabled16Gb RAM1 x 36Gb boot driver (10K rpm)1 Gbit NIC (copper)

OS: SLES 9 SP3

Storage: EMC CX-200 External Disk

8 x 36Gb drives (15K rpm)RAID 10 configured (striped mirrors)

Small data volume: 50 TPC-C warehouses, ~5.4Gb (4.7Gb data + 700Mb indexes)

No I/O issues in benchmarking

Page 9: Scaling PostgreSQL on SMP Architectures

PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 9

0

50

100

150

200

1 2 4 8 12 16 20 24 28 32

CPUs

Tran

sact

ions

per

sec

ond

8.0.78.1.2

Scaling -- Results

TPS 1 2 4 8 12 16 20 24 28 328.0.7 29.13 38.12 39.99 36.14 28.28 27.32 26.35 26.02 23.72 20.828.1.2 31.42 55.61 86.4 103.78 117.83 153.42 144.39 129.71 118.7 111.33

Users2 x CPUs6 x CPUs

Page 10: Scaling PostgreSQL on SMP Architectures

PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 10

0

50

100

150

200

1 2 4 8 12 16 20 24 28 32

CPUs

Tran

sact

ions

per

sec

ond

8.0.78.1.2

Scaling – Three Issues

Performance vs.

Scalability

Hardware Architecture

Software Architecture

TPS 1 2 4 8 12 16 20 24 28 328.0.7 29.13 38.12 39.99 36.14 28.28 27.32 26.35 26.02 23.72 20.828.1.2 31.42 55.61 86.4 103.78 117.83 153.42 144.39 129.71 118.7 111.33

Users2 x CPUs6 x CPUs

Page 11: Scaling PostgreSQL on SMP Architectures

PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 11

Performance vs. Scalability

• Performance and Scalability are not the same thing– But are closely related– Must be REALLY careful comparing anecdotal reports

• Performance comparable on equal configurations– In this data set, the only equal configuration is 1 CPU– 8.1.2 exhibits a small (~8%) improvement over 8.0.7

• Scalability measures throughput as resources are added– 8.1.2 uses incremental resources more effectively than 8.0.7– At 16 CPUs, 8.1.2 is 5.6 times better than 8.0.7

• Fair to conclude that 8.1.2 scales effectively to 16 CPUs– At least on this application and platform combination

Page 12: Scaling PostgreSQL on SMP Architectures

PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 12

Hardware Architecture• Cache invalidations

– Handled quickly when on-cell• No visible degradation up to 8 CPUs

– Slower across cells• Cross-cell interactions required, but slower• Responsible for dip at 12 CPUs

– Processing power unbalanced across cells (8 + 4)• Recovers by 16 CPUs

– Processing power balance across cells (8 + 8)– Reduces chance of cache misses

• Cross-cell cache invalidations increase quickly when data structures are poorly aligned with cache line size

• Every large-scale computing platform, regardless of vendor, will have some sort of hardware related behavior that needs to be considered in performance and scalability work

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

SharedCache

MemoryController

RAM

ES7000/540 Cell

Page 13: Scaling PostgreSQL on SMP Architectures

PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 13

Software Architecture

• Degradation above 16 CPUs– Continued decline implies systemic origin– Characteristic of software algorithms or data structures

reaching inherent scalability limits

• Causes unclear, but something important is wrong

• Possibilities include– The Usual Suspects: lock mgmt, buffer mgmt, etc.– Lack of Repeatable Read isolation level

• A TPC-C requirement• Elevation to Serializable hurts overall throughput

– See To Do List (a later slide)

Page 14: Scaling PostgreSQL on SMP Architectures

PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 14

It’s not just TPCC4J!

• Another industry standard, transactional Java benchmark – ES7000 8x – 8.0: 34 TPS – 8.1: 544 TPS

• Different benchmark structure

• Different hardware platform– No cache issues, CPUs on one cell

• Even more dramatic improvement (16x!)

Page 15: Scaling PostgreSQL on SMP Architectures

PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 15

Other Useful Tooling

• gprof (standard Linux utility)

pfreeDLMoveToFrontenlargeStringInfoappendBinaryStringInfo_bt_compareLWLockReleaseLWLockAcquirecheck_stack_depthMemoryContextAllocZeroAlignedhash_search

FunctionCall2int4eqBufferGetBlockNumber_bt_checkkeys_bt_stepAllocSetAllocint4leint4geMemoryContextAllocbtint4cmp_bt_nextAllocSetFree

A Billion calls!

Page 16: Scaling PostgreSQL on SMP Architectures

PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 16

Other Useful Tooling

• VTune– Intel low-level performance monitor

• http://www.intel.com/cd/software/products/asmo-na/eng/index.htm– High frequency functions, by clock ticks

• HeapTupleSatisfiesSnapshotData• LWLockAcquire• XLogInsert• PinBuffer• hash_search

• oprofile – Results pending

HeapTupleSatisfiesSnapshotData??

Page 17: Scaling PostgreSQL on SMP Architectures

PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 17

To Do List• In-line some high frequency functions

• Speed up CRC algorithm by rearranging loop structure

• Improve data structure alignment to memory pages and cache lines– Possibly dynamically based on processor id and settings

• Re-type byte structures to words for newer Intel processors

• Investigate efficiency of bitmaps on newer Intel processors

• Residual spin-lock implementation issues

• Reduce query plan discards (very adventurous!)

• Improve statistics infrastructure and real-time monitoring

Page 18: Scaling PostgreSQL on SMP Architectures

PostgreSQL 10th Anniversary Summit, Toronto, July 8-9, 2006 Page 18

Conclusions• Scalability dramatically improved in 8.1 release

– Scales to 16 CPUs (on test environment)– Context switches down– Supported by multiple benchmarks

• Minor single CPU performance improvement

• Hardware architecture can affect scalability– Important consideration for high-capacity, high-throughput

applications– Systems of all sizes moving to multi-core processor chips

• Substantial scalability and performance challenges remain– TPC-C required feature content missing (repeatable read)– Algorithm scalability limits exist (especially > 16 CPUs)