High-Performance Concurrency Control Mechanisms for Main …blanas.2/files/MainMemCC.pdf ·...

27
High-Performance Concurrency Control Mechanisms for Main-Memory Databases Per-Åke Larson 1 , Spyros Blanas 2 , Cristian Diaconu 1 , Craig Freedman 1 , Jignesh M. Patel 2 , Mike Zwilling 1 1 Microsoft Corporation 2 Univ. of Wisconsin-Madison 1

Transcript of High-Performance Concurrency Control Mechanisms for Main …blanas.2/files/MainMemCC.pdf ·...

Page 1: High-Performance Concurrency Control Mechanisms for Main …blanas.2/files/MainMemCC.pdf · High-Performance Concurrency Control Mechanisms for Main-Memory Databases Per-Åke Larson1,

High-Performance Concurrency Control

Mechanisms for Main-Memory Databases

Per-Åke Larson1, Spyros Blanas2, Cristian Diaconu1, Craig Freedman1, Jignesh M. Patel2, Mike Zwilling1

1Microsoft Corporation 2Univ. of Wisconsin-Madison

1

Page 2: High-Performance Concurrency Control Mechanisms for Main …blanas.2/files/MainMemCC.pdf · High-Performance Concurrency Control Mechanisms for Main-Memory Databases Per-Åke Larson1,

The problem

Most DBMSs designed for:

• Disk-resident data

• Few CPUs

$50K server in 2012:

• 1TB of RAM

• 40 CPUs

2

What concurrency control scheme should be used for a high-performance

main-memory OLTP system?

What concurrency control scheme should be used for a high-performance

main-memory OLTP system?

Page 3: High-Performance Concurrency Control Mechanisms for Main …blanas.2/files/MainMemCC.pdf · High-Performance Concurrency Control Mechanisms for Main-Memory Databases Per-Åke Larson1,

Contributions

1. Multi-version optimistic concurrency control – Multi-version: readers don’t block writers

– Optimistic: no waiting on database locks

– Supports all SQL isolation levels

2. Efficient mechanisms for implementing multi-version and single-version locking

3. Experimental evaluation: High performance (millions of TX/sec) and full serializability without workload-specific knowledge

3

Page 4: High-Performance Concurrency Control Mechanisms for Main …blanas.2/files/MainMemCC.pdf · High-Performance Concurrency Control Mechanisms for Main-Memory Databases Per-Åke Larson1,

Recent related work

• Concurrency control for shared-nothing DBMSs

– Open question: Performance in a shared-everything main memory environment

• Make existing DBMS storage engine scale:

– Locking, page latching, B-tree index, logging, …

• Exploit specific workload property:

– Partitionable workload

– Deterministic stored procedures

4

Our approach:

Redesign DBMS storage engine, make no assumption about workload

Our approach:

Redesign DBMS storage engine, make no assumption about workload

Page 5: High-Performance Concurrency Control Mechanisms for Main …blanas.2/files/MainMemCC.pdf · High-Performance Concurrency Control Mechanisms for Main-Memory Databases Per-Åke Larson1,

Designing a main memory storage engine

Traditional disk-oriented engine

• Disk-friendly data structures – Pages, B-tree index

• Absorbs high disk latency by frequent context switching

• Thread spins for latches

• TX may yield for locks

• Critical sections are thousands of instructions long, and limit scalability

Our main memory prototype

• Latch-free hash table stores individual records

• Minimizes context switching – Usually 1, at most 2 per TX

• Eliminates latches

• TX never waits for locks

• No critical sections – Many TXs finish in thousands

of instructions

5

Page 6: High-Performance Concurrency Control Mechanisms for Main …blanas.2/files/MainMemCC.pdf · High-Performance Concurrency Control Mechanisms for Main-Memory Databases Per-Åke Larson1,

Multi-version optimistic scheme

• TXs have two unique timestamps: BEGIN, END

• Read as of BEGIN timestamp

• Write as of END timestamp

6

1 2 3 4 5

Logical time

R W But not for Serializable But not for Serializable

Sufficient for Read Committed

Sufficient for Read Committed

BEGIN END

Snapshot Isolation (SI)

Page 7: High-Performance Concurrency Control Mechanisms for Main …blanas.2/files/MainMemCC.pdf · High-Performance Concurrency Control Mechanisms for Main-Memory Databases Per-Åke Larson1,

R

Making SI serializable

• Read as of BEGIN timestamp

• Repeat Read as of END timestamp, verify no change

• Write as of END timestamp

7

R W BEGIN END

1 2 3 4 5

Logical time

[Bornea et al, ICDE’11]

Page 8: High-Performance Concurrency Control Mechanisms for Main …blanas.2/files/MainMemCC.pdf · High-Performance Concurrency Control Mechanisms for Main-Memory Databases Per-Åke Larson1,

What needs to be repeated?

• Depends on the isolation level

• Read Committed or SI: No validation needed

– Versions were committed at BEGIN, will still be committed at END

• Repeatable Read: Read versions again

– Ensure no versions have disappeared from the view

• Serializable: Repeat scans with same predicate

– Ensure no phantoms have appeared in the view

8

Page 9: High-Performance Concurrency Control Mechanisms for Main …blanas.2/files/MainMemCC.pdf · High-Performance Concurrency Control Mechanisms for Main-Memory Databases Per-Åke Larson1,

Transaction states

9

Committed Committed

Active Active Validating Validating

Aborted Aborted

Get Begin Timestamp

Get End Timestamp

User abort or

WW conflict

Serializability violation

Log updates, wait for I/O

Committed Committed

Active Active Validating Validating

Read only transaction

Terminated Terminated Terminated Terminated

Postprocessing

Page 10: High-Performance Concurrency Control Mechanisms for Main …blanas.2/files/MainMemCC.pdf · High-Performance Concurrency Control Mechanisms for Main-Memory Databases Per-Åke Larson1,

Transaction map

• Stores transaction state, timestamps

• Globally visible

10

TXID STATE BEGIN END

5 ACTIV 2 N/A

TRANSACTION MAP

Page 11: High-Performance Concurrency Control Mechanisms for Main …blanas.2/files/MainMemCC.pdf · High-Performance Concurrency Control Mechanisms for Main-Memory Databases Per-Åke Larson1,

Determining version visibility

11

Visibility as of time T is determined by: version timestamps and TX state

Visibility as of time T is determined by: version timestamps and TX state

John John $100 $100 1 1 ∞ ∞

timestamp

8 bytes

, or transaction ID

TXID STATE BEGIN END

5 ACTIV 2 N/A TX5 TX5 TXID STATE BEGIN END

5 ACTIV 2 N/A 1 1 TX5 TX5

TRANSACTION MAP

Page 12: High-Performance Concurrency Control Mechanisms for Main …blanas.2/files/MainMemCC.pdf · High-Performance Concurrency Control Mechanisms for Main-Memory Databases Per-Åke Larson1,

Example: Update to $150

12

John John $100 $100 1 1 ∞ ∞ TXID STATE BEGIN END

TRANSACTION MAP

John John $150 $150 TX5 TX5 ∞ ∞

Active Active Validating Validating

Get Begin Timestamp

Get End Timestamp

Active Active Validating Validating

Committed Committed Committed Committed

5 N/A N/A N/A 5 N/A 2 N/A 5 ACTIV 2 N/A 5 ACTIV 2 N/A 5 ACTIV 2 4 5 VALID 2 4 5 COM 2 4 5 COM 2 4

TX5 TX5 TX5 TX5

4 4

TX5 TX5

4 4

4 4 4 4

Log updates, wait for I/O

Terminated Terminated Terminated Terminated

Postprocessing

Page 13: High-Performance Concurrency Control Mechanisms for Main …blanas.2/files/MainMemCC.pdf · High-Performance Concurrency Control Mechanisms for Main-Memory Databases Per-Åke Larson1,

John John $150 $150 TX5 TX5 ∞ ∞

WW conflicts

13

John John $100 $100 1 1 ∞ ∞

TX5 updates $100 to $150

TX2 updates $100 to $75

TX5 TX5 TX2 TX2

CAS CAS

∞ ∞ TX5 TX5 TX5 TX5

TX2 chooses to abort

TX2 chooses to abort

8 bytes

TX5 TX5 TX5 TX5

First writer wins First writer wins

Page 14: High-Performance Concurrency Control Mechanisms for Main …blanas.2/files/MainMemCC.pdf · High-Performance Concurrency Control Mechanisms for Main-Memory Databases Per-Åke Larson1,

WR conflicts

TX5 State Visible?

ACTIVE

VALIDATING

COMMITTED

ABORTED

14

John John $150 $150 TX5 TX5 ∞ ∞ Q: When is version visible?

No, version is uncommitted

Maybe, check TX5 END timestamp

No, version is garbage

? Speculate YES now, confirm at end

A: Depends on TX state

Page 15: High-Performance Concurrency Control Mechanisms for Main …blanas.2/files/MainMemCC.pdf · High-Performance Concurrency Control Mechanisms for Main-Memory Databases Per-Åke Larson1,

Commit dependencies

• Impose constraint on serialization order: Commit B only if A has committed.

• Implementation: register-and-signal

– Transform multiple waits on every record access to a single wait at end of TX

– Dependency wait time “added” to log latency

• Most common: no wait needed, dependency has cleared

• But: Cascading aborts now possible

15

Page 16: High-Performance Concurrency Control Mechanisms for Main …blanas.2/files/MainMemCC.pdf · High-Performance Concurrency Control Mechanisms for Main-Memory Databases Per-Åke Larson1,

Commit dependencies

16

Committed Committed

Active Active Validating Validating

Aborted Aborted

Get Begin Timestamp

Get End Timestamp

User abort or

WW conflict

Serializability violation

Log updates, wait for I/O

Read only transaction

Terminated Terminated

Postprocessing Release dependents

Wait for dependencies to clear, then

Page 17: High-Performance Concurrency Control Mechanisms for Main …blanas.2/files/MainMemCC.pdf · High-Performance Concurrency Control Mechanisms for Main-Memory Databases Per-Åke Larson1,

Multi-version optimistic summary

• TXs never wait during the ACTIVE phase

• No deadlock detection is needed

• Lower isolation level = less work

– Read Committed and SI: No validation at all

17

Page 18: High-Performance Concurrency Control Mechanisms for Main …blanas.2/files/MainMemCC.pdf · High-Performance Concurrency Control Mechanisms for Main-Memory Databases Per-Åke Larson1,

Multi-version locking

• Provides lock-like semantics: Once a version is read by T, it will remain visible to T until commit.

• No centralized lock table – Record lock embedded in version’s END timestamp

• Same context switching overhead: At most 2 per TX

But:

• Deadlock detection necessary

• More write traffic, even readers write to memory

18

Page 19: High-Performance Concurrency Control Mechanisms for Main …blanas.2/files/MainMemCC.pdf · High-Performance Concurrency Control Mechanisms for Main-Memory Databases Per-Åke Larson1,

Implementation details

• Independent transaction kernel in C++

• Base data structure: latch-free hash table

– Perfect sizing, perfect hashing

– Load factor when idle: 1

19

Page 20: High-Performance Concurrency Control Mechanisms for Main …blanas.2/files/MainMemCC.pdf · High-Performance Concurrency Control Mechanisms for Main-Memory Databases Per-Åke Larson1,

Single-version two-phase locking

• Traditional 2PL, optimized for main memory

• No central lock manager

• Lock is pre-allocated in hash table bucket

– Protects hash bucket, prevents phantoms

– Multiple-reader, single-writer lock

– For our experiments, also serves as a record lock

20

Page 21: High-Performance Concurrency Control Mechanisms for Main …blanas.2/files/MainMemCC.pdf · High-Performance Concurrency Control Mechanisms for Main-Memory Databases Per-Åke Larson1,

Experimental setup

• 2-socket × 6-core Xeon X5650 with 48GB RAM

• TXs don’t wait for the log (lazy commit)

– Log records are populated and written to disk

• All transactions run under Serializable

21

MV/O Multi-version optimistic

MV/L Multi-version locking

1V Single-version two-phase locking

Page 22: High-Performance Concurrency Control Mechanisms for Main …blanas.2/files/MainMemCC.pdf · High-Performance Concurrency Control Mechanisms for Main-Memory Databases Per-Åke Larson1,

Scalability No contention (10M row table)

22

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

0 6 12 18 24

Thro

ugh

pu

t (t

x/se

c)

Mill

ion

s

Threads

80% R=10 20% R=10, W=2 80% R=10 20% R=10, W=2

Page 23: High-Performance Concurrency Control Mechanisms for Main …blanas.2/files/MainMemCC.pdf · High-Performance Concurrency Control Mechanisms for Main-Memory Databases Per-Åke Larson1,

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

0 6 12 18 24

Thro

ugh

pu

t (t

x/se

c)

Mill

ion

s

Threads

1V MV/L MV/O

Scalability No contention (10M row table)

23

All methods scale All methods scale

80% R=10 20% R=10, W=2 80% R=10 20% R=10, W=2

Similar results for TATP Similar results for TATP

Page 24: High-Performance Concurrency Control Mechanisms for Main …blanas.2/files/MainMemCC.pdf · High-Performance Concurrency Control Mechanisms for Main-Memory Databases Per-Åke Larson1,

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

0 6 12 18 24

Thro

ugh

pu

t (t

x/se

c)

Mill

ion

s

Threads

1V MV/L MV/O

Scalability Extreme contention (1000 row table)

24

80% R=10 20% R=10, W=2 80% R=10 20% R=10, W=2

1V throughput limited by lock thrashing

1V throughput limited by lock thrashing

Page 25: High-Performance Concurrency Control Mechanisms for Main …blanas.2/files/MainMemCC.pdf · High-Performance Concurrency Control Mechanisms for Main-Memory Databases Per-Åke Larson1,

Effect of long readers (10M row table)

25

0.0

0.5

1.0

1.5

2.0

2.5

0 2 4 6 8 10 12 14 16 18 20 22 24

Up

dat

e t

hro

ugh

pu

t (t

x/se

c)

Mill

ion

s

Active long read TXs All active TXs short updaters All active TXs

short updaters All active TXs long readers All active TXs long readers

6 TXs long readers 18 TXs short updaters

6 TXs long readers 18 TXs short updaters

R=1,000,000 R=10, W=2 R=1,000,000 R=10, W=2

Page 26: High-Performance Concurrency Control Mechanisms for Main …blanas.2/files/MainMemCC.pdf · High-Performance Concurrency Control Mechanisms for Main-Memory Databases Per-Åke Larson1,

26

0.0

0.5

1.0

1.5

2.0

2.5

0 2 4 6 8 10 12 14 16 18 20 22 24

Up

dat

e t

hro

ugh

pu

t (t

x/se

c)

Mill

ion

s

Active long read TXs

1V MV/L MV/OR=1,000,000 R=10, W=2 R=1,000,000 R=10, W=2

Even if 1 long reader, MV/O 2.3× faster Even if 1 long reader, MV/O 2.3× faster

2.3× 2.3×

If all TXs do updates, 1V 1.9× faster If all TXs do updates, 1V 1.9× faster

25×

Effect of long readers (10M row table)

26

All active TXs short updaters All active TXs

short updaters All active TXs long readers All active TXs long readers

Page 27: High-Performance Concurrency Control Mechanisms for Main …blanas.2/files/MainMemCC.pdf · High-Performance Concurrency Control Mechanisms for Main-Memory Databases Per-Åke Larson1,

Conclusions

• Single-version 2PL is fragile

– Great for update-heavy workloads, little contention

– But: problematic for hotspots, long read TXs

• Multi-version optimistic scheme is robust

– Readers don’t block writers, no waiting on locks

• Locking semantics can be offered efficiently

• High performance and full serializability without workload-specific knowledge

27