Download - @ Carnegie Mellon Databases Data-oriented Transaction Execution VLDB 2010 Ippokratis Pandis Ryan Johnson Nikos Hardavellas Anastasia Ailamaki Carnegie.

@ Carnegie MellonDatabases

Data-oriented Transaction Execution

VLDB 2010

Ippokratis Pandis†‡ Ryan Johnson†‡ Nikos Hardavellas Anastasia Ailamaki †‡

†Carnegie Mellon University ‡École Polytechnique Fédérale de Lausanne Northwestern University

© 2010 Ippokratis Pandis 2

Scaling OLTP on multicores/sockets• Shared Everything

– Hard to scale

Multicore multisocket machine

Core Core Core Core

Core Core Core Core

CPU

Core Core Core Core

Core Core Core Core

CPU


Scaling OLTP on multicores/sockets• Shared Everything

– Hard to scale

• Shared Nothing– Multiple processes, physically

separated data– Explicit contention control– Perfectly partition-able

workload– Memory pressure: redundant

data/structures

Multicore multisocket machine

Core Core Core Core

Core Core Core Core

CPU

Core Core Core Core

Core Core Core Core

CPU

Two approaches complementaryFocus on scalability of a single (shared everything) instance


Shore-MT• Multithreaded version of SHORE

• State-of-the-art DBMS features• Two-phase row-level locking • ARIES-style logging/recovery

• Similar at instruction-level with commercial DBMSs 0 4 8 12 16 20 24 28 32

0

200

400

600

800Execution Time

MySqlPostgresDBMS "X"BDBShore-MT

# HW Contexts

Tim

e (s

econ

ds)

High-performing, scalable conventional engineAvailable at: http://www.cs.wisc.edu/shore-mt/

Sun Niagara T1Insert-only mbench


1 2 4 8 16 32 640

1000

2000

3000

4000 BPOOL

LOCK-MAN

LOG-MAN

DATA MOVEMENT

OTHER

# HW Contexts

Tim

e br

eakd

own

(cpu

sec

s)

Problems on Even Higher ParallelismSun Niagara T2TPC-C Payment

1 2 4 8 16 32 640

200

400

600

800

1000

1200

# HW Contexts

Thro

ughp

ut /

# H

W C

onte

xts

Lock manager overhead dominantContention for acquiring locks in compatible mode

Linear

© 2010 Ippokratis Pandis

Data-oriented Transaction Execution• It is not the transaction which dictates what

data the transaction-executing thread will access

• Break each transaction into smaller actions– Depending on the data they touch

• Execute actions by “data-owning” threads• Distribute and privatize locking, data accesses

across threads

New transaction execution modelReduce overhead of locking and data accesses 6


DORA vs. Conventional – At 100% CPU

Eliminate contention on the centr. lock managerSignificantly reduced work (lightweight locks)

Baseline Dora0%

10%20%30%40%50%60%70%80%90%

100%

TM1

Tim

e Br

eakd

own

(%)

Baseline Dora0%

10%20%30%40%50%60%70%80%90%

100%

TPC-C OrderStatus

Dora

Lock Manager Cont

Lock Manager

Other Cont

Work

Sun Niagara T264 HW contexts


Roadmap• Introduction• Conventional execution• Data-oriented execution• Evaluation• Conclusions


Example

CPU-0

I D

CPU

L1

CPU-1

I D

CPU-2

I D

L2

MEMI/O

WH CUST ORD

Transaction:I Du(wh)u(cust)u(ord)

I = InstructionD = Data


Conventional - Example

CPU-0 CPU

L1

CPU-1 CPU-2

L2

MEMI/O

WH CUST ORD


u(wh)u(cust)u(ord)


Typical Lock Manager

The higher the HW parallelism Longer Queues of Requests Longer CSs Higher Contention

L1 EX

EXL2 EX

T1

Lock HeadLock Hash Table

Queue Lock Requests

Xct’s Lock Requests

11


2 15 40 57 68 76 83100

0%

20%

40%

60%

80%

100%

LM Release ContLM ReleaseLM Acquire ContLM Acquire

CPU Util. (%)

Tim

e Br

eakd

own

(%)

Time Inside the lock manager

Latch contention on the lock manager

Sun Niagara T2TPC-B

12


Thread-to-transaction – Access Pattern

200000 300000 400000 500000 600000 700000 800000 9000000

20

40

60

80

100TPC-C Payment

Time (secs)

DIS

TRIC

T Re

cord

s

13

Unpredictable access patternSource of contention


Thread-to-data – Access Pattern

Predictable access patternsOptimizations possible (e.g. no centralized locks)

200000 300000 400000 500000 600000 700000 800000 9000000

20

40

60

80

100TPC-C Payment

Time (secs)

DIS

TRIC

T re

cord

s


Transaction Flow Graph• Each transaction input is a

graph of Actions & RVPs• Actions

– Table/Index it is accessing– Subset of routing fields

• Rendezvous Points – Decision points (commit/abort)– Separate different phases– Counter of the # of actions to report– Last to report initiates next phase– Enqueue the actions of the next phase

TPC-C Payment

Upd(WH) Upd(DI) Upd(CU)

Ins(HI)

Phase 1

Phase 2


Partitions & Executors• Routing table at each table

– {Routing fields executor}

• Executor thread– Local lock table

• {RoutingFields + partof(PK), LockMode}• List of blocked actions

– Input queue• New actions

– Completed queue• On xct commit/abort• Remove from local lock table

– Loop completed/input queue– Execute requests in serial order

Completed

Input

Local Lock Table

Pref LM Own Wait

AAB

A{1,0} EX A

{1,3} EX B

A

17

Routing fields: {WH_ID, D_ID}

Range Executor

{0,0}-{5,2} 1

{5,2}-{6,1} 2


DORA - Example

CPU-0 CPU

L1

CPU-1 CPU-2

L2

MEMI/O

WH CUST ORD

u(wh) u(cust) u(ord)


Centralized lock freeImproved data reuse


Non-partition aligned accesses• Secondary index probes without part of the

primary key in their definition• Each entry stores the part of the primary key

needed to decide routing, along with the RID• Do the probe with the RVP-executing thread• Enqueue a new action to the corresponding

thread• Do the record retrieve with the RID


8 16 24 32 460

1000

2000

3000

4000

5000

6000Other ContLogMgr ContLockMgr ContUseful

# HW Ctxs8 16 24 32 46 58

0

1000

2000

3000

4000

5000

6000Other Cont

LogMgr Cont

LockMgr Cont

Useful

# HW Ctxs

Tim

e Br

eakd

own

(cpu

sec

s)

0 20 40 60 80 100 1200

10

20

30

40

50

60

70

80DORABASE-DLOG

Real CPU Load (%)

Thro

ughp

ut

Sun Niagara T2TM1

BASELINE

DORA

Baseline DORAEliminates contention on the lock managerLinear scalability to 64 HW ctxsImmune to oversaturation

Eliminating contention on the lock mgr


TM1-G

etNewDest

TM1-U

pdSubData

TM1-Call

FwdM

ix

TPCC-NewOrd

er

TPCC-Payment

TPC-B0

0.2

0.4

0.6

0.8

1

1.2

BaselineDORA

Nor

m. R

espo

nse

Tim

eResponse time for single client

better

Exploits intra-xct parallelismLower response times on low loads


Peak Performance

better

TM1-M

IX

TM1-G

etSubData

TM1-G

etNew

Dest

TM1-G

etAccD

ata

TM1-U

pdSubData

TM1-U

pdLoca

tion

TM1-Call

FwdMix

TPCC-Pay

ment

TPCC-N

ewOrder

TPCC-O

rderStat

usTP

C-B0

0.5

1

1.5

2

74%

98%

66%

100%

78%

100%

60%

100%

85%

95%

76%

95%

88%

95%

90%

90%

70%

70%

95%

100%

85%

88%

Baseline DORA

Nor

m. P

eak

Thro

ughp

ut


Peak Performance

better

TM1-M

IX

TM1-G

etSubData

TM1-G

etNew

Dest

TM1-G

etAccD

ata

TM1-U

pdSubData

TM1-U

pdLoca

tion

TM1-Call

FwdMix

TPCC-Pay

ment

TPCC-N

ewOrder

TPCC-O

rderStat

usTP

C-B0

0.5

1

1.5

2

Baseline Baseline-Ideal DORA DORA-Ideal

Nor

m. P

eak

Thro

ughp

ut

Higher peak performanceAlways close to 100% CPU utilization


DORA vs. Conventional

0 20 40 600

500010000150002000025000300003500040000

TPC-C Payment

DORABASELINE

Clients

Thro

ughp

ut (K

tps)

Avoid expensive (centralized) lock

manager operations

20%

Immune to centr. lock manager

40+%

Intra-xactionparallelism on

light loads

2x

Higher performance in the entire load spectrum


Reducing overheads of locking• Rdb/VMS

– Distributed DBMS– Lock “carry-over” reduces network traffic

• DB2 “keep table lock” setting– Connection holds all table locks until close– Leads to “poor concurrency”

• H-Store– Single-threaded, non-interleaved execution model– No locking or latching at all


Summary• Large number of active threads stress scalability

of database system• Data-oriented transaction execution (DORA)• Decentralize locking & data accesses• Benefits of shared-nothing w/o physical data

partitioning• Small modifications on a conventional engine• Higher performance on the entire load spectrum