@ Carnegie MellonDatabases
Data-oriented Transaction Execution
VLDB 2010
Ippokratis Pandis†‡ Ryan Johnson†‡ Nikos Hardavellas Anastasia Ailamaki †‡
†Carnegie Mellon University ‡École Polytechnique Fédérale de Lausanne Northwestern University
© 2010 Ippokratis Pandis 2
Scaling OLTP on multicores/sockets• Shared Everything
– Hard to scale
Multicore multisocket machine
Core Core Core Core
Core Core Core Core
CPU
Core Core Core Core
Core Core Core Core
CPU
© 2010 Ippokratis Pandis 3
Scaling OLTP on multicores/sockets• Shared Everything
– Hard to scale
• Shared Nothing– Multiple processes, physically
separated data– Explicit contention control– Perfectly partition-able
workload– Memory pressure: redundant
data/structures
Multicore multisocket machine
Core Core Core Core
Core Core Core Core
CPU
Core Core Core Core
Core Core Core Core
CPU
Two approaches complementaryFocus on scalability of a single (shared everything) instance
© 2010 Ippokratis Pandis 4
Shore-MT• Multithreaded version of SHORE
• State-of-the-art DBMS features• Two-phase row-level locking • ARIES-style logging/recovery
• Similar at instruction-level with commercial DBMSs 0 4 8 12 16 20 24 28 32
0
200
400
600
800Execution Time
MySqlPostgresDBMS "X"BDBShore-MT
# HW Contexts
Tim
e (s
econ
ds)
High-performing, scalable conventional engineAvailable at: http://www.cs.wisc.edu/shore-mt/
Sun Niagara T1Insert-only mbench
© 2010 Ippokratis Pandis 5
1 2 4 8 16 32 640
1000
2000
3000
4000 BPOOL
LOCK-MAN
LOG-MAN
DATA MOVEMENT
OTHER
# HW Contexts
Tim
e br
eakd
own
(cpu
sec
s)
Problems on Even Higher ParallelismSun Niagara T2TPC-C Payment
1 2 4 8 16 32 640
200
400
600
800
1000
1200
# HW Contexts
Thro
ughp
ut /
# H
W C
onte
xts
Lock manager overhead dominantContention for acquiring locks in compatible mode
Linear
© 2010 Ippokratis Pandis
Data-oriented Transaction Execution• It is not the transaction which dictates what
data the transaction-executing thread will access
• Break each transaction into smaller actions– Depending on the data they touch
• Execute actions by “data-owning” threads• Distribute and privatize locking, data accesses
across threads
New transaction execution modelReduce overhead of locking and data accesses 6
© 2010 Ippokratis Pandis 7
DORA vs. Conventional – At 100% CPU
Eliminate contention on the centr. lock managerSignificantly reduced work (lightweight locks)
Baseline Dora0%
10%20%30%40%50%60%70%80%90%
100%
TM1
Tim
e Br
eakd
own
(%)
Baseline Dora0%
10%20%30%40%50%60%70%80%90%
100%
TPC-C OrderStatus
Dora
Lock Manager Cont
Lock Manager
Other Cont
Work
Sun Niagara T264 HW contexts
© 2010 Ippokratis Pandis 8
Roadmap• Introduction• Conventional execution• Data-oriented execution• Evaluation• Conclusions
© 2010 Ippokratis Pandis 9
Example
CPU-0
I D
CPU
L1
CPU-1
I D
CPU-2
I D
L2
MEMI/O
WH CUST ORD
Transaction:I Du(wh)u(cust)u(ord)
I = InstructionD = Data
© 2010 Ippokratis Pandis 10
Conventional - Example
CPU-0 CPU
L1
CPU-1 CPU-2
L2
MEMI/O
WH CUST ORD
Transaction:I Du(wh)u(cust)u(ord)
u(wh)u(cust)u(ord)
© 2010 Ippokratis Pandis
Typical Lock Manager
The higher the HW parallelism Longer Queues of Requests Longer CSs Higher Contention
L1 EX
EXL2 EX
T1
Lock HeadLock Hash Table
Queue Lock Requests
Xct’s Lock Requests
11
© 2010 Ippokratis Pandis
2 15 40 57 68 76 83100
0%
20%
40%
60%
80%
100%
LM Release ContLM ReleaseLM Acquire ContLM Acquire
CPU Util. (%)
Tim
e Br
eakd
own
(%)
Time Inside the lock manager
Latch contention on the lock manager
Sun Niagara T2TPC-B
12
© 2010 Ippokratis Pandis
Thread-to-transaction – Access Pattern
200000 300000 400000 500000 600000 700000 800000 9000000
20
40
60
80
100TPC-C Payment
Time (secs)
DIS
TRIC
T Re
cord
s
13
Unpredictable access patternSource of contention
© 2010 Ippokratis Pandis 14
Roadmap• Introduction• Conventional execution• Data-oriented execution• Evaluation• Conclusions
© 2010 Ippokratis Pandis 15
Thread-to-data – Access Pattern
Predictable access patternsOptimizations possible (e.g. no centralized locks)
200000 300000 400000 500000 600000 700000 800000 9000000
20
40
60
80
100TPC-C Payment
Time (secs)
DIS
TRIC
T re
cord
s
© 2010 Ippokratis Pandis 16
Transaction Flow Graph• Each transaction input is a
graph of Actions & RVPs• Actions
– Table/Index it is accessing– Subset of routing fields
• Rendezvous Points – Decision points (commit/abort)– Separate different phases– Counter of the # of actions to report– Last to report initiates next phase– Enqueue the actions of the next phase
TPC-C Payment
Upd(WH) Upd(DI) Upd(CU)
Ins(HI)
Phase 1
Phase 2
© 2010 Ippokratis Pandis
Partitions & Executors• Routing table at each table
– {Routing fields executor}
• Executor thread– Local lock table
• {RoutingFields + partof(PK), LockMode}• List of blocked actions
– Input queue• New actions
– Completed queue• On xct commit/abort• Remove from local lock table
– Loop completed/input queue– Execute requests in serial order
Completed
Input
Local Lock Table
Pref LM Own Wait
AAB
A{1,0} EX A
{1,3} EX B
A
17
Routing fields: {WH_ID, D_ID}
Range Executor
{0,0}-{5,2} 1
{5,2}-{6,1} 2
© 2010 Ippokratis Pandis 18
DORA - Example
CPU-0 CPU
L1
CPU-1 CPU-2
L2
MEMI/O
WH CUST ORD
u(wh) u(cust) u(ord)
Transaction:I Du(wh)u(cust)u(ord)
Centralized lock freeImproved data reuse
© 2010 Ippokratis Pandis 19
Non-partition aligned accesses• Secondary index probes without part of the
primary key in their definition• Each entry stores the part of the primary key
needed to decide routing, along with the RID• Do the probe with the RVP-executing thread• Enqueue a new action to the corresponding
thread• Do the record retrieve with the RID
© 2010 Ippokratis Pandis 20
Roadmap• Introduction• Conventional execution• Data-oriented execution• Evaluation• Conclusions
© 2010 Ippokratis Pandis 21
8 16 24 32 460
1000
2000
3000
4000
5000
6000Other ContLogMgr ContLockMgr ContUseful
# HW Ctxs8 16 24 32 46 58
0
1000
2000
3000
4000
5000
6000Other Cont
LogMgr Cont
LockMgr Cont
Useful
# HW Ctxs
Tim
e Br
eakd
own
(cpu
sec
s)
0 20 40 60 80 100 1200
10
20
30
40
50
60
70
80DORABASE-DLOG
Real CPU Load (%)
Thro
ughp
ut
Sun Niagara T2TM1
BASELINE
DORA
Baseline DORAEliminates contention on the lock managerLinear scalability to 64 HW ctxsImmune to oversaturation
Eliminating contention on the lock mgr
© 2010 Ippokratis Pandis 22
TM1-G
etNewDest
TM1-U
pdSubData
TM1-Call
FwdM
ix
TPCC-NewOrd
er
TPCC-Payment
TPC-B0
0.2
0.4
0.6
0.8
1
1.2
BaselineDORA
Nor
m. R
espo
nse
Tim
eResponse time for single client
better
Exploits intra-xct parallelismLower response times on low loads
© 2010 Ippokratis Pandis 23
Peak Performance
better
TM1-M
IX
TM1-G
etSubData
TM1-G
etNew
Dest
TM1-G
etAccD
ata
TM1-U
pdSubData
TM1-U
pdLoca
tion
TM1-Call
FwdMix
TPCC-Pay
ment
TPCC-N
ewOrder
TPCC-O
rderStat
usTP
C-B0
0.5
1
1.5
2
74%
98%
66%
100%
78%
100%
60%
100%
85%
95%
76%
95%
88%
95%
90%
90%
70%
70%
95%
100%
85%
88%
Baseline DORA
Nor
m. P
eak
Thro
ughp
ut
© 2010 Ippokratis Pandis 24
Peak Performance
better
TM1-M
IX
TM1-G
etSubData
TM1-G
etNew
Dest
TM1-G
etAccD
ata
TM1-U
pdSubData
TM1-U
pdLoca
tion
TM1-Call
FwdMix
TPCC-Pay
ment
TPCC-N
ewOrder
TPCC-O
rderStat
usTP
C-B0
0.5
1
1.5
2
Baseline Baseline-Ideal DORA DORA-Ideal
Nor
m. P
eak
Thro
ughp
ut
Higher peak performanceAlways close to 100% CPU utilization
© 2010 Ippokratis Pandis 25
Roadmap• Introduction• Conventional execution• Data-oriented execution• Evaluation• Conclusions
© 2010 Ippokratis Pandis 26
DORA vs. Conventional
0 20 40 600
500010000150002000025000300003500040000
TPC-C Payment
DORABASELINE
Clients
Thro
ughp
ut (K
tps)
Avoid expensive (centralized) lock
manager operations
20%
Immune to centr. lock manager
40+%
Intra-xactionparallelism on
light loads
2x
Higher performance in the entire load spectrum
© 2010 Ippokratis Pandis 27
Reducing overheads of locking• Rdb/VMS
– Distributed DBMS– Lock “carry-over” reduces network traffic
• DB2 “keep table lock” setting– Connection holds all table locks until close– Leads to “poor concurrency”
• H-Store– Single-threaded, non-interleaved execution model– No locking or latching at all
© 2010 Ippokratis Pandis 28
Summary• Large number of active threads stress scalability
of database system• Data-oriented transaction execution (DORA)• Decentralize locking & data accesses• Benefits of shared-nothing w/o physical data
partitioning• Small modifications on a conventional engine• Higher performance on the entire load spectrum
Top Related