Predicting Replicated Database Scalability
Sameh Elnikety, Microsoft Research Steven Dropsho, Google Inc.Emmanuel Cecchet, Univ. of Mass.Willy Zwaenepoel, EPFL
• Environment– E-commerce website– DB throughput is 500 tps
• Is 5000 tps achievable?– Yes: use 10 replicas– Yes: use 16 replicas – No: faster machines needed
• How tx workload scales on replicated db?
Motivation
SingleDBMS
2
Multi-Master Single-Master
Replica 2
Replica 1
Replica 3
3
Slave 1
Master
Slave 2
Background: Multi-Master
Replica 2
Replica 1
Replica 3
StandaloneDBMS
Load Balancer
4
Read Tx
Replica 2
Replica 1
Replica 3
Load Balancer
T
5
Read tx does not change DB state
Update Tx
Replica 2
Replica 1
Replica 3
CertLoad
BalancerT
ws wswsws
6
Update tx changesDB state
Additional Replica
Replica 2
Replica 1
Replica 3
Load Balancer T ws
Replica 3
7
Replica 4
Certwsws
• Standalone DBMS– Service demands
• Multi-master system– Service demands– Queuing model
• Experimental validation
Coming Up …
8
• Required– readonly tx: R – update tx: W
• Transaction load– readonly tx: R – update tx: W / (1 - A1)
Standalone DBMSSingleDBMS
Abort probability is A1 Submit W / (1 - A1) update tx
Commited tx: WAborted tx: W ∙ A1 / (1- A1) 9
Standalone DBMSSingleDBMS
1
(1)(1 )
WLoad R rc wc
A
10
• Required– readonly tx: R – update tx: W
• Transaction load– readonly tx: R – update tx: W / (1 - A1)
Service Demand
1
(1)(1 )
WLoad R rc wc
A
1
(1)(1 )
PwD Pr rc wc
A
11
• Required (whole system of N replicas)– Readonly tx: N ∙ R – Update tx: N ∙ W
• Transaction load per replica– Readonly tx: R – Update tx: W / (1 - AN) – Writeset: W ∙ (N - 1)
Multi-Master with N Replicas
( 1)(1 )
( )N
MMW
R rc wc W N wsA
Load N
12
MM Service Demand
( 1)(1 )
( )N
MMW
R rc wc W N wsA
Load N
( )(1 )
1)N
MMPw
N Pr rc wc Pw wsA
D N
13Explosive cost!
Compare: Standalone vs MM
( )(1 )
1)N
MMPw
N Pr rc wc Pw wsA
D N
Explosive cost!
1
(1)(1 )
PwD Pr rc wc
A
14
• Standalone:
• Multi-Master:
Readonly Workload
( )(1 )
1)N
MMPw
N Pr rc wc Pw wsA
D N
Explosive cost!
1
(1)(1 )
PwD Pr rc wc
A
15
• Standalone:
• Multi-Master:
Update Workload
( )(1 )
1)N
MMPw
N Pr rc wc Pw wsA
D N
Explosive cost!
1
(1)(1 )
PwD Pr rc wc
A
16
• Standalone:
• Multi-Master:
Closed-Loop Queuing Model
Replica i
LB
LB
LB
...CPU
Disk
TT
TT
TT
Cert
Cert
Cert
Think time
Load balancer
& network
delay
Certifier delay
Pw...
...
N replicas
17
• Standard algorithm• Iterates over the number of clients• Inputs:
– Number of clients– Service demand at service centers– Delay time at delay centers
• Outputs:– Response time– Throughput
Mean Value Analysis (MVA)
18
Using the Model
Replica i
LB
LB
LB
...CPU
Disk
TT
TT
TT
Cert
Cert
Cert
Think time
Load balancer
& network
delay
Certifier delay
Pw...
...
N replicas
19
• Copy of database• Log all txs, (Pr : Pw)• Python script replays txs
– Readonly (rc)– Updates (wc)
• Writesets– Instrument db with triggers– Play txs to log writesets– Play writesets (ws)
Standalone Profiling (Offline)
20
MM Service Demand
( )(1 )
1)N
MMPw
N Pr rc wc Pw wsA
D N
21Explosive cost!
Abort Probability
( )
(1)1(1 ) (1 )
CW N
LN
NA A
• Predicting abort probability is hard• Single-master
– No prediction needed – Measure offline on master
• Multi-master– Approximate using
– Sensitivity analysis in the paper
22
Using the Model
Replica i
LB
LB
LB
...CPU
Disk
TT
TT
TT
Cert
Cert
Cert
Think time
Load balancer
& network
delay
Certifier delay
Pw...
...
N replicas
# clients, think time
1.5 ∙ fsync()
1 ms
23
• Compare– Measured performance vs model predictions
• Environment– Linux cluster running PostgreSQL
• TPC-W workload– Browsing (5% update txs)– Shopping (20% update txs)– Ordering (50% update txs)
• RUBiS workload– Browsing (0% update txs)– Bidding (20% update txs)
Experimental Validation
24
Multi-Master TPC-W Performance Throughput Response time
25
26
Browsing, 5% u
15.7 X
Ordering, 50% u6.7 X15%
Multi-Master RUBiS Performance Throughput Response time
27
28
Browsing, 0% u
16 X
bidding, 20% u
3.4 X
• Database system– Snapshot isolation– No hotspots– Low abort rates
• Server system– Scalable server (no thrashing)
• Queuing model & MVA– Exponential distribution for service demands
Model Assumptions
29
• Models– Single-Master– Multi-Master
• Experimental results– TPC-W– RUBiS
• Sensitivity analysis– Abort rates– Certifier delay
Checkout the Paper
30
Urgaonkar, Pacifici, Shenoy, Spreitzer, Tantawi. “An analytical model for multi-tier internet services
and its applications.” Sigmetrics 2005.
Related Work
31
• Derived an analytical model– Predicts workload scalability
• Implemented replicated systems– Multi-master– Single-master
• Experimental validation– TPC-W– RUBiS– Throughput predictions match within 15%
Conclusions
32
• Questions?
Danke Schön!
33
Predicting Replicated Database Scalability
Top Related