Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd,...

35
Gryff: Unifying Consensus and Shared Registers Matthew Burke Audrey Cheng Wyatt Lloyd Cornell University Princeton University 1

Transcript of Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd,...

Page 1: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

Gryff: Unifying Consensus and Shared Registers

Matthew Burke Audrey Cheng Wyatt Lloyd

Cornell University Princeton University

1

Page 2: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

Applications Rely on Geo-Replicated Storage• Fault tolerant: data is safe despite failures

2

Client

Page 3: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

Applications Rely on Geo-Replicated Storage• Fault tolerant: data is safe despite failures

• Linearizable: intuitive for application developers

3

Page 4: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

Linearizable Replicated Storage Systems

4

Cloud Spanner

Page 5: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

Status Quo: Consensus or Shared Registers

Consensus Shared Registers

Strong Synchronization ✔ ✘Low ReadTail Latency ✘ ✔

5

• Given the desire for fault tolerance and linearizability

Unify consensus and shared registers?

Page 6: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

Consensus & State Machine Replication (SMR)• Generic interface: Command(c(.))

• Stable ordering: all preceding log positions are assigned commands

6

c1 c2 c3 c4

Page 7: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

Consensus & State Machine Replication (SMR)• Generic interface: Command(c(.))

• Stable ordering: all preceding log positions are assigned commands

• Used in etcd, CockroachDB, Spanner, Azure Storage, Chubby

6

c1 c2 c3 c4

Page 8: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

SMR Requires Stable Order• Allow for strong synchronization primitives like read-modify-writes

• High tail latency in practice (e.g., by serializing through a leader)

7

Consensus

Strong Synchronization ✔Low ReadTail Latency ✘

Page 9: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

Shared Registers• Simple interface: Read()/Write(v)

• Unstable ordering: total order without pre-defined positions

8

w1 < w2 < w3 < w4

Page 10: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

Shared Registers• Simple interface: Read()/Write(v)

• Unstable ordering: total order without pre-defined positions

8

w1 < w2 < w3 < w4

w5

Page 11: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

Shared Registers• Simple interface: Read()/Write(v)

• Unstable ordering: total order without pre-defined positions

• Similar to Cassandra, Dynamo, Riak

8

w1 < w2 < w3 < w4w5 <

Page 12: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

Shared Registers Use Unstable Order• Cannot implement strong synchronization primitives [Herlihy91]

• Flexibility of unstable order provides favorable tail latency

9

Consensus Shared Registers

Strong Synchronization ✔ ✘Low ReadTail Latency ✘ ✔

Page 13: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

RMWs with low read tail latency?

Shared Objects: Interface for Unification• Interface: Read()/Write(v)/RMW(f(.))

• RMW(f(.))→ read base v, compute new value f(v), write f(v)

• Examples: etcd, Redis, BigTable

10

Page 14: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

Consensus-after-Register Timestamps (Carstamps)

11

rmw1 rmw3 rmw4

rmw2

Unstable Order

StableOrder w1 w2 w3 w4< < <

Page 15: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

Consensus-after-Register Timestamps (Carstamps)

11

rmw1 rmw3 rmw4

rmw2

Unstable Order

StableOrder w1 w2 w3 w4w5 << < < w6<

Page 16: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

Carstamps• Tuple with three fields: (ts, id, rmwc)

• ts and id basis for unstable ordering of writes

• rmwc is set to 1 greater than rmwc of base to ensure stable ordering

12

w1 < w2

rmw1 rmw3

(3,1,0)

(3,1,1)

(4,1,0)

(4,1,1)

rmw2(3,1,2)

Page 17: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

Consensus Shared Registers

Gryff

Strong Synchronization ✔ ✘ ✔Low ReadTail Latency ✘ ✔ ✔

Gryff Unifies Consensus and Shared Registers• Only uses consensus when necessary, for strong synchronization

13

Page 18: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

Gryff Design• Combine multi-writer [LS97] ABD [ABD95] & EPaxos [MAK13]

• Modifications needed for safety:• Carstamps for proper ordering

• Synchronous Commit phase for rmws

• Modifications for better read tail latency:• Early termination for reads (fast path)

• Proxy optimization for reads (fast path more often)

14

See the paper for details!

Page 19: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

Gryff in Action

15

Page 20: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

Gryff in Action

15

(2,3,0) (1,0,0) (2,3,0)

Page 21: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

Gryff in Action

15

c1

(2,3,0) (1,0,0) (2,3,0)

w1→ (3,1,0)

Writes always terminate in 2 phases

Page 22: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

Executed (3,1,1)

(3,1,1)(3,1,1)

Gryff in Action

18

c1

(2,3,0)

c2

rmw1→ (3,1,1)

Writes always terminate in 2 phases

RMW carstamps directly after base

Page 23: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

Read1Reply (3,1,1)

(3,1,1)(3,1,1)

Gryff in Action

19

c1

(2,3,0)

c2

r → (3,1,1)

Writes always terminate in 2 phases

RMW carstamps directly after base

Reads often terminate in 1 phase

Page 24: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

Evaluation

Relative to state-of-the-art-consensus protocols:

1. How do Gryff’s read/write protocols affect read tail latency?

2. What is the latency distribution of Gryff’s reads, writes, and rmws?

3. What maximum throughput does Gryff achieve?

4. How does Gryff perform in tail-at-scale workloads?

20

Page 25: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

Evaluation

Relative to state-of-the-art-consensus protocols:

1. How do Gryff’s read/write protocols affect read tail latency?

2. What is the latency distribution of Gryff’s reads, writes, and rmws?

3. What maximum throughput does Gryff achieve?

4. How does Gryff perform in tail-at-scale workloads?

21

Page 26: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

Evaluation Setup

22

• Geo-replication with 3 regions

• Baselines: MultiPaxos (industry standard), EPaxos (leaderless)

not-to-scale ocean

72ms 88ms

Page 27: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

Read Tail Latency (94.5% R, 4.5% W, 1% RMW, 25% Conflicts)

23

0

0.2

0.4

0.6

0.8

1

60 120 180 240

Fra

ctio

n o

f R

ead

s

Latency (ms)

MultiPaxos

Page 28: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

0

0.2

0.4

0.6

0.8

1

60 120 180 240

Fra

ctio

n o

f R

ead

s

Latency (ms)

MultiPaxos

Read Tail Latency (94.5% R, 4.5% W, 1% RMW, 25% Conflicts)

24

Page 29: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

0

0.2

0.4

0.6

0.8

1

60 120 180 240

Fra

ctio

n o

f R

ead

s

Latency (ms)

MultiPaxos

Read Tail Latency (94.5% R, 4.5% W, 1% RMW, 25% Conflicts)

24

serializing through far-away leader

Page 30: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

0

0.2

0.4

0.6

0.8

1

60 120 180 240

Fra

ctio

n o

f R

ead

s

Latency (ms)

MultiPaxos

EPaxos

Read Tail Latency (94.5% R, 4.5% W, 1% RMW, 25% Conflicts)

25

Page 31: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

0

0.2

0.4

0.6

0.8

1

60 120 180 240

Fra

ctio

n o

f R

ead

s

Latency (ms)

MultiPaxos

EPaxos

Read Tail Latency (94.5% R, 4.5% W, 1% RMW, 25% Conflicts)

25

delaying reads that conflict with concurrent writes

Page 32: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

0

0.2

0.4

0.6

0.8

1

60 120 180 240

Fra

ctio

n o

f R

ead

s

Latency (ms)

MultiPaxos

EPaxos

Gryff

Read Tail Latency (94.5% R, 4.5% W, 1% RMW, 25% Conflicts)

26

Page 33: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

0

0.2

0.4

0.6

0.8

1

60 120 180 240

Fra

ctio

n o

f R

ead

s

Latency (ms)

MultiPaxos

EPaxos

Gryff

Read Tail Latency (94.5% R, 4.5% W, 1% RMW, 25% Conflicts)

26

1 round to nearest majority in tail

Page 34: Gryff: Unifying Consensus and Shared Registersmatthelb/talks/gryff-nsdi20-talk.pdf•Examples: etcd, Redis, BigTable 10. Consensus-after-Register Timestamps (Carstamps) 11 rmw 1 rmw

Summary• Consensus: strong synchronization w/ high tail latency

Shared registers: low tail latency w/o strong synchronization

• Carstamps stably order read-modify-writes within a more efficient unstable order for reads and writes

• Gryff unifies an optimized shared register protocol with a state-of-the-art consensus protocol using carstamps

• Gryff provides strong synchronization w/ low read tail latency

27