PLATO: Predictive Latency-Aware Total Ordering

24
PLATO: Predictive Latency-Aware Total Ordering Mahesh Balakrishnan Ken Birman Amar Phanishayee

description

PLATO: Predictive Latency-Aware Total Ordering. Mahesh Balakrishnan Ken Birman Amar Phanishayee. Total Ordering. a.k.a Atomic Broadcast delivering messages to a set of nodes in the same order messages arrive at nodes in different orders… nodes agree on a single delivery order - PowerPoint PPT Presentation

Transcript of PLATO: Predictive Latency-Aware Total Ordering

Page 1: PLATO: Predictive Latency-Aware Total Ordering

PLATO: Predictive Latency-Aware Total Ordering

Mahesh Balakrishnan

Ken Birman

Amar Phanishayee

Page 2: PLATO: Predictive Latency-Aware Total Ordering

Total Ordering

a.k.a Atomic Broadcast delivering messages to a set of nodes

in the same order messages arrive at nodes in different

orders… nodes agree on a single delivery order messages are delivered at nodes in the

agreed order

Page 3: PLATO: Predictive Latency-Aware Total Ordering

Modern Datacenters

Applications E-tailers, Finance, Aerospace Service-Oriented Architectures, Publish-

Subscribe, Distributed Objects, Event Notification…

… Totally Ordered Multicast!

Hardware Fast high-capacity networks Failure-prone commodity nodes

Page 4: PLATO: Predictive Latency-Aware Total Ordering

Total Ordering in a Datacenter

Inventory ServiceReplica 1

Inventory ServiceReplica 2

Query

Query Update 1

Update 2

Updates are Totally OrderedReplicatedService

Totally Ordered Multicast is used to consistently update Replicated Services

Latency of Multicast System Consistency

Requirement: order multicasts consistently, rapidly, robustly

Page 5: PLATO: Predictive Latency-Aware Total Ordering

Multicast Wishlist

Low Latency!

High (stable) throughput Minimal, proactive overheads

Leverage hardware properties HW Multicast/Broadcast is fast, unreliable

Handle varying data rates Datacenter workloads have sharp spikes… and

extended troughs!

Page 6: PLATO: Predictive Latency-Aware Total Ordering

State-of-the-Art

Traditional Protocols Conservative Latency-Overhead tradeoff

Example: Fixed Sequencer Simple, works well

Optimistic Total Ordering: deliver optimistically, rollback if incorrect Why this works – No out-of-order arrival in LANs

Optimistic total ordering for datacenters?

Page 7: PLATO: Predictive Latency-Aware Total Ordering

PLATO: Predictive Ordering

In a datacenter, broadcast / multicast occurs almost instantaneously Most of the time, messages arrive in

same order at all nodes. Some of the time, messages arrive in

different orders at different nodes. Can we predict out-of-order arrival?

Page 8: PLATO: Predictive Latency-Aware Total Ordering

Reasons for Disorder: Swaps

Receiver 1

Sender 1

Switch SwitchReceiver 2

Sender 2

Receives Sender 1's message after

Sender 2's message

Receives Sender 2's message after

Sender 1's message

Receiver 1

Sender 1

Switch SwitchReceiver 2

Sender 2

Receiver 1

Sender 1

Switch SwitchReceiver 2

Sender 2

Receiver 1

Sender 1

Switch SwitchReceiver 2

Sender 2

Out-of-order arrival can occur when the inter-send interval betweentwo messages is smaller than the diameter of the network

Typical Datacenter Diameter: 50-500 microseconds

Page 9: PLATO: Predictive Latency-Aware Total Ordering

E

D

C

B

A

Order of arrivals into user-space

t

G

F

E

D

C

B

A

Order of arrivals into user-space

t

H

A B

E

D

C

F G

G

F

E

D

C

B

A

Order of arrivals into user-space

t

H

A B C D E H

E

D

C

F G

G

F

C

B

A

Order of arrivals into user-space

t

E

D

Reasons for Disorder: Loss

Datacenter networks are over-provisioned Loss never occurs

in the network Datacenter nodes

are cheap Loss occurs due to

end-host buffer overflows caused by CPU contention

Page 10: PLATO: Predictive Latency-Aware Total Ordering

Emulab Testbed (Utah)

Cisco 6509

Cisco 6509Cisco 6509

Cisco 6509

Cisco 6513

1 Gb8 Gb

4 Gb

4 Gb

100 Mb

100 Mb

100 Mb

600 Mhz

850 Mhz

850 Mhz 2 Ghz

Emulab3 test scenario: 3 switches of separationOne-way ping latency:

~110 microseconds

Emulab2 test scenario: 2 switches of separationOne-way ping latency:

~100 microseconds

4 Gb

3 GHz

850 Mhz

100 Mb

The Utah Emulab Testbed

Page 11: PLATO: Predictive Latency-Aware Total Ordering

Cornell Testbed

HP

Pro

curv

e 40

00M

HP

Procurve

4000M

HP Procurve 6108

100 Mb 100 Mb1 Gb 1 Gb

Cornell3 test scenario:3 switches of separationOne-way ping latency:

~70 microseconds

HP

Pro

curv

e 40

00M

HP

Procurve

4000MHP Procurve 6108

100 Mb 100 Mb1 Gb 1 Gb

1.3 Ghz

1 Gb Cornell5 test scenario: 5 switches of separationOne-way ping latency:

~110 microseconds

1.3 Ghz

HP Procurve 6108

1 Gb1.3 Ghz

1.3 Ghz

The Cornell Testbed

Page 12: PLATO: Predictive Latency-Aware Total Ordering

Disorder: Emulab3

At 2800 packets per sec, 2% of all packet pairs are swapped and 0.5% of packets are lost.

Percentage of swaps and losses goes up with data rate

Page 13: PLATO: Predictive Latency-Aware Total Ordering

Disorder

Page 14: PLATO: Predictive Latency-Aware Total Ordering

Predicting Disorder

Predictor: Inter-arrival time of consecutive packets into user-space

Why? Swaps: simultaneous multicasts

low inter-arrival time Loss: kernel buffer overflow

sequence of low inter-arrival times

Page 15: PLATO: Predictive Latency-Aware Total Ordering

Predicting Disorder

95% of swaps and 14% of all pairs are within 128 µsecs

Inter-arrival time of swaps

Inter-arrival time of all pairs

Cornell Datacenter, 400 multicasts/sec

Page 16: PLATO: Predictive Latency-Aware Total Ordering

Predicting Disorder

Page 17: PLATO: Predictive Latency-Aware Total Ordering

PLATO Design

Heuristic: If two packets arrive within Δ µsecs, possibility of disorder

PLATO Heuristic + Lazy Fixed Sequencer Heuristic works ~ zero (Δ) latency Heuristic fails fixed sequencer latency

Page 18: PLATO: Predictive Latency-Aware Total Ordering

PLATO Design

API: optdeliver, confirm, revoke

Ordering Layer:

Pending Queue: Packets suspected to be out-of-order, or queued behind suspected packets

Suspicious Queue:Packets optdelivered to the application, not yet confirmed

Page 19: PLATO: Predictive Latency-Aware Total Ordering

PLATO Design

D

optdeliver(A)optdeliver(E)optdeliver(B)optdeliver(D)

B E A

A

E

D

B

C

TC-TD<DELTA

TE-TA>DELTA

Seq MsgOrder: ABCD

D

B

revoke(D)setsuspect(D)setsuspect(C)

E A

C

E

revoke(E)setsuspect(E)

confirm(A, B, C, D)

suspicious

suspicious

suspicious

pending

pending

pending

Underlined packets in pending are suspected

t

Page 20: PLATO: Predictive Latency-Aware Total Ordering

Performance

Fixed Sequencer

PLATO

At small values of Δ, very low latency of delivery but more rollbacks

Page 21: PLATO: Predictive Latency-Aware Total Ordering

Performance

Latency of both Fixed-Sequencer and PLATO decreases as throughput increases

Page 22: PLATO: Predictive Latency-Aware Total Ordering

Performance

Traffic Spike: PLATO is insensitive to data rate, while Fixed Sequencer depends on data rate

Page 23: PLATO: Predictive Latency-Aware Total Ordering

Performance

Δ is varied adaptively in reaction to rollbacks

Latency is as good as static Δ parameterization

Page 24: PLATO: Predictive Latency-Aware Total Ordering

Conclusion

First optimistic total order protocol that predicts out-of-order delivery

Slashes ordering latency in datacenter settings

Stable at varying loads Ordering layer of a time-critical

protocol stack for Datacenters