Download - EyeQ: (An engineer’s approach to) Taming network performance unpredictability in the Cloud Vimal Mohammad Alizadeh Balaji Prabhakar David Mazières Changhoon.

EyeQ:(An engineer’s approach to)

Taming network performance unpredictability in the Cloud

VimalMohammad Alizadeh

Balaji PrabhakarDavid Mazières

Changhoon KimAlbert Greenberg

2

What are we depending on?

http://techblog.netflix.com/2010/12/5-lessons-weve-learned-using-aws.html

5 Lessons We’ve Learned Using AWS

… in the Netflix data centers, we have a high capacity, super fast, highly reliablenetwork. This has afforded us the luxury of designing around chatty APIs to remote systems. AWS networking has more variable latency.

Overhaul appsto deal with variability

Many customersdon’t even realise network issues:

Just “spin up more VMs!”Makes app more network dep.

3

Cloud: Warehouse Scale ComputerMulti-tenancy: To increase cluster utilisation

6/11/12

http://research.google.com/people/jeff/latency.html

Provisioning the WarehouseCPU, memory, disk

Network

4

Sharing the Network

• Policy– Sharing model

• Mechanism– Computing rates– Enforcing rates on entities…• Per-VM (multi-tenant)• Per-service (search, map-reduce, etc.)

6/11/12

Can we achieve this?

2Ghz VCPU15GB memory1Gbps network

Tenant X’s Virtual Switch

VM1 VM2 VMnVM3 …

Tenant Y’s Virtual Switch

VM1 VM2 VMiVM3 …

Customer X specifiesthe thickness of each pipe.No traffic matrix.(Hose Model)

5

Why is it hard? (1)

• Bandwidth demands can be…– Random, bursty– Short: few millisecond requests

• Timescales matter!– Need guarantees on the order of few RTTs (ms)

6/11/12

• Default policy insufficient: 1 vs many TCP flows, UDP, etc.• Poor scalability of traditional QoS mechanisms

10–100KB 10–100MB

6

Seconds: Eternity

6/11/12

Switch

1 Long livedTCP flow

Bursty UDP sessionON: 5msOFF: 15ms

Shared10G pipe

7

Under the hood

6/11/12

Switch

8

Why is it hard? (2)

6/11/12

Switch

• Switch sees contention, but lacks VM state• Receiver-host has VM state, but does not see contention

(1) Drops in network: servers don’t see true demand

(2) Elusive TCP (back-off) makes true demand detection harder

9

Key Idea: Bandwidth Headroom• Bandwidth guarantees: managing congestion• Congestion: link util reaches 100%

– At millisecond timescales• Don’t allow 100% util

– 10% headroom: Early detection at receiver

6/11/12

N x 10G

UDP

TCP

Shared pipeLimit to 9G

Single Switch: Headroom

What about a network?

10

Network design: the old

6/11/12

http://bradhedlund.com/2012/04/30/network-that-doesnt-suck-for-cloud-and-big-data-interop-2012-session-

teaser/

Over-subscription

11

Network design: the new

6/11/12

http://bradhedlund.com/2012/04/30/network-that-doesnt-suck-for-cloud-and-big-data-interop-2012-session-

teaser/

(1) Uniform capacity across racks

(2) Over-subscription only atTop-of-Rack

12

Mitigating Congestion in a Network

6/11/12

Load balancing + Admissibility =Hotspot free network core

[VL2, FatTree, Hedera, MicroTE]

Aggregate rate > 10GbpsFabric gets congested

Server

VM

10Gbps pipe

Fabric

Aggregate rate < 10GbpsCongestion free Fabric

Server

VM

10Gbps pipe

FabricLoad balancing: ECMP, etc.

Admissibility: e2e congestion control (EyeQ)

13

EyeQ Platform

6/11/12

TX packets

VMVM

TX

VM

Software VSwitchAdaptive Rate

Limiters

untrusted

RX

3Gbps6Gbps

RX packets

Software VSwitch

VM

Congestion Detectors

untrusted VM

RX componentdetects

TX componentreacts

End-to-endflow control

(VSwitch—VSwitch)

DataCentreFabric

Congestion Feedback

14

Does it work?

6/11/12

Without EyeQ With EyeQ

Improves utilisation

Provides protection

TCP: 6GbpsUDP: 3Gbps

15

State: only at edge

EyeQ

One Big Switch

16

[email protected]

EyeQ Load balancing+ Bandwidth headroom+ Admissibility at millisec timescales= Network as one big switch= Bandwidth sharing at edge

Linux, Windows implementation for 10Gbps~1700 lines C codehttp://github.com/jvimal/perfiso_10g (Linux kmod)No documentation, yet.

http://github.com/jvimal/perfiso_10g

176/11/12