15-744: Computer Networking L-14 Data Center Networking II.

40
15-744: Computer Networking L-14 Data Center Networking II

Transcript of 15-744: Computer Networking L-14 Data Center Networking II.

Page 1: 15-744: Computer Networking L-14 Data Center Networking II.

15-744: Computer Networking

L-14 Data Center Networking II

Page 2: 15-744: Computer Networking L-14 Data Center Networking II.

Overview

• Data Center Topologies

• Data Center Scheduling

2

Page 3: 15-744: Computer Networking L-14 Data Center Networking II.

Current solutions for increasing data center network bandwidth

3

1. Hard to construct 2. Hard to expand

FatTree BCube

Page 4: 15-744: Computer Networking L-14 Data Center Networking II.

Background: Fat-Tree• Inter-connect racks (of servers) using a fat-tree topology• Fat-Tree: a special type of Clos Networks (after C. Clos)

K-ary fat tree: three-layer topology (edge, aggregation and core)• each pod consists of (k/2)2 servers & 2 layers of k/2 k-port

switches• each edge switch connects to k/2 servers & k/2 aggr. switches • each aggr. switch connects to k/2 edge & k/2 core switches• (k/2)2 core switches: each connects to k pods

4

Fat-tree with K=4

Page 5: 15-744: Computer Networking L-14 Data Center Networking II.

Why Fat-Tree?• Fat tree has identical bandwidth at any bisections

• Each layer has the same aggregated bandwidth

• Can be built using cheap devices with uniform capacity• Each port supports same speed as end host• All devices can transmit at line speed if packets are distributed uniform

along available paths

• Great scalability: k-port switch supports k3/4 servers

5

Fat tree network with K = 3 supporting 54 hosts

Page 6: 15-744: Computer Networking L-14 Data Center Networking II.

An alternative: hybrid packet/circuit switched data center network

6

Goal of this work: – Feasibility: software design that enables efficient use of optical

circuits– Applicability: application performance over a hybrid network

Page 7: 15-744: Computer Networking L-14 Data Center Networking II.

Electrical packet switching

Optical circuit switching

Switching technology

Store and forward Circuit switching

Switching capacity

Switching time

Optical circuit switching v.s. Electrical packet switching

7

16x40Gbps at high end e.g. Cisco CRS-1

320x100Gbps on market, e.g. Calient FiberConnect

Packet granularity Less than 10ms

e.g. MEMS optical switch

Page 8: 15-744: Computer Networking L-14 Data Center Networking II.

Optical Circuit Switch

2010-09-02 SIGCOMM Nathan Farrington 8

Lenses FixedMirror

Mirrors on Motors

Glass FiberBundle

Input 1Output 2Output 1

Rotate MirrorRotate Mirror• Does not decode

packets• Needs take time

to reconfigure

Page 9: 15-744: Computer Networking L-14 Data Center Networking II.

Electrical packet switching

Optical circuit switching

Switching technology

Store and forward Circuit switching

Switching capacity

Switching time

Switching traffic

Optical circuit switching v.s. Optical circuit switching v.s. Electrical packet switchingElectrical packet switching

9

16x40Gbps at high end e.g. Cisco CRS-1

320x100Gbps on market, e.g. Calient FiberConnect

Packet granularity Less than 10ms

For bursty, uniform traffic For stable, pair-wise traffic

Page 10: 15-744: Computer Networking L-14 Data Center Networking II.

Optical circuit switching is promising despite slow switching time

• [IMC09][HotNets09]: “Only a few ToRs are hot and most their traffic goes to a few other ToRs. …”

• [WREN09]: “…we find that traffic at the five edge switches exhibit an ON/OFF pattern… ”

10

Full bisection bandwidth at packet granularitymay not be necessary

Page 11: 15-744: Computer Networking L-14 Data Center Networking II.

Hybrid packet/circuit switched network architecture

Optical circuit-switched network for high capacity transfer

Electrical packet-switched network for low latency delivery

Optical paths are provisioned rack-to-rack– A simple and cost-effective choice – Aggregate traffic on per-rack basis to better utilize optical circuits

Page 12: 15-744: Computer Networking L-14 Data Center Networking II.

Design requirements

12

Control plane:– Traffic demand estimation – Optical circuit configuration

Data plane:– Dynamic traffic de-multiplexing– Optimizing circuit utilization

(optional)

Traffic demands

Page 13: 15-744: Computer Networking L-14 Data Center Networking II.

c-Through (a specific design)

13

No modification to applications and switches

Leverage end-hosts for traffic

management Centralized control for circuit configuration

Page 14: 15-744: Computer Networking L-14 Data Center Networking II.

c-Through - traffic demand estimation and traffic batching

14

Per-rack traffic demand vector

2. Packets are buffered per-flow to avoid HOL blocking.

1. Transparent to applications.

Applications

Accomplish two requirements: – Traffic demand estimation – Pre-batch data to improve optical circuit utilization

Socket buffers

Page 15: 15-744: Computer Networking L-14 Data Center Networking II.

c-Through - optical circuit configuration

15

Use Edmonds’ algorithm to compute optimal configuration

Many ways to reduce the control traffic overhead

Traffic demand

configuration

Controller

configuration

Page 16: 15-744: Computer Networking L-14 Data Center Networking II.

c-Through - traffic de-multiplexing

16

VLAN #1

Traffic de-multiplexer

VLAN #1 VLAN #2

circuit configuration

traffic

VLAN #2

VLAN-based network isolation:

– No need to modify switches

– Avoid the instability caused by circuit reconfiguration

Traffic control on hosts:– Controller informs hosts

about the circuit configuration

– End-hosts tag packets accordingly

Page 17: 15-744: Computer Networking L-14 Data Center Networking II.

17

Page 18: 15-744: Computer Networking L-14 Data Center Networking II.
Page 19: 15-744: Computer Networking L-14 Data Center Networking II.
Page 20: 15-744: Computer Networking L-14 Data Center Networking II.
Page 21: 15-744: Computer Networking L-14 Data Center Networking II.
Page 22: 15-744: Computer Networking L-14 Data Center Networking II.

Overview

• Data Center Topologies

• Data Center Scheduling

22

Page 23: 15-744: Computer Networking L-14 Data Center Networking II.

Datacenters and OLDIs

OLDI = OnLine Data Intensive applications

e.g., Web search, retail, advertisements

An important class of datacenter applications

Vital to many Internet companies

OLDIs are critical datacenter applications

Page 24: 15-744: Computer Networking L-14 Data Center Networking II.

OLDIs

• Partition-aggregate • Tree-like structure• Root node sends query• Leaf nodes respond with data

• Deadline budget split among nodes and network• E.g., total = 300 ms, parents-leaf RPC = 50 ms

• Missed deadlines incomplete responses affect user experience & revenue

Page 25: 15-744: Computer Networking L-14 Data Center Networking II.

Challenges Posed by OLDIs

Two important properties:

1)Deadline bound (e.g., 300 ms)• Missed deadlines affect revenue

2)Fan-in bursts Large data, 1000s of servers

Tree-like structure (high fan-in) Fan-in bursts long “tail latency”

Network shared with many apps (OLDI and non-OLDI)

Network must meet deadlines & handle fan-in bursts

Page 26: 15-744: Computer Networking L-14 Data Center Networking II.

Current Approaches

TCP: deadline agnostic, long tail latency Congestion timeouts (slow), ECN (coarse)

Datacenter TCP (DCTCP) [SIGCOMM '10]

first to comprehensively address tail latency

Finely vary sending rate based on extent of

congestion

shortens tail latency, but is not deadline aware

~25% missed deadlines at high fan-in & tight

deadlinesDCTCP handles fan-in bursts, but is not deadline-aware

Page 27: 15-744: Computer Networking L-14 Data Center Networking II.

D2TCP

Deadline-aware and handles fan-in bursts

Key Idea: Vary sending rate based on both deadline

and extent of congestion

Built on top of DCTCP

Distributed: uses per-flow state at end hosts

Reactive: senders react to congestion

no knowledge of other flows

Page 28: 15-744: Computer Networking L-14 Data Center Networking II.

D2TCP’s Contributions

1) Deadline-aware and handles fan-in bursts Elegant gamma-correction for congestion avoidance

far-deadline back off more near-deadline back off less

Reactive, decentralized, state (end hosts)

• Does not hinder long-lived (non-deadline) flows

• Coexists with TCP incrementally deployable

• No change to switch hardware deployable today

D2TCP achieves 75% and 50% fewer missed deadlines than DCTCP and D3

Page 29: 15-744: Computer Networking L-14 Data Center Networking II.

Coflow Definition

Page 30: 15-744: Computer Networking L-14 Data Center Networking II.

30

Page 31: 15-744: Computer Networking L-14 Data Center Networking II.

31

Page 32: 15-744: Computer Networking L-14 Data Center Networking II.

32

Page 33: 15-744: Computer Networking L-14 Data Center Networking II.

33

Page 34: 15-744: Computer Networking L-14 Data Center Networking II.

34

Page 35: 15-744: Computer Networking L-14 Data Center Networking II.

35

Page 36: 15-744: Computer Networking L-14 Data Center Networking II.

36

Page 37: 15-744: Computer Networking L-14 Data Center Networking II.

37

Page 38: 15-744: Computer Networking L-14 Data Center Networking II.

Data Center Summary

• Topology• Easy deployment/costs• High bi-section bandwidth makes placement less

critical• Augment on-demand to deal with hot-spots

• Scheduling• Delays are critical in the data center• Can try to handle this in congestion control• Can try to prioritize traffic in switches• May need to consider dependencies across flows

to improve scheduling

38

Page 39: 15-744: Computer Networking L-14 Data Center Networking II.

Review

• Networking background• OSPF, RIP, TCP, etc.

• Design principles and architecture• E2E and Clark

• Routing/Topology• BGP, powerlaws, HOT topology

39

Page 40: 15-744: Computer Networking L-14 Data Center Networking II.

Review• Resource allocation

• Congestion control and TCP performance• FQ/CSFQ/XCP

• Network evolution• Overlays and architectures• Openflow and click• SDN concepts• NFV and middleboxes

• Data centers• Routing• Topology• TCP• Scheduling

40