Control Planes for Optical Switching - OFC Conference · Payload size (B) 2048 1024 2048 1024 Data...

14
Control Planes for Optical Switching George Papen UC San Diego Opportunities and Challenges for Optical Switching in the Data Center OFC Workshop 2019

Transcript of Control Planes for Optical Switching - OFC Conference · Payload size (B) 2048 1024 2048 1024 Data...

Page 1: Control Planes for Optical Switching - OFC Conference · Payload size (B) 2048 1024 2048 1024 Data rate (Gbps) 12.5 12.5 20 20 Cycletime (ns) 1366 730 858 460 BM-Switch time (ns)

Control Planes for

Optical SwitchingGeorge Papen

UC San Diego

Opportunities and Challenges for Optical Switching in the Data Center

OFC Workshop 2019

Page 2: Control Planes for Optical Switching - OFC Conference · Payload size (B) 2048 1024 2048 1024 Data rate (Gbps) 12.5 12.5 20 20 Cycletime (ns) 1366 730 858 460 BM-Switch time (ns)

Practical Optical Switching in Data Centers• Hardware Issues

• Link-level reconfiguration time dominates physical switch time

• Synchronization is hard• Links w/burst-mode RXs are more complex

TX data

Switch latch

Switch output

BM RX start

BM RX output

Demux output

A B C D E

InvalidPreamblePRBS data

AError detector gate

DSW DPRBS

DGATE

TCYCLE

TGATE

Switch state

Time (us)

Host 7Host 6Host 5Host 4Host 2Host 1

Tx to Host 0PFC

ReconfigDemand Updates

● ● ● ● ● ● ● ● ● ● ● ● ● ●

0 1500 3000 4500 6000 7500

M7

…M1 M2 M3 M4 M5

Optical rotorswitches

Electronic top-of-rack switches

Server racks

M6 M8 M9 M7

…M1 M2 M3 M4 M5

Optical rotorswitches

Servers /Compute nodes

M6 M8 M9

Single mode fiberSingle mode OR

multi mode fiber

• Software Issues• Centralized scheduling does not scale• Software has no firm concept of ``Go now!”• Unacceptable delay for µs (or less) switching

• A New Solution RotorNet• Decouple scheduling and routing• Decentralized control plane• Hide delay w/parallelism

Page 3: Control Planes for Optical Switching - OFC Conference · Payload size (B) 2048 1024 2048 1024 Data rate (Gbps) 12.5 12.5 20 20 Cycletime (ns) 1366 730 858 460 BM-Switch time (ns)

Hardware Issues: Reconfiguration Time

• Research at IBM w/UCSD intern using nanosecond Si-P switch• Goal: minimize link-level reconfiguration time of the switch

(Time during which data cannot reliably transit the network)

• Includes:– Switch reconfiguration time– Clock Data Recovery (CDR) locking time– Synchronization guard delays

Transfer data Transfer dataSwitch reconfig CDR lock

Guard Guard Guard

System-level reconfiguration time

OFC 2018: A. Forencich et. al, “System-Level Demonstration of a Dynamically Reconfigured Burst-Mode Link Using a Nanosecond Si-Photonic Switch”

Page 4: Control Planes for Optical Switching - OFC Conference · Payload size (B) 2048 1024 2048 1024 Data rate (Gbps) 12.5 12.5 20 20 Cycletime (ns) 1366 730 858 460 BM-Switch time (ns)

System Reconfiguration Time Testbed

• Data Plane– 100G PSM-4 QSFP28 TX– 2x2 Silicon Photonic switch– Amp/filter/attenuator– Burst-mode receiverPattern

Generator

100GPSM-4

2x2 Si-PhSwitch

Burst ModeReceiver

Powermeter

ErrorDetector

SwitchControl

TriggerGenerator

PDFA VOA

BPF99%

1%Transmitter(QSFP28)

Xilinx Virtex Ultrascale FPGA (VCU108 board)

PC 1:16

Demuxboard

1:16

PatternGenerator

100GPSM-4

2x2 Si-PhSwitch

Burst ModeReceiver

Powermeter

ErrorDetector

SwitchControl

TriggerGenerator

PDFA VOA

BPF99%

1%Transmitter(QSFP28)

Xilinx Virtex Ultrascale FPGA (VCU108 board)

PC 1:16

Demuxboard

1:16

• Control plane– Xilinx XVCU095 FPGA– 25 Gbps pattern generator– Trigger generator, switch interface– 25 Gbps gated error detector

~ 1 ns Si-P switching

(physical response)

Page 5: Control Planes for Optical Switching - OFC Conference · Payload size (B) 2048 1024 2048 1024 Data rate (Gbps) 12.5 12.5 20 20 Cycletime (ns) 1366 730 858 460 BM-Switch time (ns)

Measured Waveforms of Photonic Switch and BM-RX

Payload size (B) 2048 1024 2048 1024Data rate (Gbps) 12.5 12.5 20 20Cycle time (ns) 1366 730 858 460

BM-Switch time (ns) 90 90 60 60Duty cycle (%) 93 87 93 87

Reconfiguration 60-90x slower than physical switch time !

Physical Switch Time

Preamble Data

Page 6: Control Planes for Optical Switching - OFC Conference · Payload size (B) 2048 1024 2048 1024 Data rate (Gbps) 12.5 12.5 20 20 Cycletime (ns) 1366 730 858 460 BM-Switch time (ns)

Software Issues

Page 7: Control Planes for Optical Switching - OFC Conference · Payload size (B) 2048 1024 2048 1024 Data rate (Gbps) 12.5 12.5 20 20 Cycletime (ns) 1366 730 858 460 BM-Switch time (ns)

ASIC

Queue occupancy

Scheduling

Queue occupancy

Scheduling

Data plane doesn’t scale to entire datacenter!

The Centralized Control Issue

Queue occupancy

Inputs Outputs

Scheduling

Crossbar

Sendingracks

Receivingracks

Crossbar

Optical CircuitSwitch

Electronic PacketSwitch

Information required for scheduling not locally available

Page 8: Control Planes for Optical Switching - OFC Conference · Payload size (B) 2048 1024 2048 1024 Data rate (Gbps) 12.5 12.5 20 20 Cycletime (ns) 1366 730 858 460 BM-Switch time (ns)

Hybrid Switch

Sender

Sender

Sender

N Input Ports

Receiver

Receiver

Receiver

N Output Ports

Circuit Switch

Packet Switch

. . . 1 . . . . . 1 1 . . . . . 1 . . . . . 1 . .

. . 1 . . . . . 1 . . . . . 1 1 . . . . . 1 . . .

. 3 . 1 4 . . 6 2 . 2 . . 2 4 5 1 2 . . 1 4 . 3 .

69 10ScheduleS =

Duration Circuit Configuration Duration Circuit Configuration Leftover Demand

Circuit Switch Packet Switch

. 13 10 70 4 . . 14 12 6965 . . 12 1415 33 2 . 1112 7 3 1 .

Senders

R e c e i v e r s

SchedulingAlgorithm

Steps in Centralized SchedulingTrafficMatrix

• Collect demand information from endhosts• Send demand information over a network to central location • Form the traffic matrix • Factor traffic matrix into a sequence of switch states• Finally! Set the switch

Switch States

Scheduling Techniques for Hybrid Circuit/PacketNetworks, CoNEXT 2015

Receivers

Send

ers

Page 9: Control Planes for Optical Switching - OFC Conference · Payload size (B) 2048 1024 2048 1024 Data rate (Gbps) 12.5 12.5 20 20 Cycletime (ns) 1366 730 858 460 BM-Switch time (ns)

A Centralized Control Plane -ReacToR

ControlPlane

Hosts

Top of Rackswitch

OR

Station

“A multiport microsecond optical circuit switch for data center networking,” PTL 2013

0 5 10 15

0.0

0.2

0.4

0.6

0.8

1.0

End−to−End Flow Switch Time (microsecond)

CDF

of L

aten

cy D

istrib

utio

n

14.84us

Circuit Switch(Mordia)

Control Path

EPS(10G Switch)

H H H H H H H HSchedule

REACToR(FPGA)

REACToR(FPGA)Control

Computer

DemandInformation

Schedule

10 G

1 G

Software latencies dominate switch time

Page 10: Control Planes for Optical Switching - OFC Conference · Payload size (B) 2048 1024 2048 1024 Data rate (Gbps) 12.5 12.5 20 20 Cycletime (ns) 1366 730 858 460 BM-Switch time (ns)

Time-Varying Demand

Time (us)

Host 7Host 6Host 5Host 4Host 2Host 1

Tx to Host 0PFC

ReconfigDemand Updates

● ● ● ● ● ● ● ● ● ● ● ● ● ●

0 1500 3000 4500 6000 7500

§ Control plane tries to allocate circuits based on calculated schedule§ Control plane prototype was slow, did not always schedule

correctly, and does not scale – hard lesson learned!

IncorrectlyScheduled

“Circuit Switching Under the Radar with REACToR,” NSDI 2014

Page 11: Control Planes for Optical Switching - OFC Conference · Payload size (B) 2048 1024 2048 1024 Data rate (Gbps) 12.5 12.5 20 20 Cycletime (ns) 1366 730 858 460 BM-Switch time (ns)

A New Solution - RotorNet

• No centralized control – inherently more scalable!

• Co-design of optical switch and network

Why build a large, fast crossbar that you cannot control?

• Parallelism decouples minimum latency from switching time

TOMORROW: Max Mellette, invited talk M2C.3, Monday 11 AM, Room 3.

“A Practical Approach to Optical Switching in Datacenters”

Page 12: Control Planes for Optical Switching - OFC Conference · Payload size (B) 2048 1024 2048 1024 Data rate (Gbps) 12.5 12.5 20 20 Cycletime (ns) 1366 730 858 460 BM-Switch time (ns)

Rotor switch model:

N inputports

N outputports

N – 1 matchings

RotorNet has no Central Control

Real-time scheduleMatching 1Matching 2Queue occupancy

Scheduling

Crossbar

1 ® 2 1 ® 3 1 ® 4Fixed schedule

,

Rotor switch

® No (central) control

® Bounded reduction in throughput

RotorNet: A Scalable, Low-complexity, Optical Datacenter Network, Sigcomm ‘17

Page 13: Control Planes for Optical Switching - OFC Conference · Payload size (B) 2048 1024 2048 1024 Data rate (Gbps) 12.5 12.5 20 20 Cycletime (ns) 1366 730 858 460 BM-Switch time (ns)

Summary

• Mimicking the electronic packet-switched network control plane leads to an optical circuit switch that does not scale

• Even w/o centralized scheduling, the control plane is hard• Must reduce/hide system reconfiguration time• Must synchronize “asynchronous” end hosts

• RotorNet addresses control plane issues by:• No centralized scheduler • Using parallelism to bypass system reconfiguration delay• Still must address synchronization with end hosts

Page 14: Control Planes for Optical Switching - OFC Conference · Payload size (B) 2048 1024 2048 1024 Data rate (Gbps) 12.5 12.5 20 20 Cycletime (ns) 1366 730 858 460 BM-Switch time (ns)

Contributing ResearchersSystems

• UC San Diego: Max Mellette, George Papen, George Porter, Alex C. Snoeren, Geoffrey M. Voelker, Amin Vahdat, with students Nathan Farrington, Rishi Kapoor, He Liu, Feng Lu, Rob McGuinness, Arjun Roy, Malveeka Tewari

• CMU: David G. Andersen, Srinivasan Seshan, with students Matthew K. Mukerjee , Conglong Li, Nicolas Feltman

• Intel: Michael Kaminsky

Hardware• UC San Diego: Joe Ford, Max Mellette, George Papen, P.-C. Sun, with

students Alex Forencich, Max Mellette, Glenn M. Schuster

• IBM: Nicolas Dupuis, Christian Baks, Benjamin G Lee, Laurent Schares

• Technical University of Denmark: Valerija Kamchevska (student)

14