High Performance Optical DCN based on WDM Optical Cross ...€¦ · • WDM TRXs at the TOR...
Transcript of High Performance Optical DCN based on WDM Optical Cross ...€¦ · • WDM TRXs at the TOR...
HighPerformanceOpticalDCNbasedonWDMOpticalCross-ConnectSwitches
NicolaCalabretta
COSIGN
Outline
• ScalableDCNarchitecture:bandwidth,latency,powerconsumptionissues
• OPSquareDCNarchitecturebasedondistributedflow-controlledWDMcross-connectswitches
• Photonicintegratedcross-connectswitch
• Conclusions
3
More than 50 billion connections by 2020
Data Center Network (DCN)
75%withinDCN
10zettabytes
What are the issues?
• Network architectureBandwidthbottleneck
Largelatency
• Electrical switch
Static
Limitedbandwidth
Largelatency
Highcostandpower
ScalingtoPetabit/sinterconnectnetworks
4
X86 Motherboard 30 cm x25 cm 40 Gbps Ethernet
IBM Microserver 14 cm x 5.5 cm 40 Gbps Ethernet
AcQ Microserver 10 cm x 15 cm 40 Gbps Ethernet24 Gbps PCIe3
5
TOR: 20 x 640 Gbps = 12.8 Tbps
Blade: 40 Gbps x 16 = 640 Gbps
…
…
… …
Electronic switch
ScalingtoPetabit/sinterconnectnetworks
6
…TOR #1: 12.8 Tbps TOR #1000
Scalability and capacity limited by the switch radix and port bandwidth
ScalingtoPetabit/sinterconnectnetworks
7
OneflatnetworkOvercomethebandwidthbottleneck
LowlatencyFlatany-to-anyconnectivity
IntroducingopticalswitchingTransparencytodatarate/formatEliminationofO/E/Oconversions
Fastcontrol?Bufferingissue?Connectivity?
HighcapacityDCNbasedonfastcontrolledopticalswitches
Optically switched network
8
• Flat connectivity- High bandwidth- Low latency
• Scalability- Square of switch radix- Large interconnectivity
• Resilience- Fault tolerance- Load balancing
OPSquareDCNarchitecture
Optical label processing à Nanoseconds switch control
Optical flow control à Buffer-less operation
Nanoseconds reconfiguration à statistical multiplexing
Single stage architecture à Scalable
Fastcontrolledopticalswitch
9
…
ToR2
ToRF
Group1 Group2 GroupK
ToRM
Cluster1
LP 1xFswitchλ1λ2
λKLP
LP
1xFswitch
1xFswitch
1
2
K
……
…
…
……
…LP 1xFswitchλ1λ2
λKLP
LP
1xFswitch
1xFswitch
1
2
K
……
…
…
……
…
Control
LP 1xFswitchλ1λ2
λK
1
2
K1xFswitch
1xFswitchLP
LP
…
… …
Moduel 1Fast flow-controlled Optical switch
………
…
……
…
LP 1xFswitchλ1λ2
λKLP
LP
1xFswitch
1xFswitch
1
2
K
……
…
…
……
…LP 1xFswitchλ1λ2
λKLP
LP
1xFswitch
1xFswitch
1
2
K
……
…
…
……
…
LP 1xFswitchλ1λ2
λKLP
LP
1xFswitch
1xFswitch
1
2
K
……
…
…
……
…
Moduel K
Module2
……
…
……
…
…
…ToR2
ToRF
Group1
Group2
GroupK
ToRM
2
2
N
2
• Fast parallel processing of the label
• On-the-fly distributed control
• KxF connectivity achieved by 1xF switches
• Fast flow control and retransmission
DynamicvirtualDCNreconfiguration
10
…
Cluster1ToR 2ToR 1 ToR M ToR 2ToR 1 ToR M
ToR 2ToR 1 ToR M
…
Cluster2
…
…
…
ClusterN
…
Intra-OS1 Intra-OS2 Intra-OSN
ControlPlane
… …
Inter-OS1 Inter-OS2 Inter-OSMToToRs ToToRs
LUT LUT LUTLUT LUT LUT LUTLUT LUT
LUT
LUT
LUT
LUT
LUT
LUT
Status
Status
Status Status
Status Status
t
1. Networkprovisioning- Millisecondsoperation
2.Fastswitching- nanosecondsoperation
LUT– look-uptable
Decouplingofcontrolplane(ms)anddataplane(ns)operation
Numericalperformance(I)
11
Different intra-/inter-cluster traffic ratio- Buffer size is 20KB per transceiver
0,0 0,2 0,4 0,6 0,8 1,01E-8
1E-7
1E-6
1E-5
1E-4
1E-3
0,01
0,1
1
Pack
et lo
ss
Load
INTRA:INTER 3:1 4:1
2
4
6
8
10
12
End-
to-e
nd la
tenc
y (μ
s)
• Packet loss <1E-6• End-to-end latency <2µs
• Buffer dimensioning
0,0 0,2 0,4 0,6 0,8 1,01E-81E-71E-61E-51E-41E-30,01
0,11
Pack
et lo
ssLoad
Buffer size 10KB 15KB 20KB 30KB
2
4
6
8
10
12
End-
to-e
nd la
tenc
y (μ
s)
Larger buffer improve packet loss but for load > 0.5 latency performance increases
12
Scaling to larger number of servers:- Network size varies from 2560 to 40960 servers- Optical switches have radix of 8x8 to 32x32- 20KB buffer per transceiver
0,0 0,2 0,4 0,6 0,8 1,01E-81E-71E-61E-51E-41E-30,01
0,11 Server NO.
2560 10240 23040 40960
Pack
et lo
ss
Load
Server NO. 2560 10240 23040 40960
2
4
6
8
10
12
End-
to-e
nd la
tenc
y (μ
s)
0,0 0,2 0,4 0,6 0,8 1,00,0
0,2
0,4
0,6
0,8
1,0 Server NO. 2560 10240 23040 40960
Throughput
Load
• Larger scales perform similarly with limited degradation• Network saturates for load higher than 0.6
Numericalperformance(II)
Experimentalinvestigation
13W. Miao et al., paper Tu.3.6.3, ECOC 2015
IS3
Rack1 Cluster1
IS1
ES1
IS2
ES2 ES3
Cluster2 Cluster3
ToR1 ToR9
12
3 45
6 78
9
-12 -11 -10 -9 -8121110
987
6
5
4
-Log
(BER
)
Received optical power (dBm)
Back-to-back Direct connection After bypass link Indirect connection
OSNR: 49dB
OSNR: 36dB
OSNR: 34dB
OSNR: 31dB
Optical switch prototype
Path 1 Path 2
4x4OPSprototype
TowardsPhotonicIntegration
14N. Calabretta et al, ECOC 2016 and OFC 2017
1234
WDM Inputs1
2
3
4
Outputs
.
.
.
16
6mm
WSS Optical Module 1
Multi-level modulated traffic Large input power dynamic range
Small footprint Low power consumption High bandwidth density
Conclusions
15
• OPSquare scalable parallel flat intra-/inter-cluster data center networks architecture based on distributed buffer-less optical
• Fast flow control and label processing enable nanoseconds control of the optical switches à statistical multiplexing
• Optical switch transparency à data rate and format independent
• WDM TRXs at the TOR à improve DCN capacity and the feasibility of the optical switch (lower B&S splitting losses)
• Assessments with realistic traffic in OMNeT++− Packet loss <1E-6 and latency <2µs at 0.5 load− up to 40960 servers with limited degradation
• Photonic integration enables large port count WDM cross-connect switches with reduced power and costs
Acknowledgements
16
COSIGN
Numericalperformanceinvestigation
17
• 40-server (10Gb/s) rack• Data center-like traffic pattern*
• Kintra=4 and Kinter=1
• Transceivers operating at 50Gb/s• 560ns round trip time
*T. Benson et al., “Network Traffic Characteristics of Data centers in the Wild,” Proc. ACM, 2010.