Controlling Congestion in New Storage Architectures
Transcript of Controlling Congestion in New Storage Architectures
![Page 1: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/1.jpg)
PRESENTATION TITLE GOES HERE
Controlling Congestion in New Storage Architectures
September 15, 2015
![Page 2: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/2.jpg)
Today’s Presenters
Chad Hintz Ethernet Storage Forum
Board Member Solutions Architect - Cisco
David L. Fair SNIA Ethernet Storage Forum Chair
Intel
![Page 3: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/3.jpg)
SNIA Legal Notice
! The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted.
! Member companies and individual members may use this material in presentations and literature under the following conditions:
! Any slide or slides used must be reproduced in their entirety without modification ! The SNIA must be acknowledged as the source of any material used in the body of any
document containing material from these presentations. ! This presentation is a project of the SNIA Education Committee. ! Neither the author nor the presenter is an attorney and nothing in this
presentation is intended to be, or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion please contact your attorney.
! The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK.
3
![Page 4: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/4.jpg)
Terms
! STP-Spanning Tree Protocol ! An older network protocol that ensures a loop-free topology for
any bridged Ethernet local area network.
! Spine/Leaf, CLOS, Fat-Tree, Multi Rooted Tree ! Network based on routing with all links active
! VXLAN ! Standards based layer 2 overlay scheme over a layer 3 network
4
![Page 5: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/5.jpg)
Agenda
! Data Center “Fabrics” Current State
! Current Congestion Control Mechanisms ! CONGA’s Design ! Why is CONGA important for IP based
Storage and Congestion Control? ! Q&A
5
![Page 6: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/6.jpg)
Storage over Ethernet Needs and Concerns
6
Minimize chance of dropped traffic
In order frame delivery
Minimal oversubscription
Lossless fabric for FCoE (no Drop)
Reliability Performance
![Page 7: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/7.jpg)
Data Center Fabric: Current State
7
![Page 8: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/8.jpg)
STP- Spanning Tree
Data Center “Fabric” Journey
Blocking
![Page 9: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/9.jpg)
STP- Spanning Tree Multi-Chassis
Etherchannel
Data Center “Fabric” Journey
![Page 10: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/10.jpg)
STP- Spanning Tree Multi-Chassis
Etherchannel
Data Center “Fabric” Journey
Spine-Leaf
LAYER 3
![Page 11: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/11.jpg)
STP- Spanning Tree Multi-Chassis
Etherchannel
Data Center “Fabric” Journey
MAN/WAN
VXLAN /EVPN
LAYER 3 W/ LAYER 2 OVERLAY
Spine-Leaf
![Page 12: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/12.jpg)
Multi-rooted Tree(Spine/Leaf) = Ideal DC Network
12
1000s of server ports
Ideal DC network:
Giant switch
Ø No internal bottlenecks è predictable Ø Simplifies BW management
![Page 13: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/13.jpg)
Multi-rooted Tree(Spine/Leaf) = Ideal DC Network
13
1000s of server ports
Ideal DC network:
Giant switch
Ø No internal bottlenecks è predictable Ø Simplifies BW management
Can’t build it L
![Page 14: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/14.jpg)
Multi-rooted Tree(Spine/Leaf) = Ideal DC Network
14
1000s of server ports
Ideal DC network:
Giant switch
Ø No internal bottlenecks è predictable Ø Simplifies BW management
Multi-rooted tree
1000s of server ports
Can’t build it L ≈
![Page 15: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/15.jpg)
Multi-rooted Tree(Spine/Leaf) = Ideal DC Network
15
1000s of server ports
Ideal DC network:
Giant switch
Ø No internal bottlenecks è predictable Ø Simplifies BW management
Multi-rooted tree
1000s of server ports
Can’t build it L ≈Possible
bottlenecks
![Page 16: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/16.jpg)
Multi-rooted Tree(Spine/Leaf) = Ideal DC Network
16
1000s of server ports
Ideal DC network:
Giant switch
Ø No internal bottlenecks è predictable Ø Simplifies BW management
Multi-rooted tree
1000s of server ports
Can’t build it L ≈Possible
bottlenecks Need precise load balancing
![Page 17: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/17.jpg)
Leaf-Spine DC Fabric
17
Approximates ideal giant switch
H1 H2 H3 H4 H5 H6 H7 H8 H9 H1 H2 H3 H4 H5 H6 H7 H8 H9
Leaf switches
Spine switches
≈
![Page 18: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/18.jpg)
Leaf-Spine DC Fabric
18
Approximates ideal giant switch
H1 H2 H3 H4 H5 H6 H7 H8 H9 H1 H2 H3 H4 H5 H6 H7 H8 H9
Leaf switches
Spine switches
≈! How close is Leaf-Spine to ideal giant switch?
! What impacts its performance? ! Link speeds, oversubscription, buffering
![Page 19: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/19.jpg)
Today: Equal Cost Multipath (ECMP) Routing
Pick among equal-cost paths by an algorithm(hash) Ø Randomized load balancing Ø Preserves packet order (Optimal for TCP)
19
Problems: - Hash collisions - No idea of congestion - Flows mapped to one
path (Loss of link issues)
![Page 20: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/20.jpg)
Impact of Link Speed
20
Three non-oversubscribed topologies:
20×10Gbps Downlinks
20×10Gbps Uplinks
20×10Gbps Downlinks
5×40Gbps Uplinks
20×10Gbps Downlinks
2×100Gbps Uplinks
![Page 21: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/21.jpg)
How does Link Speed affect ECMP
Higher speed links improve ECMP efficiency
21
20×10Gbps Uplinks
2×100Gbps Uplinks
11×10Gbps flows (55% load)
1 2
1 2 20
Prob of 100% throughput = 3.27%
Prob of 100% throughput = 99.95%
http://simula.stanford.edu/~alizade/papers/conga-sigcomm14.pdf
![Page 22: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/22.jpg)
Better
Impact of Link Speed
22
0 2 4 6 8
10 12 14 16 18 20
30 40 50 60 70 80 FCT
(no
rmal
ized
to
opti
mal
)
Load (%)
Avg FCT: Large (10MB,∞) background flows
OQ-Switch
20x10Gbps
5x40Gbps
2x100Gbps
http://simula.stanford.edu/~alizade/papers/conga-sigcomm14.pdf
![Page 23: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/23.jpg)
Better
Impact of Link Speed
23
0 2 4 6 8
10 12 14 16 18 20
30 40 50 60 70 80 FCT
(no
rmal
ized
to
opti
mal
)
Load (%)
Avg FCT: Large (10MB,∞) background flows
OQ-Switch
20x10Gbps
5x40Gbps
2x100Gbps
! 40/100Gbps fabric: ~ same Flow Completion Time as Giant Switch (OQ)
! 10Gbps fabric: FCT up 40% worse than OQ
![Page 24: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/24.jpg)
Storage over Spine-Leaf
! New scale out storage is looking to have initiators and targets spread over multiple leaf switches
! Concerns ! Multiple hops ! Potential for increased latency ! Oversubscription ! TCP Incast ! Potential buffering issues
24
![Page 25: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/25.jpg)
Incast Issue with IP Based Storage
25
Initiators (Senders)
Target (receiver)
Spread over many paths
Single point of convergence
Incast events are most severe at receiver (iSCSI, other IP based storage)
![Page 26: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/26.jpg)
Summary
! 40/100Gbps fabric + ECMP ≈ Giant switch; some performance loss with 10Gbps fabric
! Oversubscription(incast) in IP Storage networks are very common and have cascading effect on performance and throughput
26
![Page 27: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/27.jpg)
Current Congestion Control Mechanisms Hop-By-Hop
27
![Page 28: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/28.jpg)
IEEE 802.1Qaz
Enhanced Transmission Selection (ETS)
! Required when consolidating I/O – It’s a QoS problem
! Prevents a single traffic class of “hogging” all the bandwidth and starving other classes
! When a given load doesn’t fully utilize its allocated bandwidth, it is available to other classes
! Helps accommodate for classes of a“bursty” nature
Ethernet Wire
FCoE
50% 50%
![Page 29: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/29.jpg)
IEEE 802.1Qbb Priority Flow Control
! PFC enables Flow Control on a Per-Priority basis
! Therefore, we have the ability to have lossless and lossy priorities at the same time on the same wire
! Allows FCoE to operate over a lossless priority independent of other priorities
! Other traffic assigned to other CoS will continue to transmit and rely on upper layer protocols for retransmission
! Not only for FCoE traffic
29
Ethernet Wire
FCoE
![Page 30: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/30.jpg)
Adding in Spine-Leaf
! Use IEEE ETS to guarantee bandwidth for traffic types
! Use IEEE PFC to create lossless traffic for FCoE
! Use Ethernet infrastructure for all kinds of storage
! Improve scalability for all application needs and maintain high, consistent performance for all traffic types, not just storage
30
![Page 31: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/31.jpg)
Problems ETS/PFC do not solve
! Does not take in consideration Layer 3 links and ECMP in spine-leaf topology ! Limited to hop-by-hop links ! PFC designed for lossless
traffic, not typical IP-based storage
! ETS guarantees bandwidth, does not alleviate congestion
31
![Page 32: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/32.jpg)
The network paradigm as we know it…
![Page 33: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/33.jpg)
Control and Data Plane
! Two Models: ! Distributed Control and Data Plane-Traditional ! Centralized (Controller Based/SDN)
33
![Page 34: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/34.jpg)
Control and Data Plane resides within Physical Device
Traditional
![Page 35: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/35.jpg)
Software defined networking (SDN) definition: The physical separation of the network control plane from
the forwarding plane, and where a control plane controls several devices.
What is SDN? per ONF definition
https://www.opennetworking.org/sdn-resources/sdn-definition
![Page 36: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/36.jpg)
In other words…
In the SDN paradigm, not all processing happens inside the same device
![Page 37: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/37.jpg)
CONGA’s Design
37
![Page 38: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/38.jpg)
CONGA in 1 Slide
38
L0 L1 L2
1. Leaf switches (top-of-rack) track congestion to other leaves on different paths in near real-time
2. Send traffic on least congested path(s)
Fast feedback loops between leaf switches, directly in dataplane
![Page 39: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/39.jpg)
CONGA in 1 Slide
39
L0 L1 L2
1. Leaf switches (top-of-rack) track congestion to other leaves on different paths in near real-time
2. Send traffic on least congested path(s)
Fast feedback loops between leaf switches, directly in dataplane
![Page 40: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/40.jpg)
Could this work with centralized control plane?
! If control plane is separate then feedback can be sent in dataplane but then has to be computed in central control point (controller)
! Latency for this along with constant change in network makes this not a valid option
40
VS.
![Page 41: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/41.jpg)
CONGA in Depth
CONGA operates over a standard DC overlay (VXLAN) Ø Already broadly supported (VXLAN) to virtualize the
physical network
41 H1 H2 H3 H4 H5 H6 H7
L0 L1 L2
H9 H8
H1èH9 L0èL2 H1èH9 L0èL2
VXLAN encap.
![Page 42: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/42.jpg)
CONGA-In Depth (VXLAN)
42
VXLAN Frame Format
![Page 43: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/43.jpg)
H9
CONGA In Depth: Leaf-to-Leaf Congestion
43 H1 H2 H3 H4 H5 H6 H7
L1
H8
1 2 3 0
Track path-wise congestion metrics (3 bits) between each pair of leaf switches
Conges4on-‐From-‐Leaf Table @L2
Src Leaf
Path 0 1 2
L0 L1
3
L0èL2 Path=2 CE=5
5
L0 L2
L0èL2 Path=2 CE=0
L0èL2 Path=2 CE=5
L0èL2 Path=2 CE=0
pkt.CE ç max(pkt.CE, link.util)
![Page 44: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/44.jpg)
H9
CONGA In Depth: Leaf-to-Leaf Congestion
44 H1 H2 H3 H4 H5 H6 H7
L1
H8
1 2 3 0
Track path-wise congestion metrics (3 bits) between each pair of leaf switches
Conges4on-‐To-‐Leaf Table @L0
Dest Leaf
Path 0 1 2
L1 L2
3
5
L0 L2
L0èL2 Path=2 CE=0
L2èL0 FB-‐Path=2 FB-‐Metric=5
51 1 43 7 2
![Page 45: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/45.jpg)
CONGA-In Depth: LB Decisions
Send each packet on least congested path
45 H1 H2 H3 H4 H5 H6 H7
L0 L1 L2
H9 H8
Conges4on-‐To-‐Leaf Table @L0
Dest Leaf
Path 0 1 2
L1 L2
3
551 1 43 7 2
1 2 3 0
L0 è L1: p* = 3 L0 è L2: p* = 0 or 1
http://groups.csail.mit.edu/netmit/wordpress/wp-content/themes/netmit/papers/texcp-
hotnets04.pdf
![Page 46: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/46.jpg)
CONGA-In Depth: LB Decisions
Send each packet on least congested path
46 H1 H2 H3 H4 H5 H6 H7
L0 L1 L2
H9 H8
Conges4on-‐To-‐Leaf Table @L0
Dest Leaf
Path 0 1 2
L1 L2
3
551 1 43 7 2
1 2 3 0
L0 è L1: p* = 3 L0 è L2: p* = 0 or 1
flowlet [Kandula et al 2004]
http://groups.csail.mit.edu/netmit/wordpress/wp-content/themes/netmit/papers/texcp-
hotnets04.pdf
![Page 47: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/47.jpg)
CONGA-In Depth: Flowlet Switching
H1 H2
TCP flow
• State-of-the-art ECMP hashes flows (5-tuples) to path to prevent reordering
TCP packets. • Flowlet switching* routes bursts of
packets from the same flow independently.
• No packet re-ordering
Gap ≥ |d1 – d2|
d1 d2
*Flowlet Switching (Kandula et al ’04) http://groups.csail.mit.edu/netmit/wordpress/wp-content/
themes/netmit/papers/texcp-hotnets04.pdf
![Page 48: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/48.jpg)
Of Elephants and Mice
! Two types of Flows in the data center ! Long Live Flows-”Elephant”
! Data or (block) storage migrations, VM Migrations, MapReduce ! Flows that impact buffers ! Not many in Data Centers, but just a few of these could be impactful
! Short Lived Flows-”Mice” ! Web requests, emails, small data requests ! Can be bursty
! How they interact is key ! In tradition ECMP, multiple long live flows can be mapped to few
links ! If mice are mapped to same links it is detrimental to the
application performance
48
![Page 49: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/49.jpg)
Of Elephants and Mice
! Two types of Flows in the data center ! Long Live Flows-”Elephant”
! Data or storage migrations, VM Migrations, MapReduce ! Flows that impact buffers ! Not many in Data Centers, but just a few of these could be impactful
! Short Lived Flows-”Mice” ! Web requests, emails, small data requests ! Can be bursty
! How they interact is key ! In tradition ECMP, multiple long live flows can be mapped to few
links ! If mice are mapped to same links it is detrimental to the
application performance
49
![Page 50: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/50.jpg)
Of Elephants and Mice
! Two types of Flows in the data center ! Long Live Flows-”Elephant”
! Data or storage migrations, VM Migrations, MapReduce ! Flows that impact buffers ! Not many in Data Centers, but just a few of these could be impactful
! Short Lived Flows-”Mice” ! Web requests, emails, small data requests ! Can be bursty
! How they interact is key ! In tradition ECMP, multiple long live flows can be mapped to few
links ! If mice are mapped to same links it is detrimental to the
application performance
50
Need a new Metric to determine this impact
Application Flow Completion Time (FCT)
![Page 51: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/51.jpg)
CONGA: Fabric Load Balancing Dynamic Flow Prioritization
Real traffic is a mix of large (elephant) and small (mice) flows.
Key Idea: Fabric detects initial few flowlets of each flow and
assigns them to a high priority class.
![Page 52: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/52.jpg)
CONGA: Fabric Load Balancing Dynamic Flow Prioritization
Real traffic is a mix of large (elephant) and small (mice) flows.
F1
F2
F3
Standard (single priority): Large flows severely impact
performance (latency & loss). for small flows
Key Idea: Fabric detects initial few flowlets of each flow and
assigns them to a high priority class.
![Page 53: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/53.jpg)
CONGA: Fabric Load Balancing Dynamic Flow Prioritization
Real traffic is a mix of large (elephant) and small (mice) flows.
F1
F2
F3
Standard (single priority): Large flows severely impact
performance (latency & loss). for small flows
High Priority
Dynamic Flow Prioritization: Fabric automatically gives a higher priority to small flows.
Standard Priority
Key Idea: Fabric detects initial few flowlets of each flow and
assigns them to a high priority class.
![Page 54: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/54.jpg)
CONGA: Fabric Load Balancing Dynamic Flow Prioritization
Real traffic is a mix of large (elephant) and small (mice) flows.
F1
F2
F3
Standard (single priority): Large flows severely impact
performance (latency & loss). for small flows
High Priority
Dynamic Flow Prioritization: Fabric automatically gives a higher priority to small flows.
Standard Priority
Key Idea: Fabric detects initial few flowlets of each flow and
assigns them to a high priority class.
![Page 55: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/55.jpg)
Why Is CONGA important for IP based Storage?
55
![Page 56: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/56.jpg)
Storage Flows are Elephant Flows
! Using Flowlet switching we can break long lived storage flows(block) into flowlets(bursts) and route across multiple paths ! No Packet reorder-in order delivery
! Send traffic on least congested path using CONGA feedback loop
! Loss of link in path is a loss of flowlet –Minimal disruption ! No TCP reset (ISCSI, NFS, APPs)
! Just send small burst that was lost
! Object and File based short flows(mice) get higher priority and are able complete flows faster
56
![Page 57: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/57.jpg)
CONGA for “Elephant” flows (block)
57
0
0.2
0.4
0.6
0.8
1
1.2
1.4
10 20 30 40 50 60 70 80 90
FCT
(N
orm
. to
EC
MP
)
Load (%)
ECMP
CONGA
Mice flows (<100KB) Elephant flows (>10MB)
0
0.2
0.4
0.6
0.8
1
1.2
10 20 30 40 50 60 70 80 90
FCT
(N
orm
. to
EC
MP
)
Load (%)
ECMP
CONGA
CONGA up-to 35% better than ECMP for
elephants
![Page 58: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/58.jpg)
CONGA for “Mice” flows (file or object)
58
0
0.2
0.4
0.6
0.8
1
1.2
1.4
10 20 30 40 50 60 70 80 90
FCT
(N
orm
. to
EC
MP
)
Load (%)
ECMP
CONGA
Mice flows (<100KB) Elephant flows (>10MB)
0
0.2
0.4
0.6
0.8
1
1.2
10 20 30 40 50 60 70 80 90
FCT
(N
orm
. to
EC
MP
)
Load (%)
ECMP
CONGA
CONGA up-to 40% better for mice
![Page 59: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/59.jpg)
Single Fabric for Storage(block,file or object) and Data
59
0
0.2
0.4
0.6
0.8
1
1.2
1.4
10 20 30 40 50 60 70 80 90
FCT
(N
orm
. to
EC
MP
)
Load (%)
ECMP
CONGA
Mice flows (<100KB) Elephant flows (>10MB)
0
0.2
0.4
0.6
0.8
1
1.2
10 20 30 40 50 60 70 80 90
FCT
(N
orm
. to
EC
MP
)
Load (%)
ECMP
CONGA
CONGA up-to 35% better than ECMP for
elephants
CONGA up-to 40% better for mice
![Page 60: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/60.jpg)
Link Failures with Minimal Loss
60
0 2 4 6 8
10 12 14 16 18 20
10 20 30 40 50 60 70
FCT
(N
orm
. to
Opt
imal
)
Load (%)
ECMP
CONGA
Overall Average FCT
![Page 61: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/61.jpg)
Summary
61
![Page 62: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/62.jpg)
CONGA with DCB meets the needs of Storage over Ethernet
62
Minimize chance of Dropped traffic
In Order Frame Delivery
Minimal Oversubscription/
Incast Issues
Lossless Fabric for FCoE (no Drop)
![Page 63: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/63.jpg)
CONGA with DCB meets the needs of Storage over Ethernet
63
Routing flowlets over least Congestion
across path-link loss has minimal impact
Minimize chance of Dropped traffic
In Order Frame Delivery
Flowlet Switching with CONGA guarantees in
order delivery
Minimal Oversubscription/
Incast Issues
40G fabric with Enhanced ECMP and Mice/Elephant Flow detection/ separation
Lossless Fabric for FCoE (no Drop)
CONGA and DCB (PFC,ETS) can be
implemented together
![Page 64: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/64.jpg)
After This Webcast
! This webcast will be posted to the SNIA Ethernet Storage Forum (ESF) website and available on-demand ! http://www.snia.org/forums/esf/knowledge/webcasts
! A full Q&A from this webcast, including answers to questions we couldn't get to today, will be posted to the SNIA-ESF blog ! http://sniaesfblog.org/
! Follow us on Twitter @ SNIAESF
64
![Page 65: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/65.jpg)
PRESENTATION TITLE GOES HERE Q&A
![Page 66: Controlling Congestion in New Storage Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062504/58a1bb4b1a28ab625d8ba683/html5/thumbnails/66.jpg)
PRESENTATION TITLE GOES HERE Thank You