FIF (FUTURE INTERNET FORUM)fif.kr/AsiaNetFPGAws/slide/1-4_slide.pdf · 2017. 2. 13. · FIF (FUTURE...
Transcript of FIF (FUTURE INTERNET FORUM)fif.kr/AsiaNetFPGAws/slide/1-4_slide.pdf · 2017. 2. 13. · FIF (FUTURE...
Data Center Quantized Congestion Notification (QCN): Implementation and Evaluation on NetFPGA
2010 June 14thMasato Yasuda (NEC Corporation)
Abdul Kader Kabanni (Stanford University)
© NEC Corporation 2009Page 2
Background
▐ In data centers, there is a movement of Network Convergence ▐ The spec of Converged Enhanced Ethernet (CEE) has been discussed in
IEEE 802.1 standard committees
Converged Enhanced Ethernet
Internet
servers
terminals
Internet
terminals
Converged DC transport
LAN(Fibre Channel: FC)(TCP)
SAN
storage storage
servers
Separate Network Converged Network
© NEC Corporation 2009Page 3
Congestion Control Mechanism in CEE
▐ In CEE, it has Congestion Control MechanismNon-TCP traffic (storage, media) can avoid network
congestion
▐ Congestion Notification (CN) The spec of Congestion Control Mechanism in CEE It is specified in IEEE802.1Qau in Data Center Bridging Task Group We proposed QCN (Quantized Congestion Notification) and finally
accepted as a standard in March 2010.
Converged Enhanced Ethernet (CEE)Converged Enhanced Ethernet (CEE)
FCoEFCoEIPIP IPIP
TCPTCP UDP(for media)
UDP(for media)
FC(for storage)
FC(for storage)
SupportSupportCongestion Congestion
ControlControlat Layer 2at Layer 2
© NEC Corporation 2009Page 4
Significant restrictions/requirements for CN
We must avoid queue oscillation which causes overflow (causes link pause) or underflow (lose utilization)
StableStable
Unlike TCP slow start, sources can come on at the full line rate of 10Gbps
Sources can start Sources can start at line lateat line late
Links can be paused by pause mechanism but CN is needed to avoid congestion spreading (pause spreading to innocent paths).
Links can be Links can be pausedpaused
The algorithm should be simple enough to be implemented in hardware (for 40G or 100G processing)
SimpleSimple
No way to know round trip time. Not automatically self-clocked like TCP.(The algorithm need to have counters for self-clocking)
No perNo per--packet packet ACKsACKs
© NEC Corporation 2009Page 5
QCN Overview
© NEC Corporation 2009Page 6
QCN (Quantized Congestion Notification)
▐ QCN: Congestion Control Mechanism at Layer 2 Proposed by our group and accepted in IEEE 802.1Qau
▐ Terminology: Congestion Point: Where congestion occurs, mainly switches Reaction Point: Source of traffic, mainly rate limiters in Ethernet
NICs
Reaction PointReaction Point Congestion PointCongestion PointFeedbackMessage
FbFbTrafficfrom othersources
SwitchNIC
© NEC Corporation 2009Page 7
▐ Feedback Processing Calculate Feedback value (Fb) from Switch Queue length Send feedback to Reaction Point (RP)
▐ Fb sampling probability No Congestion: Pmin(1%) Under Congestion: Pmax(10%) at maximum
▐ Fb Calculation
QCN Algorithm: Congestion Point (CP)
) Q wQ (F deltaoffb
Qeq : Qlen in equibrium
Qold : Queue length at previous processingQ: Current Queue LengthFb : Feedback (quantized to 6bits) w: fixed parameter (= 2)
QeqQ
QoldQdelta
Qoff
Qoff: Offset
eqoff QQQ Qdelta: Variance
olddelta QQQ
|Fb|
Refle
ction
Prob
ability
Pmin
Pmax
© NEC Corporation 2009Page 8
QCN Algorithm: Reaction Point (RP)
▐ Rate Control Rate Decrease: Feedback from CP Rate Increase: Increase by itself
▐ Rate Increase Algorithm: Averaging Principle Based on BIC-TCP: Works without ACK
Target Rate (TR)
Current Rate (CR)
Fast RecoveryFast Recovery
Active Active IncreaseIncrease
HyperHyper--Active Active IncreaseIncrease
Receive FeedbackReceive Feedback
)FGCR(1CR bdCRTR
1/2)F(G bmaxd
TR)/2(CRCR
TRTR
AIRTRTR
TR)/2(CRCR TR)/2(CRCR HAIiRTRTR
Time
Rate
TriggerTimeout or Byte Counter
© NEC Corporation 2009Page 9
Implementation
© NEC Corporation 2009Page 10
Current QCN Implementation
▐▐ WorldWorld’’s first Hardware QCN Full implementations first Hardware QCN Full implementation It is fully functional We confirmed that the result matches OMNet++ simulation result!
▐ QCN-NIC Multiple Reaction Points Improved Token Bucket Rate Limiter
▐ QCN-Switch Multiple Congestion Points
▐ Compliant with Pseudo Code v2.3 [1] 1Gpbs Platform (NetFPGA)
[1] QCN Pseudo Code v2.3: http://www.ieee802.org/1/files/public/docs2009/au-rong-qcn-serial-hai-v23.pdf
© NEC Corporation 2009Page 11
Packet Format
▐ Data Frame (Normal Ethernet UDP/IP Frame)
▐ QCN Fb Frame(64Bytes)
SRC MAC[48:32]DST MAC[47:0]
63 0
SRC MAC[31:0] Type =16’h0800
IP and UDP Header
Extract as flowid[7:0] at QCN Switch
SRC MAC[48:32]DST MAC[47:0]
63 0
SRC MAC[31:0] Type = 16’hXXXX QCN Field[15:0]
flowid[7:0] qcn_fb[5:0]
Ethernet Padding
reserved(2bits)
QCN Field[15:0]
© NEC Corporation 2009Page 12
QCN NIC Block Diagram
User Data Path
InputArbiter
Demux DemuxMux
Delay Line
SRAM
ReceivePacketparser
QCNTerminator
Fb Demuxqcn_fb, flowid
RateCalculator
Timer
Byte Count
Token BucketRate Limiter
UDP PktGen
Reaction Point 0
Demux Mux
Reaction Point 1Reaction Point 2
Reaction Point 3
© NEC Corporation 2009Page 13
Token Bucket Rate Limiter
▐ Token bucket algorithm avoiding burstiness
Packet
Increase token_fill size tokensIf refresh timer has expired
Decrease packet size tokensin sending packet
Bucket Size B
Wait for being allowed to send
tokens
Send Condition:tokens in the bucket > 0
Send Condition:tokens in the bucket > 0
Send Condition:tokens in the bucket >= token size
Send Condition:tokens in the bucket >= token size
Causing burstiness!
Previous implementation Current implementation
Working Fine
© NEC Corporation 2009Page 14
QCN Switch Module Structure
User Data Path
Fb Calculator
QlenMonitor
Fb Generator
PrioritizedMux
Output Queues
InputArbiter
PacketParser
Port ByteCount
Receive Block
FwdPort
Congestion Point
src/dst macsrc/dst port
Demux4port
PrioritizedMux
PrioritizedMux
PrioritizedMuxRate
Limiter
SRAM
© NEC Corporation 2009Page 15
Evaluation
© NEC Corporation 2009Page 16
QCN Hardware Evaluation System
QCN-NIC0
QCN-Switch
Port1MAC
Port2MAC
QCN-NIC1 CPRate
Limiter
Fb
Mux
Demux
drop
RP0RP1RP2RP3
Receive
Mux
MACDelayLine
RP0RP1RP2RP3
Receive
Mux
MACDelayLine
Port0MAC
Output Queue(Port0)
FwdPort
▐ Number of Reaction Points (RPs): 1-8 Each RP generates 1Gbps UDP Traffic
▐ Limit Rate at Switch: 950Mbps(3.7sec) 200Mbps(3.7sec) 950Gbps(3.7sec)
Limit Rate (950M200M950M)to check flow control
behavior
4RPs
4RPs
© NEC Corporation 2009Page 17
Parameters (QCN)
▐ NIC FAST_RECOVERY_THRESHOLD = 5 AI_INC = 0.5 Mbps HAI_INC = 5 Mbps BC_LIMIT = 150 KB (30% randomness) TIMER_PERIOD = 25 ms (30% randomness) MIN_RATE = 0.5 Mbps GD = 1/128
▐ Switch Quantized_Fb: 6 bits Q_EQ = 33 KB W = 2 Base marking = 150 KB, and varies according to the lookup table in
the pseudo code (30% randomness)
Based on QCN Pseudo Code Parameters
© NEC Corporation 2009Page 18
Evaluation Result
© NEC Corporation 2009Page 19
Hardware Evaluation Result (1RP) , RTT=100usec
Switch: Queue LengthNIC: Rate
020406080
100120140
1 3 5 7 9 11
Evaluation Time [sec]Q
ueue
Siz
e [K
Byt
es]
0
200
400
600
800
1000
1 3 5 7 9 11
Evaluation Time [sec]
Rat
e [M
bps]
Queue length rises temporarilyin limiting Switch rate
but go back to Qeq in shorter time.
Queue length rises temporarilyin limiting Switch rate
but go back to Qeq in shorter time.
The NIC rate behavior well followsthe limited Switch rate
by QCN Algorithm
The NIC rate behavior well followsthe limited Switch rate
by QCN Algorithm
© NEC Corporation 2009Page 20
OMNet++ Simulation Result (1RP) , RTT=100usec
Switch: Queue LengthNIC: Rate
020406080
100120140
1 3 5 7 9 11
Simulation Time [sec]
Que
ue S
ize
[KB
ytes
]
0
200
400
600
800
1000
1 3 5 7 9 11
Simulation Time [sec]
Rat
e [M
bps]
Simulation result well matches the Hardware Evaluation Result !Simulation result well matches the Hardware Evaluation Result !
© NEC Corporation 2009Page 21
Hardware Evaluation Result (8RPs) , RTT=1msec
Switch: Queue LengthNIC: Rate
020406080
100120140
1 3 5 7 9 11
Evaluation Time [sec]
Que
ue S
ize
[KB
ytes
]
0
200
400
600
800
1000
1 3 5 7 9 11
Evaluation Time [sec]
Rat
e [M
bps]
It also matches OMNet++ Simulation Result with the same parameterIt also matches OMNet++ Simulation Result with the same parameter
Wiggling because of longer delaybut still keeps the same queue length.
Wiggling because of longer delaybut still keeps the same queue length.
The aggregate NIC rate behavior still well follows the Switch limited rate
The aggregate NIC rate behavior still well follows the Switch limited rate
© NEC Corporation 2009Page 22
Demonstration
© NEC Corporation 2009Page 23
Conclusion
▐ We presented/demonstrated QCN theory and implementation
▐ QCN Implementation on NetFPGA Board All QCN Functions are implemented (Pseudo Code v2.3) QCN-NIC
• Multiple RPs are ready• Improved Token Bucket Rate Limiter
QCN-Switch• Multiple CPs are ready• Binary Search Logic for Fb Calculation
▐ We have proved that our QCN implementation achieves expected performance. The result matches OMNet++ Simulation