Stress Resistant Scheduling Algorithms for CIOQ Switches
description
Transcript of Stress Resistant Scheduling Algorithms for CIOQ Switches
![Page 1: Stress Resistant Scheduling Algorithms for CIOQ Switches](https://reader035.fdocuments.net/reader035/viewer/2022062322/56814ac0550346895db7d5bc/html5/thumbnails/1.jpg)
Stress Resistant Scheduling Algorithms for
CIOQ Switches Prashanth PappuApplied Research Laboratory
Washington University in St Louis
“Stress Resistant Scheduling Algorithms for CIOQ switches”, Prashanth Pappu, Jon Turner. To appear in ICNP 2003.
![Page 2: Stress Resistant Scheduling Algorithms for CIOQ Switches](https://reader035.fdocuments.net/reader035/viewer/2022062322/56814ac0550346895db7d5bc/html5/thumbnails/2.jpg)
Prashanth Pappu
Anatomy of a Router
Switch Fabric
IPP
OP
P
LineCard
IPP
OP
P
LineCard
IPP
OP
P
LineCard
IPP
OP
P
LineCard
IPP
OP
P
LineCard
IPP
OP
P
LineCard
ControlProcessor
Port processor queue packets and make routing decisions
Line cards encode data for transmission on target physical layer.
Control processor – routing protocols and monitoring functions.
![Page 3: Stress Resistant Scheduling Algorithms for CIOQ Switches](https://reader035.fdocuments.net/reader035/viewer/2022062322/56814ac0550346895db7d5bc/html5/thumbnails/3.jpg)
Prashanth Pappu
Output Queuing
Queuing is done only at output ports. Maximizes throughput. Contentions between packets - only at output ports. Speedup=N, impractical but ideal model.
SwitchingFabric
OutputPorts
InputPorts
![Page 4: Stress Resistant Scheduling Algorithms for CIOQ Switches](https://reader035.fdocuments.net/reader035/viewer/2022062322/56814ac0550346895db7d5bc/html5/thumbnails/4.jpg)
Prashanth Pappu
Combined Input Output Queuing (CIOQ)
Use of VOQs. Crossbar configured by centralized scheduler. Bipartite graph matching problem.
SwitchingFabric
OutputPorts
InputPorts
……
CentralizedScheduler
![Page 5: Stress Resistant Scheduling Algorithms for CIOQ Switches](https://reader035.fdocuments.net/reader035/viewer/2022062322/56814ac0550346895db7d5bc/html5/thumbnails/5.jpg)
Prashanth Pappu
Stability results Maximum size matching (MSM) – stable for i.i.d,
uniform, admissible traffic. Maximum weight matching (MWM) – stable for
independent, admissible traffic. Too complex, O(N5/2) and O(N3logN). Switch with 10 Gb/s links has < 40 ns to make
scheduling decision. Maximal size matching algorithms – Parallel
Iterative Matching (PIM) and iterative SLIP (iSLIP).
![Page 6: Stress Resistant Scheduling Algorithms for CIOQ Switches](https://reader035.fdocuments.net/reader035/viewer/2022062322/56814ac0550346895db7d5bc/html5/thumbnails/6.jpg)
Prashanth Pappu
Parallel iterative Matching (PIM) Iterative matching algorithm
each unmatched input sends request to every output for which it has a queued cell.
unmatched outputs randomly pick a request and send grant.
if input receives multiple grants, it picks one randomly.
O(log N) convergence.
![Page 7: Stress Resistant Scheduling Algorithms for CIOQ Switches](https://reader035.fdocuments.net/reader035/viewer/2022062322/56814ac0550346895db7d5bc/html5/thumbnails/7.jpg)
Prashanth Pappu
iterative SLIP (iSLIP) Iterative matching algorithm
unmatched inputs send requests to unmatched outputs (for which they have cells)
unmatched outputs pick a request that appears next in a fixed round-robin order from an input pointer. (input pointer is updated only in first iteration)
if input gets multiple grants, it picks one that appears next in a fixed round robin order from an output pointer. (update of output pointer)
Desynchronization effect. Simple to implement but do not perform well under
extreme traffic conditions.
![Page 8: Stress Resistant Scheduling Algorithms for CIOQ Switches](https://reader035.fdocuments.net/reader035/viewer/2022062322/56814ac0550346895db7d5bc/html5/thumbnails/8.jpg)
Prashanth Pappu
Worst case results
Critical Cells First (CCF) can emulate output queuing with speedup =2.
Lowest occupancy output first algorithm is work conserving with speedup =2.
Can be augmented with timestamps to emulate output queuing. (speedup=3)
![Page 9: Stress Resistant Scheduling Algorithms for CIOQ Switches](https://reader035.fdocuments.net/reader035/viewer/2022062322/56814ac0550346895db7d5bc/html5/thumbnails/9.jpg)
Prashanth Pappu
LOOFA
Iterative matching algorithm unmatched inputs send requests to outputs with
lowest occupancy (for which they have queued cells)
outputs pick a request randomly and send grant to input
O(N) iterations to perform correctly. Work conserving with speedup of 2. Significant result but not practical.
![Page 10: Stress Resistant Scheduling Algorithms for CIOQ Switches](https://reader035.fdocuments.net/reader035/viewer/2022062322/56814ac0550346895db7d5bc/html5/thumbnails/10.jpg)
Prashanth Pappu
Traffic in IP networks Unregulated nature of IP networks can cause
sustained overloads. use of slow congestion control mechanisms limited route diversity makes congested links common use of route selection mechanisms not guided by session
bandwidth needs sudden route changes causing rapid traffic shifts malicious users
How do practical scheduling algorithms perform in these conditions?
![Page 11: Stress Resistant Scheduling Algorithms for CIOQ Switches](https://reader035.fdocuments.net/reader035/viewer/2022062322/56814ac0550346895db7d5bc/html5/thumbnails/11.jpg)
Prashanth Pappu
Solution
We use targeted stress tests to study performance of practical scheduling
algorithms under extreme conditions study performance of work conserving scheduling
algorithms under speedups < 2 design stress resistant scheduling algorithms
which maintain throughput under uniform traffic and stress tests and can still be implemented at high speeds.
![Page 12: Stress Resistant Scheduling Algorithms for CIOQ Switches](https://reader035.fdocuments.net/reader035/viewer/2022062322/56814ac0550346895db7d5bc/html5/thumbnails/12.jpg)
Prashanth Pappu
Miss fraction
Previous work use average queuing delay as a metric.
Not useful under inadmissible traffic conditions.
Miss fraction
miss fraction = 1 – NA/NI
Determines relative loss in throughput.
![Page 13: Stress Resistant Scheduling Algorithms for CIOQ Switches](https://reader035.fdocuments.net/reader035/viewer/2022062322/56814ac0550346895db7d5bc/html5/thumbnails/13.jpg)
Prashanth Pappu
Stress Testphase 1 phase 2 phase 3 phase 4
Adversary approach in overloading (stressing) various outputs.
Output with empty queues have cells queued at various inputs.
Inputs with cells for an empty output also have cells queued forother outputs.
Test can be varied by changing number of participating inputs orphases.
![Page 14: Stress Resistant Scheduling Algorithms for CIOQ Switches](https://reader035.fdocuments.net/reader035/viewer/2022062322/56814ac0550346895db7d5bc/html5/thumbnails/14.jpg)
Prashanth Pappu
Stress Test (Example)
PIM (speedup =1.5). Stress test with 3 participating inputs, 4 phases.
![Page 15: Stress Resistant Scheduling Algorithms for CIOQ Switches](https://reader035.fdocuments.net/reader035/viewer/2022062322/56814ac0550346895db7d5bc/html5/thumbnails/15.jpg)
Prashanth Pappu
PIM (under uniform traffic)
Average Queuing delays Miss fraction
![Page 16: Stress Resistant Scheduling Algorithms for CIOQ Switches](https://reader035.fdocuments.net/reader035/viewer/2022062322/56814ac0550346895db7d5bc/html5/thumbnails/16.jpg)
Prashanth Pappu
iSLIP (under uniform traffic)
Average Queuing delays Miss fraction
![Page 17: Stress Resistant Scheduling Algorithms for CIOQ Switches](https://reader035.fdocuments.net/reader035/viewer/2022062322/56814ac0550346895db7d5bc/html5/thumbnails/17.jpg)
Prashanth Pappu
Stress Tests
Test A(Worst case for PIM(4), speedup=2)
Test B(Worst case for LOOFA, speedup=2)
![Page 18: Stress Resistant Scheduling Algorithms for CIOQ Switches](https://reader035.fdocuments.net/reader035/viewer/2022062322/56814ac0550346895db7d5bc/html5/thumbnails/18.jpg)
Prashanth Pappu
Stress resistant algorithms Better performance of LOOFA suggests, ordering
outputs is the key. Complete ordering can make algorithms too
complex to implement. But traffic conditions are persistent and change
slowly, use approximate ordering schemes. Lowest Layer Selection (LLS) heuristic which achieves a
coarser ordering of outputs. Odd-even sorting which achieves approximate ordering but
converges to ideal ordering under persistent traffic conditions.
![Page 19: Stress Resistant Scheduling Algorithms for CIOQ Switches](https://reader035.fdocuments.net/reader035/viewer/2022062322/56814ac0550346895db7d5bc/html5/thumbnails/19.jpg)
Prashanth Pappu
Lowest Layer Selection achieves coarser ordering bigger layers for larger
queue lengths beyond a queue limit all
outputs are treated equal number of layers
independent of N. algorithms give priority to
outputs in lowest layer in accept phase.
priority encoder or N-way minimum finding circuit can be used on a grant vector.
![Page 20: Stress Resistant Scheduling Algorithms for CIOQ Switches](https://reader035.fdocuments.net/reader035/viewer/2022062322/56814ac0550346895db7d5bc/html5/thumbnails/20.jpg)
Prashanth Pappu
Lowest Layer Selection - Random (LLS-R) Iterative matching algorithm
each unmatched input sends request to every output for which it has a queued cell.
unmatched outputs randomly pick a request and send grant.
if input receives multiple grants, it picks one randomly from lowest layer.
O(log N) convergence still holds.
![Page 21: Stress Resistant Scheduling Algorithms for CIOQ Switches](https://reader035.fdocuments.net/reader035/viewer/2022062322/56814ac0550346895db7d5bc/html5/thumbnails/21.jpg)
Prashanth Pappu
Lowest Layer Selection –SLIP (LLS-S) Iterative matching algorithm
unmatched inputs send requests to unmatched outputs (for which they have cells)
unmatched outputs pick a request that appears next in a fixed round-robin order from an input pointer. (input pointer is updated only in first iteration)
if input gets multiple grants, it picks one that appears next in the lowest layer in a fixed round robin order from an output pointer. (update of output pointer)
Both LLS-R and LLS-S have the same performance as PIM and iSLIP under uniform traffic.
![Page 22: Stress Resistant Scheduling Algorithms for CIOQ Switches](https://reader035.fdocuments.net/reader035/viewer/2022062322/56814ac0550346895db7d5bc/html5/thumbnails/22.jpg)
Prashanth Pappu
Stress TestMiss fractions for LLS-R, LLS-S (using 16 layers) and LOOFA.
Test A Test B
![Page 23: Stress Resistant Scheduling Algorithms for CIOQ Switches](https://reader035.fdocuments.net/reader035/viewer/2022062322/56814ac0550346895db7d5bc/html5/thumbnails/23.jpg)
Prashanth Pappu
Stress TestMiss fractions for LLS-S and LLS-R (single iteration) with varying layers.
LLS-S (Test A) LLS-R (Test A)
![Page 24: Stress Resistant Scheduling Algorithms for CIOQ Switches](https://reader035.fdocuments.net/reader035/viewer/2022062322/56814ac0550346895db7d5bc/html5/thumbnails/24.jpg)
Prashanth Pappu
Approximate LOOFA (A-LOOFA)
LOOFA is complex but can be used as the basis for a practical algorithm (with similar performance)
![Page 25: Stress Resistant Scheduling Algorithms for CIOQ Switches](https://reader035.fdocuments.net/reader035/viewer/2022062322/56814ac0550346895db7d5bc/html5/thumbnails/25.jpg)
Prashanth Pappu
Approximate LOOFA (A-LOOFA)
)()( ,1,1,,1,1,, jijijijijijiji cvrcvrr
Matching in A-LOOFA is accomplished using a simple combinational circuit.
O(N) but constant factor is determined by gate delays (2N times delay in each block).
.13 um ASIC process, gate delays are 25-50 ps. Match can be completed in 3.2-6.4 ns.
)()( 1,,,11,,,1, jijijijijijiji rvcrvcc
![Page 26: Stress Resistant Scheduling Algorithms for CIOQ Switches](https://reader035.fdocuments.net/reader035/viewer/2022062322/56814ac0550346895db7d5bc/html5/thumbnails/26.jpg)
Prashanth Pappu
A-LOOFA Columns are ordered using odd-even sort.
for all even j < N, swap Bj and qj with Bj+1 and qj+1, if qj > qj+1.
Similarly, for all odd j < N-1
Rows are ordered using a permutation based on perfect shuffle (to ensure fairness). for all even i<N, generate a pseudo random bit xi.
if xi = 0, values in row i are moved to row i/2 and those in i+1 are moved to (N+i)/2.
else, values in row i are moved to row (N+i)/2 and values in row i+1 are moved to row i/2.
![Page 27: Stress Resistant Scheduling Algorithms for CIOQ Switches](https://reader035.fdocuments.net/reader035/viewer/2022062322/56814ac0550346895db7d5bc/html5/thumbnails/27.jpg)
Prashanth Pappu
A-LOOFA performance
Test A Test B