The Simulation of the Dynamic Link Allocation Router (DyLAR)
Transcript of The Simulation of the Dynamic Link Allocation Router (DyLAR)
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
The Simulation of the Dynamic
Link Allocation Router (DyLAR)
Wei Song
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Overview
• A brief review of the Dynamic Link
Allocation flow control method
• The new simulation platform
• Some simple performance analyses
• An alternative method of the task request
procedure
• Future schedule
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Serial is better than Parallel
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C C
C
C C
C
DI00
DI01
DI10
DI11
DI20
DI21
DI30
DI31
DO00
DO01
DO10
DO11
DO20
DO21
DO30
DO31
ACKI ACKO
Dual-rail 0
Dual-rail 1
Dual-rail 2
Dual-rail 3
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Bandwidth efficiency is
less than 50% Master Slave
Tim
e
Request to reserve a path
OK ACK
Data transmissionsData transmissionsData transmissionsData transmissionsData transmissionsData transmissions
False ACK
Data transmissions (end)
Request to reserve a path
False Ack
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
The high Loss Rate
Simulation results of a 6x6 NoC.
0 100 200 300 400
0
2000
4000
6000
8000
10000
12000
14000
16000
Avera
ge F
ram
e L
ate
ncy (
ns)
Frame Injection Rate (kfps)0 100 200 300 400
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Loss R
ate
Frame Injection Rate (kfps)
Flit Level Loss Rate
Frame Level Retry rate
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Some hypotheses of DyLAR
• Asynchronous circuits prefer serial rather than parallel channels
• Connection oriented communications only have a bandwidth efficiency less than 50%
• The high retry rate of connection oriented communication is reducible by add virtual channels
• The input buffer could be smaller than flit size when using serial channels
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
The DyLAR Router
Tran
Control
Tran
Control
Tran
Control
Arbiter
Output Buffer
Output Buffer
Output Buffer
Data
Switch
Request
Switch
Input Buffer
Input Buffer
Input Buffer
DyLAR
Router
Credit(0,0)
Credit(0,1)
Credit(0,2)
Sub-link(0,0)
Sub-link(0,1)
Sub-link(0,2)
Credit(1,0)
Credit(1,1)
Credit(1,2)
Sub-link(1,0)
Sub-link(1,1)
Sub-link(1,2)
Credit(2,0)
Credit(2,1)
Credit(2,2)
Sub-link(2,0)
Sub-link(2,1)
Sub-link(2,2)
Sub-link(0,0)
Sub-link(0,1)
Sub-link(0,2)
Sub-link(1,0)
Sub-link(1,1)
Sub-link(1,2)
Sub-link(2,0)
Sub-link(2,1)
Sub-link(2,2)
Credit(2,0)
Credit(2,1)
Credit(2,2)
Credit(1,0)
Credit(1,1)
Credit(1,2)
Credit(0,0)
Credit(0,1)
Credit(0,2)
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Flit Formats
flit type flit
headerXY
8 bit8 bit
data flit type flit
header
4 bit128 bit_
*24
REQ NUM
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
The Flow Control Procedures
Tran
Control
Tran
Control
Tran
Control
Arbiter
1
2
3
4
Tran
Control
Tran
Control
Tran
Control
Arbiter
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Overview
• A brief review of the Dynamic Link
Allocation flow control method
• The new simulation platform
• Some simple performance analyses
• An alternative method of the task request
procedure
• Future schedule
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Basic information
– Mesh topology
– Only send XY frames
– Parameter reconfigurable
– Latency is set according to 1-of-4 CHAIN link
– SystemC 2.2.0
– GNU g++
– Makefile
– Batch simulation and automatic result analysis (accepted traffic, latency, loss rate)
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Configurable parameters
– Dimension (>1)
– Injected traffic (kfps) (>0)
– Channel number (>0)
– Request number (>0)
– Random seed (0 random seed, others seeds)
– Random delay
– Simulation time
– VCD file (generate waveform and debug logs)
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Current Problems
• The router design
– Multiple request lines sharing one channel will
generate deadlocks
• (still under debugging and modificating)
• The simulation model
– Slow (possible > 20 min under 4x4 cases)
– Memory consuming (possible > 2G under
some 4x4 cases) Simulation environment: ADM 2.4GHz 64-bit 4G memory
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Deadlock Avoidance 1
!
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Deadlock Avoidance 2
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Deadlock Recovery 1
Tran
Control
Tran
Control
Tran
Control
Arbiter
Output Buffer
Output Buffer
Output Buffer
Data
Switch
Request
Switch
Input Buffer
Input Buffer
Input Buffer
DyLAR
Router
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Deadlock Recovery 2
Tran
Control
Tran
Control
Tran
Control
Arbiter
Output Buffer
Output Buffer
Output Buffer
Data
Switch
Request
Switch
Input Buffer
Input Buffer
Input Buffer
DyLAR
Router
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Overview
• A brief review of the Dynamic Link
Allocation flow control method
• The new simulation platform
• Some simple performance analyses
• An alternative method of the task request
procedure
• Future schedule
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Simulation parameters
• Dimension 4x4
• Channel 1~3
• Request line 1~8
• Frame injection rate 20~500 kfps
• Random delay and random uniform traffic
pattern
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
1 channel with multiple requests
0 20 40 60 80 100 120 140 160 180 200 220 240
0
20
40
60
80
100
120
140
160
180
200
220
240
260
280
300
320
Accepte
d T
raffic
(M
Byte
/s)
Injection Rate (kfps)
1 req
2 req
4 req
6 req
0 20 40 60 80 100 120 140
0
10000
20000
30000
40000
50000
60000
70000
avera
ge L
ate
ncy (
ns)
Injection Rate (kfps)
1 req
2 req
4 req
6 req
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
1 channel with multiple requests
0 20 40 60 80 100 120 140
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9R
etr
y r
ate
Injection Rate (kfps)
1 req
2 req
4 req
6 req
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
1 request with multiple channels
0 20 40 60 80 100 120 140 160 180 200 2200
50
100
150
200
250
300
Acce
pte
d T
raffic
(M
Byte
/s)
Injection Rate (kfps)
1C1R
2C1R
3C1R
0 20 40 60 80 100 120 1400
10000
20000
30000
40000
50000
avera
ge L
ate
ncy (
ns)
Injection Rate (kfps)
1C1R
2C1R
3C1R
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
1 request with multiple channels
0 50 100
2000
4000
6000
ave
rag
e L
ate
ncy (
ns)
Injection Rate (kfps)
1C1R
2C1R
3C1R
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
2 channels with multi-requests
0 50 100 150 200 250 300 3500
100
200
300
400
500
600
700
800
Acce
pte
d T
raff
ic (
MB
yte
/s)
Injection Rate (kfps)
C1R1
C2R1
C2R2
C2R4
C2R6
C2R8
0 50 100 150 200 250 300 3500
10000
20000
30000
40000
50000
Ave
rag
e L
ate
ncy (
ns)
Injection Rate (kfps)
C1R1
C2R1
C2R2
C2R4
C2R6
C2R8
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
3 channels with multi-requests
0 100 200 300 400 5000
200
400
600
800
1000
1200
Accep
ted
Tra
ffic
(M
Byte
/s)
Injection Rate (kfps)
3C1R
3C2R
3C4R
3C6R
3C8R
0 100 200 300 400 5000
10000
20000
30000
40000
50000
60000
70000
80000
Avera
ge
La
ten
cy (
ns)
Injection Rate (kfps)
3C1R
3C2R
3C4R
3C6R
3C8R
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Throughput
Unit: MByte/s
186 266 300 300 300
265 512 710 >710 >710
300 650 >1000 >1000 >1000
1 channel
2 channel
3 channel
1
request
2
request
4
request
6
request
8
request
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Overview
• A brief review of the Dynamic Link
Allocation flow control method
• The new simulation platform
• Some simple performance analyses
• An alternative method of the task request
procedure
• Future schedule
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
The Original Task Request
Procedure
M S S STR
F(3)
TRF(2) TRF(1) TRF(0)
VRF/TAFVRF/TAFTAF
TAF
TRF task request flit
VRF volunteer request flit
TAF task acknowledge flit
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
The alternative method
M S S STR
F(3)
TRF(3) TRF(2) TRF(1)
M S S S
VRF/TAFVRF/TAFTAF
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Comparison of the two methods
• The original TRF
– Need counters to calcuate
life_time
– Remember state for every
TRF
– Special communication
with NA
– Wait for the whole flit
– One request line per TRF
• The alternative
– Move counters to NA
– States will be recorded by
NA and only 1 state
machine is enough
– Directly send flit to NA
– Send directly after the
flit_type field
– Two request lines per TRF
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Overview
• A brief review of the Dynamic Link
Allocation flow control method
• The new simulation platform
• Some simple performance analyses
• An alternative method of the task request
procedure
• Future schedule
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Schedule
• The simulation model is still under
debugging
• Build the hardware model according to the
SystemC model
• Try to speed up the simulation model and
reduce the memory required
2014/5/13 Advanced Processor Technology Group
The School of Computer Science
Thank you!
Questions?