Non-Minimal Routing Strategy for Application-Specific Networks-on-Chips Hiroki Matsutani Michihiro...
-
Upload
brianna-hill -
Category
Documents
-
view
213 -
download
1
Transcript of Non-Minimal Routing Strategy for Application-Specific Networks-on-Chips Hiroki Matsutani Michihiro...
Non-Minimal Routing Strategy for Application-Specific Networks-on-Chips
Hiroki Matsutani
Michihiro Koibuchi
Yutaka Yamada
Jouraku Akiya
Hideharu Amano
Keio Univ.
National Institute of Informatics
Toshiba RDC
Keio Univ.
Keio Univ.
Network-on-Chip (NoC)• Tile-based Multi-Core
– Core: Execution– Router: Packet delivery
• RAW– 2D Mesh
• ACM– Tree
• aSoC– 2D Mesh
[Taylor, Micro2002]
[Liang, TVLSI2004]
[Furtek, FPL2004]
0 1 2
3 4 5
6 7 8
Tile (RISC, RAM, I/O)
Network-on- b Chip (NoC)
[Taylor, Micro2002]
[Liang, TVLSI2004]
[Furtek, FPL2004]
MIPSMemory
Router
• Tile-based Multi-Core– Core: Execution– Router: Packet delivery
• RAW– 2D Mesh
• ACM– Tree
• aSoC– 2D Mesh
Network-on-Chip (NoC)
0 1 2
3 4 5
6 7 8
SoC is growing! NoC is one of Scalable on-chip interconnects
• Better Wiring Delay– Global wiring– Limited-length Links
• Improve Modularity– Standard Network I/F
○ Advantage
• Overhead
× Drawback
Tile (RISC, RAM, I/O)
Stream Processing ~ Simulation ~
Module(a) Module(b)
Data
• No Clock for execution
Module(a) Module(b)
Data
• Communication is cycle accurate
Clock
• MPEG, JPEG, Viterbi– System Level Design
RTL Model
UnTimed Functional
Bus Cycle Accurate
UTF Model
BCA Model
High Abstraction
Detail DesignApplication is divided into some Tasks based on Simulation.
Task Flow Graph
Stream Processing ~Map, Route~
• Shared Links– Link Congestion Throughput is degraded
• Optimization (in general)– Mapping: Minimum Communication Length– Routing : Minimal Paths
(2)
(2)
(2)
(1) (3) (4)
Physical Tile of NoC
(1) (2) (2) (2)
(4) (3)
Strong access locality !!
Too short to distribute path congestion by Minimal paths.
Existing Routing ~ Is non-minimal path useful?~
• Packet delivery– WH Switching
Common feature of SAN & NoC
Predictable communication Load balancing with non-minimal
• Deadlock freedom – Turn-Model, …
• Various applications, Various traffic patterns– Non-minimal paths
make unstable state
Feature of SAN
[Ho, HPCA2003]
• Fixed application, Fixed traffic patterns– System level simulation
Feature of NoC
Flee ~ Non-minimal routing strategy~
• Stream processing in NoCs– Strong access locality !!– Too short to distribute path congestions
• Partially non-minimal paths
• Path establishment based on Traffic Amount– Heavy Traffic Comm. Minimal Path– Light Traffic Comm. Avoiding Congestion
Non-minimal paths are basically inefficient…
Increase # of alternative pathsby introducing non-minimal paths
Flee ~ Traffic pattern Analysis~
# time, src, dst, size
10000 (0) (1) 32
10000 (0) (2) 4
10000 (0) (3) 4
10010 (1) (2) 32
10010 (0) (1) 32
10010 (0) (2) 4
10010 (0) (3) 4
10020 (2) (3) 32
10020 (1) (2) 32
10030 (2) (3) 4
Traffic Pattern
Traffic Analysis
1. For each src-dst pair,
–Totalize packet size
E.g., src-dst pair(0,1)
32 + 32 64
2. Sorting in descending order
–In order of TotalSize
# srcdst, TotalSize
(0) (1) 8192
(1) (2) 8192
(2) (3) 8192
(0) (2) 1024
(0) (3) 1024
…
Analysis Record
Src-dst pair with largestTotalSize is in first line
Each src-dst pair gets a path in order of Analysis Record.
Heavy!
# srcdst, TotalSize
(0) (1) 8192
(1) (2) 8192
(2) (3) 8192
(0) (2) 1024
(0) (3) 1024
…
(0) (1) (2) (3)
Flee ~ Establishing Paths ~
• In order of Traffic Amount :– Search for lowest cost path– Increase the cost of links selected
Each link has “Cost”
解析結果
# srcdst, TotalSize
(0) (1) 8192
(1) (2) 8192
(2) (3) 8192
(0) (2) 1024
(0) (3) 1024
…
Analysis Record
# srcdst, TotalSize
(0) (1) 8192
(1) (2) 8192
(2) (3) 8192
(0) (2) 1024
(0) (3) 1024
…
Analysis Record
# srcdst, TotalSize
(0) (1) 8192
(1) (2) 8192
(2) (3) 8192
(0) (2) 1024
(0) (3) 1024
…
Analysis Record
# srcdst, TotalSize
(0) (1) 8192
(1) (2) 8192
(2) (3) 8192
(0) (2) 1024
(0) (3) 1024
…
Analysis Record
# srcdst, TotalSize
(0) (1) 8192
(1) (2) 8192
(2) (3) 8192
(0) (2) 1024
(0) (3) 1024
…
Analysis Record
Paths are assigned not to disturb previously established paths
There will be several alternative paths …
Link with high cost is hotspot …
Simulation Environments• Router Model
– 4 ports for adj. Routers– 1 port for Core
• Network Topology– 4×4 Mesh– 4×4 Torus
16 node 2D mesh
0
4
8
12
1
5
9
13
2
6
10
14
3
7
11
15
Router
Core
Packet size 259 flit (2 flit header)
Switching method Wormhole switching
# of Virtual channels Mesh : 1, Torus :2
Simulation time 1,000,000 cycle
Applications for Evaluation• App. Traces
– Viterbi Decoder– JPEG Codec– IPsec– Uniform
(0)HeaderAnalysis
(1)HuffmanDecode
(2)InverseQuant.
(3)I-DCT
for Row
(4) (5)Yuv-rgbConvert
(6)MCU
Mapping
(7)I-DCTfor Col
(8)Rgb-yuvConvert
(9)MCU
Samping
(10)I-DCTfor Col
(11)I-DCT
for Row
(12) (13)StreamGen.
(14)Huffman
Code
(15)Quant.
Tile mapping example of JPEG Codec
( for Decoder, for Encoder)
Results ~ Viterbi @ 2D Mesh~
• Flee– Avg Hop count : 2.
52
• DOR– Avg Hop count : 1.
84
X-axis : Accepted Traffic [flit/cycle/node]
Y-a
xis:
Lat
en
cy [c
ycle
]
14.2% Improved
Communication in Viterbi trace includes Fork and Join.
(Dimension-Order Routing)
Results ~ Viterbi @ 2D Torus~
• Flee– Avg Hop count : 1.
87
• DOR– Avg Hop count : 1.
48
22.2% Improved
X-axis : Accepted Traffic [flit/cycle/node]Flee improves 22.2% of throughput with non-minimal paths.
Y-a
xis:
Lat
en
cy [c
ycle
]
Communication in Viterbi trace includes Fork and Join.
(Dimension-Order Routing)
Results ~ JPEG @ 2D Mesh~
• Flee– Avg Hop count : 1.
01
• DOR– Avg Hop count : 1.
00
No difference
X-axis : Accepted Traffic [flit/cycle/node]
Y-a
xis:
Lat
en
cy [c
ycle
]
In JPEG trace, data is sequentially process. No fork and join pattern.
(Dimension-Order Routing)
Communication is between neighbors No need non-minimal
Results ~ Effect of Traffic Analysis~
• Flee– Known data amount
• Flee (Incomplete)
– Unknown data amount
Incomplete Flee: Not Improved
Viterbi @ 2D MeshY
-axi
s: L
ate
ncy
[cyc
le]
X-axis : Accepted Traffic [flit/cycle/node]
All data transfer size is “1”
Results ~ Effect of Traffic Analysis~
• Flee– Known data amount
• Flee (Incomplete)
– Unknown data amount
Incomplete Flee: Partially Improved
All data transfer size is “1”
Viterbi @ 2D Torus
X-axis : Accepted Traffic [flit/cycle/node]Communication size is key factor to improve performance.
Y-a
xis:
Lat
en
cy [c
ycle
]
Summary ~ Non-minimal routing strategy~
• Stream Processing in NoCs– Strong access locality !!– Too short to distribute path congestions
• Flee: Non-minimal routing strategy– Heavy Traffic Comm. Minimal Paths– Light Traffic Comm. Avoiding Congestions
• Improve 22.2% of Throughput
Increase # of alternative pathsby introducing non-minimal paths
Thank you for your listening