A Cost Effective Centralized Adaptive Routing for Networks on Chip

26
A Cost Effective Centralized Adaptive Routing for Networks on Chip Ran Manevich, Israel Cidon, Avinoam Kolodny, Isask’har (Zigi) Walter and Shmuel Wimer Technion – Israel Institute of Technology M odule M odule M odule M odule M odule M odule M odule M odule M odule M odule M odule M odule Group Research QNoC

description

A Cost Effective Centralized Adaptive Routing for Networks on Chip. Ran Manevich, Israel Cidon, Avinoam Kolodny, Isask ’ har (Zigi) Walter and Shmuel Wimer. Technion – Israel Institute of Technology. QNoC. Research. Group. - PowerPoint PPT Presentation

Transcript of A Cost Effective Centralized Adaptive Routing for Networks on Chip

Page 1: A Cost Effective Centralized Adaptive Routing for Networks on Chip

A Cost Effective Centralized Adaptive

Routing for Networks on Chip

Ran Manevich, Israel Cidon, Avinoam Kolodny, Isask’har (Zigi) Walter and Shmuel

WimerTechnion – Israel Institute of

TechnologyModule

Modu le Module

Modu le Modu le

Modu le Modu le

Modu le

Module

Modu le

Modu le

Modu leGroup

ResearchQNoC

Page 2: A Cost Effective Centralized Adaptive Routing for Networks on Chip

Global traffic information is essential to make the right decision!

Page 3: A Cost Effective Centralized Adaptive Routing for Networks on Chip

2D Mesh NoC

Adaptive Routing in NoCs – Local vs. Global Information

Low CongestionMedium CongestionHigh Congestion

A Packet routed from upper left to bottom right corner utilizing local congestion information.The same packet routed using global information.

I CAN MAKE IT!!!Source

Destination

Page 4: A Cost Effective Centralized Adaptive Routing for Networks on Chip

Route Selection - ATDOR ATDOR - Adaptive Toggle Dimension Ordered

Routing Keep it simple! Centralized selection:

Routing tables in sources. One bit per destination.

The option with less congested bottleneck link is preferred.

XY or YX

Page 5: A Cost Effective Centralized Adaptive Routing for Networks on Chip

ATDOR Illustration 1 Five identical flows,

100 MB/s each.

Links modeled as M/M/1 queues. Delay of a single link:

LINKTraffic

DCapacity Traffic

Links capacity is 210 MB/s.

Initial routing - XY

Page 6: A Cost Effective Centralized Adaptive Routing for Networks on Chip

Centralized Routing – How?

Option 1 – Continuous calculation of optimal routing for the active sessions:

Achievable load balancing

Speed and computation complexity

System complexity

Page 7: A Cost Effective Centralized Adaptive Routing for Networks on Chip

Centralized Routing – How?

Option 2 – Iterative serial selection based on traffic load measurements between XY and YX for all source-destination pairs:

Achievable load balancing

Speed and computation complexitySystem complexity

Page 8: A Cost Effective Centralized Adaptive Routing for Networks on Chip

ATDOR illustration 1

Average Delay

Re-Routed Flow

Step #

1->15 1

Re-Routed Flow

Step #

2->8 2

Average Delay

37 ns

Re-Routed Flow

Step #

2->15 3

Average Delay

22 ns

Page 9: A Cost Effective Centralized Adaptive Routing for Networks on Chip

What did we just see? For each flow we:

1. Calculated the better route.2. Updated routing table of the

source.3. Waited for the update to take effect and measured global traffic load.

Steps 2 and 3 are unified for all destinations of a single source:Achievable load balancing

Speed and computation complexityScalability

Performing steps 1-3 for each flow is slow and not scalable.

Page 10: A Cost Effective Centralized Adaptive Routing for Networks on Chip

Back illustration 1

Average Delay

Re-Routed Flow

Step #

1->15 1

Average Delay

22 ns

Re-Routed Flow

Step #

2->822->15

Re-Routed Flow

Step #

4->15 3

Average Delay

22 ns

Re-Routed Flow

Step #

1->15 4

Average Delay

22 ns

Re-Routed Flow

Step #

2->852->15

Average Delay

Page 11: A Cost Effective Centralized Adaptive Routing for Networks on Chip

Problem #1 Changing routing may enhance

congestion and cause fluctuations.

Solution: Change routing only if the alternative is better by the margin α, 0< α <1:

YX XY

YX XY

XY YX

XY YX

if (Current Route = XY)YX if MAX[Load ] a MAX[Load ]

NextRoute =XY if MAX[Load ] > a MAX[Load ]

elseif (Current Route = YX)XY if MAX[Load ] a MAX[Load ]

NextRoute =YX if MAX[Load ] > a MAX[Load ]

Page 12: A Cost Effective Centralized Adaptive Routing for Networks on Chip

ATDOR illustration 2

Average Delay

Re-Routed Flow

Step #

1->14

11->15

1->16

Average Delay

Re-Routed Flow

Step #

1->14

21->15

1->16

Re-Routed Flow

Step #

1->14

31->15

1->16

Page 13: A Cost Effective Centralized Adaptive Routing for Networks on Chip

Problem #2 Coupling among flows sharing

the same source. Solution: Re-Routing counters

CI,J count routing changes of flows from source I to destination J (FI,J). When CI,J reaches a limit LI,J, routing of FI,J is locked. A Possible definition of Limits LI,J :

, ( ) mod 3I JL I J

Page 14: A Cost Effective Centralized Adaptive Routing for Networks on Chip

Back to illustration 2R.

Changes Left

Flows

2 1->161 1->150 1->14

Average Delay

R. Changes

Left

Flows

1 1->160 1->150 1->14

Average Delay

73 ns

R. Changes

Left

Flows

0 1->160 1->150 2->14

Average Delay

22 ns

, ( ) mod 3I JL I J

Page 15: A Cost Effective Centralized Adaptive Routing for Networks on Chip

Bring it all togetherR.

Changes Left

Flows

1 1>-15

1 2>-8

2 2>-15

1 4>-15

Average Delay

R. Changes

Left

Flows

0 1>-15

1 2>-8

2 2>-15

1 4>-15

R. Changes

Left

Flows

0 1>-15

0 2>-8

1 2>-15

1 4>-15

Average Delay

22 ns

R. Changes

Left

Flows

0 1>-15

0 2>-8

1 2>-15

0 4>-15

Average Delay

22 nsAverage Delay

14 ns

R. Changes

Left

Flows

0 1>-15

0 2>-8

0 2>-15

0 4>-15

, ( ) mod 3I JL I J

Page 16: A Cost Effective Centralized Adaptive Routing for Networks on Chip

Centralized Adaptive Routing for NoCs - Architecture

Traffic load measurements aggregation into Traffic Load Maps.

Routing control.

Local traffic load measurements inside the routers.

Page 17: A Cost Effective Centralized Adaptive Routing for Networks on Chip

Load Measurements Aggregation

An illustration of aggregation of load values in a 4X4 2D mesh.

A congestion value is written to each traffic load map every clock cycle.

Page 18: A Cost Effective Centralized Adaptive Routing for Networks on Chip

ATDOR – Route Selection Circuit

Combinatorial pipelined implementation.

Result every ATDOR clock cycle.

Maximally loaded links of the two alternatives are compared. Next route:

YX XY

YX XY

XY YX

XY YX

if(Current Route = XY)YX if MAX[Load ] a MAX[Load ]

NextRoute =XY if MAX[Load ] > a MAX[Load ]

elseif(Current Route = YX)XY if MAX[Load ] a MAX[Load ]

NextRoute =YX if MAX[Load ] > a MAX[Load ]

0 < a <1

Page 19: A Cost Effective Centralized Adaptive Routing for Networks on Chip

Hardware Requirements The whole

mechanism was implemented on xc5vlx50t VIRTEX 5 FPGA.

Estimated area for 45nm technology node.

Per-Router hardware overheads in % for a NoC with typical size (50 KGates) virtual channel routers.

Page 20: A Cost Effective Centralized Adaptive Routing for Networks on Chip

Average Packet Delay – Uniform Traffic

Average delay vs. average load in links normalized to links capacity. 8X8 2D Mesh. Uniform traffic pattern.

Page 21: A Cost Effective Centralized Adaptive Routing for Networks on Chip

Average Packet Delay – Transpose Traffic

Average delay vs. average load in links normalized to links capacity. 8X8 2D Mesh. Transpose traffic pattern.

Page 22: A Cost Effective Centralized Adaptive Routing for Networks on Chip

Average Packet Delay – Hotspot Traffic

Average delay vs. average load in links normalized to links capacity. 8X8 2D Mesh. 4 Hotspots traffic pattern.

Page 23: A Cost Effective Centralized Adaptive Routing for Networks on Chip

Control Iteration Duration Number of re-routed flows vs. time. 8X8 2D Mesh, ATDOR clock of 100 MHz.

α = 15/16 α = 3/4

Page 24: A Cost Effective Centralized Adaptive Routing for Networks on Chip

CMP DNUCA - Architecture 8X8 CMP DNUCA (Dynamic Non Uniform

Cache Array) with 8 CPUs and 56 cache banks:

Page 25: A Cost Effective Centralized Adaptive Routing for Networks on Chip

CMP DNUCA – Saturation Throughput

Saturation throughput - Splash 2 and Parsec benchmarks on 8X8 CMP DNUCA with 8 CPUs and 56 cache banks:

Page 26: A Cost Effective Centralized Adaptive Routing for Networks on Chip

Conclusions Centralized adaptive routing is feasible

for NoCs. ATDOR: Centralized selection

between XY and YX for each source-destination pair. Hardware overhead: <4% of an 8X8 typical NoC. Average saturation throughput improvement:Vs. RCA Vs. O1TURN

12.1% 19.3% Synthetic Patterns12.8% 22.8% Spash 2 and

Parsec Benchmarks