1 Asynchronous vs. Synchronous Network-on-Chip Prepared by Sergey Rudko Advanced Topics in VLSI 1...

17
1 Asynchronous vs. Asynchronous vs. Synchronous Synchronous Network-on-Chip Network-on-Chip Prepared by Sergey Rudko Prepared by Sergey Rudko Advanced Topics in VLSI 1 (NoC) Advanced Topics in VLSI 1 (NoC) 049036 049036
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    221
  • download

    0

Transcript of 1 Asynchronous vs. Synchronous Network-on-Chip Prepared by Sergey Rudko Advanced Topics in VLSI 1...

Page 1: 1 Asynchronous vs. Synchronous Network-on-Chip Prepared by Sergey Rudko Advanced Topics in VLSI 1 (NoC) 049036 Advanced Topics in VLSI 1 (NoC) 049036.

1

Asynchronous vs. SynchronousAsynchronous vs. SynchronousNetwork-on-ChipNetwork-on-Chip

Prepared by Sergey RudkoPrepared by Sergey Rudko

Advanced Topics in VLSI 1 (NoC) 049036 Advanced Topics in VLSI 1 (NoC) 049036    

Page 2: 1 Asynchronous vs. Synchronous Network-on-Chip Prepared by Sergey Rudko Advanced Topics in VLSI 1 (NoC) 049036 Advanced Topics in VLSI 1 (NoC) 049036.

2

IntroductionIntroduction

• Problem DefinitionProblem Definition– NoC Implementation AlternativesNoC Implementation Alternatives

• Fully asynchronousFully asynchronous• Multi-synchronous (GALS)Multi-synchronous (GALS)• SynchronousSynchronous

• Proposed SolutionProposed Solution– Systematic Comparison between Different StrategiesSystematic Comparison between Different Strategies

• Silicon AreaSilicon Area• Network Saturation ThresholdNetwork Saturation Threshold• Communication ThroughputCommunication Throughput• Packet LatencyPacket Latency• Power ConsumptionPower Consumption• Implementation Flexibility and Tools Implementation Flexibility and Tools

• Related ApproachesRelated Approaches– I. Miro-Panades, F. Clermidy, P. Vivet, A. Greiner, I. Miro-Panades, F. Clermidy, P. Vivet, A. Greiner, “Physical Implementation of the DSPIN “Physical Implementation of the DSPIN

Network-on-Chip in the FAUST Architecture”Network-on-Chip in the FAUST Architecture”, NoCs 2008, NoCs 2008

Page 3: 1 Asynchronous vs. Synchronous Network-on-Chip Prepared by Sergey Rudko Advanced Topics in VLSI 1 (NoC) 049036 Advanced Topics in VLSI 1 (NoC) 049036.

3

Synchronous RouterSynchronous Router

• Router Pipeline may include many stagesRouter Pipeline may include many stages– Increases communication latencyIncreases communication latency

• Router Pipeline may be optimized to single cycle routerRouter Pipeline may be optimized to single cycle router– Possible by use of speculationPossible by use of speculation– Clock period same as pipeline routerClock period same as pipeline router

• Presence of clock simplify designPresence of clock simplify design– Standard libraries and toolsStandard libraries and tools

VCAVCA SASARouterRouter

Data pathData pathLINKLINK LINKLINK

A. Kumar, P. Kundu, A. Singh, L. Peh and N. Jha ,"A 4.6Tbits/s 3.6GHz Single-cycle NoC Router with a Novel Switch Allocator",

International Conference on Computer Design (ICCD), October, 2007.

Speculative Control SignalsSpeculative Control Signals

Page 4: 1 Asynchronous vs. Synchronous Network-on-Chip Prepared by Sergey Rudko Advanced Topics in VLSI 1 (NoC) 049036 Advanced Topics in VLSI 1 (NoC) 049036.

4

Limitations ofLimitations ofFully-Synchronous NetworksFully-Synchronous Networks

• Difficult to distribute clock Difficult to distribute clock – Network spread over die & may have irregular layoutNetwork spread over die & may have irregular layout– MinimisingMinimising skew costs complexity and power skew costs complexity and power– Solution:Solution: Alternatives/extensions to PLL and H-tree Alternatives/extensions to PLL and H-tree

• Single Network Clock FrequencySingle Network Clock Frequency– Communicating synchronous IP blocks with different frequenciesCommunicating synchronous IP blocks with different frequencies– What is most appropriate network clock frequency?What is most appropriate network clock frequency?

Problem:Problem: Clock Distribution and Frequency SelectionClock Distribution and Frequency SelectionSolution:Solution: Beyond a Single Global Clock Beyond a Single Global Clock

Page 5: 1 Asynchronous vs. Synchronous Network-on-Chip Prepared by Sergey Rudko Advanced Topics in VLSI 1 (NoC) 049036 Advanced Topics in VLSI 1 (NoC) 049036.

5

Synchronous Routers with Synchronous Routers with Asynchronous Links (GALS)Asynchronous Links (GALS)

s

• Synchronization is simpleSynchronization is simple– Traditional 2 FF synchronizersTraditional 2 FF synchronizers

• Can support asynchronous interconnects Can support asynchronous interconnects – No longer exploiting periodic nature of router clocksNo longer exploiting periodic nature of router clocks– Correct operation is independent of the delay of the linkCorrect operation is independent of the delay of the link

• GALS interfaces with pausible clocksGALS interfaces with pausible clocks– If necessary clock is stretched, data is always transferred reliably If necessary clock is stretched, data is always transferred reliably – Need to construct local delay lineNeed to construct local delay line

RouterRouter RouterRouterAsynchronous FIFO

s r

Connect Frequency Independent RoutersConnect Frequency Independent Routers

Page 6: 1 Asynchronous vs. Synchronous Network-on-Chip Prepared by Sergey Rudko Advanced Topics in VLSI 1 (NoC) 049036 Advanced Topics in VLSI 1 (NoC) 049036.

6

Asynchronous NoCsAsynchronous NoCs

• Simple/elegant solution when networked IP blocks run at different Simple/elegant solution when networked IP blocks run at different clock frequenciesclock frequencies– Data driven, no superfluous switching activityData driven, no superfluous switching activity– No synchronization/clock alignment issues at interfacesNo synchronization/clock alignment issues at interfaces– Solves synchronization, clock domain crossings, timing, long connectsSolves synchronization, clock domain crossings, timing, long connects

• No clock distribution issuesNo clock distribution issues• Security and EMI advantages Security and EMI advantages

– Clock focuses EM emissionsClock focuses EM emissions– The presence of a clock can also aid fault-induction and side-channel The presence of a clock can also aid fault-induction and side-channel

analysis attacksanalysis attacks• Reduced design timeReduced design time

– Easy to use interfaces, modularityEasy to use interfaces, modularity– Robust and simple implementationRobust and simple implementation

• Reduced powerReduced power• But network latency significantly increasedBut network latency significantly increased

Page 7: 1 Asynchronous vs. Synchronous Network-on-Chip Prepared by Sergey Rudko Advanced Topics in VLSI 1 (NoC) 049036 Advanced Topics in VLSI 1 (NoC) 049036.

7

Asynchronous NoCs Approaches Asynchronous NoCs Approaches

• ““An Asynchronous Router for Multiple Service Levels Networks on Chip”,An Asynchronous Router for Multiple Service Levels Networks on Chip”,R. Dobkin et al, ASYNC’05. (QNoC Group)R. Dobkin et al, ASYNC’05. (QNoC Group)

• MANGO Clockless Network-on-ChipMANGO Clockless Network-on-Chip– ““A Scheduling Discipline for Latency and Bandwidth Guarantees in A Scheduling Discipline for Latency and Bandwidth Guarantees in

Asynchronous Network-on-Chip”Asynchronous Network-on-Chip”,,T. Bjerregaard and J. Spars, ASYNC’05.T. Bjerregaard and J. Spars, ASYNC’05.

– ““A router Architecture for Connection-Orientated Service Guarantees in A router Architecture for Connection-Orientated Service Guarantees in the MANGO Clockless Network-on-Chip”the MANGO Clockless Network-on-Chip”,,T. Bjerregaard and J. Spars, DATE’05T. Bjerregaard and J. Spars, DATE’05

R. Dobkin Provide Synchronous versus Asynchronous Router StudyR. Dobkin Provide Synchronous versus Asynchronous Router Study

Page 8: 1 Asynchronous vs. Synchronous Network-on-Chip Prepared by Sergey Rudko Advanced Topics in VLSI 1 (NoC) 049036 Advanced Topics in VLSI 1 (NoC) 049036.

8

Synchronous or Asynchronous Synchronous or Asynchronous NoCs?NoCs?

““Physical Implementation of the DSPIN Network-on-Chip in the FAUST Architecture”Physical Implementation of the DSPIN Network-on-Chip in the FAUST Architecture”I. Miro-Panades, F. Clermidy, P. Vivet and A. GreinerI. Miro-Panades, F. Clermidy, P. Vivet and A. Greiner

NoCs 2008NoCs 2008

Page 9: 1 Asynchronous vs. Synchronous Network-on-Chip Prepared by Sergey Rudko Advanced Topics in VLSI 1 (NoC) 049036 Advanced Topics in VLSI 1 (NoC) 049036.

9

MotivationMotivation

• Physically implement the DSPIN NoC into the Physically implement the DSPIN NoC into the FAUST application platformFAUST application platform

• Compare the performances between ANOC and Compare the performances between ANOC and DSPIN on a real application and trafficDSPIN on a real application and traffic– Silicon AreaSilicon Area– ThroughputThroughput– Packet LatencyPacket Latency– Power Consumption Power Consumption

Page 10: 1 Asynchronous vs. Synchronous Network-on-Chip Prepared by Sergey Rudko Advanced Topics in VLSI 1 (NoC) 049036 Advanced Topics in VLSI 1 (NoC) 049036.

10

FAUST Architecture with ANOCFAUST Architecture with ANOC

• Asynchronous NoC (ANOC)Asynchronous NoC (ANOC)– QDI 4-phase/4-rail asynchronous logicQDI 4-phase/4-rail asynchronous logic

• 20 Routers20 Routers– 5 port router5 port router– Source routingSource routing– Wormhole packet switchWormhole packet switch– 32 bit payload32 bit payload

• GALS ConceptionGALS Conception– 24 independent clocks24 independent clocks– FIFO based InterfaceFIFO based Interface

Hard-macro approach for ANOC reuseHard-macro approach for ANOC reuse

Page 11: 1 Asynchronous vs. Synchronous Network-on-Chip Prepared by Sergey Rudko Advanced Topics in VLSI 1 (NoC) 049036 Advanced Topics in VLSI 1 (NoC) 049036.

11

DSPIN ArchitectureDSPIN Architecture

• Packet BasedPacket Based• Distributed Router ArchitectureDistributed Router Architecture• Suited for GALS ApproachSuited for GALS Approach• Mesochronouse links between routersMesochronouse links between routers• Metastability Resolved by “Metastability Resolved by “bi-synchronousbi-synchronous” FIFO ” FIFO

Synthesizable with Standard CellsSynthesizable with Standard Cells

Page 12: 1 Asynchronous vs. Synchronous Network-on-Chip Prepared by Sergey Rudko Advanced Topics in VLSI 1 (NoC) 049036 Advanced Topics in VLSI 1 (NoC) 049036.

12

DSPIN Clock TreeDSPIN Clock Tree

Mesochronous Link between Neighbor Routers Mesochronous Link between Neighbor Routers

Page 13: 1 Asynchronous vs. Synchronous Network-on-Chip Prepared by Sergey Rudko Advanced Topics in VLSI 1 (NoC) 049036 Advanced Topics in VLSI 1 (NoC) 049036.

13

NoC Architecture ComparisonNoC Architecture Comparison

Both implementation use GALS principlesBoth implementation use GALS principles

Page 14: 1 Asynchronous vs. Synchronous Network-on-Chip Prepared by Sergey Rudko Advanced Topics in VLSI 1 (NoC) 049036 Advanced Topics in VLSI 1 (NoC) 049036.

14

Network ComparisonNetwork Comparison

DSPIN clock-tree Consumes as much Power as the Router Itselftself

ParameterParameterANOCANOCDSPINDSPIN

ImplementationImplementationHard-MacroHard-MacroSoft-MacroSoft-Macro

AreaArea0.281 mm²0.187 mm²

Throughout Throughout

(worst case conditions(worst case conditions((

~ ~160Mflit/s160Mflit/s≤≤289Mflit/s289Mflit/s

ThroughoutThroughout

(nominal conditions)(nominal conditions)

~ ~220Mflit/s220Mflit/s≤≤408Mflit/s408Mflit/s

Power Consumption (F=150MHz) Power Consumption (F=150MHz) 3.69mW3.69mW5.89mW5.89mW

Power Consumption (F=250MHz)Power Consumption (F=250MHz)3.69mW3.69mW10.39mW10.39mW

• DSPIN throughput is deterministic with respect to the clock frequencyDSPIN throughput is deterministic with respect to the clock frequency• DSPIN Power IssuesDSPIN Power Issues

– Power consumption mainly dominated by FIFO data registersPower consumption mainly dominated by FIFO data registers– The DSPIN clock-gating reduced the power consumption by 67%The DSPIN clock-gating reduced the power consumption by 67%

Page 15: 1 Asynchronous vs. Synchronous Network-on-Chip Prepared by Sergey Rudko Advanced Topics in VLSI 1 (NoC) 049036 Advanced Topics in VLSI 1 (NoC) 049036.

15

Network Comparison - LatencyNetwork Comparison - Latency

DSPIN Router is IP Data Locality Aware

• DSPIN routers resynchronize the data packetsDSPIN routers resynchronize the data packets• DSPIN should be clocked to 367MHzDSPIN should be clocked to 367MHz

Flit PathFlit PathANOCANOCDSPINDSPINANOCANOCDSPINDSPIN

F=150MHzF=150MHzF=250MHzF=250MHz

Intermediate Router LatencyIntermediate Router Latency6.80 ns16.6616.66 nsns6.806.80 nsns10.00 ns

First + Last Router Latency First + Last Router Latency 60.0060.00 nsns56.6656.66 nsns47.0047.00 nsns34.0034.00 nsns

Latency for 5 hops pathLatency for 5 hops path80.0080.00 nsns106.66106.66 nsns68.0068.00 nsns64.0064.00 nsns

Latency for 9 hops pathLatency for 9 hops path106.66 ns106.66 ns173.30 ns173.30 ns96.00 ns96.00 ns104.00 ns104.00 ns

Page 16: 1 Asynchronous vs. Synchronous Network-on-Chip Prepared by Sergey Rudko Advanced Topics in VLSI 1 (NoC) 049036 Advanced Topics in VLSI 1 (NoC) 049036.

16

ConclusionConclusion• Little published work on asynchronous routers and networksLittle published work on asynchronous routers and networks• Comparing synchronous and asynchronous designs are difficultComparing synchronous and asynchronous designs are difficult

– System timing styleSystem timing style– Technology Technology – Circuit style and architectureCircuit style and architecture

• Difficult to reproduce and simulate asynchronous designs from Difficult to reproduce and simulate asynchronous designs from published workpublished work– No notion of cycle-accurate modelNo notion of cycle-accurate model– Hide detailed control and datapath delaysHide detailed control and datapath delays

• Asynchronous Performance GuaranteesAsynchronous Performance Guarantees– Performance guarantees are requiredPerformance guarantees are required– Less predictable, non-deterministicLess predictable, non-deterministic– Predicting performance is more complexPredicting performance is more complex

• Asynchronous EDA Tool RequirementsAsynchronous EDA Tool Requirements• Synchronous RoutersSynchronous Routers

– Predictability and determinism can be exploitedPredictability and determinism can be exploited– Fast single cycle routers possible Fast single cycle routers possible

ANoC for Low Power & SNoC for Small Area ANoC for Low Power & SNoC for Small Area

Page 17: 1 Asynchronous vs. Synchronous Network-on-Chip Prepared by Sergey Rudko Advanced Topics in VLSI 1 (NoC) 049036 Advanced Topics in VLSI 1 (NoC) 049036.

17