1 Asynchronous vs. Synchronous Network-on-Chip Prepared by Sergey Rudko Advanced Topics in VLSI 1...
-
date post
19-Dec-2015 -
Category
Documents
-
view
221 -
download
0
Transcript of 1 Asynchronous vs. Synchronous Network-on-Chip Prepared by Sergey Rudko Advanced Topics in VLSI 1...
1
Asynchronous vs. SynchronousAsynchronous vs. SynchronousNetwork-on-ChipNetwork-on-Chip
Prepared by Sergey RudkoPrepared by Sergey Rudko
Advanced Topics in VLSI 1 (NoC) 049036 Advanced Topics in VLSI 1 (NoC) 049036
2
IntroductionIntroduction
• Problem DefinitionProblem Definition– NoC Implementation AlternativesNoC Implementation Alternatives
• Fully asynchronousFully asynchronous• Multi-synchronous (GALS)Multi-synchronous (GALS)• SynchronousSynchronous
• Proposed SolutionProposed Solution– Systematic Comparison between Different StrategiesSystematic Comparison between Different Strategies
• Silicon AreaSilicon Area• Network Saturation ThresholdNetwork Saturation Threshold• Communication ThroughputCommunication Throughput• Packet LatencyPacket Latency• Power ConsumptionPower Consumption• Implementation Flexibility and Tools Implementation Flexibility and Tools
• Related ApproachesRelated Approaches– I. Miro-Panades, F. Clermidy, P. Vivet, A. Greiner, I. Miro-Panades, F. Clermidy, P. Vivet, A. Greiner, “Physical Implementation of the DSPIN “Physical Implementation of the DSPIN
Network-on-Chip in the FAUST Architecture”Network-on-Chip in the FAUST Architecture”, NoCs 2008, NoCs 2008
3
Synchronous RouterSynchronous Router
• Router Pipeline may include many stagesRouter Pipeline may include many stages– Increases communication latencyIncreases communication latency
• Router Pipeline may be optimized to single cycle routerRouter Pipeline may be optimized to single cycle router– Possible by use of speculationPossible by use of speculation– Clock period same as pipeline routerClock period same as pipeline router
• Presence of clock simplify designPresence of clock simplify design– Standard libraries and toolsStandard libraries and tools
VCAVCA SASARouterRouter
Data pathData pathLINKLINK LINKLINK
A. Kumar, P. Kundu, A. Singh, L. Peh and N. Jha ,"A 4.6Tbits/s 3.6GHz Single-cycle NoC Router with a Novel Switch Allocator",
International Conference on Computer Design (ICCD), October, 2007.
Speculative Control SignalsSpeculative Control Signals
4
Limitations ofLimitations ofFully-Synchronous NetworksFully-Synchronous Networks
• Difficult to distribute clock Difficult to distribute clock – Network spread over die & may have irregular layoutNetwork spread over die & may have irregular layout– MinimisingMinimising skew costs complexity and power skew costs complexity and power– Solution:Solution: Alternatives/extensions to PLL and H-tree Alternatives/extensions to PLL and H-tree
• Single Network Clock FrequencySingle Network Clock Frequency– Communicating synchronous IP blocks with different frequenciesCommunicating synchronous IP blocks with different frequencies– What is most appropriate network clock frequency?What is most appropriate network clock frequency?
Problem:Problem: Clock Distribution and Frequency SelectionClock Distribution and Frequency SelectionSolution:Solution: Beyond a Single Global Clock Beyond a Single Global Clock
5
Synchronous Routers with Synchronous Routers with Asynchronous Links (GALS)Asynchronous Links (GALS)
s
• Synchronization is simpleSynchronization is simple– Traditional 2 FF synchronizersTraditional 2 FF synchronizers
• Can support asynchronous interconnects Can support asynchronous interconnects – No longer exploiting periodic nature of router clocksNo longer exploiting periodic nature of router clocks– Correct operation is independent of the delay of the linkCorrect operation is independent of the delay of the link
• GALS interfaces with pausible clocksGALS interfaces with pausible clocks– If necessary clock is stretched, data is always transferred reliably If necessary clock is stretched, data is always transferred reliably – Need to construct local delay lineNeed to construct local delay line
RouterRouter RouterRouterAsynchronous FIFO
s r
Connect Frequency Independent RoutersConnect Frequency Independent Routers
6
Asynchronous NoCsAsynchronous NoCs
• Simple/elegant solution when networked IP blocks run at different Simple/elegant solution when networked IP blocks run at different clock frequenciesclock frequencies– Data driven, no superfluous switching activityData driven, no superfluous switching activity– No synchronization/clock alignment issues at interfacesNo synchronization/clock alignment issues at interfaces– Solves synchronization, clock domain crossings, timing, long connectsSolves synchronization, clock domain crossings, timing, long connects
• No clock distribution issuesNo clock distribution issues• Security and EMI advantages Security and EMI advantages
– Clock focuses EM emissionsClock focuses EM emissions– The presence of a clock can also aid fault-induction and side-channel The presence of a clock can also aid fault-induction and side-channel
analysis attacksanalysis attacks• Reduced design timeReduced design time
– Easy to use interfaces, modularityEasy to use interfaces, modularity– Robust and simple implementationRobust and simple implementation
• Reduced powerReduced power• But network latency significantly increasedBut network latency significantly increased
7
Asynchronous NoCs Approaches Asynchronous NoCs Approaches
• ““An Asynchronous Router for Multiple Service Levels Networks on Chip”,An Asynchronous Router for Multiple Service Levels Networks on Chip”,R. Dobkin et al, ASYNC’05. (QNoC Group)R. Dobkin et al, ASYNC’05. (QNoC Group)
• MANGO Clockless Network-on-ChipMANGO Clockless Network-on-Chip– ““A Scheduling Discipline for Latency and Bandwidth Guarantees in A Scheduling Discipline for Latency and Bandwidth Guarantees in
Asynchronous Network-on-Chip”Asynchronous Network-on-Chip”,,T. Bjerregaard and J. Spars, ASYNC’05.T. Bjerregaard and J. Spars, ASYNC’05.
– ““A router Architecture for Connection-Orientated Service Guarantees in A router Architecture for Connection-Orientated Service Guarantees in the MANGO Clockless Network-on-Chip”the MANGO Clockless Network-on-Chip”,,T. Bjerregaard and J. Spars, DATE’05T. Bjerregaard and J. Spars, DATE’05
R. Dobkin Provide Synchronous versus Asynchronous Router StudyR. Dobkin Provide Synchronous versus Asynchronous Router Study
8
Synchronous or Asynchronous Synchronous or Asynchronous NoCs?NoCs?
““Physical Implementation of the DSPIN Network-on-Chip in the FAUST Architecture”Physical Implementation of the DSPIN Network-on-Chip in the FAUST Architecture”I. Miro-Panades, F. Clermidy, P. Vivet and A. GreinerI. Miro-Panades, F. Clermidy, P. Vivet and A. Greiner
NoCs 2008NoCs 2008
9
MotivationMotivation
• Physically implement the DSPIN NoC into the Physically implement the DSPIN NoC into the FAUST application platformFAUST application platform
• Compare the performances between ANOC and Compare the performances between ANOC and DSPIN on a real application and trafficDSPIN on a real application and traffic– Silicon AreaSilicon Area– ThroughputThroughput– Packet LatencyPacket Latency– Power Consumption Power Consumption
10
FAUST Architecture with ANOCFAUST Architecture with ANOC
• Asynchronous NoC (ANOC)Asynchronous NoC (ANOC)– QDI 4-phase/4-rail asynchronous logicQDI 4-phase/4-rail asynchronous logic
• 20 Routers20 Routers– 5 port router5 port router– Source routingSource routing– Wormhole packet switchWormhole packet switch– 32 bit payload32 bit payload
• GALS ConceptionGALS Conception– 24 independent clocks24 independent clocks– FIFO based InterfaceFIFO based Interface
Hard-macro approach for ANOC reuseHard-macro approach for ANOC reuse
11
DSPIN ArchitectureDSPIN Architecture
• Packet BasedPacket Based• Distributed Router ArchitectureDistributed Router Architecture• Suited for GALS ApproachSuited for GALS Approach• Mesochronouse links between routersMesochronouse links between routers• Metastability Resolved by “Metastability Resolved by “bi-synchronousbi-synchronous” FIFO ” FIFO
Synthesizable with Standard CellsSynthesizable with Standard Cells
12
DSPIN Clock TreeDSPIN Clock Tree
Mesochronous Link between Neighbor Routers Mesochronous Link between Neighbor Routers
13
NoC Architecture ComparisonNoC Architecture Comparison
Both implementation use GALS principlesBoth implementation use GALS principles
14
Network ComparisonNetwork Comparison
DSPIN clock-tree Consumes as much Power as the Router Itselftself
ParameterParameterANOCANOCDSPINDSPIN
ImplementationImplementationHard-MacroHard-MacroSoft-MacroSoft-Macro
AreaArea0.281 mm²0.187 mm²
Throughout Throughout
(worst case conditions(worst case conditions((
~ ~160Mflit/s160Mflit/s≤≤289Mflit/s289Mflit/s
ThroughoutThroughout
(nominal conditions)(nominal conditions)
~ ~220Mflit/s220Mflit/s≤≤408Mflit/s408Mflit/s
Power Consumption (F=150MHz) Power Consumption (F=150MHz) 3.69mW3.69mW5.89mW5.89mW
Power Consumption (F=250MHz)Power Consumption (F=250MHz)3.69mW3.69mW10.39mW10.39mW
• DSPIN throughput is deterministic with respect to the clock frequencyDSPIN throughput is deterministic with respect to the clock frequency• DSPIN Power IssuesDSPIN Power Issues
– Power consumption mainly dominated by FIFO data registersPower consumption mainly dominated by FIFO data registers– The DSPIN clock-gating reduced the power consumption by 67%The DSPIN clock-gating reduced the power consumption by 67%
15
Network Comparison - LatencyNetwork Comparison - Latency
DSPIN Router is IP Data Locality Aware
• DSPIN routers resynchronize the data packetsDSPIN routers resynchronize the data packets• DSPIN should be clocked to 367MHzDSPIN should be clocked to 367MHz
Flit PathFlit PathANOCANOCDSPINDSPINANOCANOCDSPINDSPIN
F=150MHzF=150MHzF=250MHzF=250MHz
Intermediate Router LatencyIntermediate Router Latency6.80 ns16.6616.66 nsns6.806.80 nsns10.00 ns
First + Last Router Latency First + Last Router Latency 60.0060.00 nsns56.6656.66 nsns47.0047.00 nsns34.0034.00 nsns
Latency for 5 hops pathLatency for 5 hops path80.0080.00 nsns106.66106.66 nsns68.0068.00 nsns64.0064.00 nsns
Latency for 9 hops pathLatency for 9 hops path106.66 ns106.66 ns173.30 ns173.30 ns96.00 ns96.00 ns104.00 ns104.00 ns
16
ConclusionConclusion• Little published work on asynchronous routers and networksLittle published work on asynchronous routers and networks• Comparing synchronous and asynchronous designs are difficultComparing synchronous and asynchronous designs are difficult
– System timing styleSystem timing style– Technology Technology – Circuit style and architectureCircuit style and architecture
• Difficult to reproduce and simulate asynchronous designs from Difficult to reproduce and simulate asynchronous designs from published workpublished work– No notion of cycle-accurate modelNo notion of cycle-accurate model– Hide detailed control and datapath delaysHide detailed control and datapath delays
• Asynchronous Performance GuaranteesAsynchronous Performance Guarantees– Performance guarantees are requiredPerformance guarantees are required– Less predictable, non-deterministicLess predictable, non-deterministic– Predicting performance is more complexPredicting performance is more complex
• Asynchronous EDA Tool RequirementsAsynchronous EDA Tool Requirements• Synchronous RoutersSynchronous Routers
– Predictability and determinism can be exploitedPredictability and determinism can be exploited– Fast single cycle routers possible Fast single cycle routers possible
ANoC for Low Power & SNoC for Small Area ANoC for Low Power & SNoC for Small Area
17