Bounded Dataflow Networks and Latency Insensitive Circuits

Click here to load reader

  • date post

    14-Jan-2016
  • Category

    Documents

  • view

    35
  • download

    1

Embed Size (px)

description

Bounded Dataflow Networks and Latency Insensitive Circuits. Arvind Computer Science and Artificial Intelligence Laboratory MIT Based on the work of Murali Vijayaraghavan and Arvind[MEMOCODE 2009]. Modeling of a processor on an FPGA. Exception. Branch Resolution. Reg File. Fetch. Decode. - PowerPoint PPT Presentation

Transcript of Bounded Dataflow Networks and Latency Insensitive Circuits

  • Bounded Dataflow Networks and Latency Insensitive CircuitsArvindComputer Science and Artificial Intelligence Laboratory MIT

    Based on the work of Murali Vijayaraghavan and Arvind[MEMOCODE 2009]November 17, 2009L22-*http://csg.csail.mit.edu/korea

    http://csg.csail.mit.edu/korea

  • Modeling of a processor on an FPGAFetchDecodeExecute/AddrCalcMemRegWriteCommitExceptionBranch ResolutionRegFileDivide and multiply are resource hogsCan we pipeline or implement them as a multicycle operationMultiported register file maps poorly on FPGAsCan we map it as a multicycle operation into BRAMs?CAM for TLBs map poorly on FPGAsCan we implement CAMs as sequential search using BRAMs?

    How to do these refinements to Synchronous Sequential Machines (SSMs) without affecting the overall correctness November 17, 2009L22-*http://csg.csail.mit.edu/korea

    http://csg.csail.mit.edu/korea

  • Conventional modular refining methodologyRequires re-verificationBesides, in our processor example, after the ad-hoc changes, what are we modeling?Rest of the design, with ad-hoc modificationComplete DesignRest of the designModule to be refinedRefined module, with changed timing contractNovember 17, 2009L22-*http://csg.csail.mit.edu/korea

    http://csg.csail.mit.edu/korea

  • Rest of the design,wrapped automaticallyComplete DesignRest of the designTrue modular refinementModule to be refinedRefined module, with changed timing contractAbility to replace any module by an equivalent module without affecting the overall correctness November 17, 2009L22-*http://csg.csail.mit.edu/korea

    http://csg.csail.mit.edu/korea

  • Theory of Latency Insensitive Designs by Carloni et. al[ICCAD99, IEEE-TCAD01]A method to reduce critical wire delays by adding buffersModule is treated as a black-box and wrapped to make it latency-insensitive to input/output wire latenciesOur goal is to also permit refinements that may change the timing of a module November 17, 2009L22-*http://csg.csail.mit.edu/korea

    http://csg.csail.mit.edu/korea

  • Carlonis methodMake a cut to include the wires of interest (some restrictions on cuts)Create wrappers and insert buffers or FIFOsNovember 17, 2009L22-*http://csg.csail.mit.edu/korea

    http://csg.csail.mit.edu/korea

  • Bounded Dataflow Networks (BDNs)Primitive BDNs that directly implement SSMsBounded FIFOs, initially emptyDifferent from a Kahn Network because a send can blockNovember 17, 2009L22-*http://csg.csail.mit.edu/korea

    http://csg.csail.mit.edu/korea

  • Primitive BDNExample 1: Combinational Gatec(t) = f( a(t), b(t) )rule OutC when (a.empty b.empty c.full) c.enq( f( a.first, b.first ) ); a.deq; b.deqThe figure does not represent all the control logic necessary to implement a BDNNovember 17, 2009L22-*http://csg.csail.mit.edu/korea

    http://csg.csail.mit.edu/korea

  • Circuit generated for the Gate BDNNovember 17, 2009L22-*http://csg.csail.mit.edu/korea

    http://csg.csail.mit.edu/korea

  • Primitive BDNsExample 2: RegisterInitial: r r0rule OutB when(a.empty b.full) b.enq(r); r a.first; a.deqNovember 17, 2009L22-*http://csg.csail.mit.edu/korea

    http://csg.csail.mit.edu/korea

  • BDN Input/Output notationIi(n) represents the nth values enqueued in input buffer IiI(n) represents the nth values enqueued in all input buffersOj(n) represents the nth values dequeued from output buffer OjO(n) represents the nth values dequeued from all output buffersNovember 17, 2009L22-*http://csg.csail.mit.edu/korea

    http://csg.csail.mit.edu/korea

  • Implementing an SSM as a BDNTime is converted into enqueues into input FIFOs and dequeues from output FIFOsI(t) input into an SSM corresponds to the tth enqueues in the input FIFOs of the BDNO(t) output of an SSM corresponds to the tth dequeues from the output FIFOs of the BDNThis separates the timing and functionality in a BDN makes BDNs an asynchronous frameworkNovember 17, 2009L22-*http://csg.csail.mit.edu/korea

    http://csg.csail.mit.edu/korea

  • BDN Implementing an SSMA BDN is said to implement an SSM iffThere is a bijective mapping between inputs (outputs) of the SSM and BDNThe output histories of the SSM and BDN match whenever the input histories matchThe BDN is deadlock-freeNovember 17, 2009L22-*http://csg.csail.mit.edu/korea

    http://csg.csail.mit.edu/korea

  • Implementing a network of SSMsIs this transformation correct?November 17, 2009L22-*http://csg.csail.mit.edu/korea

    http://csg.csail.mit.edu/korea

  • Implementing an SSM as a BDNExample 3: A combinational circuitrule OutCD when(a.empty b. empty c.full d.full) c.enq( f( a.first, b.first ) ); d.enq(b.first); a.deq; b.deqNovember 17, 2009L22-*http://csg.csail.mit.edu/korea

    http://csg.csail.mit.edu/korea

  • Network with the combinational circuit noderule OutCD when(a.empty b. empty c.full d.full) c.enq( f( a.first, b.first ) ); d.enq(b.first); a.deq; b.deqDeadlock!Culprit: Extraneous dependencyDoesnt form a combinational cycle November 17, 2009L22-*http://csg.csail.mit.edu/korea

    http://csg.csail.mit.edu/korea

  • Another BDN implementationrule OutC when(cDone a.empty b. empty c.full) c.enq( f( a.first, b.first ) ); cDone Truerule OutD when(dDone b. empty d.full) d.enq( b.first ); dDone Truerule Finish when(cDone dDone) a.deq; b.deq; cDone False; dDone FalseNo deadlock!November 17, 2009L22-*http://csg.csail.mit.edu/korea

    http://csg.csail.mit.edu/korea

  • Example 2 Revisted: RegisterInitial: r r0rule OutB when(a.empty b.full) b.enq(r); r a.first; a.deqNetwork with a single register nodeDeadlock!Culprit: Extraneous dependencyNovember 17, 2009L22-*http://csg.csail.mit.edu/korea

    http://csg.csail.mit.edu/korea

  • BDN for a register avoiding deadlocksInitial: r r0; bDone Falserule OutB when(bDone b.full) b.enq(r); bDone Truerule Finish when(bDone a.empty) r a.first; a.deq; bDone FalseNovember 17, 2009L22-*http://csg.csail.mit.edu/korea

    http://csg.csail.mit.edu/korea

  • No-Extraneous Dependency (NED) propertyoutInputs combinationally connected to outoutQProduction of outQ waits only for these input FIFOsNovember 17, 2009L22-*http://csg.csail.mit.edu/korea

    http://csg.csail.mit.edu/korea

  • Example 3: A shift registerr1(t+1) = a(t); r2(t+1) = r1(t); b(t) = r2(t)initially r1(0)=r10 ; r2(0)=r20November 17, 2009L22-*http://csg.csail.mit.edu/koreaTwo BDN implementations

    http://csg.csail.mit.edu/korea

  • A shift register: Two ImplementationsNovember 17, 2009L22-*http://csg.csail.mit.edu/koreaInitial: r1 r10; r2 r20; bDone Falserule OutB when(bDone b.full) b.enq(r2); bDone Truerule Finish when(bDone a.empty) r1 a.first; r2 r1; a.deq; bDone FalseInitial: r1 r10; r2 r20; aCnt 0; bCnt 0; rule Out1 when(bCnt=0 b.full) b.enq(r2); bCnt 1 rule Out2 when(bCnt=1 b.full) b.enq(r1); bCnt 2 rule In1 when(bCnt=2 aCnt=0 a.empty) r2 a.first; a.deq; aCnt 1rule In2 when(bCnt=2 aCnt=1 a.empty) r1 a.first; a.deq; aCnt 0; bCnt 0; Implementation 1Implementation 2

    http://csg.csail.mit.edu/korea

  • A network with a shift registerImplementation 2 will Deadlock if FIFO size is 1!November 17, 2009L22-*http://csg.csail.mit.edu/koreaImplementation 2 does not dequeues its inputs every time it produces an output Not self cleaning

    http://csg.csail.mit.edu/korea

  • Self-Cleaning (SC) propertyIf the BDN has enqueued all its outputs, it will dequeue all its inputsNovember 17, 2009L22-*http://csg.csail.mit.edu/korea

    http://csg.csail.mit.edu/korea

  • Latency-Insensitive BDN (LI-BDN)A BDN implementing an SSM is an LI-BDN iff it hasNo extraneous dependencies propertySelf cleaning propertyTheorem: A BDN where all the nodes are LI-BDNs will not deadlockNovember 17, 2009L22-*http://csg.csail.mit.edu/korea

    http://csg.csail.mit.edu/korea

  • Implementation of a network of SSMs - revisitedThis transformation is correct if each BDNi implements SSMi and is latency insensitiveNext lecture how to do modular refinement using BDNsNovember 17, 2009L22-*http://csg.csail.mit.edu/korea

    http://csg.csail.mit.edu/korea

    **************************