VLSI Programming 2016: Lecture 5
Transcript of VLSI Programming 2016: Lecture 5
03/05/16 1
VLSI Programming 2016: Lecture 5
Course: 2IMN35
Teachers: Kees van Berkel [email protected] Rudolf Mak [email protected]
Lab: Kees van Berkel, Rudolf Mak, Alok Lele
www: http://www.win.tue.nl/~wsinmak/Education/2IMN35/ Lecture 5 folding
03/05/16 2
VLSI Programming (2IMN35): time table 2016 2016 in Tue:h5-h8;MF.07 out 2016 in Thu:h1-h4;Gemini-Z3A-08/10/13 out
19-Apr
introduc/on,DSPgraphs,bounds,…
21-Apr
pipelining,re/ming,transposi/on,J-slow,unfolding
T1+T2
26-Apr
toolsinstalled
Introduc/onstoFPGAandVerilog
L1:audiofiltersimula/on
L1L2
28-Apr
T1+T2
unfolding,look-ahead,strengthreduc/on
L1cntd
T3+T4
3-May
folding
L2:audiofilteronXUPboard
5-May
10-May
T3+T4
DSPprocessors
L2cntd
L3
12-May
L3:sequen/alFIR+strength-reducedFIR
17-May
L3cntd
19-May
L3cntd
L4
24-May
systoliccomputa/on
T5
26-May
L3
L4
31-May
T5
L4:audiosamplerateconvertor
2-Jun
L4cntd
L5
7-Jun
L5:1024xaudiosamplerateconvertor
9-Jun
L4
L5cntd
14-Jun
16-Jun
L5
deadlinereportL5
03/05/16 3
Outline Lecture 5
• Folding Transformation
Mandatory reading (reminder):
• Edward A. Lee and David G. Messerschmitt. Synchronous Data Flow. Proc. of the IEEE, Vol. 75, No. 9, Sept 1987, pp 1235-1245.
FOLDING (TRANSFORMATION)
03/05/16 4
Folding versus unfolding
• Unfolding (a.k.a. block processing, MIMO processing): increase throughput (#samples per time unit), at the expense of extra hardware resources, by processing blocks of L samples.
• Folding: reduce hardware resources (#multipliers, # adders), at the expense of reduced throughput, by mapping multiple operations on the same HW resource.
• Folding is not: from MIMO back to SISO.
• Typically: given required throughput (latency) of a function (application), apply folding or unfolding (along with other transformations) to achieve the required throughput @ minimal hardware costs.
03/05/16 5
03/05/16 6
Folding transformation [Parhi]
Goal: N × lower throughput, N × fewer (best case) resources
Given a data-flow graph, folding amounts to: 1. chose a folding factor N 2. chose folding sets S (resource binding + order):
ordered sets of ≤N data-flow operations that share the same functional unit (hardware)
3. calculate the folded delay DF for each data-flow edge 4. successful folding if all DF ≥0,
i.e. production precedes consumption
When DF for a node <0: try step 2 again. Step 2 is a creative step; may involve some trial and error.
03/05/16 7
Folding transformation [Parhi]
• S1 = adder, latency=1; S2 = multiplier, latency=2
• smart order: add4 before add2; add3 before add1
• add2 before add3: saves register
• On iteration now takes 4 cycles (= max(|S1|,|S2|)), initiating one addition and one multiplication in each cycle.
03/05/16 8
DF (U → V) = N we – Pu + v –u
Folding transformation [Parhi]
03/05/16 9
Folding transformation
Folding equation, one per edge of the original graph
DF (U → V) = N we – Pu + v –u
where
• N = folding factor
• U and V are two nodes in graph connected by edge e • we = #delays in edge e • Pu = latency (pipeline-level) for each unit u (eg. adder)
• u and v are folding orders of nodes U and V: which phase [0..N-1] are operations scheduled
03/05/16 10
03/05/16 11
03/05/16 12
DF (U → V) = N we – Pu + v –u
03/05/16 13
03/05/16 14
2IN35: reporting guidelines 2016 (1)
1. Submit one report per team (2 students)
2. Respect deadlines: • Assignment L3: Thursday May 26, 2016 • Assignment L4: Thursday June 9, 2016 • Assignment L5: Thursday June 16, 2016
3. Make sure that assignments L3, L4, and L5 are demonstrated to and signed of by Alok, Rudolf, or Kees.
4. Report on lab assignments L3, L4, and L5.
5. Submit the reports using Peach (paper copies will not be accepted).
03/05/16 15
2IN35: reporting guidelines 2016 (2)
General guidelines (each assignments), to be followed strictly:
6. Analyze the specifications and requirements.
7. Present/motivate key ideas/decisions, design options, alternatives, trade-offs.
8. Draw architecture block diagram (= picture!).
9. Explain functional correctness of your Verilog programs(include your complete Verilog programs in an appendix).
10. Explain #clock cycles per sample time Ts. Include waveforms.
11. Report, analyze & explain FPGA-resource usage and utilization {#multipliers, #BRAMS, #LUTs} in relation to your design.
12. Report, analyze & explain (min) sample time Ts and (max) sample frequency fs, both after synthesis and after placement & routing.
2IN35: reporting guidelines 2016 (3)
13. Include simulation results: both wave forms in time domain, and in frequency domain (apply FFT) (assignments 3 and 4 only).
14. Include answers to the inline questions
15. Annotate all graphs to include for both axis: - quantity (weight, distance, duration, …) - unit (ounce, light year, century, …) - linear/log/... (ok to assume linear)
03/05/16 16
THANK YOU