ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs ›...
Transcript of ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs ›...
![Page 1: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/1.jpg)
High-Level Synthesis Part 1
ECE 699: Lecture 9
![Page 2: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/2.jpg)
Required Reading
• Chapter 14: Spotlight on High-Level Synthesis • Chapter 15: Vivado HLS: A Closer Look
The ZYNQ Book
S. Neuendorffer and F. Martinez-Vallina, Building Zynq Accelerators with Vivado High Level Synthesis, FPGA 2013 Tutorial (selected slides on Piazza)
![Page 3: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/3.jpg)
Recommended Reading
G. Martin and G. Smith, “High-Level Synthesis: Past, Present, and Future,” IEEE Design & Test of Computers, IEEE, vol. 26, no. 4, pp. 18–25, July 2009. Vivado Design Suite Tutorial, High-Level Synthesis, UG871, Nov. 2014 Vivado Design Suite User Guide, High-Level Synthesis, UG902, Oct. 2014 Introduction to FPGA Design with Vivado High-Level Synthesis, UG998, Jul. 2013.
![Page 4: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/4.jpg)
4 ECE 448 – FPGA and ASIC Design with VHDL
Behavioral Synthesis
Algorithm
I/O Behavior
Target Library
Behavioral Synthesis
RTL Design
Logic Synthesis
Gate level Netlist
Classic RTL Design Flow
![Page 5: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/5.jpg)
5 ECE 448 – FPGA and ASIC Design with VHDL
Need for High-Level Design
• Higher level of abstraction • Modeling complex designs • Reduce design efforts • Fast turnaround time • Technology independence • Ease of HW/SW partitioning
![Page 6: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/6.jpg)
6 ECE 448 – FPGA and ASIC Design with VHDL
Platform Mapping SW/HW Partitioning
Software (executed in
the microprocessor system)
Hardware (executed in
the reconfigurable processor system)
Program
![Page 7: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/7.jpg)
7 ECE 448 – FPGA and ASIC Design with VHDL
SW/HW Partitioning & Coding Traditional Approach
Specification
SW/HW Partitioning
SW Coding HW Coding
SW Compilation HW Compilation
SW Profiling HW Profiling
![Page 8: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/8.jpg)
8 ECE 448 – FPGA and ASIC Design with VHDL
SW/HW Partitioning & Coding New Approach
Specification
SW/HW Coding
SW Compilation HW Compilation
SW Profiling HW Profiling
SW/HW Partitioning
![Page 9: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/9.jpg)
9 ECE 448 – FPGA and ASIC Design with VHDL
Advantages of Behavioral Synthesis
• Easy to model higher level of complexities • Smaller in size source compared to RTL code • Generates RTL much faster than manual method • Multi-cycle functionality • Loops • Memory Access
![Page 10: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/10.jpg)
10
Generation 1 (1980s-early 1990s): research period Generation 2 (mid 1990s-early 2000s): • Commercial tools from Synopsys, Cadence, Mentor Graphics, etc. • Input languages: behavioral HDLs Target: ASIC Outcome: Commercial failure Generation 3 (from early 2000s): • Domain oriented commercial tools: in particular for DSP • Input languages: C, C++, C-like languages (Impulse C, Handel C, etc.),
Matlab + Simulink, Bluespec • Target: FPGA, ASIC, or both Outcome: First success stories
Short History of High-Level Synthesis
![Page 11: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/11.jpg)
11 ECE 448 – FPGA and ASIC Design with VHDL
Hardware-Oriented High-Level Languages
• C-Based System level languages • Commercial
• Handel C -- Celoxica Ltd. • Impulse C -- Impulse Accelerated Technologies • Carte C – SRC Computers • SystemC -- The Open SystemC Initiative
• Research • Streams-C -- Los Alamos National Laboratory • SA-C -- Colorado State University, University of
California, Riverside, Khoral Research, Inc. • SpecC – University of California, Irvine and
SpecC Technology Open Consortium
![Page 12: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/12.jpg)
12 ECE 448 – FPGA and ASIC Design with VHDL
Other High-Level Design Flows
• Matlab-based • AccelChip DSP Synthesis -- AccelChip
• System Generator for DSP -- Xilinx
• GUI Data-Flow based • Corefire -- Annapolis Microsystems
• Java-based • Commercial
• Forge -- Xilinx
• Research • JHDL – Brigham Young University
![Page 13: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/13.jpg)
13 ECE 448 – FPGA and ASIC Design with VHDL
Handel-C Overview
• High-level language based on ISO/ANSI-C for the implementation of algorithms in hardware
• Allows software engineers to design hardware without retraining
• Clean extensions for hardware design including flexible data widths, parallelism and communications
• Well defined timing model • Each statement takes a single clock cycle
• Includes extended operators for bit manipulation, and high-level mathematical macros (including floating point)
![Page 14: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/14.jpg)
14 ECE 448 – FPGA and ASIC Design with VHDL
Handel-C/ANSI-C Comparisons
Preprocessors i.e. #define
Structures ANSI-‐C Constructs for, while, if, switch
Func=ons
Arrays
Pointers
Arithme=c operators
Bitwise logical operators
Logical operators
ANSI-‐C Standard Library
Recursion
Floa=ng Point
Handel-‐C Standard Library
Parallelism
Arbitrary width variables
RAM, ROM Signals
Interfaces
Enhanced bit manipula=on
ANSI-‐C HANDEL-‐C
![Page 15: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/15.jpg)
15 ECE 448 – FPGA and ASIC Design with VHDL
Handel-C Design Flow
Executable Specifica=on
Handel-‐C
Synthesis
Place & Route
VHDL
EDIF EDIF
![Page 16: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/16.jpg)
16 ECE 448 – FPGA and ASIC Design with VHDL
More abstract, lessimplementation-
specific
Less abstract, moreimplementation-
specific
RTL Domain(Implementation-specific)
Timed C Domain(Implementation-specific)
Untimed C Domain(Non-implementation-specific)
Ver
ilog
and
VH
DL
Sys
tem
C
Aug
men
ted
C/C
++
Pur
e C
/C++
The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043
Copyright © 2004 Mentor Graphics Corp. (www.mentor.com)
Different Levels of C/C++ Synthesis Abstraction
![Page 17: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/17.jpg)
17 ECE 448 – FPGA and ASIC Design with VHDL
- Non-implementation-specific- Easy to create- Fast to simulate- Easy to modify
Pure C/C++
Gate-levelnetlist
Verilog /VHDL RTL
LUT/CLB-level netlist
ASICtarget
Pure C/C++Synthesis
User interactionand guidence
Verilog /VHDL RTL
RTLSynthesis
RTLSynthesis
FPGAtarget
Auto-generated,implementation-specific
Pure Untimed C/C++ Design Flow
The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043
Copyright © 2004 Mentor Graphics Corp. (www.mentor.com)
![Page 18: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/18.jpg)
18 ECE 448 – FPGA and ASIC Design with VHDL
Mentor Graphics – Catapult C
![Page 19: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/19.jpg)
19 ECE 448 – FPGA and ASIC Design with VHDL
• Catapult C automatically converts un-timed C/C++ descriptions into synthesizable RTL.
Mentor Graphics – Catapult C
![Page 20: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/20.jpg)
20 ECE 448 – FPGA and ASIC Design with VHDL
SystemC -based design-flow alternatives
SystemC
Auto-RTL Translation
Verilog / VHDL RTL
RTL Synthesis
SystemC Synthesis
Gate-level netlist
Implementation specific, relatively slow to simulate, relatively difficult to modify
Alternative SystemC flows
![Page 21: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/21.jpg)
21 ECE 448 – FPGA and ASIC Design with VHDL
SystemC Evolution
Sys
tem
C 2
.0
Sys
tem
C1.
0RTL
Behavioral/Transaction-
level
Algorithmic
System
Timed
Untimed
The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043
Copyright © 2004 Mentor Graphics Corp. (www.mentor.com)
![Page 22: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/22.jpg)
22 ECE 448 – FPGA and ASIC Design with VHDL
Reconfigurable Supercomputers
![Page 23: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/23.jpg)
23 ECE 448 – FPGA and ASIC Design with VHDL
Interface
µP memory
µP memory . . .
µP µP . . .
I/O Interface
FPGA memory
FPGA memory
. . .
FPGA FPGA . . .
I/O
Microprocessor system Reconfigurable system
What is a Reconfigurable Computer?
![Page 24: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/24.jpg)
24 ECE 448 – FPGA and ASIC Design with VHDL
Reconfigurable Supercomputers
Machine Released
SRC 6 from SRC Computers Cray XD1 from from Cray SGI Altix from SGI SRC 7 from SRC Computers, Inc,
2002 2005 2005 2006
![Page 25: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/25.jpg)
25 ECE 448 – FPGA and ASIC Design with VHDL
Pros and cons of reconfigurable computers
+ can be programmed using high-level programming languages, such as C, by mathematicians & scientist themselves + facilitates hardware/software co-design + shortens development time, encourages experimentation and complex optimizations + allows sharing costs among users of various applications - high entry cost (~$100,000) - hardware aware programming - limited portability - limited availability of libraries - limited maturity of tools.
![Page 26: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/26.jpg)
26 ECE 448 – FPGA and ASIC Design with VHDL
SRC Programming Model
Microprocessor FPGA main.c
function_1() function_2()
ANSI C
function_1
function_2
macro_1(a, b, c) macro_2(b, d) macro_2(c, e)
macro_3(s, t) macro_1(n, b) macro_4(t, k)
FPGA
Macro_1
Macro_2 Macro_2
a
b c
d e MAP C
(subset of ANSI C)
I/O
I/O
Libraries of macros
VHDL
macro_1 macro_2 macro_3 macro_4 ……………………….
![Page 27: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/27.jpg)
27 ECE 448 – FPGA and ASIC Design with VHDL
SRC Compilation Process
Object files
Application sources Macro sources
MAP Compiler µP Compiler
Logic synthesis
Place & Route
Linker
.v files
.bin files
. ngo files
.o files .o files
Application executable
Configuration bitstreams
HDL sources
Netlists
.c or .f files . vhd or .v files
Logic synthesis
Place & Route
Linker
.v files
.bin files
. ngo files
HDL sources
. or .mc or .mf files
![Page 28: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/28.jpg)
28 ECE 448 – FPGA and ASIC Design with VHDL
Library Development - SRC
HLL (C, Fortran)
HDL (VHDL, Verilog)
µP system
FPGA system
Application Programmer
Library Developer
HLL (C, Fortran)
HLL (C, Fortran)
LLL (ASM)
HLL (C, Fortran)
![Page 29: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/29.jpg)
29 ECE 448 – FPGA and ASIC Design with VHDL
SRC Programming Environment
+ very easy to learn and use + standard ANSI C + hides implementation details + very well integrated environment + mature - in production use for over 4 years with constant improvements - subset of C - legacy C code requires rewriting - C limitations in describing HW (paralellism, data types) - closed environment, limited portability of code to HW platforms other than SRC
![Page 30: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/30.jpg)
30 ECE 448 – FPGA and ASIC Design with VHDL
Application Development for Reconfigurable Computers
Program Entry
Compilation
Execution
Platform mapping
Debugging & Verification
![Page 31: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/31.jpg)
31 ECE 448 – FPGA and ASIC Design with VHDL
Ideal Program Entry
Program Entry
Function
![Page 32: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/32.jpg)
32 ECE 448 – FPGA and ASIC Design with VHDL
Actual Program Entry
SW/HW Partitioning
Data Transfers & Synchronization
Use of Internal and External Memories
Sequence of Run-time Reconfigurations
Use of FPGA Resources
(multipliers, µP cores)
Preferred Architectures
Program Entry
Function
FPGA Mapping
SW/HW Interface
![Page 33: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/33.jpg)
33
AutoESL Design Technologies, Inc. (25 employees) Flagship product: AutoPilot, translating C/C++/System C to VHDL or Verilog • Acquired by the biggest FPGA company, Xilinx Inc., in 2011 • AutoPilot integrated into the primary Xilinx toolset, Vivado, as Vivado HLS, released in 2012 “High-Level Synthesis for the Masses”
Cinderella Story
![Page 34: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/34.jpg)
High Level Language C, C++, System C
Hardware Description Language VHDL or Verilog
Vivado HLS
Vivado HLS
![Page 35: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/35.jpg)
High-Level Synthesis
HDL Code
Physical Implementation FPGA Tools
Netlist
Post Place & Route
Results
Functional Verification
Timing Verification
Reference ImplementaAon in C
Test Vectors
Manual Modifications (pragmas, tweaks)
HLS-‐ready C code
HLS-Based Development and Benchmarking Flow
![Page 36: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/36.jpg)
36
– Open-source HLS Tool
• Developed at the University of Toronto • Faculty supervisors: Jason H. Anderson and Stephen Brown • FPL Community Award 2014
– High-Level Synthesis from C to Verilog – Targets Altera FPGAs (extension to Xilinx relatively simple) – Two flows
• Pure Hardware • Hardware/Software Hybrid = Tiger MIPS + hardware accelerator(s) + Avalon bus + shared on-chip and off-chip memory
LegUp – Academic Tool for HLS
![Page 37: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/37.jpg)
37
– Domain specific language for cryptology: Cryptol
• High-level programming language similar to Haskell • Developed by Galois Inc. based in Portland, USA
– High-Level Synthesis from Cryptol to efficient Software and Hardware
Cryptol – New Language for Cryptology
Modified C
SW benchmarking HW benchmarking SW benchmarking HW benchmarking
Cryptol Reference C
Optimized C
HLS SW HLS HW HLS
HDL HDL Optimized C
![Page 38: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/38.jpg)
Source: The Zynq Book
Levels of Abstraction in FPGA Design
![Page 39: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/39.jpg)
Source: The Zynq Book
High-Level Synthesis vs. Logic Synthesis
![Page 40: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/40.jpg)
Source: The Zynq Book
Algorithm and Interface Synthesis
![Page 41: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/41.jpg)
Source: The Zynq Book
Vivado HLS Design Flow
![Page 42: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/42.jpg)
Source: The Zynq Book
Design Trade-offs Explored Using HLS
![Page 43: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/43.jpg)
Source: The Zynq Book
C Functional Verification and C/RTL Cosimulation
in Vivado HLS
![Page 44: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/44.jpg)
Vivado HLS
![Page 45: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/45.jpg)
Source: The Zynq Book
Vivado HLS Scheduling and Binding
![Page 46: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/46.jpg)
Source: The Zynq Book
Vivado HLS Scheduling and Binding
Scheduling – translation of the RTL statements interpreted from the C code into a set of operations, each with an associated duration in terms of clock cycles. Affected by the clock frequency, uncertainty, target technology, and user directives. Binding - associating the scheduled operations with the physical resources of the target device.
![Page 47: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/47.jpg)
Source: The Zynq Book
Three Possible Outcomes from HLS Average of 10 numbers
![Page 48: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/48.jpg)
Source: The Zynq Book
Vivado HLS Synthesis Process
![Page 49: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/49.jpg)
Native Integer Data Types of C
Source: The Zynq Book
![Page 50: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/50.jpg)
Arbitrary Precision Integer Data Types of C and C++ Accepted by Vivado HLS
Source: The Zynq Book
![Page 51: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/51.jpg)
Arbitrary Precision Integer Types of C and C++
Source: The Zynq Book
![Page 52: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/52.jpg)
Native Floating-Point Data Types of C
Source: The Zynq Book
![Page 53: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/53.jpg)
Fixed-point Word Format
Source: The Zynq Book
![Page 54: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/54.jpg)
Arbitrary Precision Fixed-Point Data Types used in Vivado HLS
Source: The Zynq Book
W – total width, I – number of integer bits Q – quantization mode, O – overflow mode,
N – number of saturation bits in overflow wrap modes
![Page 55: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/55.jpg)
Quantization modes for for the C++ ap_fixed and ap_ufixed types
Source: The Zynq Book
![Page 56: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/56.jpg)
Truncation to zero
Source: UG902 Vivado Design Suite User Guide, High-Level Synthesis
![Page 57: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/57.jpg)
Overflow modes for for the C++ ap_fixed and ap_ufixed types
Source: The Zynq Book
![Page 58: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/58.jpg)
Wraparound
Source: UG902 Vivado Design Suite User Guide, High-Level Synthesis
![Page 59: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/59.jpg)
C++ code with the declaration of fixed point variables
Source: The Zynq Book
![Page 60: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/60.jpg)
System C Data Types
Source: The Zynq Book
![Page 61: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/61.jpg)
An Example Top-Level Function for HLS
Source: The Zynq Book
![Page 62: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/62.jpg)
Simplified Interface Diagram for the Example Top-Level Function
Source: The Zynq Book
![Page 63: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/63.jpg)
Synthesis of Port Directions
Source: The Zynq Book
![Page 64: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/64.jpg)
Default Port Level Types and Protocols
Source: The Zynq Book
![Page 65: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/65.jpg)
Data flow between Vivado HLS blocks
Source: The Zynq Book
![Page 66: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/66.jpg)
RTL Interface Diagram Showing Default Block Level Ports and Protocols
Source: The Zynq Book
![Page 67: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/67.jpg)
67
Can High-Level Synthesis Compete Against a Hand-Written Code in the
Cryptographic Domain? A Case Study
Ekawat Homsirikamol & Kris Gaj George Mason University
USA
Project supported by NSF Grant #1314540
![Page 68: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/68.jpg)
Primary Author
Ekawat Homsirikamol a.k.a “Ice”
Working on the PhD Thesis entitled
“A New Approach to the Development of Cryptographic Standards Based
on the Use of High-Level Synthesis Tools”
![Page 69: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/69.jpg)
69
Manual Design
HDL Code
Manual Optimization FPGA Tools
Netlist
Post Place & Route
Results
Functional Verification
Timing Verification
Informal SpecificaAon Test Vectors
Traditional Development and Benchmarking Flow
![Page 70: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/70.jpg)
70
Manual Design
HDL Code
Option Optimization FPGA Tools
Netlist
Post Place & Route
Results
Functional Verification
Timing Verification
Informal SpecificaAon Test Vectors
Extended Traditional Development and Benchmarking Flow
GMU ATHENa
![Page 71: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/71.jpg)
ATHENa – Automated Tool for Hardware EvaluatioN
71
Benchmarking open-source tool, written in Perl, aimed at an
AUTOMATED generation of OPTIMIZED results for MULTIPLE hardware platforms
Currently under development at George Mason University
http://cryptography.gmu.edu/athena
![Page 72: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/72.jpg)
72
• batch mode of FPGA tools
• ease of extraction and tabulation of results • Text Reports, Excel, CSV (Comma-Separated Values)
• optimized choice of tool options • GMU_optimization_1 strategy
Generation of Results Facilitated by ATHENa
vs.
![Page 73: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/73.jpg)
73
High-Level Synthesis
HDL Code
Option Optimization FPGA Tools
Netlist
Post Place & Route
Results
Functional Verification
Timing Verification
Reference ImplementaAon in C
Test Vectors
Manual Modifications (pragmas, tweaks)
HLS-‐ready C code
HLS-Based Development and Benchmarking Flow
GMU ATHENa
![Page 74: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/74.jpg)
74
• Algorithm: AES-128 • Mode of operation: Counter (CTR) • Protocol and interface: GMU proposal • Two vendors: Xilinx & Altera • Four different FPGA families
ü Xilinx Spartan-6 (X-S6) ü Xilinx Virtex-7 (X-V7) ü Altera Cyclone IV (A-CIV) ü Altera Stratix V (A-SV)
Case Study
![Page 75: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/75.jpg)
75
• Vivado HLS 2014.1
• Xilinx ISE v14.7
• Altera Quartus II v13.0sp1
• ATHENa v0.6.4 (with GMU_optimization_1)
Tools & Tool Versions
![Page 76: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/76.jpg)
76
Interface & Protocol
![Page 77: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/77.jpg)
77
Top-Level
![Page 78: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/78.jpg)
78
Reference Hardware Design in RTL VHDL
![Page 79: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/79.jpg)
79
RTL Result
Latency = 11 cycles Time between two consecutive outputs = 10 cycles
![Page 80: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/80.jpg)
80
Software Design
Reference Code • Source: P. Barreto and V. Rijmen, “Reference code in
ANSI C v2.2,” Mar. 2002.
HLSv0 • Removed support for decryption • Removed support for different AES variants
![Page 81: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/81.jpg)
81
HLSv0: Xilinx Results
Latency = 7367 cycles
![Page 82: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/82.jpg)
82
HLSv1: Code Refactoring
Refactor the code to match the target AES architecture
• KeyScheduling is performed once per round • Improved Galois field multiplication operation • Included last round as part of the core loop
![Page 83: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/83.jpg)
83
HLSv1: Xilinx Results
Latency = 3224 cycles
![Page 84: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/84.jpg)
84
HLSv2: Optimization directives: ARRAY_RESHAPE
Ø Change an array shape in the output hardware
void AES_encrypt (word8 a[4][4], word8 k[4][4], word8 b[4][4]) { #pragma HLS ARRAY_RESHAPE variable=a[0] complete dim=1 reshape #pragma HLS ARRAY_RESHAPE variable=a[1] complete dim=1 reshape #pragma HLS ARRAY_RESHAPE variable=a[2] complete dim=1 reshape #pragma HLS ARRAY_RESHAPE variable=a[3] complete dim=1 reshape #pragma HLS ARRAY_RESHAPE variable=a complete dim =1 reshape
![Page 85: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/85.jpg)
85
HLSv2: Optimization directives: UNROLL & INLINE
Ø Unroll a loop OutputLoop: for (i = 0; i < 4; i ++) #pragma HLS UNROLL for (j = 0; j < 4; j ++) #pragma HLS UNROLL b[i][j] = s[i][j];
Ø Flatten a function's hierarchy for improved performance
void KeyUpdate (word8 k[4][4], word8 round) { #pragma HLS INLINE ... }
![Page 86: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/86.jpg)
86
HLSv2: Optimization directives: RESOURCE & INTERFACE
Ø Specify the type of FPGA resource to be used by the target variable
word32 rcon[10] = { 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1b, 0x36 };
#pragma HLS RESOURCE variable=Rcon0 core=ROM_1P_1S
Ø Direct how an input/output port should behave, i.e., registered or handshake mode
void AES_encrypt (word8 a[4][4], word8 k[4][4], word8 b[4][4]) { #pragma HLS INTERFACE register port=b
![Page 87: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/87.jpg)
87
HLSv2: Xilinx Results
Latency = 11 cycles
![Page 88: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/88.jpg)
88
HLSv2: HLS vs. RTL, Frequency - Area
![Page 89: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/89.jpg)
89
HLSv2: HLS vs. RTL, Throughput - Area
![Page 90: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/90.jpg)
90
Source of Inefficiencies: Datapath vs. Control Unit
Datapath Control Unit
Data Inputs
Data Outputs
Control Inputs
Control Outputs
Control Signals
Status Signals
Determines • Area • Clock Frequency
Determines • Number of clock cycles
![Page 91: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/91.jpg)
91
Datapath inferred correctly • Frequency and area within 10% of manual designs Control Unit suboptimal • Difficulty in inferring an overlap between completing the last round and reading the next input block • One additional clock cycle used for initialization of the state at the beginning of each round • The formulas for throughput:
RTL: Throughput = Block_size / (#Rounds * TCLK) HLS: Throughput = Block_size / ((#Rounds+2) * TCLK)
Source of Inefficiencies
![Page 92: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/92.jpg)
92
AES-ECB-ENC x2: HLS vs. RTL, Frequency - Area
![Page 93: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/93.jpg)
93
AES-ECB-ENC x2: HLS vs. RTL, Throughput - Area
![Page 94: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/94.jpg)
94
AES-CTR
![Page 95: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/95.jpg)
95
AES-CTR Results
![Page 96: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/96.jpg)
96
Full AES-CTR with I/O processors
![Page 97: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/97.jpg)
97
AES-CTR with IO Results
![Page 98: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/98.jpg)
98
Results for AES
gen_mod_add: if (G_OPERATOR = ADDER) generate end generate;
![Page 99: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/99.jpg)
99
• Area and frequency of designs produced by High-Level Synthesis are comparable to handwritten RTL code
• Small increase in the number of clock cycles reduces
throughput of HLS-based approach • Complex I/O units can be created by HLS-based approach • HLS-based design can compete against handwritten RTL
code when we have a specific architecture and latency in mind while preparing an HLS-ready HLL code
Conclusions
![Page 100: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/100.jpg)
Hardware Benchmarking of SHA-‐3 Finalists
using High-‐Level Synthesis
Ekawat Homsirikamol & Kris Gaj George Mason University
![Page 101: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/101.jpg)
101
• 5 final SHA-3 candidates + old standard SHA-2 • Most efficient sequential architectures (/2h for BLAKE, x4 for Skein, x1 for others) • GMU VHDL codes developed during SHA-3 contest • Reference software implementations in C
included in the submission packages
Hypotheses: • Ranking of candidates will remain the same • Performance ratios HDL/HLS similar across candidates
Our Test Case
![Page 102: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/102.jpg)
102
Manual RTL vs. HLS-based Results: Altera Stratix III
RTL HLS
![Page 103: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/103.jpg)
103
Manual RTL vs. HLS-based Results: Altera Stratix IV
RTL HLS
![Page 104: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/104.jpg)
104
Lack of Correlation for Xilinx Virtex 6
RTL HLS
![Page 105: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/105.jpg)
105
Lack of Correlation for Xilinx Virtex 6
RTL HLS
![Page 106: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/106.jpg)
106
Lack of Correlation for Xilinx Virtex 7
RTL HLS
![Page 107: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/107.jpg)
107
Ratios of Major Results RTL/HLS for Altera Stratix IV
![Page 108: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/108.jpg)
108
Ratios of Major Results RTL/HLS for Xilinx Virtex 6
![Page 109: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/109.jpg)
109
Datapath vs. Control Unit
Datapath Control Unit
Data Inputs
Data Outputs
Control Inputs
Control Outputs
Control Signals
Status Signals
Determines • Area • Clock Frequency
Determines • Number of clock cycles
![Page 110: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/110.jpg)
110
Datapath inferred correctly • Frequency and area within 30% of manual designs Control Unit suboptimal • Difficulty in inferring an overlap between completing the last
round and reading the next input block • One additional clock cycle used for initialization of the state at
the beginning of each round • The formulas for throughput:
RTL: Throughput = Block_size / (#Rounds * TCLK) HLS: Throughput = Block_size / ((#Rounds+2) * TCLK)
Encountered Problems
![Page 111: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/111.jpg)
111
Hypothesis I: • Ranking of candidates in terms of throughput, area, and throughput/
area ratio will remain the same TRUE for Altera Stratix III, Stratix IV
FALSE for Xilinx Virtex 5, Virtex 6, and Virtex 7 Hypothesis II: • Performance ratios HDL/HLS similar across candidates
Hypothesis Check
Stratix III Stratix IV Frequency 0.99-1.30 0.98-1.19 Area 0.71-1.01 0.68-1.02 Throughput 1.10-1.33 1.09-1.27 Throughput/Area
1.14-1.55 1.17-1.59
![Page 112: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/112.jpg)
112
Correlation Between Altera FPGA Results and ASICs
Stratix III FPGA ASIC
![Page 113: ECE699 lecture 9 - George Mason Universityece.gmu.edu › ... › S15 › viewgraphs › ECE699_lecture_9.pdf10 Generation 1 (1980s-early 1990s): research period Generation 2 (mid](https://reader035.fdocuments.net/reader035/viewer/2022062917/5ed483c44945a32c3c4cff7d/html5/thumbnails/113.jpg)
113
Most Promising Methodology & Toolset
High-Level Synthesis Xilinx Vivado HLS
HDL Code
Option Optimization GMU ATHENa
FPGA Tools Altera Quartus II
Reference ImplementaAon in C
Manual Modifications
HLS-‐ready C code
Results
Frequency & Throughput decrease Area increases by no more than 30% compared to manual RTL