Vivado HLS Implementation of Round-2 SHA-3 Candidates
Transcript of Vivado HLS Implementation of Round-2 SHA-3 Candidates
![Page 1: Vivado HLS Implementation of Round-2 SHA-3 Candidates](https://reader034.fdocuments.net/reader034/viewer/2022052313/589d749a1a28abea498b937f/html5/thumbnails/1.jpg)
1
Vivado HLS Implementation of Round-2 SHA-3 Candidates
Farnoud FarahmandECE 646
CERG: http://cryptography.gmu.edu
![Page 2: Vivado HLS Implementation of Round-2 SHA-3 Candidates](https://reader034.fdocuments.net/reader034/viewer/2022052313/589d749a1a28abea498b937f/html5/thumbnails/2.jpg)
2
Cryptographic Standard Contests
time97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17
AES
NESSIE
CRYPTREC
eSTREAM
SHA-3
34 stream 4 HW winnersciphers → + 4 SW winners
51 hash functions → 1 winner
15 block ciphers → 1 winnerIX.1997 X.2000
I.2000 XII.2002
IV.2008
X.2007 X.2012
XI.2004
CAESARI.2013
57 authenticated ciphers → multiple winnersXII.2017
![Page 3: Vivado HLS Implementation of Round-2 SHA-3 Candidates](https://reader034.fdocuments.net/reader034/viewer/2022052313/589d749a1a28abea498b937f/html5/thumbnails/3.jpg)
3
Evaluation Criteria
Security
Software Efficiency Hardware Efficiency
Simplicity
FPGAs ASICs
Flexibility Licensing
µProcessors µControllers
![Page 4: Vivado HLS Implementation of Round-2 SHA-3 Candidates](https://reader034.fdocuments.net/reader034/viewer/2022052313/589d749a1a28abea498b937f/html5/thumbnails/4.jpg)
4
Traditional Development & Benchmarking Flow
ManualDesign
HDLCode
Manual OptimizationFPGATools
Netlist
PostPlace&Route
Results
Functional Verification
Timing Verification
InformalSpecification TestVectors
![Page 5: Vivado HLS Implementation of Round-2 SHA-3 Candidates](https://reader034.fdocuments.net/reader034/viewer/2022052313/589d749a1a28abea498b937f/html5/thumbnails/5.jpg)
5
• Large number of candidates• Long time necessary to develop and verify
RTL (Register-Transfer Level)Hardware Description Language (HDL) codes
• Multiple variants of algorithms (e.g., multiple hash value size)
• Multiple hardware architectures • Dependence on skills of designers
Remaining Difficulties of Hardware Benchmarking
![Page 6: Vivado HLS Implementation of Round-2 SHA-3 Candidates](https://reader034.fdocuments.net/reader034/viewer/2022052313/589d749a1a28abea498b937f/html5/thumbnails/6.jpg)
6
High-Level Synthesis (HLS)
High Level Language(e.g. C, C++, SystemC)
Hardware Description Language(e.g., VHDL or Verilog)
High-Level Synthesis
![Page 7: Vivado HLS Implementation of Round-2 SHA-3 Candidates](https://reader034.fdocuments.net/reader034/viewer/2022052313/589d749a1a28abea498b937f/html5/thumbnails/7.jpg)
7
High-Level Synthesis
HDLCode
Automated OptimizationFPGATools
Netlist
PostPlace&Route
Results
Functional Verification
Timing Verification
ReferenceImplementationinC
TestVectorsManual Modifications
(pragmas, tweaks)
HLS-readyCcode
Proposed HLS-Based Development and Benchmarking Flow
SoftwareVerification
![Page 8: Vivado HLS Implementation of Round-2 SHA-3 Candidates](https://reader034.fdocuments.net/reader034/viewer/2022052313/589d749a1a28abea498b937f/html5/thumbnails/8.jpg)
8
Examples of Source Code Modifications (1)
for (i = 0; i < 16; i++) #pragma HLS UNROLL
b[i] = a[i]+4;
Unrolling of loops:
void main() { int A[16];#pragma HLS ARRAY_RESHAPE Variable=A complete dim=1
...}
Array Reshaping:
![Page 9: Vivado HLS Implementation of Round-2 SHA-3 Candidates](https://reader034.fdocuments.net/reader034/viewer/2022052313/589d749a1a28abea498b937f/html5/thumbnails/9.jpg)
9
Examples of Source Code Modifications (2)
Function Reuse:
![Page 10: Vivado HLS Implementation of Round-2 SHA-3 Candidates](https://reader034.fdocuments.net/reader034/viewer/2022052313/589d749a1a28abea498b937f/html5/thumbnails/10.jpg)
10
• 4 Round 2 SHA-3 candidates + 5 Finalists are done previously • Basic iterative architecture• GMU Hardware Interface for SHA core• Padding done in software• RTL Implementations developed previously • Starting point: Informal specifications and reference software
implementations in C provided by the algorithm authors• Post P&R results generated for
- Xilinx Virtex 5 using Xilinx ISE • No use of BRAMs or DSP Units
Our Test Case
![Page 11: Vivado HLS Implementation of Round-2 SHA-3 Candidates](https://reader034.fdocuments.net/reader034/viewer/2022052313/589d749a1a28abea498b937f/html5/thumbnails/11.jpg)
11
Parameters of Hash Functions
Algorithm Number of Rounds
Block Size Hash size Chain Size
Luffa 8 256 256 768Cubehash 16 256 256 1024BMW 1 512 256 512ECHO 8 1536 256 512
![Page 12: Vivado HLS Implementation of Round-2 SHA-3 Candidates](https://reader034.fdocuments.net/reader034/viewer/2022052313/589d749a1a28abea498b937f/html5/thumbnails/12.jpg)
12
SHA core Interface
![Page 13: Vivado HLS Implementation of Round-2 SHA-3 Candidates](https://reader034.fdocuments.net/reader034/viewer/2022052313/589d749a1a28abea498b937f/html5/thumbnails/13.jpg)
13
HLS and RTL Implementations Parameters
Algorithm #Rounds Cycles/BlockRTL
Cycles/BlockHLS
IntervalHLS
Luffa 8 9 10 11Cubehash 16 16 17 18BMW 1 2 1 1ECHO 8 26 25 26
Interval = Number of Cycles require to processtwo consecutives blocks
![Page 14: Vivado HLS Implementation of Round-2 SHA-3 Candidates](https://reader034.fdocuments.net/reader034/viewer/2022052313/589d749a1a28abea498b937f/html5/thumbnails/14.jpg)
14
Datapath vs. Control Unit
Datapath ControlUnit
Data Inputs
Data Outputs
Control Inputs
Control Outputs
Control Signals
StatusSignals
Determines• Area• Clock Frequency
Determines• Number of clock cycles
![Page 15: Vivado HLS Implementation of Round-2 SHA-3 Candidates](https://reader034.fdocuments.net/reader034/viewer/2022052313/589d749a1a28abea498b937f/html5/thumbnails/15.jpg)
15
Control Unit suboptimal• Difficulty in inferring an overlap between completing the last
round and reading the next input block• One additional clock cycle used for initialization of the state at
the beginning of each round
Encountered Problems
![Page 16: Vivado HLS Implementation of Round-2 SHA-3 Candidates](https://reader034.fdocuments.net/reader034/viewer/2022052313/589d749a1a28abea498b937f/html5/thumbnails/16.jpg)
16
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
HLS RTL
Area
Cubehash Luffa BMW ECHO
RTL vs. HLS (Area):
![Page 17: Vivado HLS Implementation of Round-2 SHA-3 Candidates](https://reader034.fdocuments.net/reader034/viewer/2022052313/589d749a1a28abea498b937f/html5/thumbnails/17.jpg)
17
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
HLS RTL
Throughput
Cubehash BMW Luffa ECHO
RTL vs. HLS (Throughput):
![Page 18: Vivado HLS Implementation of Round-2 SHA-3 Candidates](https://reader034.fdocuments.net/reader034/viewer/2022052313/589d749a1a28abea498b937f/html5/thumbnails/18.jpg)
18
Throughput vs. Area:
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
0 1,000 2,000 3,000 4,000 5,000 6,000 7,000
Thro
ughp
ut
Area
HLSLuffaCubeHashBMWECHO
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
0 1,000 2,000 3,000 4,000 5,000 6,000 7,000
Thro
ughp
utArea
RTLLuffaCubeHashBMWECHO
![Page 19: Vivado HLS Implementation of Round-2 SHA-3 Candidates](https://reader034.fdocuments.net/reader034/viewer/2022052313/589d749a1a28abea498b937f/html5/thumbnails/19.jpg)
19
Overall Results:
HLS RTLFreq. TP A TP/A Freq TP A TP/A
Luffa 217.39 5,565 1,390 4.00 334.4 9,513 1,023 9.29Cubehash 168.71 2,399 1,076 2.23 248.3 3,972 663 5.99BMW 9.45 4,838 4,859 0.99 34 8,704 4,350 2.00ECHO 145.77 8,612 6,389 1.34 203.8 12,094 4,888 2.47
RTL/HLSFreq. TP A TP/A
Luffa 1.53 1.70 0.73 2.32Cube Hash 1.47 1.65 0.61 2.68BMW 3.59 1.79 0.89 2.00ECHO 1.39 1.40 0.76 1.83
![Page 20: Vivado HLS Implementation of Round-2 SHA-3 Candidates](https://reader034.fdocuments.net/reader034/viewer/2022052313/589d749a1a28abea498b937f/html5/thumbnails/20.jpg)
20
Conclusions
• High-level synthesis offers a potential to facilitate hardware benchmarking during the design of cryptographic algorithms and at the early stages of cryptographic contests
• Case study based on 4 Round 2 SHA-3 candidates demonstrated correct ranking for all of candidates using all major performance metrics
• More research & development needed to overcome remaining difficulties
• Suboptimal control unit of HLS implementations• Wide range of RTL to HLS performance metric ratios• A few potentially suboptimal HLS or RTL implementations• Efficient and reliable generation of HLS-ready C codes
![Page 21: Vivado HLS Implementation of Round-2 SHA-3 Candidates](https://reader034.fdocuments.net/reader034/viewer/2022052313/589d749a1a28abea498b937f/html5/thumbnails/21.jpg)
21
Comments? Questions?
Suggestions?
CERG: http://cryptography.gmu.edu
Thank you!