Cryptographic applications Kris Gaj Tarek El-Ghazawi & GMU, … · 2006-02-21 · Tarek El-Ghazawi...
Transcript of Cryptographic applications Kris Gaj Tarek El-Ghazawi & GMU, … · 2006-02-21 · Tarek El-Ghazawi...
Cryptographic applications
Kris GajTarek El-Ghazawi
& GMU, GWU students
Cipher
message
cryptographickey
K bits
ciphertext2
Definition of a cipher
M C
family of encipheringtransformations
family of decipheringtransformations
K key K
message space ciphertext spacekey space
enciphering transformation EK(M)
deciphering transformation DK(C)
K ∈ K 3 DK(EK(M)) = MM ∈ M
Substitution Cipher
Key =a b c d e f g h i j k l m n o p q r s t u v w x y z
f q i s h n c v j t y a u w d r e x l b m z o g k p
TO BE OR NOT TO BE
BD QH DX WDB BD QH
enciphering
decipheringTO BE OR NOT TO BE
Number of keys = 26! ≈ 4 ⋅ 1026
4
Secret-key (Symmetric) Cryptosystems
key of Alice and Bob - KABkey of Alice and Bob - KAB
Network
DecryptionEncryptionBobAlice
5
Key Distribution Problem
6
N - Users N · (N-1)2
Keys
Users Keys
5,0001001000 500,000
Public Key (Asymmetric) Cryptosystems
Public key of Bob - KB Private key of Bob - kB
Alice Bob
Network
Encryption Decryption
7
Key exchange for secret-key ciphers
Network
session key(random secret-key) BobAlice
Bob’s public key Bob’s private key
Session keyencrypted using Bob’s public key
Message encrypted using session key
8
Digital SignatureAlice Bob
SignatureMessageSignatureMessage
9
Hash function
Public keycipher
Hash function
Alice’s public key
Hash value 1
Hash value 2
Hash value
Public key cipher
yes no
Alice’s private key
High-throughput secret-key encryption
10
High-Throughput Encryption. . . .
Mi
Mi+1
Mi+2
K0
11
Encryption
Encryption algorithms:
DES, 3DES, AES, RC5, IDEA, etc.
Ci
Ci+1
Ci+2
Fully Pipelined Architecture
. . . .
. . . .
. . . .
Loop unrolling
Pipeline stages inside of cipher rounds
New input & new output every clock cycle
Round 1
Round 2
. . .. . . .
Round k
12
Secret-key cipher breaking
13
Secret-key cipher breaking
Given:
14
Looked for:
Method:
remaining plaintext
ciphertextguessed fragment of the plaintext
or key
successivekeys
exhaustive key search (brute-force) attack
cipher
Secret-key breaking
Message – Ciphertext pair
C0M0
Negligibly smallinput/output
Huge amountof computations
K1 K2 K3 KN
Cipherbreaker
…
Generated by the cipher breakerCorrect key
15
Public-key cryptography forkey exchange
and digital signatures
16
RSA public-key encryption
C = f(M) = Me mod N C
PUBLIC KEYmessage ciphertext
M
M = f-1(C) = Cd mod N
PRIVATE KEY
P, Q - large prime numbersN = P ⋅ Q17e ⋅ d ≡ 1 mod ((P-1)(Q-1))
RSA keysPUBLIC KEY PRIVATE KEY
{ e, N } { d, P, Q }
P, Q - large prime numbersP, Q:
N: N = P ⋅ Q
e: gcd(e, P-1) = 1 and gcd(e, Q-1) = 1
d:18
e ⋅ d ≡ 1 mod ((P-1)(Q-1))
Why public-key cryptography is a good application for reconfigurable
computers?
• computationally intensive arithmetic operations
• unconventionally long operand sizes (160-2048 bits)
• multiple algorithms, parameters,key sizes, and architectures= need for reconfiguration
19
Elliptic Curve Cryptosystems (ECC)
a family of cryptosystems, rather than a single cryptosystem = added security butneed for reconfigurationpublic key (asymmetric) cryptosystemsused for key agreement and digital signaturesimplementations must be optimized forminimum latency rather than maximumthroughput = limited speed-up fromparallel processing
20
Classes of applications
1. input/output intensive applications• bulk data encryption
(DES, IDEA, and RC5 encryption)
2. computationally intensive applications• secret-key cipher breaking based on
the exhaustive key search (DES, IDEA, RC5 breakers)
• public-key cipher breaking based on factoring
3. latency-critical applications• cipher key agreement and signature
(ECC schemes, RSA)
21
Secret key cipher libraries
1. Secret key cipher encryption and decryption
2. Secret key cipher breaking
• DES• IDEA• RC5
• DES• IDEA• RC5
22
Public key cipher libraries1. Operations in the binary Galois Fields GF(2m)
a. polynomial basisb. normal basis
2. Multiprecision integer arithmetic
3. Elliptic Curve Operations
- addition- doubling- scalar multiplication
23
Basic operations of ECCBasic operations in Galois Field GF(2m)
• addition and subtraction (xor): x+y, x-y (XOR)• multiplication, squaring: x ⋅ y, x2
• inversion: x-1
Basic operations on points of an Elliptic Curve• addition of points: P + Q• doubling a point: 2 P• projective to affine coordinate: P2A
Complex operations on points of an Elliptic Curve
24
• scalar multiplication: k ⋅ P = P + P + …+Pk times
Hierarchy of functions
High level kP
P+Q 2P projective_to_affine(P2A)
MUL
INV
Medium level
Low level 2
ROTXORLow level 1
SRC Program Partitioning
C function for µP
C function for MAP
VHDLmacro
µP system
FPGA system
HLL
HDL
26
Investigated Partitioning Schemes
27
µP Software Only
C function for µP kP
C function for FPGA
VHDLmacro
28
Based on public-domain code by Rosing M., Implementing Elliptic Curve Cryptography, Manning, 1999
0HL1 PartitioningC function for µP
0
MUL4
C functionfor FPGA
VHDL macros
29
ROTROT XOR
H
L1V_ROTVARROT
kPP2A
kP
P+Q 2P
MUL2MUL2 MULMUL
INVINV
P2AP+Q2P
0HL2 PartitioningC function for µP
0
MUL4
C functionfor FPGA
VHDL macros
ROTROT XOR
H
V_ROTINV
kPP2A
kP
P+Q 2P
MUL2MUL2
P2AP+Q2P
L2
30
0HM Partitioning0C function
for µP
31
C function for FPGA
VHDL macros
P+QP+Q 2P2P P2AP2A
kPkP H
M
00H Partitioning (VHDL only)
C function for µP 0
C function for FPGA 0
VHDLmacro kP H
32
Results
33
Timing Measurements
.c file.mc file
MAPAlloc.
MAPFree
DMA
DataOut
DMA
Data In
FPGA
Computation
End-to-End time (SW)
MAPfunction MAP function
FPGAConfigure
End-to-End time (HW)
Configuration time
MAPAllocation
time
MAP Release
Time
Results (Latency)
0HL1 866 37 472 14 394 893 1.460HL2 863 37 469 14 394 895 1.450HM 592 37 201 12 391 1305 1VHDL macro 592 39 201 17 391 1305 1
System Level Architecture
Total Overhead
(µs)
Data Transfer
Out Time (µs)
Speedup vs.
Software
Slow-down vs. VHDL macro
Software
FPGA Compu-tation Time (µs)
Data Transfer
In Time(µs)
End-to-End Time (µs)
772,519
350
100
200
300
400
500
600
700
800
900
us
0HL1 0HL2 0HM 00H (VHDL)
Different Architectures
End to End Time
FPGA ComputationTime
Results (Area)
Software N/A0HL1 99 1.68 57 1.3 68 2.610HL2 92 1.56 52 1.18 62 2.380HM 75 1.27 48 1.09 39 1.500H 59 1 44 1 26 1
System Level Architecture
% of CLB slices
(out of 33792)
CLB increase vs. pure VHDL
% of LUTs(out of 67,584)
LUT increase vs. pure VHDL
% of FFs
(out of 67,584)
FF count increase vs. pure VHDL
360
10
20
30
40
50
60
70
80
90
100
%
0HL1 0HL2 0HM 00HDifferent Architectures
CLB slicesLUTFF
Number of lines of code
Algorithm Partitioning
Scheme
VHDL Macro Wrapper
MAP C Main C
0HL1 1007 260 371
349
185
78
153
153
153
153
1291 230
1744 160
36
0HM
1960
0HL2
VHDL macro
37
Conclusions
38
Conclusions
Assuming focus on:
Timing Resources
Ease of programming
Conclusions – cont.The best implementation approach:
0HL1 partitioning scheme
MUL4
C functionfor MAP
VHDL macros
ROTROT XOR
C function for µP
0
H
L1V_ROTV_ROT
kP
INV
P2AP+Q 2P
kP
INV
P2AP+QP+Q 2P2P
MUL2MUL2 MULMUL
893 speedup vs. software and only 0.46 times slowdown versus pure VHDL with ease of
implementation
Conclusions
• Elliptic Curve Cryptosystem implementation challenging for reconfigurable computers because of
• optimization for latency rather than throughput• limited amount of parallelism
• First publication showing a 1000x speed-up for a reconfigurable computer applicationoptimized for data latency
41
General hierarchy of library files suggested
by SRC Computers Inc.
42
Structure of the macro repository
< top of repository >
< macros >
<lib # 1 > <lib # 2 > <lib # 3 >
common rev_d rev_e
macro1 macro2 macro3
rev_f
DataSheethdlfile DebugCodeFileInfoFile BlkBoxFile43
Files describing the macro
Platform independent– HDL file: macro.v or macro.vh
• Verilog or VHDL code defining the macro
– Debug Code File: macro.c • provides the equivalent C functionality for the macro
– Data sheet file: datasheet• contains the documentation for the macro
Platform dependent– Blk Box File: blackbox.v
• Interface (black box) definition for the macro in Verilog
– Info File: info• Info file entry for this macro
44
Software libraries and their role in the development
of SRC libraries
45
Roles of software libraries
1. source of test vectors for VHDL macros
2. emulation of hardware in the debug mode
3. performance comparison
46
How to approach porting your application to reconfigurable computers?
1. Identify class of applications
2. Identify basic operations required by your applications
3. Determine the existence of the RC library of such operations
4. Determine the existence of the microprocessor library of such operations
5. Determine the right granularity for the requiredlibrary operations
47