Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic...

92
MRABET Amine Montgomery Algorithm for Modular Multiplication with Systolic Architecture LIASD Paris 8 ENIT-TUNIS EL MANAR University SAS - CMP - Gardanne SPACE 2016 1

Transcript of Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic...

Page 1: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

MRABET Amine

Montgomery Algorithm for Modular Multiplication

with Systolic Architecture

LIASD Paris 8

ENIT-TUNIS EL MANAR University

SAS - CMP - Gardanne

SPACE 2016

1

Page 2: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

1. Introduction for pairing

2. Montgomery Multiplication (CIOS)

3. Architecture

4. Results

5. Conclusion and Perspectives

Plan

2

Page 3: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

1. Introduction for pairing

2. Montgomery Multiplication (CIOS)

3. Architecture

4. Results

5. Conclusion and Perspectives

Plan

2

Page 4: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

This work is part of the hardware implementation of

asymmetric cryptography primitives, such as Optimal-Ate

pairing based on elliptic curves, the cryptographic systems

based on elliptic curves and RSA,

3

General Context

Page 5: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

This work is part of the hardware implementation of

asymmetric cryptography primitives, such as Optimal-Ate

pairing based on elliptic curves, the cryptographic systems

based on elliptic curves and RSA,

Which are the best known methods in asymmetric encryption.

General Context

3

Page 6: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

Let G1 and G2 be two additive groups and let G3 be a

multiplicative group.

Pairing is an application

e : G1 × G2 G3 with the following properties:

4

Definition

Page 7: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

Definition

4

Let G1 and G2 be two additive groups and let G3 be a

multiplicative group.

Pairing is an application

e : G1 × G2 G3 with the following properties:

e is non degenerate :

if P ∈ G1, P ≠ 0 it exists Q ∈ G2 such as e(P, Q) ≠ 1

and

if Q ∈ G2, Q ≠ 0 it exists P ∈ G1 such as e(P, Q) ≠ 1.

Page 8: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

e is non degenerate :

if P ∈ G1, P ≠ 0 it exists Q ∈ G2 such as e(P, Q) ≠ 1

and

if Q ∈ G2, Q ≠ 0 it exists P ∈ G1 such as e(P, Q) ≠ 1.

Bilinearity:

e(xP, yQ) = e(P,Q)xy ,

e(xP, yQ)z = e(yP, zQ)x = e(zP, xQ)y = e(P,Q)xyz

Definition

4

Let G1 and G2 be two additive groups and let G3 be a

multiplicative group.

Pairing is an application

e : G1 × G2 G3 with the following properties:

Page 9: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

The bilinearity of the pairings allowed the construction of

protocols.

5

Pairing protocols

Page 10: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

5

Pairing protocols

Diffie–Hellman key exchange ( Joux 2001)

Identity-Based Cryptography(Boneh and Franklin)

Short signature schemes (Boneh, Lynn, Shacham)

The bilinearity of the pairings allowed the construction of

protocols.

Page 11: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

Trusted authority

Alice

IA

Pairing protocolsExample of Cryptography Based on Identity

6

Bob

IB

S: The secret of the trusted authority

The Public keys are the identities of people.

Page 12: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

S: The secret of the trusted authority

The Public keys are the identities of people.

The private keys are Constructed by the trusted authority and

Transmitted to users.

Trusted authority

Bob Alice

IB IA

6

PB=S*IB PA=S*IA

Pairing protocolsExample of Cryptography Based on Identity

e (PA, IB) = e (IA, IB) se (PB, IA) = e (IA, IB) s

Page 13: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

7

Alice wants to send a message to Bob:

She chooses an integer a randomly,

She retrieves Bob's public key : IB,

She calculates the pairing e(IB;Q0)a,

She sends to Bob : [ aP, M ⊕H2 (e(IB;Q0)a) ]=[U,V]

Pairing protocols

Example of Cryptography Based on Identity

Encryption step of the clear message M

Page 14: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

8

Bob follows the following steps:

He contacts the trusted authority to retrieve his private key

PB = sIB,

He finds the message by calculating V ⊕ H2 (e(PB,U)).

The message : M

The bilinearity of pairings :

e(PB,U) = e(sIB,aP) = e(IB,P)as = e(IB,sP)a

Pairing protocolsExample of Cryptography Based on Identity

Decryption step of the encrypted message.

Page 15: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

Different pairings

9

Weil pairing

eW

: E (Fp)[r ] × E(Fpk)/rE (Fpk) → F*pk

(P,Q) → (-1)r fr, p

(Q) / fr ,Q

(P)

Miller Lite fr, p

(Q)

Miller Full fr ,Q

(P)

Inversion

Multiplication

Page 16: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

Different pairings

9

Weil pairing

eW

: E (Fp)[r ] × E(Fpk)/rE (Fpk) → F*pk

(P,Q) → (-1)r fr, p

(Q) / fr ,Q

(P)

Tate pairing

eT: E (Fp)[r ] × E(Fpk)/rE (Fpk) → F*pk

(P,Q) → [ fr, p(Q) ] (p^k- 1)/r

Tate pairing is defined with the same parameters E, Fp, r, k

than Weil pairing.

For the calculation of Tate pairing we make log2(r) iterations during

the Miller algorithm, where r is the order of the subgroups used.

Page 17: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

The main advantage compared to Tate pairing is the reduction of the number of

iterations made during the Miller algorithm.

log2(T) where T = t − 1, and t is the Frobenius trace on E(Fp).

The disadvantage of Ate pairing is that it corresponds to a Miller Full application.

Different pairings

Ate paring

G1 = E[r] ∩ Ker(p-[1]) = E(Fp)[r], G2 = E[r] ∩ Ker(p-[p])

eA

: G1 × G2 → F*pk;

(P,Q) → [ fT, Q

(P) ] (p^k- 1)/r

10

Page 18: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

The calculation is made by an execution of Miller Lite, which would alleviate the

complexity of the calculations.

Different pairings

Twisted Ate pairingG1 = E[r] ∩ Ker(p-[1]) = E(Fp)[r], G2 = E[r] ∩ Ker(p-[p])

eTA

: G1 × G2 → F*pk;

(P,Q) → [ fT, p

(Q) ] (p^k- 1)/r

11

Page 19: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

Different pairings

Ate-Optimal (OATE) pairing

Ate-Optimal pairing improves Ate pairing by reducing the number of iterations

in the Miller algorithm used to calculate f,Q(P).

In the case of BN curves , OATE pairing is defined by:

where = 6t+2 (t the parameter of BN curves)

The calculation is made by an execution of Miller Lite, which would alleviate the

complexity of the calculations.

Twisted Ate pairingG1 = E[r] ∩ Ker(p-[1]) = E(Fp)[r], G2 = E[r] ∩ Ker(p-[p])

eTA

: G1 × G2 → F*pk;

(P,Q) → [ fT, p

(Q) ] (p^k- 1)/r

11

Page 20: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

The basic operations in the Finite field :

Addition

Subtraction

Multiplication

inversion

Basic operations

12

Page 21: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

The basic operations in the Finite field :

Addition

Subtraction

Multiplication

inversion

Constitute the essential of calculation time of pairing.

That’s why the optimization of these operation is the most

important

12

Basic operations

Page 22: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

1. Introduction for pairing

2. Montgomery Multiplication (CIOS)

3. Architecture

4. Results

5. Conclusion and Perspectives

Plan

13

Page 23: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

Reminder: Montgomery algorithm

14

Page 24: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

Reminder: Montgomery algorithm

14

Ordinary domain Montgomery domain

a M(a)=a.R mod p

b M(b)=b.R mod p

a.b M(a.b)=a.b.R mod p

Conversion between Ordinary Field and Montgomery

Page 25: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

The CIOS method improves the Montgomery algorithm by

integrating multiplication and reduction.

How?

[1] Analyzing and Comparing Montgomery Multiplication Algorithms, IEEE Micro. , juin1996

Cetin Kaya Koç, Tolga Acar and Burton S. Kaliski Jr.

The Coarsely Integrated Operand Scanning method [1] ?

15

Page 26: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

The CIOS method improves the Montgomery algorithm by

integrating multiplication and reduction.

How?

Instead of multiplying axb then performe to reduction, it

allows to alternate between the iterations of multiplication

and reduction.

[1] Analyzing and Comparing Montgomery Multiplication Algorithms, IEEE Micro. , juin1996

Cetin Kaya Koç, Tolga Acar and Burton S. Kaliski Jr.

15

The Coarsely Integrated Operand Scanning method [1] ?

Page 27: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

What is a systolic architecture ?

16

It’s a network composed of a large number of cells, Each

cell receives data from the neighboring cells, performs a

simple calculation, and then transmits the results, always to

neighboring cells.

Page 28: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

What is a systolic architecture ?

16

It’s a network composed of a large number of cells, Each

cell receives data from the neighboring cells, performs a

simple calculation, and then transmits the results, always to

neighboring cells.

Page 29: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

What is a systolic architecture ?

16

It’s a network composed of a large number of cells, Each

cell receives data from the neighboring cells, performs a

simple calculation, and then transmits the results, always to

neighboring cells.

A systolic architecture provides very simplified elementary

cells. Therefore, this architecture reduces resource

requirements in hardware implementations.

Page 30: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

It’s a network composed of a large number of cells, Each

cell receives data from the neighboring cells, performs a

simple calculation, and then transmits the results, always to

neighboring cells.

A systolic architecture provides very simplified elementary

cells. Therefore, this architecture reduces resource

requirements in hardware implementations.

Our contribution in this work is to combine a systolic

architecture, which is supposed to be the best solution for

FPGA implementations, with the CIOS method of the

Montgomery modular multiplication.

What is a systolic architecture ?

16

Page 31: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

Coarsely Integrated Operand Scanning

17

Coarsely Integrated Operand Scanning

Page 32: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

Coarsely Integrated Operand Scanning

17

Page 33: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

Cutting the algorithm CIOS

17

alpha : the lines 5 and 6

Page 34: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

17

_2alpha : the lines 7,8 and 9

alpha : the lines 5 and 6

Cutting the algorithm CIOS

Page 35: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

17

beta: the lines11 and 12

_2alpha : the lines 7,8 and 9

alpha : the lines 5 and 6

Cutting the algorithm CIOS

Page 36: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

gamma: the lines14 and 15

17

beta: the lines11 and 12

_2alpha : the lines 7,8 and 9

alpha : the lines 5 and 6

Cutting the algorithm CIOS

Page 37: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

_2gamma: the lines16,17 and 18

17

gamma: the lines14 and 15

beta: the lines11 and 12

_2alpha : the lines 7,8 and 9

alpha : the lines 5 and 6

Cutting the algorithm CIOS

Page 38: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

Plan

18

1. Introduction

2. Montgomery Multiplication (CIOS)

3. Architecture

4. Results

5. Conclusion and Perspectives

Page 39: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

1 1 1

1

2 2 2

2 2 2

3 3

3 3

i=0

_

2

3_

2

Multiplication Step

Reduction Step

a0 b0 a0 b1 a0 b2 a0 b3 a0 b4 a0 b5 a0 b6 a0 b7

j=0 j=1 j=2 j=3 j=4 j=5 j=7 j=6

CIOS in Systolic for s=8

19

_2

_2

Page 40: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

1 1 1

1

2 2 2

2 2 2

3 3

3 3

i=0

_

2

3_

2

Multiplication Step

Reduction Step

a0 b0 a0 b1 a0 b2 a0 b3 a0 b4 a0 b5 a0 b6 a0 b7

j=0 j=1 j=2 j=3 j=4 j=5 j=7 j=6

1 1 1

1

2 2 2

2 2 2

3 3

3 3

i=1

_

2

3_

2

19

CIOS in Systolic for s=8

Page 41: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

1 1 1

1

2 2 2

2 2 2

3 3

3 3

i=0

_

2

3_

2

Multiplication Step

Reduction Step

a0 b0 a0 b1 a0 b2 a0 b3 a0 b4 a0 b5 a0 b6 a0 b7

j=0 j=1 j=2 j=3 j=4 j=5 j=7 j=6

1 1 1

1

2 2 2

2 2 2

3 3

3 3

i=1

_

2

3_

2

In this architecture we also have an integration between

the different iterations that loop on i.

In our case we have 3 iterations of i which can be

executed at the same time.

19

CIOS in Systolic for s=8

Page 42: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

1 1 1

1

2 2 2

2 2 2

3 3

3 3

i=0

_

2

3_

2

Multiplication Step

Reduction Step

a0 b0 a0 b1 a0 b2 a0 b3 a0 b4 a0 b5 a0 b6 a0 b7

j=0 j=1 j=2 j=3 j=4 j=5 j=7 j=6

1 1 1

1

2 2 2

2 2 2

3 3

3 3

i=1

_

2

3_

2

1 1 1

1

2 2 2

2 2 2

3 3

3 3

i=7

_

2

3_

2

i=2

i=3

i=4

i=5

i=6

19

CIOS in Systolic for s=8

. . . . . . . . . . . .. . . . . . . . . . . .

. . . . . . . . . . . .

Page 43: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

1 1 1

1

2 2 2

2 2 2

3 3

3 3

i=0

_

2

3_

2

Multiplication Step

Reduction Step

a0 b0 a0 b1 a0 b2 a0 b3 a0 b4 a0 b5 a0 b6 a0 b7

j=0 j=1 j=2 j=3 j=4 j=5 j=7 j=6

1 1 1

1

2 2 2

2 2 2

3 3

3 3

i=1

_

2

3_

2

1 1 1

1

2 2 2

2 2 2

3 3

3 3

i=7

_

2

3_

2

a x b x R-1 mod p

i=2

i=3

i=4

i=5

i=6

19

CIOS in Systolic for s=8

. . . . . . . . . . . .. . . . . . . . . . . .

. . . . . . . . . . . .

Page 44: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

i=0

2

2

i=1

2

2

2

2

a x b x R-1 mod p

i=2

Multiplication Step

Reduction Step

2

2

i=3

2

2

i=4

2

2

i=5

2

2

i=6

2

2

i=7

20

CIOS in Systolic for s=8

Page 45: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

S

C C

S

ai bj

i=0

2

2

i=1

2

2

2

2

a x b x R-1 mod p

Multiplication Step

Reduction Step

2

2

i=3

2

2

i=4

2

2

i=5

2

2

i=6

2

2

i=7

20

CIOS in Systolic for s=8

i=2

Page 46: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

S

C C

S

C

C

ai bj

m pj

i=0

2

2

i=1

2

2

2

2

a x b x R-1 mod p

Multiplication Step

Reduction Step

2

2

i=3

2

2

i=4

2

2

i=5

2

2

i=6

2

2

i=7

20

CIOS in Systolic for s=8

i=2

S

S

S

Page 47: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

a0

a1

.

.

.

.

.

.

.

a7

b0 b1 b2 b3 b4 b5 b6 b7

1 1 1

1

2 2 2

2 2 2

3 3

3 3

_

2

3_

2

1 1 1

1

2 2 2

2 2 2

3 3

3 3

_

2

3_

2

B

i=0

i=1

A

p0 p1 p2 p3 p4 p5 p6 p7P

Data Flow

1 1 1

1

2 2 2

2 2 2

i=2

21

. . . . . . . . .. . . . . . . .

. . . . . . . .

Page 48: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

b0 b1 b2 b3 b4 b5 b6 b7

1 1 1

1

2 2 2

2 2 2

3 3

3 3

_

2

3_

2

1 1 1

1

2 2 2

2 2 2

3 3

3 3

_

2

3_

2

b0 b1 b2 b3 b4 b5 b6 b7

B

B1 B2 B3

i=0

i=1

p0 p1 p2 p3 p4 p5 p6 p7P

a0

a1

.

.

.

.

.

.

.

a7

A

1 1 1

1

2 2 2

2 2 2

Data Flow

i=2

21

. . . . . . . . .. . . . . . . .

. . . . . . . .

Page 49: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

b0 b1 b2 b3 b4 b5 b6 b7

1 1 1

1

2 2 2

2 2 2

3 3

3 3

_

2

3_

2

1 1 1

1

2 2 2

2 2 2

3 3

3 3

_

2

3_

2

b0 b1 b2 b3 b4 b5 b6 b7

B

B1 B2 B3

i=0

i=1

p0 p1 p2 p3 p4 p5 p6 p7P

p0 p1 p2 p3 p4 p5 p6 p7

P2 P3

a0

a1

.

.

.

.

.

.

.

a7

A

1 1 1

1

2 2 2

2 2 2

Data Flow

i=2

21

. . . . . . . . .. . . . . . . .

. . . . . . . .

Page 50: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

b0 b1 b2 b3 b4 b5 b6 b7

1 1 1

1

2 2 2

2 2 2

3 3

3 3

_f

3 _f

1 1 1

1

2 2 2

2 2 2

3 3

3 3

_f

3 _f

b0 b1 b2 b3 b4 b5 b6 b7

B

B1 B2 B3

i=0

i=1

p0 p1 p2 p3 p4 p5 p6 p7P

p0 p1 p2 p3 p4 p5 p6 p7

a0

a1

.

.

.

.

.

.

.

a7

A

1 1 1

1

2 2 2

2 2 2

Data Flow

i=2

21

. . . . . . . . .. . . . . . . .

. . . . . . . .

S

C

P2 P3

Page 51: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

b0 b1 b2 b3 b4 b5 b6 b7

1 1 1

1

2 2 2

2 2 2

3 3

3 3

_f

3 _f

1 1 1

1

2 2 2

2 2 2

3 3

3 3

_f

3 _f

b1 b2 b0 b3 b4 b5 b6 b7

B

B1 B2 B3

i=0

i=1

p0 p1 p2 p3 p4 p5 p6 p7P

p0 p1 p2 p3 p4 p5 p6 p7

P2 P3

a0

a1

.

.

.

.

.

.

.

a7

A

1 1 1

1

2 2 2

2 2 2

Data Flow

i=2

21

. . . . . . . . .. . . . . . . .

. . . . . . . .

S

C

SC

C

Page 52: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

b0 b1 b2 b3 b4 b5 b6 b7

1 1 1

1

2 2 2

2 2 2

3 3

3 3

_f

3 _f

1 1 1

1

2 2 2

2 2 2

3 3

3 3

_f

3 _f

b2 b0 b1 b3 b4 b5 b6 b7

B

B1 B2 B3

i=0

i=1

p0 p1 p2 p3 p4 p5 p6 p7P

p0 p1 p2 p3 p4 p5 p6 p7

P2 P3

a0

a1

.

.

.

.

.

.

.

a7

A

1 1 1

1

2 2 2

2 2 2

Data Flow

i=2

21

. . . . . . . . .. . . . . . . .

. . . . . . . .

S

C

S

C

S

C

C

S

C

Page 53: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

b0 b1 b2 b3 b4 b5 b6 b7

1 1 1

1

2 2 2

2 2 2

3 3

3 3

_f

3 _f

1 1 1

1

2 2 2

2 2 2

3 3

3 3

_f

3 _f

b0 b1 b2 b3 b4 b5 b6 b7

B

B1 B2 B3

i=0

i=1

p0 p1 p2 p3 p4 p5 p6 p7P

p0 p1 p2 p3 p4 p5 p6 p7

P2 P3

a0

a1

.

.

.

.

.

.

.

a7

A

1 1 1

1

2 2 2

2 2 2

Data Flow

i=2

21

. . . . . . . . .. . . . . . . .

. . . . . . . .

S

C

S

C

S

C

S

C

S

CC

S

C

Page 54: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

b0 b1 b2 b3 b4 b5 b6 b7

1 1 1

1

2 2 2

2 2 2

3 3

3 3

_f

3 _f

1 1 1

1

2 2 2

2 2 2

3 3

3 3

_f

3 _f

b1 b2 b0 b4 b5 b3 b6 b7

B

B1 B2 B3

i=0

i=1

p0 p1 p2 p3 p4 p5 p6 p7P

p0 p1 p3 p4 p2 p5 p6 p7

P2 P3

a0

a1

.

.

.

.

.

.

.

a7

A

1 1 1

1

2 2 2

2 2 2

Data Flow

i=2

21

. . . . . . . . .. . . . . . . .

. . . . . . . .

S

C

S

C

S

C

S

C

S

C

S

CC C

S

C

S

Page 55: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

b0 b1 b2 b3 b4 b5 b6 b7

1 1 1

1

2 2 2

2 2 2

3 3

3 3

_f

3 _f

1 1 1

1

2 2 2

2 2 2

3 3

3 3

_f

3 _f

b2 b0 b1 b5 b3 b4 b6 b7

B

B1 B2 B3

i=0

i=1

p0 p1 p2 p3 p4 p5 p6 p7P

p0 p1 p4 p2 p3 p5 p6 p7

P2 P3

a0

a1

.

.

.

.

.

.

.

a7

A

1 1 1

1

2 2 2

2 2 2

Data Flow

i=2

21

. . . . . . . . .. . . . . . . .

. . . . . . . .

S

C

S

C

S

C

S

C

S

C

S

C

S

CC C

S

C

S S

C

Page 56: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

b0 b1 b2 b3 b4 b5 b6 b7

1 1 1

1

2 2 2

2 2 2

3 3

3 3

_f

3 _f

1 1 1

1

2 2 2

2 2 2

3 3

3 3

_f

3 _f

b0 b1 b2 b3 b4 b5 b6 b7

B

B1 B2 B3

i=0

i=1

p0 p1 p2 p3 p4 p5 p6 p7P

p0 p1 p2 p3 p4 p5 p6 p7

P2 P3

a0

a1

a2

.

.

.

.

.

.

a7

A

1 1 1

1

2 2 2

2 2 2

i=2

Data Flow

21

. . . . . . . . .. . . . . . . .

. . . . . . . .

S

C

S

C

S

C

S

C

S

C

S

C

S

C

S

CC C

S

C

S S

C

S

C

Page 57: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

b0 b1 b2 b3 b4 b5 b6 b7

1 1 1

1

2 2 2

2 2 2

3 3

3 3

_f

3 _f

1 1 1

1

2 2 2

2 2 2

3 3

3 3

_f

3 _f

b1 b2 b0 b4 b5 b3 b7 b6

B

B1 B2 B3

i=0

i=1

p0 p1 p2 p3 p4 p5 p6 p7P

p0 p1 p3 p4 p2 p6 p7 p5

P2 P3

a0

a1

a2

.

.

.

.

.

.

a7

A

1 1 1

1

2 2 2

2 2 2

i=2

Data Flow

21

. . . . . . . . .. . . . . . . .

. . . . . . . .

S

C

S

C

S

C

S

C

S

C

S

C

S

C

S

C

S

CC C

S

C

S S

C

S

C

S

C

Page 58: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

b0 b1 b2 b3 b4 b5 b6 b7

1 1 1

1

2 2 2

2 2 2

3 3

3 3

_f

3 _f

1 1 1

1

2 2 2

2 2 2

3 3

3 3

_f

3 _f

b2 b0 b1 b5 b3 b4 b7 b6

B

B1 B2 B3

i=0

i=1

p0 p1 p2 p3 p4 p5 p6 p7P

p0 p1 p4 p2 p3 p7 p5 p6

P2 P3

a0

a1

a2

.

.

.

.

.

.

a7

A

1 1 1

1

2 2 2

2 2 2

i=2

Data Flow

21

. . . . . . . . .. . . . . . . .

. . . . . . . .

S

C

S

C

S

C

S

C

S

C

S

C

S

C

S

C

S,C

S

CC C

S

C

S S

C

S

C

S

C

S

C

Page 59: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

i=0

2

2

i=1

2

2

2

2

a x b x R-1 mod p

Multiplication Step

Reduction Step

2

2

i=3

2

2

i=4

2

2

i=5

2

2

i=6

2

2

During execution of this algorithm

there are always three iterations

of the loop 'i' which are executed

at the same time, which gives a

maximum of three alphas and

three gammas which are executed

in parallel.

i=7

22

CIOS in Systolic for s=8

i=2

Page 60: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

According to the blocks that are

repeated, we modeled our FSM

with 3 states, which allows us to

perform all the multiplication in

just 33 cycles.

(8+3)*3=33

i=0

2

2

i=1

2

2

2

2

a x b x R-1 mod p

i=2

Multiplication Step

Reduction Step

2

2

i=3

2

2

i=4

2

2

i=5

2

2

i=6

2

2

i=7

S0 S1 S2

CIOS in Systolic for s=8

S0 S1 S2

S0 S1 S2

S0 S1 S2

S0 S1 S2

S0 S1 S2

S0 S1 S2

S0 S1 S2 S0 S1 S2 S0 S1 S2 S0

22

Page 61: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

1 1 1

1

2 2 2

2 2 2

6 6

6 6

i=0

_

2

6_

2

a0 b0 a0 b1 a0 b2 a0 b3 a0 b4 a0 b5a0 b14 a0 b15

j=0 j=1 j=2 j=3 j=4 j=5 j=14 j=15

CIOS in Systolic for s=16

23

Page 62: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

CIOS in Systolic for s=16

23

1 1 1

1

2 2 2

2 2 2

6 6

6 6

i=0

_

2

6_

2

a0 b0 a0 b1 a0 b2 a0 b3 a0 b4 a0 b5a0 b14 a0 b15

j=0 j=1 j=2 j=3 j=4 j=5 j=14 j=15

i=2

i=3

i=15

1 1 1

1

2 2 2

2 2 2

6 6

6 6

_

2

6_

2

a x b x R-1 mod p

. . . . . . . . . . . .. . . . . . . . . . . .

. . . . . . . . . . . .. . . . . . . . . . . .

Page 63: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

CIOS in Systolic for s=16

b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 b11 b12 b13 b14 b15

b0 b1 b2

b3 b4 b5

B

B1

B2

B3

b6 b7 b8

b9 b10 b11

B4

b12 b13 b14

B5

b15

1

2

3

4

5

6

B6

24

Page 64: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

CIOS in Systolic for s=16

b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 b11 b12 b13 b14 b15

b0 b1 b2

b3 b4 b5

B

B1

B2

B3

p0 p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15P

p0

p1

p2 p3 p4

p5 p6 p7

P2

P3

b6 b7 b8

b9 b10 b11

B4

b12 b13 b14

B5

b15

1

2

3

4

5

6

p8 p9 p10

p11 p12 p13

P4

P5

p14 p15

P6

B6

P1

1

64

53

2

24

Page 65: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

alpha_2

gamma_

2

alpha

(1)

alpha

(2)

alpha

(3)

gamma

(1)

gamma

(2)

gamma

(3)

beta

i++

K=256, w=32, s=8

K=512, w=64, s=8

33 clock cycles

CIOS in Systolic for s=8

25

Page 66: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

K=256, w=16, s=16

alpha_f

gamma_f

alpha

(1)

alpha

(2)

alpha

(3)

gamma

(1)

gamma

(2)

gamma

(3)

beta

i++

alpha

(4)

alpha

(5)

alpha

(6)

gamma

(4)

gamma

(5)

gamma

(6)

K=512, w=32, s=16

66 clock cycles

Alpha_f

gamma_

f

alpha

(1)

alpha

(2)

alpha

(3)

gamma

(1)

gamma

(2)

gamma

(3)

beta

i++

K=256, w=32, s=8

K=512, w=64, s=8

33 clock cycles

CIOS in Systolic for s=8

25

Page 67: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

S=8 6 +3 cells 33 clock cycles

S=16 12 +3 cells 66 clock cycles

S=32 24 +3 cells 132 clock cycles

S=64 48 +3 cells 264 clock cycles

Comparison

26

Page 68: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

S=8 S=16 S=32

K=256 32 16 8

K=512 64 32 16

K=1024 128 64 32

Number of

cycles

33 66 132

The interest of each architecture depends on our needs

Security level

Resources

Speed

The method used

The interest of each architecture

27

Page 69: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

ArchitecturesDigital signal processing (DSP)

Modern FPGAs are equipped with hardware extensions for

arithmetic calculation.

28

Page 70: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

ArchitecturesDigital signal processing (DSP)

Modern FPGAs are equipped with hardware extensions for

arithmetic calculation.

Perform basic arithmetic calculations: multiplication, addition and

subtraction of unsigned integers.

28

Page 71: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

The arithmetic operations of each cell

are designed to use the maximum of the

DSPs.

29

a[i]

b[j]

C__In

REGLSB w bits

REGMSB w bits

C__Out

S__Out

S__In

+

+x

alpha

_2

_2

Internal architectures - cells

Page 72: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

p’

S__In

P[0]REG

C__Out

REG m

xx

+

beta

29

a[i]

b[j]

C__In

REGLSB w bits

REGMSB w bits

C__Out

S__Out

S__In

+

+x

alpha

S__In

Internal architectures - cells

Page 73: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

m]

p[j]

C_ _In

REGLSB w bits

REGMSB w bits

C_ _Out

S_ _Out

gamma

S_ _In

+

+x

30

Internal architectures - cells

_2

_2

Page 74: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

gamma_2

S1__2_In

C__2

REGw bits

REG S2__2_Out

S1__2_Out

S2__2_In

LSB w bits

MSB w bits

++

30

m]

p[j]

C_ _In

REGLSB w bits

REGMSB w bits

C_ _Out

S_ _Out

gamma

S_ _In

+

+x

Internal architectures - cells

Page 75: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

alpha_2C__2

REG

REG S2__2_Out

S1__2_OutS__2_In LSB w bits

MSB w bits

+

Internal architectures - cells

30

gamma_2

S1__2_In

C__2

REGw bits

REG S2__2_Out

S1__2_Out

S2__2_In

LSB w bits

MSB w bits

++

m]

p[j]

C_ _In

REGLSB w bits

REGMSB w bits

C_ _Out

S_ _Out

gamma

S_ _In

+

+x

Page 76: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

ROTATION

Mux

A (K bits)X

31

Internal architectures - Rotation

Page 77: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

ROTATION

Mux

A (K bits)X

ROTATION

Mux

B (3 w bits)X

ROTATION

Mux

B (3 w bits)X

ROTATION

Mux

B (2 w bits)X

31

Internal architectures - Rotation

Page 78: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

Internal architectures - Rotation

ROTATION

Mux

A (K bits)X

ROTATION

Mux

B (3 w bits)X

ROTATION

Mux

P (3 w bits)X

ROTATION

Mux

B (3 w bits)X

ROTATION

Mux

P (3 w bits)X

ROTATION

Mux

B (2 w bits)X

31

Page 79: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

PE

alpha

(1)

MUX

C_1_Out

zero

C_1_InMUX

S_1_In

S_2_Out S_1_Out

S_1_Out

sig_state

A- alpha1

Architectures

32

Page 80: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

PE

alpha

(1)

MUX

C_1_Out

zero

C_1_InMUX

S_1_In

S_2_Out S_1_Out

S_1_Out

PE

alpha

(2)

MUX

C_2_Out

C_2_In

MUXS_2_In

S_3_Out S_2_Out

S_2_Out

C_1_Out

sig_state sig_state

A- alpha1B- alpha2

Architectures

32

Page 81: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

PE

alpha

(3)

MUX

C_3_Out

C_3_InMUX

S_3_In

S_3_Out

S_3_Out

C_2_OutS1__2_Out

sig_state

C- alpha3

PE

alpha

(1)

MUX

C_1_Out

zero

C_1_InMUX

S_1_In

S_2_Out S_1_Out

S_1_Out

PE

alpha

(2)

MUX

C_2_Out

C_2_In

MUXS_2_In

S_3_Out S_2_Out

S_2_Out

C_1_Out

sig_state sig_state

A- alpha1B- alpha2

Architectures

32

Page 82: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

PE

gamma

(1)

C_ 1_Out

C_ 1_InS_ 1_In

S_ 1_Out

D- gamma1

m

p[0]

Architectures

33

Page 83: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

PE

gamma

(2)

MUX

C_ 2_Out

C_ 2_InMUX

S_ 2_In

S_

2_Out

S_ 2_Out

C_ 1_OutS_ 1_Out

sig_state

E- gamma2

m

p[j]

PE

gamma

(1)

C_ 1_Out

C_ 1_InS_ 1_In

S_ 1_Out

D- gamma1

m

p[0]

Architectures

33

Page 84: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

PE

gamma

(3)

MUX

C_ 3_Out

C_ 3_InMUX

S_ 3_In

S_

3_Out

S_ 3_Out

C_ 2_OutS_ 2_Out

sig_state

F- gamma3

m

p[j]

PE

gamma

(2)

MUX

C_ 2_Out

C_ 2_InMUX

S_ 2_In

S_

2_Out

S_ 2_Out

C_ 1_OutS_ 1_Out

sig_state

E- gamma2

m

p[j]

PE

gamma

(1)

C_ 1_Out

C_ 1_InS_ 1_In

S_ 1_Out

D- gamma1

m

p[0]

Architectures

33

Page 85: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

PE

alpha_2

PE

gamma_2

S1__2_Out S2__2_Out S1_ _2_Out S2_ _2_Out

C_ _2

PE

beta

m C_ _Out

S_ _In

G- alpha_2H- gamma_2

I- beta

p’P[0]

S1__2_In S2__2_In C__2 S__2_In

Architectures

34

Page 86: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

Plan

35

1. Introduction

2. Montgomery Multiplication (CIOS)

3. Architecture

4. Results

5. Conclusion and Perspectives

Page 87: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

Nexys 4 DSP Frequency (MHz) Cycles

MMM(s=8/K=256) 31 105.275 33

Alpha 4 291.023 1

Gamma 4 291.023 1

Beta 4 388.350 1

Alpha_2 1 459.918 1

Gamma_2 2 442.811 1

Results

36

Page 88: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

Nexys 4 DSP LUTs Reg Occupied

slice

Frequency Cycles

MMM

S=8/k=256

31 809 870 352 105.275 33

MMM

S=16/k=256

33 846 1123 402 145.892 66

MMM

S=8/k=512

87 2650 1614 878 64.825 33

MMM

S=16/k=512

57 1789 2164 798 105.594 66

Results

37

Page 89: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

Plan

38

1. Introduction

2. Montgomery Multiplication (CIOS)

3. Architecture

4. Results

5. Conclusion and Perspectives

Page 90: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

We have implemented the Montgomery multiplication with a

systolic architecture in a number of fixed clock cycles.

We made our design in order to use the maximum of the DSPs on

FPGA card

Conclusion

conclusion and perspectives

39

We implemented two architectures(s=8 and s=16)

We used this two design to implement the scalar multiplication for

the security level of 128-bits.

Page 91: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture

Perspective

40

Perform a Mixed Implementation Soft / hard (co-design) for the

Optimal-Ate pairing on the BN curves in Jacobian coordinates

using this multiplication algorithm.

Finalize the hardware implementation of the designs

s= 32.

s= 64.

Page 92: Montgomery Algorithm for Modular Multiplication with ...math-sa-sara0050/space16/... · A systolic architecture provides very simplified elementary cells. Therefore, this architecture