Development of Side Channel Attacking Tools on Embedded Components

Limoges University

Faculty of Sciences

2012/2013

Master CRYPTIS

Development

of Side Channel Attacking Tools

on Embedded Components

Fadi OBEID

The internship took place between 04 March and 30 August 2013 at

Supervised by

Vincent Verneuil

Public Document

Acknowledgments

I want to thank my supervisor Vincent Verneuil, with whom work was interesting and fun,and from whom I’ve learned a lot about the domain, and what it’s like to work in a company.

I also want to thank Mylene Roussellet who was always present when I had questions andwho took Vincent’s place for the last 6 weeks of the internship.

In addition I thank all my colleagues in Inside Secure who were always smiley and helpful,my colleagues in Limoges University, and my tutors at Limoges University especially professorChristophe Clavier.

i

Introduction

Years ago, in France, an unauthorised party spied on an automotive industrial company bystealing copies of their electric bills. When the expenditure (thus the power consumption)was high, the spying party realises that the company is preparing for a new car release, aninformation that can be valuable to the company’s nemesis and the stock market traders.

In an unrelated story, agents spied on the Pentagone by snooping their pizza deliveryorder. If the number of pizzas delivered were high at night, it means that most of the workersstayed up all the night which draws the conclusion of crisis possibility. The Pentagone dealtwith this by buying the same number of pizzas every night and throwing them away whennot in need.

The two stories above seem unrelated, but they are actually combined under one tacticwhich is our subject. Yes, the above stories are considered as side channel attacks.

This type of attack is based on information leaked from a system. In cryptography it’s theinformation we can gather from physical leakage such as power consumption, time, sound. . .

One of the first papers concerning the revolutionary attack’s threats to cryptography wasthe timing attacks presented by P. Kocher in 1995 [Koc96]

Lately, high secured algorithms such as the Advanced Encryption Standard became atarget to different types of attacks based on side-channel information.

According to [CWWZ10], this type of attacks are also a threat to web applications becauseof the rise of Web 2.0 and software-as-a-service. A threat that forces developers to look formore than encrypted transmissions because the attack is based on the sent packages size andnot on the encryption.

More attacks can be viewed under the same flag, in fact, all attacks that use leakedinformation to threaten the information security can be seen as a side channel attack.

This document is the fruit of 6 months intership during which researches and implemen-tations of some of the side channel attacks on embedded components were realised.

The objective was to try some of the attacks and evaluate them depending on the numberof needed samples to be able to compare them based on large statistic.

The document starts with a presentation of the company Inside Secure, then we have3 parts. In the first part we find some important definitions, then we review some of theattacks, and we finish with speaking about the most known/used countermeasures. In thesecond part we give details about the AES algorithm and about the tested attacks and someof the special options and improvements giving statistical results on the fly. In the third andlast part of the document, we explain the working process to realize a full attack, and weexplain how we can evaluate an attack. In the last part we have a small conclusion built onthe work and the obtained statistics.

iii

Inside Secure

The company

In 1995, five former executives of ST Microelectronics and Gemplus decided to start InsideSecure in Aix en Provence - France. The company was to focus exclusively on contactlesstechnology, including smart cards. Inside Secure is now the only provider in the worlddedicated exclusively to contactless technology.

In 2010, the company Inside Contactless acquired Atmel SMS division (Secure Micro-controller Solutions) and became the Inside Secure group. In 2012 ESS was also acquired toreinforce the company’s position in product security. The company currently employs morethan 400 workers. Its headquarters is located in France but it has other offices in Europe,Asia and North America.

Inside Secure is controlled by a board composed of various investors, and directed byRemy Tonnac. This company is in the heart of the smart card market.

Inside focuses its activities on the design of high performance products to secure reliableand fast bank payment transactions, access control, identification, etc.. The hardware designincludes the design of printed circuit boards and development board.

We find Inside designs in all kinds of products compatible with the contactless technologysuch as smart cards, mobile phones, computer devices and smart card readers.

Numbers and references

• More than 200 million contactless sold chips

• About 660 patents since its inception in 1995

• 14 different sites in the world

• 463 employees

Inside was the first company in the world to:

• Focus exclusively on contactless technologies

• Provide a comprehensive and secure solution for contactless products

v

Development Site

It’s in the site of Aix en Provence, in the Bouches du Rhone where the products are developedand tested. In August 2013, the workers of this site joined their colleagues from Rousset -France in a new site located in Meyreuil - France.

The site of Aix en Provence has seven sections:

• Management

• R & D (hardware and software)

• Security

• Operations

• The human resources

• Quality

• Customers technical support

• Business lines

The Security Service

Product security testers, and product security developers work together to assure that theproducts are certified and secured. Their most important goal is to achieve secured product,and always update them to current security needs and challenges. It is in this service thatmy internship took place.

vi

Table of Contents

Acknowledgments i

Introduction iii

Inside Secure v

Table of Contents vii

Acronyms xi

Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

Other Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

List of Algorithms xiii

List of Figures xv

I General View 1

1 Definitions and Technical Information 3

1.1 Side-Channel Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Secret Internal State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Side-Channel Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4 Hamming Weight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.5 Hamming Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Classic Attacks and Preventions 5

2.1 Timing Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Power Analysis Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2.1 Simple Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2.2 Differential Power Analysis . . . . . . . . . . . . . . . . . . . . . . . 7

2.2.3 High-Order Differential Power Analysis . . . . . . . . . . . . . . . . 8

2.2.4 Correlation Power Analysis . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.5 Templates Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 Fault Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3.1 Differential Fault Analysis . . . . . . . . . . . . . . . . . . . . . . . . 10

vii

2.3.2 Non-Differential Fault Analysis . . . . . . . . . . . . . . . . . . . . . 10

2.3.3 Register Fault Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Classic Countermeasures 13

3.1 Novice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1.1 Adding Delays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1.2 Power Consumption Balancing . . . . . . . . . . . . . . . . . . . . . 14

3.1.3 Shielding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2 Advanced . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2.1 Licensing Modified Algorithms . . . . . . . . . . . . . . . . . . . . . 14

3.2.2 Data Independent . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2.3 Operation Independent . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.2.4 Blinding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.2.5 Running the Encryption Twice . . . . . . . . . . . . . . . . . . . . . 15

3.2.6 Modifying the Algorithms Design . . . . . . . . . . . . . . . . . . . . 16

II Attacks on AES 17

4 Advanced Encryption Standard 19

4.1 General Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.1.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.1.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.1.3 Possible Encryptions . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.1.4 Input Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.2 Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.2.1 SubBytes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.2.2 ShiftRows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.2.3 MixColumns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.2.4 AddRoundKey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.3 Decryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.3.1 InvSubBytes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.3.2 InvShiftRows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.3.3 InvMixColumns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.4 AES key schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.4.1 Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.4.2 Rcon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.4.3 Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.4.4 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.5 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.5.1 Why . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.5.2 First Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.5.3 Last Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5 Differential Power Analysis 27

5.1 General Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.3 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

viii

5.4 Different Ways With Different Results . . . . . . . . . . . . . . . . . . . . . 28

5.4.1 Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5.4.2 Normal With Middle . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5.4.3 Normal Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.4.4 PowerE Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5.4.5 PowerL Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5.4.6 HDRefinery Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.5 Totally Different DPA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.6 Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6 Correlation Power Analysis 37

6.1 General Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6.2 From DPA to CPA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6.3 Different Correlation Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 38

6.3.1 Using Pearson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6.3.2 Using Spearman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

6.4 Methods Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

6.4.1 Correct Parts Number . . . . . . . . . . . . . . . . . . . . . . . . . . 41

6.4.2 Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

7 Template Attacks 43

7.1 General Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

7.2 Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

7.3 Attacking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

7.3.1 Probability of Correspondence . . . . . . . . . . . . . . . . . . . . . 44

7.3.2 Key-Part Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

7.3.2.1 Single Trace . . . . . . . . . . . . . . . . . . . . . . . . . . 44

7.3.2.2 Multiple Traces . . . . . . . . . . . . . . . . . . . . . . . . 45

7.3.3 Key Finding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

7.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

8 Improvements 47

8.1 Correct Traces Shifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

8.2 Traces Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

8.3 Filter Highest Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

8.4 Filter Points of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

III How to Process 55

9 Tracing 57

9.1 SASEBO-GII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

9.1.1 About SASEBO-GII . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

9.1.2 Needed Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

9.1.3 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

9.1.3.1 SASEBO-GII AES Checker Version 0 . . . . . . . . . . . . 59



ix

9.1.3.4 SASEBO-GII AES Checker Version 3 . . . . . . . . . . . . 619.1.4 SASEBO-GII Configuration . . . . . . . . . . . . . . . . . . . . . . . 62

9.1.4.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . 629.1.4.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

9.2 Obtaining Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639.2.1 Oscilloscope Configuration . . . . . . . . . . . . . . . . . . . . . . . . 63

9.2.1.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639.2.1.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

9.2.2 Electricity Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 649.2.3 Bad Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659.2.4 Perfect Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

9.3 Graph Viewer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

10 Attacking 6710.1 Transforming Obtained Traces . . . . . . . . . . . . . . . . . . . . . . . . . 6710.2 Choosing an Attacking Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 6810.3 Exploiting The Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6810.4 KATA Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

11 Evaluations 7111.1 Basic Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

11.1.1 Number of Correct . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7111.1.2 Partial Positioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7211.1.3 Not Random . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

11.2 Creating Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7311.3 Advanced Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

11.3.1 Confidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7411.4 Real Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

11.4.1 Land Owner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7611.4.2 Estimating Brute Force . . . . . . . . . . . . . . . . . . . . . . . . . 7611.4.3 Computing Brute Force . . . . . . . . . . . . . . . . . . . . . . . . . 78

Conclusion 79

Bibliography 81

x

Acronyms

Algorithms

AES Advanced Encryption Standard [FIP01]

CRT Chinese Remainder Theorem [DPS96]

DES Data Encryption Standard [FIP99]

DH Diffie-Hellman [DH76]

DSS Digital Signature Standard [oST00]

FSIS Fiat-Shamir Identification Scheme [Kno88]

RSA Algorithm of Rivest-Shamir-Adleman [RSA78]

SHA Secure Hash Algorithms [Sta94]

TDES Triple-DES [FIP99]

Attacks

CPA Correlation Power Analysis 2.2.4

DFA Differential Fault Analysis 2.3.1

DPA Differential Power Analysis 2.2.2

FA Fault Attacks 2.3

HO-DPA High-Order-DPA 2.2.3

NDFA Non-Differential Fault Analysis 2.3.2

PAA Power Analysis Attacks 2.2

RFA Register Fault Attacks 2.3.3

SCA Side-Channel Attacks 1.3

SPA Simple Power Analysis 2.2.1

TA Timing Attacks 2.1

TMPA Template Attacks 2.2.5

xi

Other Acronyms

HD Hamming Distance

HW Hamming Weight

PPMCC Pearson Product-Moment Correlation Coefficient

SCI Side-Channel Information

SIS Secret Internal State(s)

SR Spearman Rank

xii

List of Algorithms

4.4.1 AES Key Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255.4.1 DPA on AES (Normal Method) . . . . . . . . . . . . . . . . . . . . . . . . . 295.4.2 DPA on AES (Multiplication Method) . . . . . . . . . . . . . . . . . . . . . 316.3.1 CPA using Pearson’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386.3.2 CPA using Spearman’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407.2.1 Templates Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437.3.1 Templates Probability of Correspondence . . . . . . . . . . . . . . . . . . . . 447.3.2 Templates Probability of Trace ’t’ . . . . . . . . . . . . . . . . . . . . . . . . 457.3.3 Templates Probability of Group of Traces ’T’ . . . . . . . . . . . . . . . . . 45

xiii

List of Figures

4.2.1 AES Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.2.2 AES Substitution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2.3 AES S-box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2.4 AES Shifting Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2.5 AES Column Mixing Function . . . . . . . . . . . . . . . . . . . . . . . . . 224.2.6 AES Add Round Key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.3.1 AES Decryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.3.2 AES Inverse Substitution Box . . . . . . . . . . . . . . . . . . . . . . . . . 234.4.1 AES Rcon Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.4.1 DPA Efficiency (correct key-bytes), original function . . . . . . . . . . . . . 305.4.2 DPA Efficiency (correct key-bytes), comparing ’Middle’ . . . . . . . . . . . 305.4.3 DPA Efficiency (correct key-bytes), comparing ’Multiply’ . . . . . . . . . . 325.4.4 DPA Efficiency (correct key-bytes), comparing ’PowerE’ . . . . . . . . . . . 335.4.5 DPA Efficiency (correct key-bytes), comparing ’PowerL’ . . . . . . . . . . . 345.6.1 DPA Timing of one key-byte Attack . . . . . . . . . . . . . . . . . . . . . . 36

6.4.1 CPA Efficiency (correct key-bytes) . . . . . . . . . . . . . . . . . . . . . . . 416.4.2 CPA Timing of one key-byte Attack . . . . . . . . . . . . . . . . . . . . . . 42

8.1.1 DPA Normal Shifting Comparison . . . . . . . . . . . . . . . . . . . . . . . 488.1.2 DPA PowerL Multiplier Shifting Comparison . . . . . . . . . . . . . . . . . 488.1.3 CPA Shifting Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498.2.1 DPA Normal Traces Filter Comparison . . . . . . . . . . . . . . . . . . . . 498.2.2 DPA PowerL Multiplier Traces Filter Comparison . . . . . . . . . . . . . . 508.2.3 CPA Traces Filter Comparison . . . . . . . . . . . . . . . . . . . . . . . . . 508.4.1 DPA Normal Points Filter Comparison . . . . . . . . . . . . . . . . . . . . 528.4.2 DPA PowerL Multiplier Points Filter Comparison . . . . . . . . . . . . . . 528.4.3 CPA Points Filter Comparison . . . . . . . . . . . . . . . . . . . . . . . . . 538.4.4 Points Filter (PF) Timing Comparison . . . . . . . . . . . . . . . . . . . . 53

xv

9.1.1 SASEBO-GII Computer Connection . . . . . . . . . . . . . . . . . . . . . . 589.1.2 SASEBO-GII AES Checker V0 . . . . . . . . . . . . . . . . . . . . . . . . . 599.1.3 SASEBO-GII AES Checker V1 . . . . . . . . . . . . . . . . . . . . . . . . . 609.1.4 SASEBO-GII AES Checker V2 . . . . . . . . . . . . . . . . . . . . . . . . . 619.1.5 SASEBO-GII AES Checker V3 . . . . . . . . . . . . . . . . . . . . . . . . . 619.1.6 SASEBO-GII Configuration Plan . . . . . . . . . . . . . . . . . . . . . . . 629.1.7 SASEBO-GII Configuration Steps . . . . . . . . . . . . . . . . . . . . . . . 629.2.1 SASEBO-GII and Oscilloscope connection plan . . . . . . . . . . . . . . . . 649.2.2 Bad Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659.2.3 Good Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659.2.4 Perfect Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669.3.1 Full AES-BAD-Trace using Graph Viewer . . . . . . . . . . . . . . . . . . . 66

11.1.1Rating using Number of Correct on different attacking methods . . . . . . 7211.1.2Rating using Partial Positioning on different attacking methods . . . . . . 7211.1.3Rating using Not Random on different attacking methods . . . . . . . . . . 7311.3.1Rating using Confidence on different attacking methods . . . . . . . . . . . 7511.4.1Rating using land owner evaluation on different attacking methods . . . . . 77

xvi

Part I

General View

1

Chapter 1

Definitions and Technical Information

We present some important definitions and technical information.

Note that in this chapter, and also in chapter II and III, each time a new algorithm isintroduced, a reference is given to a publication explaining the cited algorithm.

1.1 Side-Channel Information

Side-Channel Information (SCI) are information that can be retrieved from a leaking devicedue to the imperfection of the code. The retrieved information such as power consumption,time, temperature are directly related to the executed algorithm or the processed data orboth. This means that monitoring SCI gives us information that is supposed to be a secret.

1.2 Secret Internal State

The Secret Internal State (SIS) of a cryptographic algorithm is a state between the beginningof the algorithm’s execution and the termination of the execution. It’s called secret becauseit’s supposed to remain unknown to outsiders.

Example 1.2.1. SIS in DESIn the Data Encryption Standard (DES [FIP99]), the state after each round from 1

to 15 is a SIS, but the state after the 16th isn’t a SIS.

1.3 Side-Channel Attacks

Side-Channel Attacks (SCA) are attacks based on SCI.

During these attacks all information such as operations time, radiation, consumption,voltage are used.

These attacks obtain information about the SIS and the operations used in the transitionof the SIS.

3

4 CHAPTER 1. DEFINITIONS AND TECHNICAL INFORMATION

The SCA can be mounted quickly and can sometimes be implemented using ReadilyAvailable Hardware, the amount of time required for the attack and the analysis depends onthe type of the attack.

1.4 Hamming Weight

The Hamming Weight (HW) of any data is the number of digits in its binary representationthat are different from 0.

Example 1.4.1. HW

HW (01000110) = 3

HW (01001111) = 5

HW (11111111) = 8

HW (00000000) = 0

1.5 Hamming Distance

The Hamming Distance (HD) between two data values is the number of positions in theirbinary representation at which the corresponding digits are different.

In other words, we can say it’s the minimum number of bitwise substitutions required tochange one string into the other.

Chapter 2

Classic Attacks and Preventions

It’s important to mention that not all physical attacks (PA) are considered as SCA. SinceSCA are non invasive, any active attack that provokes a change in the normal executionpath (e.g. inject a fault) is not considered as a SCA. Still, we’re going to present some ofthese attacks side by side with the SCA because understanding them might help to a betterunderstanding of the SCA.

During the researches, some thesis were very useful to build a general idea about SCA andit’s entourage. [Ver12] speaks about elliptic curves implementation on embedded devices andgives a great part of the thesis to the SCA. [VEN] explains the physical attacks, and [Mes00]is goes in the details of power analysis attacks and countermeasures.

2.1 Timing Attacks [Koc96]

Knowing how much time an operation takes might lead to knowing the algorithm and/or thekey. The best way to think about Timing Attacks (TA) is to think about a person reading abook. We can tell how many pages he reads by timing his readings (He can read 10 pages perhour, if he spends 3 hours reading it means that he read around 30 pages). This might seemas a none exact measurement, but with machines, it can be a lot more exact since machinesare more stable in their work than human.

TA work on both symmetric and asymmetric functions and they can exploit vulnerablesystems to find the entire secret key. TA implementations that can exploit vulnerable systemsand find the entire secret key do exist.

Their most common targets are algorithms that use modular reduction because it causesmost of the timing variation. Many algorithms are vulnerable to TA, we mention some ofthem: Diffie-Hellman (DH [DH76]), the Rivest-Shamir-Adleman algorithm (RSA [RSA78]),Chinese Remainder Theorem (CRT [DPS96]) application on RSA (RSA/CRT), and DigitalSignature Standard (DSS [oST00]).

Attack 2.1.1. TA on RSA/CRTWhen using the CRT to optimize the RSA private key operations, we are dividing

5

6 CHAPTER 2. CLASSIC ATTACKS AND PREVENTIONS

the exponentiation modulo n step into two steps which are an exponentiation modulo pand an exponentiation modulo q. So how can an attacker make a TA?

The method is very simple, the attacker chooses a random value and tries decryptingit. If the handling time is only the time of one comparison, it means that the chosenvalue is smaller than p. However, if the handling time is equivalent to one comparisonand at least one subtraction then the attacker knows the chosen value is bigger than p.

The attacker varies the chosen value approaching p till finding it.Knowing p is all that an attacker needs to break an RSA system since he calculates

q = n/p, and now he knows all the variables needed to create the public key and mostimportantly the private key.

By eliminating the reduction steps, the Montgomery Multiplication [Mon85] reduces thesize of timing characteristics which makes it a good way to prevent the TA (or at least makedoing it harder).

2.2 Power Analysis Attacks [MOP07]

Power Analysis Attacks (PAA) are considered as a great threat to smart cards since a lot ofimplementations with low cost and work were recently used. There are many types of PAA,we’ll explain the most known ones.

2.2.1 Simple Power Analysis [KJJ,KJJ99]

The idea of Simple Power Analysis (SPA) is to make a visual representation of the powerconsumption to yield information about a device’s operation(s) as well as the key.

For the SPA to work, the target system must have an execution path that is dependent tothe data being processed and/or to the operations being executed. Systems like RSA (reason:differences between multiplication and squaring) and DES (reason: differences between shifts(key schedule) and permutations) are the first that come to mind. Other systems that havecomparisons and/or multiplications and/or exponentiations are too targets for SPA.

SPA can reveal the sequence of instructions executed, which makes it useful to breakcryptographic implementations in which the execution path depends on the data being pro-cessed.

Attack 2.2.1. SPA on DESKocher, Jaffe, and Jun show in [KJJ99] the resulting graphs of SPA on DES and

explain the results, we’ll briefly explain their work.After analyzing the traces, we can easily know if the shifted bit of the 28-bits rotation

during the sub-keys generation is 1 or 0, since each will contain different SPA features.This is true because there’s a difference in the power consumption of each path taken.We can also do the same thing during the permutations if conditional branching is used.

Large power (and sometimes timing) characteristics can be easily noticed during compar-isons. Also the multiplications leak a lot of information since they depend on the operandvalues and their HW while exponentiations use multiple squarings and multiplications whichmeans that any difference between squarings and multiplications might compromise the al-gorithm.

2.2. POWER ANALYSIS ATTACKS 7

By avoiding procedures that use secret intermediates or keys for conditional branchingoperations we can prevent the SPA, but this isn’t perfect since it incurs a serious performancepenalty and requires creative coding.

2.2.2 Differential Power Analysis [KJJ99,MDS99a]

Differential Power Analysis (DPA) is a more sophisticated type of attack than SPA. Usingthis type of attacks, the attacker doesn’t just visualize the power consumption but also doesa statistical analysis and error-correction methods on multiple traces unlike the SPA whichuse statistical process on single traces only. For the DPA to work, noise filtering methodsneed to be used. A DPA takes a lot more time and extensive work comparing it to a SPA,but it’s harder to prevent, and it can be automated.

Because of the relatively high computational complexity of multiplications, asymmetricsystems have more signal leakage than symmetric ones which makes them an easier targetto the DPA. But in theory, DPA can be automated and can break almost any symmetricor asymmetric algorithm, also it can be used to reverse-engineer unknown algorithms andprotocols (Note: This reverse-engineer process can be partially automated).

To explain how DPA works, we give an example of how DPA collects and analyzes powerconsumption during the work of the Advanced Encryption Standard (AES [FIP01]) algo-rithm.

Attack 2.2.2. DPA on AESWe have a device that uses AES to encrypt the message m using the key k, the result

is c.Each time the encryption is taking place, ϕi is the power consumption of the device.

ϕi is of the form ϕi = aL(k) + b where i is the encryption number, b is the noise, L(k)is an unknown equation depending on the key k, and a is an additional coefficient.

We start by using the actual unknown key to encrypt N messages, during whichwe collect traces of the power consumption. Now that we have the power consumptiontraces ϕi for i ∈ 1, ..., N , we begin the attack by attacking each byte of the key separatelyso we present the following algorithm:

Declare S0 and S1 2 groups of traces

Declare G[256] 256 graphical results

Declare result[16] The resulting 16-key-bytes

For k in (1...16) : All bytes in key

For j in (0...255): All byte possibilities

For i in (1...N): All tried messages

d = HW (S(mi,k ⊕ j)) Hamming Weight

if d ≤ 3: Less than 4

Add ϕi to S0 Add to first group

elseif d ≥ 5: Greater than 4

Add ϕi to S1 Add to second group

G[j] = average(S1) − average(S0) DPA graph


Empty S0 and S1 Emptying the groups

result[k] = good(j/G) Conserving the key-part

Note that good(j/G) means the binary representation of j where G[j] is the highestabsolute point (let G = [[1,−3, 5], [2, 1,−6], [1, 4,−2]], the right key-byte is 01 which isthe representation of j = 1 because | − 6| is the highest absolute value).

When this is done, we would have 16 key-bytes that together form the actual sub-keyof the first round which is the original key.

Attack 2.2.3. DPA on DES, AES and exponentiation systemsPaul Kocher, Joshua Jaffe, and Benjamin Jun give a detailed example of DPA on

DES, explaining their techniques and showing detailed graphs [KJJ99].In addition to the general information about different PAA such as SPA and DPA,

the paper [FB13b] explains with details the DPA on AES with graphs and algorithmsbased on 3 interesting references [OGOP04,KJJ99,BCO04].

DPA can also be used on exponentiation systems [Sma00,MDS99b].

Preventing DPA (or making it harder) can be done by various methods, and the best wayis to combine these methods together.

We give 3 different categories of countermeasures:

Reducing signal sizes which can be done by different ways such as choosing operationsthat leak less information in their power consumption, balancing HW and state transitions,and finally by physically shielding the device.

The difficulty here is that we can’t reduce the signal to zero because an attacker withan infinite number of samples would still be able to perform DPA on the heavily-degradedsignal.

In practice aggressive shielding can make attacks infeasible but it adds a lot to the device’scost and size.

Introducing noise into power consumption measurements by producing noise intopower consumption (e.g. adding random calculations), and/or randomizing the time and theorder of the execution.

It’s hard to find a way to do so without adding too much time and power needs, whichmakes it an imperfect countermeasure.

In real life, this doesn’t prevent the DPA, but it forces the attacker to need more samples,and if sampling is unfeasible the attack is considered impossible.

Designing cryptographic systems with realistic assumptionsabout the underlying hardware which is not part of our study but can be reviewed in alot of studies such as [CCD00].

More about the DPA in chapter 5.

2.2.3 High-Order Differential Power Analysis [KJJ]

While DPA works only on operations, High-Order Differential Power Analysis (HO-DPA)works on operations and sub-operations. In addition to that, HO-DPA doesn’t analyze asingle event between samples but it may be used to correlate information between multipleevents.

2.3. FAULT ATTACKS 9

2.2.4 Correlation Power Analysis [BCO04]

Correlation Power Analysis (CPA) is based on the same leaked information that the DPAneeds, though in CPA less samples are required to obtain the same result and while DPA usesthe HW or the HD the CPA only works on the HD. Still, any defensive approach designedagainst DPA is equally effective against CPA.

CPA works on the linear correlation between the traces and the HD of the correspondingword transition (e.g. HD of a part-message before entering a S-box and afterwards). Twocorrelation methods can be and were used to calculate such correlation, the Pearson Product-Moment Correlation Coefficient (PPMCC) and the Spearman Rank(SR) (SR is a Pearsoncorrelation coefficient between the ranked variables).

PPMCC ρX,Y =cov(X,Y )

σXσX

SR ρ =

∑i(xi − x)(yi − y)√∑

i(xi − x)2∑

i(yi − y)2

More about the CPA in chapter 6.

2.2.5 Templates Attacks [HTM09,APSQ06]

From the name Templates Attacks (TMPA) we can tell that the attack uses templates. Thismeans that in TMPA a preparation step is needed before the actual attack takes place. Inthis step the used traces are obtained by encryptions using different known keys. This stepis done only once per device, afterwards these templates can be used to find any key used inan exact similar device.

In the attacking phase of TMPA, the templates are used on one (or more) real trace.Here, real trace means a trace obtained by the use of the device with an unknown key wherethe objective is to find this key. Note that TMPA can be done without knowing neither theoriginal message nor the cipher, which means that blinding countermeasures are useless.

More about the TMPA in chapter 7.

2.3 Fault Attacks [BS97,BDL97]

It’s important to remind readers that Fault Attacks (FA) are not considered as SCA, theyare active and not passive attacks, they are invasive and can often be detected.

Before getting into the details of the different FA, we should know more about the differenttypes of faults, we can define faults by their effect or by their producer:

Effect:

Permanent Faults are usually detectable since when they happen the device wouldnever go back to normal.

Transient Faults are undetectable faults since the hardware doesn’t have a clue thatthe fault had occurred. These faults may cause a Certification Authority system to generatefaulty certificates which might allow the client to generate fake certificates.

Producer:


Latent Faults are hardware or software bugs that are difficult to catch like the In-tel floating point division bug ( [WIK13]). This fault may cause a CA faulty certificatesgeneration.

Introduced Faults are faults caused by an attacker who has physical access to thehosting device. Attacks on a tamper-resistant device deliberately cause it to malfunctiongiving the attacker the chance to take some advantages like extracting secrets (keys, etc.).More about FA on tamper-resistant devices in Anderson and Kuhn’s paper [AK96].

Now we have seen some fault types, the following will detail the different attacks.

2.3.1 Differential Fault Analysis

The Differential Fault Analysis (DFA) can be used on both natural and generated (changingthe voltage, tampering with the clock, applying radiation of various types) faults.

Attack 2.3.1. DFA on DES, and TDESAn illustrated implementation studying the attack, it’s effectiveness, and it’s depen-

dency to where that fault happens of the DFA on DES can be found in [FB13a].The same attack can be successfully launched on the Triple-DES (TDES [FIP99])

In addition, DFA can be combined with Differential Key Attack and Differential RelatedKey Cryptanalysis.

2.3.2 Non-Differential Fault Analysis

While normal FA are active, Non-Differential Fault Analysis (NDFA) is not just active butalso permanent, which means that after the attack, the device can never recover its normalstate.

It’s important to mention that NDFA doesn’t need a correct cipher-text and that itspurpose is to extract symmetric keys.

2.3.3 Register Fault Attacks

Register Fault Attacks (RFA) are the general type of attacks where the transient fault doesn’thappen during the calculation, but it happens in the memory where temporary values arestored.

With low probability, one (or a few) of the bits of the value stored in some registermight flip, we need the fault to occur in low probability so that the fault occurs exactly oncethroughout the computation.

Attack 2.3.2. RFA on FSISConsider the Fiat-Shamir Identification Scheme (FSIS [Kno88]):N : An n-bits modulus.t: The predetermined security parameter of the protocols.Given t erroneous executions of the protocol, one can recover the secret S1, ..., St in

the time it takes to perform O(nt+ t2) modular multiplications.This is how we can do this:Suppose, due to a miraculous fault, while the device is waiting for Bob to send it the

set S, that one of the bits in the register holding the value r is flipped. In this case, Bob

2.3. FAULT ATTACKS 11

receives the correct value r2 (mod N), but y is computed incorrectly by the device

y = (r + E).∏j∈S

si

where E is the value added to the register as a result of the fault.Since the case is a single bit flip:

E = ±2j

where j ∈ {0, ..., n− 1}.Bob knows the value

∏j∈S vj so he can compute

(r + E)2 =y2

(∏j∈S sj)

2 (mod N) =y2∏j∈S vj

(mod N)

Since there are only n possible values for E, it’s possible to try them all. When E iscorrectly guessed, Bob can recover r since

(r + E)2 − r2 = 2E.r + E2 (mod N)

↓

r =(r + E)2 − r2 − E2

2E(mod N)

Using the guessed E and the found r, Bob can compute∏j∈S sj as follows

∏j∈S

sj =y

r + E(mod N)

To summarize, Bob guesses E and tries∏j∈S

sj =y

(r+E)2−r2−E2

2E + E(mod N)

=2E.y

(r + E)2 − r2 − E2 + 2E2(mod N)

=2E.y

y2∏j∈S vj

− r2 + E2

The problem now is how can Bob tell if the guessed E value is correct?Let T be the hypothesized value of

∏i∈S si for a guess of E. Usually, for only one

value of E, the relation T 2 =∏i∈S vi (mod N) would be satisfied.

In an unlikely event where two values E and E′ satisfy the previous relation givingT and T ′ where T 6= T ′, but we still have T 2 = (T ′)2 (mod N), so we are in one of thetwo following cases:


If T 6= −T ′ (mod N) then Bob can already factor N using

T 2 = (T ′)2

+ kN → N =(T + T ′)(T − T ′)

k

Else, since one of T or T ′ must be equal to∏i∈S si, it follows that Bob now knows∏

i∈S si (mod N) to sign, for our purposes, this is good enough.Time: Testing all values of E for one set S requires O(n+t) modular multiplications.

For t sets, we need O(nt+ t2) modular multiplications.To find more about how Bob can find each si for i ∈ (0, ..., t) [BDL97].What about multiple bits flip? the algorithm can work with multiple bits flip, but

it takes more time, say for c bits flip, the algorithm takes O(nct). Actually, the algorithmwould still work even if we modify the scheme, replacing for exemple the squaring intopower e, more information about this can also be found in [BDL97].

We can factorize N during RFA on FSIS even when T = −T ′ (mod N) using brute force:CRT: T 2 = (T ′)2 (mod p) and T 2 = (T ′)2 (mod q) So: T 2 = (T ′)2 + r1p and T 2 =

(T ′)2 + r2q So: p = T 2−(T ′)2r1

and q = T 2−(T ′)2r2

and r1r2 = (T 2−(T ′)2)2

NWhich means that we can use brute force to guess r1, calculating at each guess, r2 =

(T2−(T ′)2)2

Nr1

and calculating and verifying the values of p and q

The best way to prevent such an attack is to verify the output of the computation beforereleasing it, though, this reduces the system’s performance. For some systems like RSAsignature that has a public exponent e = 3 it can be really efficient, while for other systemslike the DSS, it can be costly.

For RSA/CRT, it’s crucial to use the verification, especially for the CA where a singletransient fault could leak the private key. Shamir found a good way to do so for all exponents[Sha97]. Though, the standard verification is still the best way to go when e is small (e.g.e = 3).

Also, giving that the fault happened in the input, and for the device, the output iscorrect (depending on the input), to protect against multi-round authentication schemes,one must ensure that the internal state of the device can’t be affected which can be realizedby protecting the internal memory by adding detection bits (e.g. Cyclic Redundancy Check).

Other ways of protection were found like Random Padding [BR96] and ProgramChecking [BW94,FGY96].

More examples of the RFA can be found: Attacking Schnorrs identification scheme[BDL97], breaking other implementations of RSA [BDL97], Guillou-Quisquater identifica-tion scheme [GQ88]

Chapter 3

Classic Countermeasures

In this chapter we cite the most known countermeasures, each is either specialized in coun-tering one attack, or is a general countermeasure. Though, some countermeasures are betternot to be used in some cases since they have inconveniences like the high cost or the timeand/or power loss.

3.1 Novice

We named them novice since they are basic and have too much inconveniences which makesusing them a non preferred approach.

3.1.1 Adding Delays [Koc96]

Clearly, this is used to prevent TA, and can be done in different ways:Adding a timer to delay returning results is not a good idea since factors such as the

system responsiveness or power consumption may still change when the operations finish ina way that can be detected.

Fixed time implementations are likely to be too slow since all operations must takeas long as the slowest one.

Random delays make the attack difficult but still possible. The number of requiredsamples for the attack to succeed increases roughly as the square of the timing noise (seeexample below).

Example 3.1.1. Random Normally Distributed Delay on Modular Exponen-tiation

Consider a modular exponentiation whose timing characteristics have a standarddeviation of 10 ms. Now consider it can be successfully broken with 1000 timing mea-surements.

If we add a random normally distributed delay with 1 second standard deviation, theattack will now require (1000ms10ms )2 ∗ 1000 = 107 samples.

Note that the mean delay would have to be several seconds to get a standard deviation

13

14 CHAPTER 3. CLASSIC COUNTERMEASURES

of 1 second. And while 107 samples is probably more than most attackers can gather, asecurity factor of 107 is not usually considered adequate.

3.1.2 Power Consumption Balancing [MB06]

It is used to prevent all sorts of PAA, because preventing the attacker from knowing howmuch power is needed to execute each operation prevents him from using PAA to learninformation about the operations being executed.

One way to do so is to have a special architecture that uses a dummy element, so, wheneveran operation is performed, a complementary operation should be preformed on the dummyelement. This way, the power consumption is constant and independent to input and keybits.

A great countermeasure but it needs more execution time and power, and a specialarchitecture that will need more space and materials.

3.1.3 Shielding [KJJ99]

Shielding is a high level of security, and it’s specially efficient and used in the countering ofall PAA. It’s a very simple physical method that necessitates only aggressive physical shieldaround the device. The cost increases, and the size increases too, and most importantly thesecurity increases as well.

3.2 Advanced

These techniques are more advanced and they have less (or none) conveniences.

3.2.1 Licensing Modified Algorithms [KJJ99]

This is a good technique to prevent all kinds of SCA, it can be done by making systems secureeven though the underlying circuits may leak information, which needs special cryptographicsystems design and implementation.

This works perfectly since during the design, if we assume that the information will leakduring the execution and we make our system secure independently to the leakage, meansthat our system is perfectly secure against all the SCA.

We can have additional security in the design, such as adding key usage counters, andusing exponent and modulus modifications aggressively (for public key schemes).

3.2.2 Data Independent [Spa06]

When we say data independent, we mean that the time operations take should be independentto the input data and/or the used key.

For this to be possible, sub-operations should take the same number of clock cycles nomatter what the input data and the key bits are.

This is a good way to prevent TA (and SPA) since they are based on analyzing thevariations in the computation time according to different inputs and/or keys.

If the device can accomplish this mission, the variation in the computation time will al-ways be null making it impossible to successfully make a TA.

3.2. ADVANCED 15

Although the idea seems simple, it can’t always be implemented, or sometimes it’s justnot efficient or doesn’t give the needed amount of security.

Let’s take the length of the exponent in exponentiation operations for example, it’ll alwaysaffect the computation time. Solution? no need, the length of the exponent isn’t a very usefulinformation for the attacker and in most cases this information is public.

3.2.3 Operation Independent [Koc96]

This is the case of ’Time Equalization of Multiplication and Squaring’ which is specializedin preventing TA and SPA against Exponentiation operations that are performed as a partof asymmetric encryption operations. The attacker shouldn’t know if, when, and how manymultiplications/exponentiations had occurred. To do so, whenever there is one of the two,the second is done anyway (and the result of the second is dumped).

3.2.4 Blinding [Koc96]

Blinding is a good technique to prevent the attacker from knowing the input to the function,which makes it a good countermeasure of all SCA on symmetric and asymmetric algorithms.

Example 3.2.1. BlindingBefore computing the modular exponentiation operation, choose a random pair (vi, vf )

such that v−1f = vxi mod n.Now we multiply the input message by vi mod n, the result is corrected by multiplying

it with vf mod n. Note that the system should reject messages equal to 0 mod n.Note that for DH it’s simpler to choose a random vi and then compute vf =

(v−1i )xmod n, while for RSA the faster way is to choose vf relatively prime to n and

then compute vi = (v−1f )emod n where e is the public exponent.

What we just gave was a good example of blinding, but it has some problems that needsolutions. For instance, computing inverses is slow, so generating a random pair for eachnew exponentiation is not practical, and at the same time the calculation might be subjectto TA, though pairs shouldn’t be reused because they are also subject to TA.

The solution for this problem is to update the vi and vf each time using v′i = v2i andv′f = v2f . This way the total performance cost is small (2 modular squaring which can beprecomputed and 2 modular multiplications).

More solutions were proposed like using exponents other than 2 or multiplying withanother pair (ui, uf ). But, these solutions don’t give any advantages over the basic solutionand they take more time to be realized.

3.2.5 Running the Encryption Twice [BS97]

If the device runs the algorithm twice, and outputs the result only in the case where bothresults are equal it can make the device resistant to DFA. Since the fault can happen twiceat the same time and place, especially if it was artificial, the prevention isn’t perfect but atleast it makes the attack really hard to succeed and forces the attacker to need a huge scaleof samples and a perfect precision in the fault injection.

16 CHAPTER 3. CLASSIC COUNTERMEASURES

3.2.6 Modifying the Algorithms Design [CMCJ04]

Another good way to prevent all kinds of PAA, it aims to modifying the algorithm in sucha way that the attacker looses control of his own attack. The best way to do so is to usehashing, like hashing a 160-bit key with Secure Hash Algorithms (SHA [Sta94]) before usingit to destroy partial information an attacker might have gathered about the key. This requiresalgorithms and protocols to change their design which may make the resulting product non-compliant with the standards and the specifications.

Part II

Attacks on AES

17

Chapter 4

Advanced Encryption Standard

In this chapter, we will present the AES in details, to be more specific we’ll study the AES-128since it’s the one used in our attack implementations.

4.1 General Information

4.1.1 History

In January 1997, a new algorithm was needed to replace the DES. The National Instituteof Standards and Technology called for participants, and 15 algorithms were proposed. Theneeded algorithm was expected to work with a symmetric key and use bloc encryption, itwas also expected to work with keys of 128, 192, and 256 bits to encrypt blocs of 128 bits.Out of 15 candidates, 5 finalists were chosen from which one algorithm was selected, whichwas the Rijndael algorithm. In 2001 this algorithm (with some modifications) became thenew standard for bloc encryption under the name of AES.

4.1.2 Notations

word 4 bytes.

Nb Number of words in input message.

Nk Number of words in input key.

Nr Number of rounds.

Confusion Complexing the relation between the cipher-text and the key.

Diffusion Complexing the relation between a plaintext and its ciphertext.

19

20 CHAPTER 4. ADVANCED ENCRYPTION STANDARD

4.1.3 Possible Encryptions

Nb Nk Nr

AES-128 4 4 10

AES-192 4 6 12

AES-256 4 8 14

4.1.4 Input Representation

We present the input in a 4× 4 matrix in which each column is a word.

a0,0 a0,1 a0,2 a0,3a1,0 a1,1 a1,2 a1,3a2,0 a2,1 a2,2 a2,3a3,0 a3,1 a3,2 a3,3

4.2 Encryption

The AES has 10, 12, or 14 rounds depending on the key length. Each round (except thelast one) is composed of 4 functions: SubBytes, ShiftRows, MixColumns and AddRoundKey.The process can be seen in Figure 4.2.1

Figure 4.2.1: AES Encryption

4.2.1 SubBytes (scramble each byte)

Each byte is replaced (see Figure 4.2.2) using a substitution table (see Figure 4.2.3), whichmakes it a none linear operation that provides confusion.

The substitution function ’S’ is simple, say we have a byte as the following:

4.2. ENCRYPTION 21

Figure 4.2.2: AES Substitution Function

Figure 4.2.3: AES S-box

a2,2 = 10010110 which is 96 in hexadecimal, this means that S(a2,2) = b2,2 where b2,2 isin line x = 9 and column y = 6 of the AES substitution table, so b2,2 = 90 in hexadecimal.

4.2.2 ShiftRows (scramble each row)

In the ShiftRows function, each line 0, 1, 2, and 3 does a circular left shift by 0, 1, 2, and 3bytes respectively (see Figure 4.2.4).

Figure 4.2.4: AES Shifting Function

4.2.3 MixColumns (scramble each column)

The MixColumns step is an invertible linear transformation, the function has 4 bytes of inputand 4 bytes of output. Each input byte affects all 4 output bytes. The MixColumns functionworks with the ShiftRows to provide diffusion.

Here, each column is considered as a polynomial of degree less than 4. We multiply each


of these polynomials with a fixed polynomial P (x). Which is the same thing as if we do amatrix multiplication: bi = P (x)⊗ ai (see Figure 4.2.5)

Figure 4.2.5: AES Column Mixing Function

The fixed polynomial P (x) is of the following form P (x) = 03x3 + 01x2 + 01x + 02which gives the following matrix multiplication:

b0,1b1,1b2,1b3,1

=

02 03 01 0101 02 03 0101 01 02 0303 01 01 02

a0,1a1,1a2,1a3,1

4.2.4 AddRoundKey (encrypt)

The function AddRoundKey xors each byte with the corresponding round sub-key byte (seeFigure 4.2.6). The sub-key of each round is obtained using the AES key schedule that we’lllater see in section 4.4.

Figure 4.2.6: AES Add Round Key

4.3 Decryption

For the decryption, the operations (except AddRoundKey) are inversed as well as the process(see Figure 4.3.1).

4.3.1 InvSubBytes

Each byte is replaced using the inverse substitution table (see Figure 4.3.2).

4.3. DECRYPTION 23

Figure 4.3.1: AES Decryption

The inverse substitution function ’ISub’ has the same idea as the substitution itself. Saywe have a byte as the following:

b2,2 = 10010000 which is 90 in hexadecimal, this means that ISub(b2,2) = a2,2 wherea2,2 is in line x = 9 and column y = 0 of the inverse substitution table, so a2,2 = 96 inhexadecimal.

Figure 4.3.2: AES Inverse Substitution Box

4.3.2 InvShiftRows

In the InvShiftRows function, each line 0, 1, 2, and 3 does a circular right shift by 0, 1, 2, and3 bytes respectively.


4.3.3 InvMixColumns

Here, each column is still considered as a polynomial of degree less than 4. We multiply eachof these polynomials with P−1(x) which is the inverse of the polynomial P (x). Which is thesame thing as if we do a matrix multiplication: ai = P−1(x)⊗ bi.

The inverse polynomial P−1(x) is of the following form P−1(x) = 0bx3 + 0dx2 + 09x + 0ewhich gives the following matrix multiplication:

a0,1a1,1a2,1a3,1

=

0e 0b 0d 0909 0e 0b 0d0d 09 0e 0b0b 0d 09 0e

b0,1b1,1b2,1b3,1

4.4 AES key schedule

During the explication of the AES, we mentioned that the sub-keys used in each round arecomputed from the original key using AES key schedule.

To be more specific, in case of AES, the algorithm is used to get sub-keys of size 128 bitsusing an original key of 128, 192, or 256 bits. But the Rijindael’s original algorithm wouldcompute sub-keys that have the same size of the original key, which is useless for AES-192and AES-256, so the algorithm is a little bit different from the Rijindael’s algorithm, wesimply call this one AES Key Expansion.

4.4.1 Rotation

The rotation is an 8-bit circular rotation ’Rt’ on a 32-bit word. So if we have the hexadecimalinput 4e6a1c3b, the output would be 6a1c3b4e.

4.4.2 Rcon

The Rcon function ’Rc’ is called by Rijndael the exponentiation of 2 to a user-specified value.Rc(i) = x(i−1) mod x8 + x4 + x3 + x+ 1Which can be obtained using the table in Figure (4.4.1)

4.4.3 Substitution

The Substitution ’S’ uses the same S-box seen before in Figure 4.2.3

4.4.4 The Algorithm

Now that we viewed the main functions, we’ll present the algorithm. The functions work on32-bit words. So for a 128-bit key and 10 rounds we have 44 words, which is 52 for a 192-bitkey and 60 for a 256-bit key.

What we need to do here is to initialize the first Nk words, then start the Key Expansion.Let W be the table of words, the key of the initial round is W[0:4] which means the first 4words in the original key.

4.5. SECURITY 25

Figure 4.4.1: AES Rcon Table

Algorithm 4.4.1. AES Key Expansion

INPUT: K: original key (4, 6, or 8 words for 128, 192, or 256 bits)

OUTPUT: W : Table of words containing all the sub-keys

Nb = 4

Nk = len(K) 4, 6, or 8 for 128, 192, or 256 bits

Nr = 6 +Nk 10, 12, or 14 for 128, 192, or 256 bits

while i < Nk:

W [i] = K[i]

i+ = 1

While i < Nb ∗ (Nr + 1):

temp = W [i− 1]

If i mod Nk == 0:

temp = S(Rt(temp))⊕Rc(i/Nk)

Else-If Nk == 8 and i mod Nk == 4: Case of 256 bits

temp = S(temp)

W [i] = W [i−Nk]⊕ tempAdd 1 to i

Return W

4.5 Security

The AES is an encryption algorithm which means that obviously a lot of security issues arerelated to the algorithm. For the sake of our work, we’ll only talk about two threats to thealgorithm.


4.5.1 Why

When a substitution happens, bits are either changed entirely, or flipped to save power. Inboth cases, the power used to do so can leak information about what’s going on inside thealgorithm and / or about the key itself.

4.5.2 First Substitution

If we take a good look at the AES algorithm, we can see that the first substitution is thefirst thing to do in round 1, which is just after adding the round key in the initial round.

It’s important to always have in mind that the key used in the initial round is the originalkey, and the message used is the first substitution is the original message.

Now consider we have N messages of the form mi are encrypted using a key K intocipher-texts of the form ci.

The substitution box will work on each byte of the xored message byte, for each message,as follows si,j = S(mi,j ⊕K∗j ) where K∗ = K∗1 ,K

∗2 , ...,K

∗16 is a supposition of K

It’s possible for an attacker to compute the HW of si,j for each message for each key-bytepossibility (256 possibilities) which makes DPA a big threat to the AES first substitution.

It’s also possible for the attacker to use the HD between mi,j and si,j giving DPA anotherchance of being a threat, and giving CPA the chance to be on the list of AES first substitutionthreats.

It’s important to mention that during practice, a DPA was unsuccessfully tested on thefirst substitution using the HW. This doesn’t mean that the attack doesn’t actually workbecause there are other factors such as the machine itself (see 9.1), and the fact that at thetime, the used traces were unreliable since they didn’t cover the whole encryption (see 9.2.3).

4.5.3 Last Substitution

It is the same idea, only this time we attack the last substitution to get the last sub-key.Then we inverse this sub-key to get the original one (This is not enough for AES-192 andAES-256).

Now having ci,j the jth byte of the cipher ci, we can compute :

si,j = InvSub(InvShiftRows(ci,j ⊕K∗j ))

Since we have K∗ which gives us the ability to compute the last round key.As in first substitution attack, also in the last substitution attack we can either use the

HW of si,j or the HD between si,j and ci,jIt’s important to mention that during practice, we only tried attacks based on the HD.

Chapter 5

Differential Power Analysis

5.1 General Idea

As we have seen in the previous chapter, it’s possible to attack an AES byte by byte using thealgorithm at different states: before and after the first substitution, or the last substitution.

Other algorithms can be attacked using the same idea, but let’s stay with the AES, andto be more specific we’ll work on attacks exploiting the HD leakage between the state justbefore the last substitution and the cipher-text of the AES-128.

In general, the attack compares the traces and selects them according to their corre-sponding HD. We choose a key-byte possibility and according to it we divide traces theminto 2 groups, the first group is formed of traces representing cases with HD lower than 4and the second group is formed of traces representing cases with HD greater than 4. Now wecalculate the average of each of the groups, then the difference between these averages, andfinally we choose the best point in the resulting vector (best point: the farthest point fromzero). We compare the points Collected for each guess, if the number of traces was enough,then the point representing the vector calculated during a possibility guess that is actuallythe right guess would be higher than most of the other points, and in a good and efficientattack (with enough traces) this point would be the highest point. This result in knowingone key-byte, we do the same to know all the other parts and find the whole key.

5.2 Evaluation

Since we have many attacks, and we even have different ways in executing them, giving usdifferent results, we need a way to measure the effectiveness of each of the algorithms andcompare it with others.

The first evaluation that comes to mind is very basic, we’re attacking key-bytes. To bemore specific, in AES-128, we’re attacking 16 key-bytes. What comes to mind is to comparethe attack executions (algorithm + number of messages) using the number of key-bytes thatwere correctly found. In other words, we count the number of times the most probable key-byte candidate was actually the correct key-byte and divide this number by 16 giving us ananswer between 0 and 1. More about evaluations in chapter 11

27

28 CHAPTER 5. DIFFERENTIAL POWER ANALYSIS

5.3 Notations

In the following section, a lot of notations are used, so we’ll present them in this section:

N The number of messages (eventually ciphers and also traces).

P The precision (number of points per trace).

B The number of byte possibilities (which is 256)..

PN The number of the key-byte we’re working on (between 0 and 15).

mid is the middle of 0 and 8 which is 4.

C Table of cipher-texts, each element of this table contain a 16 bytes word.

T Table of traces, each element of this table contain a trace of P points.

ISub The inverse SubBytes we saw in sub-section 4.3.1.

IShift The inverse shiftRows we saw in sub-section 4.3.2.

LinAv Takes a table of vectors as its input and returns the average vector.

SpecialLinAv Returns the average vector of a table of vectors, but instead of dividing thesum with the length, it divides the sum with the second input.

DiffA Returns a vector containing the absolute difference of adjacent elements in 2 vectors.

MultiplyEachValue Takes a matrix as it’s input and returns a copy of the matrix with allit’s values multiplied with the same float input.

5.4 Different Ways With Different Results

In this section we’ll study the different variations that can be done to the DPA algorithm,each would have a different (better or worse) result and would take different (more or less)time to finish. As usual the algorithms for DPA are exploiting the HD leakage between thestate just before the last substitution and the cipher-text of the AES-128.

In all the following algorithms, we attack only one key-byte out of the 16 key-bytes inAES-128 since all parts are attacked in the same way.

Since we’re going to use the HD a lot, and since the HD between a and b is the HW ofa ⊕ b, it’s better to prepare a table that gives the HW of any value between 0 and 255, wecall this table HWS. The value in each element of this table of length 256 is the HW of theelement’s index: HWS[5] = 2

The following methods were successfully tested. Depending on the used method, numberof traces, and precision, the results differ a lot. The good choice means the difference betweenfinding the whole key, and failing to find even one part of the key.

5.4. DIFFERENT WAYS WITH DIFFERENT RESULTS 29

5.4.1 Normal

The following is the original DPA algorithm, it’s not as sophisticated as other DPA algo-rithms, but it cuts the time to a minimum.

Algorithm 5.4.1. DPA on AES (Normal Method)

Initialize G, a table of vectors

j = 0

While j < B: All possibilities

Initialize G1 and G2 as empty tables of vectors

Initialize avG1 and avG2 as empty vectors

i = 0

While i in N: All messages

d = HWS[ISub(C[i][PN ]⊕ j)⊕ IShift(C[i], PN)]

If d < mid:

Add T[i] to G1

Else-If d > mid:

Add T[i] to G2

Add 1 to i

If size(G1) is zero:

Fill avG1 with P zeros

Else:

avG1 = LinAv(G1)



Else:

avG2 = LinAv(G2)

temp = DiffA(avG1, avG2)

Add temp to G

Add 1 to j

When this algorithm finishes, we’d have the table of vectors G. We choose the best point(farther from zero) to represent each vector and we organize these points in decreasing order.The first point would be the most possible candidate. Though, during execution, dependingon the number of used messages and points, and depending on how the algorithm is executed,the true candidate is not always the first, but most of the times it’s one of the firsts.

In Figure 5.4.1, we can see how the effectiveness of the DPA changes with the number ofused traces. Mostly the more traces we have the better the attack is.


0 1,000 2,000 3,000 4,000 5,0000

0.2

0.4

0.6

0.8

Number Of Traces

Effi

cien

cy(b

etw

een

0an

d1)

Normal

Figure 5.4.1: DPA Efficiency (correct key-bytes), original function

5.4.2 Normal With Middle

Some information is lost which is traces of HD = 4. All we have to do is to replace ”Ifd < mid:” with ”If d <= mid:” or replace ”If d > mid:” with ”If d >= mid:”

Figure 5.4.2 compares the NormalWithMiddle method with the Normal method. Theresult is a little bit different, but we’re not gaining a lot while using this method, especiallybecause it takes a lot more time (further explication in Figure 5.6.1).

0 1,000 2,000 3,000 4,000 5,0000

0.2

0.4

0.6

0.8

Number Of Traces

Effi

cien

cy(b

etw

een

0an

d1)

NormalNormalWithMiddle

Figure 5.4.2: DPA Efficiency (correct key-bytes), comparing ’Middle’


5.4.3 Normal Multiplier

During code debugging and execution, one can notice that the cases where HW is zero or 8are rare. The closest HW is to 4 the more probable it is. Actually the HW is working onnumbers between 0 and 255 which give it special probabilities (see Table 5.4.1)

HW 0 1 2 3 4 5 6 7 8

Proba 1256

8256

28256

56256

70256

56256

28256

8256

1256

Table 5.4.1: HW happening probability

Though, we shouldn’t forget that the HW is actually representing a HD, and since 4 isthe normal HD where half of the bits were changed. The farther we are from 4 the moreleakage we have (an idea to be verified with results).

All of the above gives us the idea of rising the effectiveness of traces representing HWfarther from 4. We can do so by multiplying these traces with ’f(HD)’ which is a functiondepending of the HD.

For faster work, it’s better to prepare the traces with the different possible multiplicationswhich are (1, 2, 3, and 4). So before starting the attack we add the following to the end ofthe preparing phase:

If method = ”NormalMultiplier”:temp = [t] Add traces multiplied by oneFor i in [2, 4]: i = 2, 3, then 4

Add [MultiplyEachValue(t, i)] to tempt = temp

Now the attacking algorithm becomes like the following:Algorithm 5.4.2. DPA on AES (Multiplication Method)

Initialize G, a table of vectors

j = 0


Initialize G1 and G2 as empty tables of vectors

Initialize avG1 and avG2 as empty vectors

length1 = length2 = 0

i = 0

While i in N: All messages

d = HWS[ISub(C[i][PN ]⊕ j)⊕ IShift(C[i], PN)]

If d < mid:

temp = mid− d


Add temp to length1 and t[temp− 1][i] to G1

Else-If d > mid:

temp = d−midAdd temp to length2 and t[temp− 1][i] to G2

Add 1 to i



Else:

avG1 = SpecialLinAv(G1, length1)



Else:

avG2 = SpecialLinAv(G2, length2)

temp = DiffA(avG1, avG2)

Add temp to G

Add 1 to j

We can see in Figure 5.4.3 how the NormalMultiplier method is always better than theNormal one. It’s clear that around 5000 traces effectiveness became 1 which means that thewhole key was successfully found.

0 1,000 2,000 3,000 4,000 5,0000

0.2

0.4

0.6

0.8

1

Number Of Traces

Effi

cien

cy(b

etw

een

0an

d1)

NormalNormalMultiplier

Figure 5.4.3: DPA Efficiency (correct key-bytes), comparing ’Multiply’


5.4.4 PowerE Multiplier

To test the same algorithm with little changes and expect a big difference in the result soundslike a bad idea, in DPA it’s not. Here we do almost the same as in ”NormalMultiplier”, butthis time we use a more powerful multiplication which is 2x.

First we need to add the following to the end of the preparation phase:

If method = ”PowerEMultiplier”:

temp = [] Empty List

For i in [1, 4]: i = 1, 2, 3, then 4

Add [MultiplyEachValue(t, 2 ** i)] to temp

t = temp

Now, from ”NormalMultiplier”, we replace:

- ”Add temp to length1” with ”Add 2 ** temp to length1”

- ”Add temp to length2” with ”Add 2 ** temp to length2”

We can see in Figure 5.4.4 that this method is almost perfect for any attacker who hasmore than 2000 traces.

0 1,000 2,000 3,000 4,000 5,0000

0.2

0.4

0.6

0.8

Number Of Traces

Effi

cien

cy(b

etw

een

0an

d1)

NormalPowerEMultiplier

Figure 5.4.4: DPA Efficiency (correct key-bytes), comparing ’PowerE’

5.4.5 PowerL Multiplier

Since the code is already prepared, the changes are so small, and the execution is happeningon a computer built to none stop work, why not trying other small changes.

Same as ”PowerEMultiplier”, but here we multiply with x2:

So for the preparation phase we add:

If method = ”PowerLMultiplier”:

temp = [t] Add traces multiplied by 12

For i in [2, 4]: i = 2, 3, then 4

Add [MultiplyEachValue(t, i ** 2)] to temp


t = temp

And in the computation phase we replace:

- ”Add temp to length1” with ”Add temp ** 2 to length1”

- ”Add temp to length2” with ”Add temp ** 2 to length2”

Looking at Figure 5.4.5 and Figure 5.4.4, we understand that the 2 algorithms givedifferent results but with almost the same efficiency.

0 1,000 2,000 3,000 4,000 5,0000

0.2

0.4

0.6

0.8

Number Of Traces

Effi

cien

cy(b

etw

een

0an

d1)

NormalPowerLMultiplier

Figure 5.4.5: DPA Efficiency (correct key-bytes), comparing ’PowerL’

5.4.6 HDRefinery Multiplier

We tried different multipliers, still when we use a multiplier that is too high, the DPA becomesworse. We suppose that one of the multipliers would be the perfect one, and since we areconsidering the relation between the multiplier and the HD probability, why not use thisprobability in a direct way. Our expectations are for this method to be the best so far.

For example, the probability of HD = 0 is 1256 so we keep corresponding traces as they

are, while for HD = 1, the probability is 8256 so we divide the values in the traces with 8,

and so on.

All we have to do is to add the following to the preparation phase:

If method = ”NormalWithHD”:

temp = [] Empty List

multipliers = [1.0/256, 1.0/28, 1.0/8, 1.0]

For i in [0, 2]: i = 0, 1, then 2

Add [MultiplyEachValue(t, multipliers[i])] to temp

t = temp

And Do the following to the computation phase:

- Replace ”Add temp to length1” with ”Add multipliers[temp− 1] to length1”

- Replace ”Add temp to length2” with ”Add multipliers[temp− 1] to length2”

5.5. TOTALLY DIFFERENT DPA 35

Shockingly, this method wasn’t better than other multiplication methods, sometimes itwas even worse. Even though the theory is correct, practice isn’t a perfect world. Themethod still may be the best, but we can’t see that because we’re testing our attacks on thesame key and the results would always depend to every little detail. For perfect comparison,we can test a good number of keys, and compute the average of success, the more keys themerrier.

5.5 Totally Different DPA

Coming from the idea of multiplying the traces depending on the HD, the comparison wouldchange to be a little more local and focused. Here we’ll deal with each HD separately, andcombine with the opposite HD. This means that instead of computing the average of alltraces referring to HD less than 4 and the same for HD higher than 4, then computing thedifferent. Here we’ll compute the average for traces referring to HD equal to zero, and thesame for HD equal to 1, 2, 3, 5, 6, 7, and 8. Afterwards, we compute the difference betweenthe average of HD equal to zero and it’s opposite which is the average of HD equal to 8. Wedo the same for the couples 1-7, 2-6, and 3-5.

In addition, and as we did before, we can use a multiplier depending on the HD, whichmeans that if we call the method collect, we’ll end up with collect, collect and multiply,collect and powerE, and collect and powerL.

A simple multiply would look like the following:

G =1

104 ∗ (avs8 − avs0) + 3 ∗ (avs7 − avs1) + 2 ∗ (avs6 − avs2) + 1 ∗ (avs5 − avs3)

Where avsi is the average of traces that are referring to HD equal to i. Notice that wedivided by 10 because it’s the sum of the multiplications we did: 1 + 2 + 3 + 4 = 10.

Another condition we had in the algorithm is that if for example avs8 is null because S8is empty (No traces referring to HD equal to 8), then we don’t use it’s opposite (avs0) andwe end up dividing by the sum of the other multipliers without the one that was actuallymultiplying avs8 and avs0: 1 + 2 + 3 = 6.

After testing the attack with the different multiplication methods on the collectivity idea,no improvements were found. Actually in many random cases the attack have even weakerresults than the normal method. The expectations about this idea working better thannormal DPA were wrong, at least in our special case (device, encryption algorithm, leakagemodel...). This doesn’t mean that this idea can’t be in some other areas better than justrunning a normal attack. Still no statistical results were made over these attacks since theyweren’t as valuable as expected.

5.6 Timing

It’s important that the attacks work, but it’s also important that they do so in acceptabletime. For example, we consider an attack that takes a lot more time than others and is just alittle bit more effective. This attack is important in case we have a limited number of traces,because in this case we are forced to accept the longer time taken by the attack. But in casewe can obtain enough traces for the other attacks, then this attack become uselessly long.


In more simple words, if Attack A needs 5000 traces and takes 1 day to finish, while AttackB needs 7000 traces and takes 1 hour to finish. Then A is better than B in case we have alimited number of 5000 traces, but if we can obtain 7000 traces then B is the better attack.

When we talk about the execution time, it’s important to know what’s the power of themachine and the used software, since this has a great effect on the execution time. So inour case we wrote the codes in python (a scripting language), which has great flexibility andexpandability but is too long compared with programming languages (like C). The computerrunning the programs is a GenuineIntel that has a 32-bit and 64-bit architectures with 8CPUs, each has a speed up to 3.6 GHz (divided into 2 threads of 1.8 GHz). In addition thecomputer has a 6 Tb of storage (2 are system reserved) and uses one chip of 12 Gb RAM.

The traces used in time computations had 1022 points each. These points are floatsrepresenting the power consumption.

In Figure 5.6.1 we see a comparison according to needed time to finish attacking one key-byte. We shouldn’t forget that in some cases, even to attack one key-part, the preparationis for all the 16 key bytes. We put methods based on multiplications (Multiply, PowerE,PowerL, and HDRefinery) together since they take almost the same time because they havea similar preparation phase and a similar attacking algorithm. Normal method takes lesstime since it has less preparations and less computations, while NormalWithMiddle takesmore than any other algorithm since during computations we have plenty of added traces(traces corresponding to HD = 4 are more probable than others).

0 1,000 2,000 3,000 4,000 5,0000

50

100

150

200

Number Of Traces

Tim

e(s

econ

ds)

NormalNormal With Middle

Multipliplication Methods

Figure 5.6.1: DPA Timing of one key-byte Attack

Chapter 6

Correlation Power Analysis

6.1 General Idea

Comparing to DPA, CPA is more sophisticated, and normally a system that can be attackedusing DPA can also be attacked using CPA with even less needs (number of traces).

While DPA would compare the traces, CPA would go further to comparing points in-side these traces. And while DPA selects traces according to the corresponding HD, CPAcomputes correlation with the HD.

Also, a problem that occurs sometimes during DPA under the name of ghost never hap-pens during CPA. The ghost problem is when we have a key-byte candidate that seems tobe the right answer but is not. More about this problem and about CPA in [BCO04].

6.2 From DPA to CPA

Evaluation methods are not made for one attack or algorithm, they are general methods.Therefore, we’ll use the same evaluation method as the one used in DPA. The used evaluationis in section 5.2. More about evaluation in chapter 11.

Notations used in DPA (section 5.3) are the same for CPA.

Table HWS 4 proved useful in CPA, so what is it? Consider the HWS table used in DPA,now say we replace each value x in this table with x− 4. This would give us a table of valuesbetween −4 and 4. This means that all elements that had HW equal to 4 have now HWequal to zero giving zero the probability of 70

256 in place of 1256 (see table 5.4.1 in section 5.4.3)

which is very useful since the zero can reduce a lot of computations. Now is this doable?Doesn’t this change the values we’re computing? ’yes’ is the answer for both questions. Thevalues do change, but since we’re computing correlations and the manipulation is similar toall HW, the final result would always be the same. Now if we take a look at our new tableHWS 4 we would see that the values are symmetric in a negative way. It means that thelast value in the table can be obtained by multiplying the first value with −1, and so on.This means that we can reduce the table of 256 values into a table of 128 and obtain HWof elements bigger or equal to 128 by the negative inverse of values in the first table. We’ll

37

38 CHAPTER 6. CORRELATION POWER ANALYSIS

see how this can be done reducing the size of the table into half and accelerating the table’saccess speed.

6.3 Different Correlation Algorithms

The two most known functions to compute the correlation are Pearson’s -PPMCC- andSpearman’s -SR-. In the following we’ll present both algorithms during the attack on onekey-byte of AES, all other 15 bytes are attacked in the same way to obtain a full key.

Even that the following algorithms are for an attack on the HD of the last substitutionin AES, it can be easily changed to do an attack on the HD of the first substitution of AES,or even exploit other encryption algorithms.

6.3.1 Using Pearson

If we want to use the Pearson’s algorithm directly, we’re going to fall in lots of unusedcomputations and loops. In the following we’ll show the Pearson’s function after it’s beenoptimized, and we’ll explain what’s not clear enough.

As in DPA, we should first initialize an empty list G in which we’ll put all the resultingvectors so afterwards a point would be selected in each vector to represent it in what we cancall ”Organized Byte Candidates”.

In addition, before running Pearson, we should create a table called ’grouped’ that willgroup the cipher-texts each with it’s corresponding trace, this way we can iterate themtogether speeding up the algorithm since iterating a table (grouped) case after case is fasterthan iterating two tables using the indexes (ciphers and traces). So each case of grouped isa list of two values, the first one (index zero) is the cipher (list of 16 integers between zeroand 255). And the second one (index one) is the trace (vector of floating points).Algorithm 6.3.1. CPA using Pearson’s

j = 0


GTemp = EmptyList() Temporary ’G’ vector

SH = 0.0 Will contain the sum of HD

SH2 = 0.0 Will contain the sum of squared HD

SC = EmptyList() Will contain the sum of equivalent points

SC2 = EmptyList() Will contain the sum of squared equivalent points

SCH = EmptyList() Will contain the sum of HD and points multiplications

p = 0

While p < P : All points

Add [0.0] to SC Fill SC with zeros

Add [0.0] to SC2 Fill SC2 with zeros

Add [0.0] to SH Fill SH with zeros

Add 1 to p

6.3. DIFFERENT CORRELATION ALGORITHMS 39

For g in grouped: Iterate by value

temp = ISub(g[0][PN ]⊕ j)⊕ IShift(g[0], PN)

If temp < 128:

h = HWS 4[temp]

Else:

h = −HWS 4[255− temp]If h not zero:

Add h to SH

Add power(h, 2) to SH2

p = 0


c = g[1][p]

Add c to SC

Add power(c, 2) to SC2

Add c ∗ h to SCH

Add 1 to p

Else: h is zero so we have less computations

p = 0


c = g[1][p]

Add c to SC

Add power(c, 2) to SC2

Add 1 to p

p = 0

While p < P

temp = absolute(((N ∗ SCH[p])− (SC[p] ∗ SH))/

(sqrt((N ∗ SC2[p])− power(SC[p], 2)) ∗ sqrt((N ∗ SH2)− power(SH, 2))))

Add [temp] to Gtemp

Add 1 to p

Add [Gtemp] to G

Add 1 to j

Advance Tell the main thread that one possibility was done

6.3.2 Using Spearman

Like Pearson’s, some changes were done to Spearman’s function to work faster. We’ll see allof this in the following algorithm.


In Spearman’s also we need to initialize G for the same reason’s as Pearson’s. But unlikePearson’s, ’grouped’ won’t be used since we’re never going to compute HD and search thepoints in the corresponding trace at the same time.Algorithm 6.3.2. CPA using Spearman’s

j = 0


GTemp = EmptyList() Temporary ’G’ vector

H = EmptyList() Will contain a vector made of the HD

MH = 0.0 Will contain the average of HD in H

HMH = EmptyList() Will contain differences between HDs in H and MH

SH2 = 0.0 Will contain the sum of squared differences in HMH

For c in ciphers: Iterate ciphers one after the other

temp = ISub(c[PN ]⊕ j)⊕ IShift(c, PN)

If temp < 128:

h = HWS 4[temp]

Else:

h = −HWS 4[255− temp]Add [h] to H and h to MH

MH = MH/N

If MH is zero:

HMH = H

For h in H:

Add power(h, 2) to SH2

Else:

For h in H:

temp = h−MH

Add [temp] to HMH and power(temp, 2) to SH2

p = 0


C = EmptyList()

MC = 0 Will contain the average of points in C

For t in traces: All traces

temp = t[p]

Add [temp] to C and temp to MC

MC = MC/N

6.4. METHODS COMPARISON 41

SC2 = 0.0

SCH = 0.0

l = 0

While l < N : All traces

temp = C[l]−MC

Add power(temp, 2) toSC2

Add temp ∗HMH[l] to SCH

Add 1 to l

temp = absolute(SCH/sqrt(SC2 ∗ SH2))

Add [temp] to Gtemp

Add 1 to p

Add [Gtemp] to G

Add 1 to j

6.4 Methods Comparison

6.4.1 Correct Parts Number

Same as the one used to compare DPA methods, here we compute the number of key-bytesthat were successfully attacked (the correct key-byte candidate is at the top of the list ’G’).

Note that in Figure 6.4.1, the curves for both methods overlap. This is totally normal,because even though the 2 methods work differently and take different amount of time, theirresults are exactly the same.

0 1,000 2,000 3,000 4,000 5,0000

0.2

0.4

0.6

0.8

1

Number Of Traces

Effi

cien

cy(b

etw

een

0an

d1)

SpearmanPearson

Figure 6.4.1: CPA Efficiency (correct key-bytes)


6.4.2 Timing

Like in DPA, we compare for CPA the time taken to prepare the attack and launch it on onekey-byte.

We can see in Figure 6.4.2 how the Pearson’s beats the Spearman’s when it comes tospeed. The difference is small but clear, and it becomes larger as the number of traces rise.This is because Spearman’s has more loops and computations to do than Pearson’s.

0 1,000 2,000 3,000 4,000 5,0000

200

400

600

800

Number Of Traces

Tim

e(s

econ

ds)

SpearmanPearson

Figure 6.4.2: CPA Timing of one key-byte Attack

Chapter 7

Template Attacks

In this chapter, we’ll study the TMPA. To be more specific, we’re going to study the TMPAthat don’t use neither the plain-text nor the cipher-text.

7.1 General Idea

We can get the idea of the attack from it’s name. As for any kind of templates, in TMPAwe need to create our templates data base and use it in the attack.

For example, in Data Mining, creating templates would be to use a huge amount of datato create an information such as ”90% of people who drink diary products live long”. Thisinformation would be created out of controlled database where each piece of information isprecise and sure. Afterwards, we use this information on unknown subjects to define them,for example, we have a person that drink coffee, we can say that he has 9

10 chance to livelong.

The same thing would be done in the TMPA, we would create templates using a largeamount of traces for different possibilities. Then we use these templates to see where thetrace fits better, and give a probability that the trace uses one key and not the other.

7.2 Profiling

In this phase, we would work on a controlled device in which we know the input and theoutput at each moment. We’ll try different key possibilities for each key-byte using thesame list of messages. Then we would create our templates according to the traces andmessages/ciphers.

Algorithm 7.2.1. Templates Profiling

• First we need to get the controlled traces which are traces obtained by using knownkeys, we did so as follow (This part has a problem while generating the keys, to beexplained afterwards in section 7.4):

43

44 CHAPTER 7. TEMPLATE ATTACKS

Declare null = 0 or any value in (0...255)

Declare N = 1000 Number of messages (around 1000)

Declare Traces[256 ∗N ] Empty table

Possibilities = [0...255] All the possibilities for a byte

Messages = N∗ randMes(16) N random messages of 16 bytes

For i in (0...255): All needed keys

Keys[i] = Possibilities[i] + [null] ∗ 15 Only the attacked byte changes

For each key in Keys:

For each message in Messages:

Encrypt the ’message’ using the ’key’

Save the resulting power trace in Traces

• Now we need to prepare the templates, this is done as follow:

For i in (0...255): All the possibilities for a byte

mi = 1N

∑Nj=1 ti,j meaning of ith possibility

Ci = 1N−1

∑Nj=1(ti,j −mi)(ti,j −mi)

T Covariance matrix of ith possibility

7.3 Attacking

Now that we have the templates, we need to use them on a trace ’t’ (or a group of traces’T’) obtained from a similar device with unknown key.

7.3.1 Probability of Correspondence

We start by computing the probability of correspondence:

Algorithm 7.3.1. Templates Probability of Correspondence


Pr(t|mi, Ci) = 1√(2π)n|Ci|

.e−12(mi−t)C−1

i (mi−t)T

7.3.2 Key-Part Probability

Using Bayes theorem we compute the probability of each possibility. The computation differsa little bit between the case in which we have only one trace, and the case where we havemany traces.

7.3.2.1 Single Trace

If we have only one trace ’t’, we proceed as the following:

7.4. RESULTS 45

Algorithm 7.3.2. Templates Probability of Trace ’t’


Pr(ki|t) = p(t|ki).p(ki)∑256l=1(p(t|kl).p(kl))

7.3.2.2 Multiple Traces

In case a group of traces ’T’ was available, we can do the following:

Algorithm 7.3.3. Templates Probability of Group of Traces ’T’


Pr(ki|T ) =(∏D

x=1 p(tx|ki)).p(ki)∑256l=1((

∏Dx=1 p(tx|kl)).p(kl))

7.3.3 Key Finding

As in other attacks, this attack would class key candidates in a table of 256 lines and 16columns. Each column has 256 possibilities dedicated for one key-byte, these possibilitiesare ordered from the highest to the lowest depending on the probability computed using thealgorithm we just saw.

We can either use a brute force to get the key, or in some cases it would already be thecombination of the 16 highest candidates (the highest candidate in each column).

7.4 Results

Unfortunately, the attack wasn’t successful. For this reason, many changes were done tothe algorithm, but we ended up with what might be the problem. Since we didn’t find theproblem in the code, we turned to the tracing. Here we saw that the choice of the keysis probably wrong (misunderstanding the articles), instead of having unattached key-bytesuntouched, we were supposed to always randomize them.

After considering this as the problem, the time was not enough to do the traces all overagain. Still, since the code is an exact implementation of what’s been already tested andworked, we are almost sure that fixing this problem would make the templates work.

Chapter 8

Improvements

After working with the algorithms, and as expected, TMPA take more preparation time thanthe other attacks. CPA takes more working time than DPA since it has more loops and sinceit’s more sophisticated. Still, and after all the algorithmic improvements, the attacks aretaking an important time. Since the loops are already put to a minimum, and since thealgorithms are almost optimized (reduced code and work), the improvements that come tomind is minimizing the work by minimizing the number of points (precision) that need to betreated. In addition, in case of TMPA, this is not just an improvement, this is a necessitysince calculating large matrices such as 1000 ∗ 1000 is too long and mostly impossible.

Another way to reduce the work is to reduce the number of messages that need to betreated. No one is saying that reducing the number of messages and reducing the number ofpoints can’t be done side by side to optimize the attack.

We should not forget, that in some of the improvements, even though we’re gaining a lotof speed, we are loosing some information since points and/or messages are being removed orreplaced or combined. This might make the attack a little less effective, but still a lot faster.And some times if the number of messages is unlimited, we can use more messages and lesspoints to obtain the same result with less time.

8.1 Correct Traces Shifting

During the trace capturing, it’s normal to have a little mal functionning in the tracingmachine, the SASEBO. This can happen from overheating, electricity change, or any otherkind of physical mal function. If this happens, or to be more accurate, when this happens,the traces are kindly shifted to the left or the right. This means that if we want our tracesto be more accurate, we need to correct this shifting.

The best way to do so, is to fix one of the traces to it’s place and correct the shiftingof all the other traces comparing it with the fixed one. The comparison happens as if weare computing the average variance between the fixed trace and the treated one. When thisaverage is minimal, the traces are more alike which means that they are synchronized.

In Figure 8.1.1 we can see how the shifting correction has a small effect on the normalDPA method. Sometimes it’s a positive effect, and in other times it’s a negative effect.

47

48 CHAPTER 8. IMPROVEMENTS

0 1,000 2,000 3,000 4,0000

0.2

0.4

0.6

0.8

Number Of Traces

Effi

cien

cy(b

etw

een

0an

d1)

Not CorrectedCorrected

Figure 8.1.1: DPA Normal Shifting Comparison

In Figure 8.1.2 we can see that the difference while using the DPA PowerL Multipliermethod is less than the one seen for the DPA normal method. The reason might be that thePowerL method is more sophisticated making it less affected by the shifting problem.

0 500 1,0001,5002,0002,5003,0003,5000

0.2

0.4

0.6

0.8

Number Of Traces

Effi

cien

cy(b

etw

een

0an

d1)


Figure 8.1.2: DPA PowerL Multiplier Shifting Comparison

The result in Figure 8.1.3 wasn’t expected. In most of the times, CPA that uses correctedtraces was a little bit less effective than CPA that uses the traces as they are. The resultswere expected to be better after the shifting correction which is not the case. Still, this mightbe by chance and not a general case. To verify so, we need a large amount of statistical resultswhich is far from our reach because of lack of time and resources.

8.2. TRACES FILTERING 49

0 1,000 2,000 3,000 4,000 5,0000

0.2

0.4

0.6

0.8

1

Number Of Traces

Effi

cien

cy(b

etw

een

0an

d1)


Figure 8.1.3: CPA Shifting Comparison

8.2 Traces Filtering

Traces filtering speeds up the work but looses information decreasing the effectiveness.

When we say Filter-x, this means that after computing the average of each trace, and thetraces average, we ignore all traces that have an average between avs−x×sd and avs+x×sdwhere avs is the traces average and sd is the traces standard deviation.

Another filtering idea, based on the same average and standard deviation, but with morefiltering is to have double filtering. Filter-x-y keeps only the traces that have an averagebetween avs− y × sd and avs− x× sd or between avs+ x× sd and avs+ y × sd.

Figure 8.2.1 shows how the Filter affects the normal DPA.

0 1,000 2,000 3,000 4,000 5,0000

0.2

0.4

0.6

0.8

Number Of Traces

Effi

cien

cy(b

etw

een

0an

d1)

No FilterFilter-1

Filter-1.5

Figure 8.2.1: DPA Normal Traces Filter Comparison


Notice in Figure 8.2.2 the method PowerL Multiplier is less affected by the filter thanthe normal method. This is because it’s more sophisticated, and in such case traces filteringis more worthy than traces filtering in case of the normal DPA method, but still not goodenough.

0 1,000 2,000 3,000 4,000 5,0000

0.2

0.4

0.6

0.8

Number Of Traces

Effi

cien

cy(b

etw

een

0an

d1)

No FilterFilter-1

Filter-1.5

Figure 8.2.2: DPA PowerL Multiplier Traces Filter Comparison

In Figure 8.2.3, we see how the CPA is affected by the traces filtering. We shouldn’tforget that the time gain here is remarkable since CPA takes a lot more time than DPA inthe first place.

0 1,000 2,000 3,000 4,000 5,0000

0.2

0.4

0.6

0.8

1

Number Of Traces

Effi

cien

cy(b

etw

een

0an

d1)

No FilterFilter-1

Filter-1.5

Figure 8.2.3: CPA Traces Filter Comparison

After viewing Figures 8.2.1, 8.2.2, and 8.2.3 we conclude that this type of traces filteringisn’t really useful.

8.3. FILTER HIGHEST POINTS 51

8.3 Filter Highest Points

Now let’s forget about minimizing the number of traces, what about minimizing the numberof points in the traces? This also can be very successful in minimizing the time of thealgorithm, still this might make the attack less effective since we’re loosing information.

The idea is to compare points in a trace, and then choose the most extreme ones (themost far from zero). This way we are using points that have the probability of being moreeffective during the algorithm execution than other ones. Also, since we calculate trace’shighest points one trace at a time, this means that we are kindly shifting the traces. Whichmeans that we are either doing the devil’s work in changing good traces to bad ones, or we’rere-shifting the traces in a less sophisticated method than the one in section 8.1.

8.4 Filter Points of Interest

Still on the idea of minimizing the number of points to be treated during the attackingalgorithm. But this time we’ll introduce a sophisticated and harmless method that wouldactually get points that can represent a trace.

When using this filtering method, it’s good to have the traces shifting already corrected.The methods together work really well.

Here the points of interest are chosen using variances calculated from all the traces to-gether. Also, for the points to be presenting the whole trace and not part of it, a limit ischosen so the points are not too close one to another.

We call Filter(x y z) a filter in which x is the number of points chosen using the HighestPoints Filter, y is the number of points chosen using the Points of Interest Filter, and z isthe minimal accepted distance between points in y.

Since x points are chosen, then out of these x points we choose y points, then we wouldend up having y points. The time gain here depends on the filter ratio which is P

y where Pis the number of floating points in the original traces. In all tested methods, we found thatthe time gain is always close to the exact filter ratio, which means that if the original traceshad 1000 points each and y = 100 then using the points filter makes the algorithm 10 timesfaster.

In the following experiments, we have Filter(None y z) which means that the HighestPoints Filter wasn’t used at all. Also, in the following tests, the used traces has alreadypassed the shifting correction phase.

We can get from Figures 8.4.1, and 8.4.2 that the results didn’t change a lot when usingthe highest points. Sometimes the results were a little better and some other times they werea little worse, still this means that the filter is very effective, since it’s using less than 1

10 ofthe points and giving the same results. Using less points means needing less time than before(see Figure 8.4.4). In Figure 8.4.2, the attack was better (in most of the times) when thefiltering is used which means that we didn’t just gain execution speed, but also we gained inthe effectiveness of the attack.

When looking at Figure 8.4.4, we shouldn’t forget that the first key-byte is being attacked,but the preparation is for all the key-bytes. This is important to think about when using thepoints of interest filtering since the preparation phase here takes a lot more time than otherpreparation phases, still the attack is a lot faster!

After doing all these tests, we realize that this type of filtering is great both for speedingup the execution and for better effectiveness.


0 1,000 2,000 3,000 4,000 5,0000

0.2

0.4

0.6

0.8

Number Of Traces

Effi

cien

cy(b

etw

een

0an

d1)

No Filter

Filter(None 100 5)

Filter(None 100 10)

Figure 8.4.1: DPA Normal Points Filter Comparison

0 1,000 2,000 3,000 4,000 5,0000

0.2

0.4

0.6

0.8

Number Of Traces

Effi

cien

cy(b

etw

een

0an

d1)

No Filter

Filter(None 100 5)

Filter(None 100 10)

Figure 8.4.2: DPA PowerL Multiplier Points Filter Comparison

8.4. FILTER POINTS OF INTEREST 53

0 1,000 2,000 3,000 4,000 5,0000

0.2

0.4

0.6

0.8

1

Number Of Traces

Effi

cien

cy(b

etw

een

0an

d1)

No Filter

Filter(None 100 5)

Filter(None 100 10)

Figure 8.4.3: CPA Points Filter Comparison

0 1,000 2,000 3,000 4,000 5,0000

100

200

300

400

500

Number Of Traces

Tim

e(s

econ

ds)

Normal DPA

Normal DPA (PF)Mult DPA Methods

Mult DPA Methods (PF)CPA Pearson

CPA Pearson (PF)

Figure 8.4.4: Points Filter (PF) Timing Comparison

Part III

How to Process

55

Chapter 9

Tracing

In this chapter we willl explain in details how we obtain the needed traces for the attacks.We’ll also speak about some of the difficulties encountered during the tracing phase.

Important Notice: All the information about SASEBO-GII (Configuration, and photos)are taken from the SASEBO-GII quick start guide which belongs to the National Instituteof Advanced Industrial Science and Technology (AIST) and can be found in http://www.

risec.aist.go.jp/project/SASEBO-GII/download_prev/SASEBO-GII_QuickStartGuide_

Ver1.0_English.pdf

9.1 SASEBO-GII

In this section we present the SASEBO-GII which is the device we attacked.

9.1.1 About SASEBO-GII

The SASEBO-GII is an FPGA holder device designed to be easily reprogrammed so encryp-tion algorithms (AES, DES, ...) can be tested. In general, all we have to do is to connect themachine to a computer (see Figure 9.1.1) and use the software that comes with it to launchencryptions and compare the results with encryptions using same key and message on thecomputer itself.

9.1.2 Needed Materials

To configure the SASEBO-GII We need the following materials:

• SASEBO-GII the device itself.

• USB Cable which will provide the SASEBO-GII with power, and with a connectionto the computer.

• A computer with at least one available USB port. The computer has to have windows(XP, Vista, 7, 8) installed for some frameworks to work.

57

http://www.risec.aist.go.jp/project/SASEBO-GII/download_prev/SASEBO-GII_QuickStartGuide_Ver1.0_English.pdf



58 CHAPTER 9. TRACING

Figure 9.1.1: SASEBO-GII Computer Connection

• FPGA Configuration Cable which is used to program the flash ROMs connectedto the FPGAs. Available cables are: Xilinx Platform Cable USB, Platform Cable USBII, and Parallel Cable IV.

To capture the power consumption of the SASEBO-GII additional materials are needed:

• Oscilloscope that works with 50 ω, can be easily used, has a good capturing band-width, sufficient memory space, and enough options to fill our needs. We used LeCroyWavePro 7100A which has a bandwidth of 1.0GHz.

• Passive Probe to be used as a trigger.

• SMA-BNC Cable to monitor the power consumption.

9.1.3 Software

Materials alone are useless, so eventually we need some softwares.

• Microsoft .Net Framework 3.5 that can be found in it’s English version in thefollowing link: http://www.microsoft.com/en-us/download/details.aspx?id=21

• Xilinx ISE WebPACK which is need to configure the SASEBO-GII. The Englishversion is in the following link: http://www.xilinx.com/products/design-tools/

ise-design-suite/ise-webpack.htm

• FTDI D2XX driver and FTD2XX NET DLL for the USB connection to work prop-erly, the files can be found in: http://www.ftdichip.com/Drivers/D2XX.htm

Additional softwares are needed if we want to communicate to the oscilloscope via TCP/IP:

• LCXSDSO IVI driver which is used for VICP and GPIB, and LXI connections. Itcan be directly downloaded from the following links:32-bit: http://teledynelecroy.com/support/softwaredownload/download.aspx?

http://www.microsoft.com/en-us/download/details.aspx?id=21

http://www.xilinx.com/products/design-tools/ise-design-suite/ise-webpack.htm

http://www.xilinx.com/products/design-tools/ise-design-suite/ise-webpack.htm

http://www.ftdichip.com/Drivers/D2XX.htm

http://teledynelecroy.com/support/softwaredownload/download.aspx?did=7514

9.1. SASEBO-GII 59

did=7514

64-bit: http://teledynelecroy.com/support/softwaredownload/download.aspx?

did=7515

• IVI Shared Components in case they didn’t come with the IVI driver installer.

• Latest Teledyne LeCroy VICP Passport which is required by the IVI driver forVICP connections only and can be found in the following link:http://teledynelecroy.com/support/softwaredownload/vicppassport.aspx

• NI-VISA 3.0 or Higher to be able to manage the software’s connection with theoscilloscope. More about VISA, and download links: http://www.ni.com/visa/

• LabWindows / CVI is required and usually automatically installed (if it’s not alreadyinstalled) by the NI-VISA installer. It’s essential for any VISA DLL to work, and itprovides an interface to develop, change, and compile programs (eventually we used itto create the DLL responsible of the VICP connection) More about this software anddownload links are available in the following link: http://www.ni.com/lwcvi/

Having all the softwares prepared, we need one more software which will relate everythingtogether, it’s the SASEBO-GII algorithm checker.

This software was created during RCIS (http://www.rcis.aist.go.jp/) SASEBO-GIIresearch which was terminated. The software is divided into two main programs: SASEBO-GII AES Checker and SASEBO-GII DES Checker.

We are working on AES so we only need the SASEBO-GII AES Checker. The softwarewas provided to us with needed dlls and the original code.

9.1.3.1 SASEBO-GII AES Checker Version 0

Version 0 is the original software that was provided to us. It interacts with the SASEBO-GII by sending it different messages and a key and gets ciphers from it. At the same time,ciphers are computed on the computer so the answers from the SASEBO-GII output can becompared with the computer’s answers.

Figure 9.1.2: SASEBO-GII AES Checker V0





http://teledynelecroy.com/support/softwaredownload/vicppassport.aspx

http://www.ni.com/visa/

http://www.ni.com/lwcvi/

http://www.rcis.aist.go.jp/



The first issue we had with the original software is that it launches the encryptions too fast.This means that the software would call the SASEBO-GII one time to start the connection,then it starts sending messages to be encrypted and receiving ciphers automatically withoutany wait.

Because of this speed, the oscilloscope we have couldn’t capture and register more thanone third of the traces, and there was no way of knowing which traces are captured andwhich are skipped. All we needed was to have a small wait between the sent messages so allthe traces can be captured.

Another problem is that even having the code, some parts of the program were alreadyin dll, so no way of changing them. Therefore the program was changed in a way that itinitiates a connection with the SASEBO-GII for each message to encrypt, then it waits forthe encryption to finish before sending the next message.

With this version of the program we were able to launch the program on around 1000messages at a time.

Another issue was that the random messages being used are not registered, so are theresulting ciphers. Therefore, another change was done so that the program would save allthe messages and corresponding ciphers to separated files.



The program we had in hand would encrypt random messages and register them with theirciphers. But before thinking about TMPA, we needed a way to control the input messages sothat they would be the same for all keys. Therefore, the program was changed again. Thistime we used a file in which we would put a number of messages. The program will now readthis file and encrypt all the messages in it using one key.

For TMPA, we also needed the ability to test numerous keys. So we had the keys alsoregistered in a file so that the program would read one key, encrypt all the messages usingit, then jump to the next key.

9.1. SASEBO-GII 61



1000 messages at a time was enough when working with DPA and CPA, all we had to dois launching the program a few times and we would have thousands of traces. But with theTMPA, we couldn’t do the same, since we had lots of keys to test, and 1000 trace per key.

The solution was to make a connection between the program and the oscilloscope usingTCP/IP. To do so, additional programs were needed as we mentioned in section 9.1.3. Afterinstalling all the required softwares, we also needed to write a small C code and create itslibrary to be used as an intermediate protocol between the SASEBO-GII and the program.

In addition, since our TMPA is based on unknown messages and ciphers, we didn’t needto register the output of the SASEBO-GII (the ciphers). And even if we needed the cipherswe could easily compute them on the computer independently.

Now everything was set and we were able to work on 20 000 messages at a time, whichmeans that each time we launched the program, traces for 20 different keys would be readyin less than 12 hours.



9.1.4 SASEBO-GII Configuration

Before we start using the SASEBO-GII device, and after that we installed the needed soft-wares, we need to configure the device so it can encrypt messages using AES.

9.1.4.1 Hardware

We start with the hardware configuration needed, which means branching and turning onewhat needs to be on. If we take a look at Figure 9.1.6, we can see 3 marked parts of thedevice that we need to work on as the following:

1. JP1: needs to be opened (see Figure 9.1.7a).

2. JP2: Should have a jumper on it (see Figure 9.1.7b).

3. JP1, JP2, JP6: Should have the second and the third buttons on (see Figure 9.1.7c).

Figure 9.1.6: SASEBO-GII Configuration Plan

(a) JP1 (b) JP2

(c) JP1, JP2, JP6

Figure 9.1.7: SASEBO-GII Configuration Steps

9.2. OBTAINING TRACES 63

9.1.4.2 Software

After we finish the hardware configuration, we need to install the AES on the FPGA:

• Reprogram the flash ROM (ST45DB16D, U11) for the control FPGA (Spartan-3A) by:

∗ Attaching the configuration cable to CN7.

∗ Using the provided mcs file SASEBO-GII gii ctrl.mcs.

• Reprogram the flash ROM (ST45DB16D, U4) for the cryptographic FPGA (Virtex-5LX30) by:

∗ Attaching the configuration cable to CN4.

∗ Using the provided mcs file SASEBO-GII aes comp lx30.mcs.

• Cycle the power for the FPGA configuration to take place immediately after the flashROM reprogramming.

9.2 Obtaining Traces

In the section above, we learned the different program versions used to trace the encryptions.We also learned how to configure the SASEBO-GII so it can do these encryptions. Now we’lllearn how to actually benefit from all we did earlier. In the following we’ll show how to startthe tracing and how to see the results.

9.2.1 Oscilloscope Configuration

Unlike the SASEBO-GII configuration, here we don’t care whether we start with the Softwareor the Hardware configuration.

9.2.1.1 Hardware

This time we’re going to look at other parts of the SASEBO-GII (see Figure 9.2.1) andproceed as follows:

1. The trigger signal should be put on pin 1 of J6.

2. The probe of the trigger should be connected to channel 2 of the oscilloscope.

3. The ground wire of the probe should be connected to TP3

4. The power consumption monitoring cable (SMA-BNC 50 ω) should be connected to J2

5. In the oscilloscope side, the monitoring cable should be connected to channel 1.


Figure 9.2.1: SASEBO-GII and Oscilloscope connection plan

9.2.1.2 Software

The following might depend on the case we’re working on, but in general, these are theneeded configuration on the oscilloscope’s side:

• Channel 1:

∗ Vertical Scale: 10 mV/div.

∗ Offset: 1.0 V/div.

• Channel 2:

∗ Vertical Scale: 1.0 V/div.

∗ Offset: 0 v.

Triggering:

∗ Trigger Source: Channel 2.

∗ Triggering Mode: Negative Edge.

9.2.2 Electricity Issues

Since the attacks are based on power consumption, it means it’s not an exact science. Thereason is the way electricity works and how it depends to everything around, like the roomtemperature, the other electronic machines connected on the same network...

The consequence here is that we can try the same attack on the same key and usingthe same messages and end up having different results. Actually on one occasion, eventurning the lights on made the oscilloscope capture a power trace, this trace can be easilydistinguished since it doesn’t look like the other traces at all. Still, these power noises fromwhether they come from the inside (oscilloscope, SASEBO, used computer, ...) or the outside(lamps, other computers, network hub, ...) are hard to control.

9.2. OBTAINING TRACES 65

In our case the oscilloscope uses power controls so it’s not affected by the outside world,but still it’s connected directly to the SASEBO, and the network hub, and these 2 are bothconnected to the used computer. This is probably the reason of traces imperfection.

One way to bypass this problem is to work in a special environment where the externalnoise is the minimum it possibly can be. This might seem extreme, but it can be realizedwith a little more cost.

In the other hand, and for theoretical tests, softwares that simulate a working device andoutputs theoretical power traces do exist. These softwares can be and are used in testingdevices resistance to SCA. Even though the answers aren’t exactly what happens in practice,but at least for a certificate to be given, the device needs to pass these theoretical exams.

9.2.3 Bad Tracing

How can someone do a tracing that is bad? We did so. The DPA we did was to attack thefirst substitution of the AES, so why take the whole trace (100000 floating points)? We couldsimply take a part of the first 1/10 of the full trace, this means a part of the first round, tobe more precise, the first part. With this idea in head we were using around 1000 points outof the 100000 original points, so the attack would be a lot faster.

The problem here was that the trace in Figure 9.2.2 wasn’t actually the full trace. Afterchanging the wideness of the trace in the oscilloscope we found that we actually have whatlooks like 11 rounds (see Figure 9.2.3) while in the first we had only 10 (which made sensein the first place).

Figure 9.2.2: Bad Trace

Figure 9.2.3: Good Trace

The DPA afterwards worked on the 11th round in Figure 9.2.3 (round number 10 inAES), which means that the attack we tried in the beginning was on either the originalround (Round number 0) or a testing round. The idea of the whole work is to realize theattack without knowing how the AES is actually implemented on the SASEBO-GII, so thisinformation remained unclear. Still, we never retried the DPA on the first substitution in AESusing the right portion of the trace, so we’re not sure if the SASEBO-GII leaks informationin this area.

9.2.4 Perfect Tracing

Knowing how bad tracing occurs, we clearly know now how a good tracing is supposed to bedone. Still, something is missing, we’re saving a whole trace of 100000 points and only using


Figure 9.2.4: Perfect Trace

1000. Thanks to the oscilloscope options, we can zoom the trace to the part we want, andthen save only this part (see Figure 9.2.4).

The trace in Figure 9.2.4 is actually showing the power consumption just before, during,and just after the last substitution. So it actually has just the critical information that weneed to realize our attacks.

9.3 Graph Viewer

A good idea when working with waveforms is to have a way of viewing them in a graph. Eventhough the oscilloscope shows us these graphs, we need something to be used anywhere. Inaddition, when working with DPA and CPA, we have a log of the computed values (eventuallypoints), and it would be interesting to have graphical access to these logs as well.

There was already a code in python for this job, but it didn’t meet our exact requirements.The good thing that using this program we can view many traces at the same time whichmeans that we can verify if a shifting occurred to the traces or not. Therefore, with a littlecode changing (so it meets our requirements) we ended up with what we wanted (see Figure9.3.1)

Figure 9.3.1: Full AES-BAD-Trace using Graph Viewer

Chapter 10

Attacking

In this chapter, we are going to briefly explain how to process the attack from the momentwe have the traces to the moment of getting to know the correct key.

10.1 Transforming Obtained Traces

We start by transforming the traces to suit our working process. Say we have the traces inwav format and our application uses traces in binary format, we have to transform the tracesfrom one format to another. In our case, the traces were in txt format and our application,along with the Curve Viewer application use traces in binary format.

While doing this format change, we should consider the precision of the numbers. Arethey floats? Are they double floats?

In addition, if an improvements is to be used on the traces, such as shifting correction ortraces filtering, it’s better to do these improvements during the traces transformation. Andwe should remember that for the shifting correction we don’t care how many traces thereare since they’ll be all corrected according to only one trace (the first trace). While in tracesfiltering, the number of traces is important. This means that if we have 10000 traces, andwe want to test an attack on only 2000 traces, we apply the filter using only 2000 traces. Ifwe don’t do so, we are actually using the rest of the traces to correct our computed traces,which is unacceptable because attacking using 2000 traces considers that we have only andexactly 2000 traces.

In a special case, such as TMPA, we might have additional work to do. First the shiftingcorrection would have more than one base trace. This is because it’s better to correct shiftingaccording to traces that are dedicated to attack only one key-byte. Which means that foreach key-byte we attack, we do the shifting correction according to the first trace for thissame key-byte. Also, it’s better to have a special target format for the traces we’re workingon. For example, we take traces in txt format and we change them to traces that look likethis:

Trace i j k

67

68 CHAPTER 10. ATTACKING

Where i represents the key-byte to which the trace is dedicated to (i is between 1 and16), j is the byte possibility the traces was created with (j is between 0 and 255), and k is thenumber of the trace (j is between 1 and N where N is the chosen number for the profiling,in our case N = 1000).

10.2 Choosing an Attacking Algorithm

Depending on the case we’re working on, the best choice could differ. For example, if we’reworking with a device that doesn’t allow us to have many traces (limited number of encryp-tions), and we have a configured copy of this device in which we can have unlimited number oftracing. In this case, TMPA would be perfect, since we can take our time doing the profilingphase, then use only one real trace to get the correct key.

Another reason to use TMPA would be that TMPA can work even without knowing themessage nor the cipher-text. This means that if the device we’re attacking is somehow hidingthese information, then we are forced to use TMPA. But we shouldn’t forget that this needsthat we copy the device in controlled environment so we can do the profiling phase.

It’s almost rare for DPA to have better results than CPA. But say we have a CPA thatcan break a device in only 5000 traces, while the DPA would break it in 6000 traces. Thismeans that even though the CPA is better, the DPA can be useful in case we want the attackto take less time and we already have enough traces for both the attacks to work. Here wesuppose, what’s usually a valid fact, that DPA would take less time on 6000 traces than whatCPA would take on 5000 traces.

All of this means that the choice of the attacking algorithm is important and is doneaccording to the obtainable resources such as the number of traces, the time, and the typeof access we have to the device (known message, key blinding, ...)

10.3 Exploiting The Result

As we’ve seen before, the attack wouldn’t always give us the full key. Which means that notall the candidates would be each in top of its ’G’ list. Still this doesn’t mean that the attackisn’t successful.

The attack would give us the ordered ’G’ which can be used to brute force the correctkey. The better the attack is, the more likely we end up having the correct key faster. Theless effective attacks would have the correct candidates in bad places in the ordered ’G’ lists.This means that either the brute force will take more time to find them, or it will never findthem because it needs to test too many keys which is impossible because of time complexityor because the device uses a maximum encryptions limit.

It seems like a lot of work to get the key. But we shouldn’t forget that the obtained key issupposed to be really confidential and this could harm entire systems. In addition, in specialcases such as TMPA, the most of the work would be done during the profiling phase. Thismeans that after finishing the profiling phase, the results can be used to attack any devicethat is a copy of the profiled device in a fast way and using only one trace!

10.4 KATA Software

For all the work to be done, a software needs to be coded and dedicated to our needs.Therefore, we developed a Python project under the name of KATA, with many packages

10.4. KATA SOFTWARE 69

and classes combining all the methods and functions from creating the messages to evaluatingthe attack, and giving us many options and features.

Since the software is for personal use and tests, there was no need for an interface. Justa class in which we control all the methods giving them inputs and outputting their answers.Still, and for scalability reasons, the code is prepared in a way it can easily be used from aninterface. Also, in case an interface is launching the attack, there’s the possibility of watchingthe attack progress, launch more than one attack, and stop any attack at any moment.

All the attacks support multi-threading, different outputting methods (log-file, debuggingscreen, interface text-box, and interface alert-box) for different outputs (Important results,detailed results, error messages, ...). Also, during an attack, we can always keep track ofwhat’s happening and what each part of the attack is doing and what’s giving as a result.

In addition, the program outputs the evaluation methods results in txt format and texformat. In tex format we can easily see the evaluations of each attack, while compiling thetex format gives us a tex graph. This was used to generate all the charts included in thisreport.

Since more than one attack was created, and testing algorithms other than AES mightbe needed in the future, The distribution of the packages, classes, and functions is done ina way we can easily add an attack or an algorithm. An easy example of this would be thatDPA took a few days to be fully implemented and tested, and when we wanted to implementCPA, it took us less than 2 hours.

The software includes all the attacking methods viewed in this report, with all the im-provements and evaluation methods. Some of the attacking and evaluation methods are alsoincluded in the software but not the report since their results weren’t specially interesting.

The last part that was added (not yet optimized) to the software was an attack simulatorthat would tell us how much time an attack would take.

Chapter 11

Evaluations

When studying different attacks, we need a way of comparing them. We can compare themaccording to the time they need to finish, and we can do other comparisons based on thenumber of samples needed for an attack to fully succeed. All of this can give us a generalbut insufficient evaluation to the attacks. We need sophisticated evaluation methods thatcan compare attacks in a more efficient and precise way.

But before we start explaining the methods, a little reminder:

’G’ is a list of the 256 possible bytes, for each key-byte ’G’ is computed in a way thatnext to each byte possibility (eventually key-byte candidate) there’s a force value (computedduring the attack). And the list is in descending order using the force value.

In addition, it’s important to say that in all of this chapter, charts were realized overDPA and CPA different methods using the basic traces (no filtering, no shift correcting, notrace treatment what so ever).

11.1 Basic Evaluations

Many basic evaluations were implemented, we choose 3 of them. The first one is about thenumber of correctly found key-bytes. The second one goes further more to the positions ofthe correct candidate in each key-byte organized result ’G’. The third one takes a look of thepossibility that the result is randomly found or not.

11.1.1 Number of Correct

An attack that finds 75% of the key can’t be seen as an equivalent to an attack that findsonly 25% of the same key.

Here all we have to do is to compute the number of correct candidates that were foundon the top position of ’G’. Which means that we start with result equal to zero, then for eachbyte of the key, if the first candidate is the correct one we add 1 to the result. Finally wedivide our result by 16 so the answer would be between zero (no correct parts were found)and 1 (the whole key was found).

71

72 CHAPTER 11. EVALUATIONS

Figure 11.1.1 compares DPA methods and CPA using the number of correct method.

0 1,000 2,000 3,000 4,000 5,0000

0.2

0.4

0.6

0.8

1

Number Of Traces

Effi

cien

cy(b

etw

een

0an

d1)

NormalNormalWithMiddleNormalMultiplierPowerEMultiplierPowerLMultiplier

CPA

Figure 11.1.1: Rating using Number of Correct on different attacking methods

11.1.2 Partial Positioning

A good attack would have the correct candidates in good positions (2, 3, ... out of 256).

For each key-byte we add 1PoC×16 to the result where PoC is the position (between 1 and

256) of the correct candidate in the corresponding ’G’ list.

Figure 11.1.2 compares DPA methods and CPA using the partial positioning method.

0 1,000 2,000 3,000 4,000 5,0000

0.2

0.4

0.6

0.8

1

Number Of Traces

Effi

cien

cy(b

etw

een

0an

d1)


CPA

Figure 11.1.2: Rating using Partial Positioning on different attacking methods

11.1.3 Not Random

Here we suppose that a random result is a result where the correct candidates are in verydifferent positions. Still, if a correct candidate is in the top position, it’s considered as a none

11.2. CREATING PROBABILITIES 73

random partial result. For the other parts where the correct candidate is not in the first po-sition, we compute the average of these positions and for each of these cases we add to theresult 1

float(position−average)power+1 where power is a chosen positive value (in our case it was 1).

Figure 11.1.3 compares different attacking methods from DPA and CPA using the eval-uation based on whether the result is random or not. Since with more traces we have morechance of correct candidates, and bad candidates are less in number and they are gettingcloser to the top position and to each other. Then we can expect that with more traces theresult is more precise and farther from being random.

0 1,000 2,000 3,000 4,000 5,0000

0.2

0.4

0.6

0.8

1

Number Of Traces

Effi

cien

cy(b

etw

een

0an

d1)


CPA

Figure 11.1.3: Rating using Not Random on different attacking methods

11.2 Creating Probabilities

The evaluations that were used till now don’t consider the values given for each candidate,they just consider the rating. Therefore we need evaluations that are more sophisticated thatand would consider these values.

But before we do so, we have a problem. During template attacks, the values are actuallyprobabilities because of the way they were computed. But in DPA and CPA the computedvalues given to each candidate are the differential and the correlation.

For better evaluation using values, we need to transform these values into probabilities.A sophisticated method can be found in [VCGRS13]. For our matter, all we want is to haveprobabilities based on the actual values.

We begin with easily thinking in a probabilistic way, each value needs to be divided bythe sum of the values. But we want the sum of 25616 key possibilities to be equal to one. Tosimplify the problem, consider we have to recover a key of 6 bits (3 words of 2 bits):


x1 y1 z1

x2 y2 z2

x3 y3 z3

x4 y4 z4

The table we just showed represents the values given to each part-key candidate, we have43 key possibilities. If we consider the values as probabilities, then the probability of a keyis the multiplication of values in its key-words. So the sum of all the probabilities that weneed to be equal to 1 is: ∑

xi × yj × zk for i, j, k in [1, 4]

Which is equal to:

4∑i=1

xi ×4∑i=1

yi ×4∑i=1

zi = 1 (1)

So we need this value to be equal to 1. But one might look the other way and say thatwe shouldn’t care about this value, the important is that the sum of probabilities for allpossibilities in each key-word column should be equal to 1. Which means that:

4∑i=1

xi = 1 &4∑i=1

yi = 1 &4∑i=1

zi = 1 (2)

Making (1) true is not simple, while making (2) true is very simple and if (2) is true then(1) is sure to be true 1× 1× 1 = 1

This can be done in the same way to 25616, all we have to do is loop over the 16 key-bytesand for each one we compute the sum of the values in the column then we divide these valuesby the computed average. So briefly we need a loop over 16 and inside it two loops over 256:16× (256× 2) = 8192 operations.

11.3 Advanced Evaluations

Many evaluation methods that don’t only use the positions of the candidates in ’G’, but alsothe values associated were implemented. First we use the probability generator that we justsaw, this way the values make more reason and are more comparable. Then we use one ofthe evaluation methods. The most important between the ones that we improvised was anevaluation method based on our confidence in the result of the attack.

11.3.1 Confidence

Here we study the confidence we have in the effectiveness of the attacking method. Forexample, during debugging, when we see a great difference between the candidate that is inthe first position of ’G’ and less difference between the second and the third, ... we thinkthat it’s probably the correct candidate.

11.3. ADVANCED EVALUATIONS 75

Now all we have to do is to think as the following: If the first candidate is correct and ithas a good value that is far from the other candidate’s values, this will raise our confidence,and if it’s correct with bad value (too close to the other values) our confidence decreases. Atthe same time if it’s incorrect and the values are too close, we have more confidence eventhough this means that the attack isn’t working perfectly but still we can know that. Andfinally if the candidate is wrong and it has a value far from others this will decrease ourconfidence, since in this case a bad candidate is imitating a correct one.

Speaking about a value being far from a group of values gets the eyes directly on theProbability Density Function (PDF). All we have to do is to compute the PDF for the firstcandidate in each ’G’ depending on the other values in ’G’. Then depending on if the candidateis actually correct or not and on the PDF value we create a probability in a convenient way:

- zero: Bad candidates with good values and if good candidates exist they have badvalues.

- 1: Good candidates with good values and if bad candidates exist they have bad values).

Note that we don’t necessarily need to use the first candidate with the group of 255other candidates. In our case we used the first candidate with a group formed of the 29candidates that followed it in the ’G’ list. This way we’re comparing our candidate to agroup of candidates that is possibly correct. The choice of the number of candidates in thegroup itself isn’t an exact science, but choosing a small number gives smaller PDF valuesand choosing all the 255 candidates give PDF a greater number (since the candidate wouldbe compared to values too far from it).

More information about the PDF in [Par62].

Figure 11.3.1 compares different attacking methods from DPA and CPA using the confi-dence evaluation method. We can see that at first the confidence is acceptable, this is becausethe attack is giving bad candidates but at least they have bad values. Then the confidencecomes down and hits the lowest for almost all the methods when we use 2000 traces, this isbecause having this number of traces would give us around half the correct keys but it wouldgive them with low PDF values. Afterwards the graph for all the methods start raising again(some times with small falls). This happens because in most cases the good candidate willhave a greater probability when using more traces.

1,000 2,000 3,000 4,000 5,000

0.2

0.3

0.4

0.5

0.6

0.7

Number Of Traces

Effi

cien

cy(b

etw

een

0an

d1)


CPA

Figure 11.3.1: Rating using Confidence on different attacking methods


11.4 Real Evaluations

Here we’ll talk about methods that rates the result in consideration how much useful it isfor the next step. The next step is a brute force attack using the result, so how much doesthe result help in reducing the time of the brute force attack and making it more possible.

We base on the idea that an attacker, after retrieving the 16 ’G’ lists, will not only tryone key formed of the first candidate in each ’G’. But he also can use these results to do anartery brute force attack based on the probability of each candidate.

We tested some methods that give an exact rating of a brute force attack that would usea tree search algorithm with ”first best key-byte”. The problem is that even that this kindof brute force can sometimes have less work to do than the full brute force, in most of thecases it will take more tries to do so.

Now we focus on the full brute force. The attacker is supposed to try all key possibilitiesthat have more probability than the correct key and never try those with less probability.

For each key, the probability is the multiplication of all it’s key-bytes probabilities. Theproblem is that we have 25616 possibilities and we need the best way to find how many arebetter than the correct one. This is because an attacker that needs to try 100 keys can easilyget the key. An attacker that needs 10000 tries would succeed unless the device’s securityincludes a limit to the number of tries. And finally an attacker with 2100 tries to do wouldnever possibly succeed.

11.4.1 Land Owner

If we think about all possible keys as a population in which the correct key lives. If wesuppose a general idea of a landscape where each one of the population has it’s own space.Then we can say that the area between 2 keys can give an image of the population densityin it.

With this in mind, we compute the probability of the best key (all candidates in position1), the probability of the worst key (all candidates in position 256), and the same for thecorrect key. We call Db the difference between the best key and the correct one, and Dw thedifference between the correct key and the worst one. Now if we divide Db by (Db+Dw) wewould get an approximation of the probability of having a key better than the correct one.The evaluation would be 1 − the approximation since the less keys better than the correctone the better the results are considered.

Figure 11.4.1 compares different attacking methods from DPA and CPA using the landowner evaluation method. As we can see, for all the attacking methods, and in most cases,the more traces we have the better the evaluation is. This is normal because with moretraces, even if we don’t get the correct key to be the first one, we are getting its key-bytescloser to the top of their ’G’ lists.

11.4.2 Estimating Brute Force

According to [VCGRS13] to compute the exact number of keys with probability better thanthe correct key takes almost as much as a real brute force attack. Which is in many casesimpossible, and for the statistics and comparisons we need to estimate bad results and notonly good results.

We can now think about the way statistics are realized. For example, in 2007 a statistic’sresult was that 92% of the French consume cheese at least once a week. But the people who

11.4. REAL EVALUATIONS 77

0 1,000 2,000 3,000 4,000 5,0000

0.2

0.4

0.6

0.8

1

Number Of Traces

Effi

cien

cy(b

etw

een

0an

d1)


CPA

Figure 11.4.1: Rating using land owner evaluation on different attacking methods

did the statistics didn’t actually ask all the French (around 62 millions), they took a sampleand used it to make the statistics.

Out of 25616 keys, we can take a sample of a thousand or 10 thousands, knowing thebigger the sample is the more chance we have for the statistic to be closer to reality. At thesame time we need the number of tries to be doable in acceptable time. We chose to take105 samples each time. The sample are chosen in a total random way, all we have to do is foreach sample we need its 16 partial keys. Therefore, for each sample we randomly choose 16positions and we multiply the probabilities of candidates in these positions (one candidate,one position per key-byte).

The problem is that in a simple implementation, most of the evaluations would have thevalue 1. This means that all the attacks for any number of traces are perfect, which is nottrue at all. The problem is that even though a lot of keys have better probabilities than thecorrect one, most of the keys don’t. In addition, we’re choosing random key-bytes, and thismeans that each time we’ll have a random key with a probability that is too low since it hasparts of it that are too close to zero.

To solve this problem, we can choose the random candidates to always be a little higherin the ’G’ lists. This manipulation might seem unjust, but at the same time thinking ofa brute force that will go all the way in the lists is not true. The brute force algorithmwould have limits since it would consider that the result it’s basing on has the candidates ingood positions. At the same time if the positions are too far, a brute force (even with nolimitations) would never test them even if they are better than the correct key, because abrute force based on such a bad result would never succeed in retrieving the key (because oftime limitation).

What can be better than that is choosing the random candidates depending on whereother random candidates are chosen. For example, we can put a rule that the sum of thepositions of all random candidates needs to be less than a constant variable. This way we’reforcing candidates to be more probable, which means that the population we’re using tocompare to the key is a population of ’not bad’ keys.


11.4.3 Computing Brute Force

As we saw, none of the algorithms proposed can get the exact number of keys with probabilitybetter than the correct key. At least not in the case where some of the candidates are in badpositions. For example the method proposed in [VCGRS13] took 221 hours and 70 Gb ofmemory on an Intel core i7 920 to get as far as 240 possibilities.

This doesn’t mean that finding a new method is impossible. During implementations, wefound that we can use some divisions and take out some obvious parts of the enumeratedkeys. For example, we are sure that all keys that don’t have any candidate worst thanit’s corresponding correct candidate have better probability than the correct key. So if wemultiply the positions of the correct key and we subtract 1, we end up with a number of keysthat are sure to be better than the correct one.

Having this idea in head, and thinking about a recursive way so it can be used on dividedparts of the ’G’ lists, a realistic (fast and exact) method might be found. Because of timelimits, we didn’t work on this idea.

Conclusion

After 6 months of reading, researching, implementing, and testing, the idea of SCA becameclear. This was the most important objective and it was realized. In addition to the attacksthat we tested, many other attacks were studied such as TA, FA, ...

Review : As we’ve seen in DPA, small differences in the algorithms caused a huge differ-ence in the answers while in CPA both Pearson’s and Spearman’s gave the same exact results.Here we can think of the DPA as practical comparisons, while CPA is a group of theoreticalcorrelation methods. On the other hand, and even though TMPA weren’t successful, we canexpect them to always give the same answers (using the same real trace(s)), even when thetraces used to generate the templates are different (but for the same keys and messages).This is because TMPA uses a large scale of traces minimizing the effect of noise and makinganswers more precise.

Evaluations took important part of our work and time. A final exact evaluation basedon the rank of key’s probability doesn’t exist yet but we gave an idea that might be usefulto do so using divide and conquer in a recursive way.

It’s important to be able to evaluate the attacks, especially in the cases were the devicehas good countermeasures making the attacks less effective. In such cases, the attack mightstill work by giving the correct key-bytes candidate good probabilities (but not the best).Here the difference between a good attack and a bad one is that the result of a good attackcan be used to brute force the key using limited number of tries and taking reasonable time,while a bad attack would give results that can never be used to make a successful brute force.

Finally, a full understanding is realized in a way that we understand how to make a fullSCA from the tracing phase to the brute force phase. This is important before studyingcomplicated SCA such as the Horizontal Correlation Analysis during which the attackerneeds only one trace to successfully retrieve secret information.

79

Bibliography

[AK96] Ross Anderson and Markus Kuhn. Tamper resistance-a cautionary note. InProceedings of the second Usenix workshop on electronic commerce, volume 2,pages 1–11, 1996.

[APSQ06] Cedric Archambeau, Eric Peeters, F-X Standaert, and J-J Quisquater. Tem-plate attacks in principal subspaces. In Cryptographic Hardware and EmbeddedSystems-CHES 2006, pages 1–14. Springer, 2006.

[BCO04] Eric Brier, Christophe Clavier, and Francis Olivier. Correlation power analysiswith a leakage model. In Cryptographic Hardware and Embedded Systems-CHES2004, pages 16–29. Springer, 2004.

[BDL97] Dan Boneh, Richard A DeMillo, and Richard J Lipton. On the importance ofchecking cryptographic protocols for faults. In Advances in CryptologyEURO-CRYPT97, pages 37–51. Springer, 1997.

[BR96] Mihir Bellare and Phillip Rogaway. The exact security of digital signatures-how to sign with rsa and rabin. In Advances in CryptologyEurocrypt96, pages399–416. Springer, 1996.

[BS97] Eli Biham and Adi Shamir. Differential fault analysis of secret key cryptosys-tems. In Advances in CryptologyCRYPTO’97, pages 513–525. Springer, 1997.

[BW94] Manuel Blum and Hal Wasserman. Program result-checking: A theory of testingmeets a test of theory. In Foundations of Computer Science, 1994 Proceedings.,35th Annual Symposium on, pages 382–392. IEEE, 1994.

[CCD00] Christophe Clavier, Jean-Sbastien Coron, and Nora Dabbous. Differential poweranalysis in the presence of hardware countermeasures. In etinK. Ko and ChristofPaar, editors, Cryptographic Hardware and Embedded Systems CHES 2000,volume 1965 of Lecture Notes in Computer Science, pages 252–263. SpringerBerlin Heidelberg, 2000. ISBN 978-3-540-41455-1. http://dx.doi.org/10.

1007/3-540-44499-8_20.

[CMCJ04] Benoıt Chevallier-Mames, Mathieu Ciet, and Marc Joye. Low-cost solutionsfor preventing simple side-channel analysis: Side-channel atomicity. Computers,IEEE Transactions on, 53(6):760–768, 2004.

[CWWZ10] Shuo Chen, Rui Wang, XiaoFeng Wang, and Kehuan Zhang. Side-channel leaksin web applications: A reality today, a challenge tomorrow. In Security andPrivacy (SP), 2010 IEEE Symposium on, pages 191–206. IEEE, 2010.

81

http://dx.doi.org/10.1007/3-540-44499-8_20

http://dx.doi.org/10.1007/3-540-44499-8_20

[DH76] Whitfield Diffie and Martin E. Hellman. New directions in cryptography. IEEETransactions on Information Theory, 22(6):644–654, November 1976.

[DPS96] Cunsheng Ding, Dingyi Pei, and Arto Salomaa. Chinese remainder theorem.World Scientific, 1996.

[FB13a] Fadi OBEID Firas Bejaoui, Mohammed El-Barbori. Differential fault analysison the des. Limoges University, Faculty of Sciences, 2013.

[FB13b] Fadi OBEID Firas Bejaoui, Mohammed El-Barbori. Power-analysis attack onan asic aes implementation. Limoges University, Faculty of Sciences, 2013.

[FGY96] Yair Frankel, Peter Gemmell, and Moti Yung. Witness-based cryptographicprogram checking and robust function sharing. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pages 499–508. ACM,1996.

[FIP99] PUB FIPS. 46-3: Data encryption standard (des). National Institute of Stan-dards and Technology, 25(10), 1999.

[FIP01] PUB FIPS. 197, advanced encryption standard (aes). National Institute ofStandards and Technology, 2001.

[GQ88] Louis C Guillou and Jean-Jacques Quisquater. A practical zero-knowledge proto-col fitted to security microprocessor minimizing both transmission and memory.In Advances in CryptologyEurocrypt88, pages 123–128. Springer, 1988.

[HTM09] Neil Hanley, Michael Tunstall, and William P Marnane. Unknown plaintexttemplate attacks. In Information Security Applications, pages 148–162. Springer,2009.

[KJJ] Paul Kocher, Joshua Jaffe, and Benjamin Jun. Introduction to differential poweranalysis and related attacks, 1998.

[KJJ99] Paul Kocher, Joshua Jaffe, and Benjamin Jun. Differential power analysis. InAdvances in CryptologyCRYPTO99, pages 388–397. Springer, 1999.

[Kno88] Knobloch. A smart card implementation of the fiat-shamir identification scheme.In EUROCRYPT: Advances in Cryptology: Proceedings of EUROCRYPT, 1988.

[Koc96] Paul C Kocher. Timing attacks on implementations of diffie-hellman, rsa, dss,and other systems. In Advances in CryptologyCRYPTO96, pages 104–113.Springer, 1996.

[MB06] Andreas Merkel and Frank Bellosa. Balancing power consumption in multipro-cessor systems. In ACM SIGOPS Operating Systems Review, volume 40, pages403–414. ACM, 2006.

[MDS99a] Thomas S Messerges, Ezzy A Dabbish, and Robert H Sloan. Investigationsof power analysis attacks on smartcards. In USENIX workshop on SmartcardTechnology, volume 1999, 1999.

82

[MDS99b] Thomas S Messerges, Ezzy A Dabbish, and Robert H Sloan. Power analysisattacks of modular exponentiation in smartcards. In Cryptographic Hardwareand Embedded Systems, pages 144–157. Springer, 1999.

[Mes00] Thomas S. Messerges. Power analysis attacks and countermeasures for crypto-graphic algorithms. PhD thesis, Chicago, IL, USA, 2000. AAI9978665.

[Mon85] Peter L Montgomery. Modular multiplication without trial division. Mathemat-ics of computation, 44(170):519–521, 1985.

[MOP07] Stefan Mangard, Elisabeth Oswald, and Thomas Popp. Power analysis attacks:Revealing the secrets of smart cards, volume 31. Springer, 2007.

[OGOP04] Siddika Berna Ors, Frank Gurkaynak, Elisabeth Oswald, and Bart Preneel.Power-analysis attack on an asic aes implementation. In Information Tech-nology: Coding and Computing, 2004. Proceedings. ITCC 2004. InternationalConference on, volume 2, pages 546–552. IEEE, 2004.

[oST00] National Institute of Standards and Technology. FIPS PUB 186-2: Digital Sig-nature Standard (DSS). National Institute for Standards and Technology, pub-NIST:adr, January 2000. http://www.itl.nist.gov/fipspubs/fip186-2.

pdf.

[Par62] Emanuel Parzen. On estimation of a probability density function and mode.The annals of mathematical statistics, 33(3):1065–1076, 1962.

[RSA78] Ronald L Rivest, Adi Shamir, and Len Adleman. A method for obtaining digitalsignatures and public-key cryptosystems. Communications of the ACM, 21(2):120–126, 1978.

[Sha97] Adi Shamir. How to check modular exponentiation. rump session of EURO-CRYPT, 97, 1997.

[Sma00] Nigel P Smart. Physical side-channel attacks on cryptographic systems. SoftwareFocus, 1(2):6–13, 2000.

[Spa06] Ljiljana Spadavecchia. A network-based asynchronous architecture for crypto-graphic devices. 2006.

[Sta94] William Stallings. SHA: the Secure Hash Algorithm. Dr. Dobb’s Journal ofSoftware Tools, 19(4):32, 34, April 1994.

[VCGRS13] Nicolas Veyrat-Charvillon, Benoıt Gerard, Mathieu Renauld, and Francois-Xavier Standaert. An optimal key enumeration algorithm and its applicationto side-channel attacks. In Selected Areas in Cryptography, pages 390–406.Springer, 2013.

[VEN] Alexandre VENELLI. Contributiona la securite physique des cryptosystemesembarques.

[Ver12] Vincent Verneuil. Cryptographie a base de courbes elliptiques et securite decomposants embarques. PhD thesis, Universite de Bordeaux, 2012.

83

http://www.itl.nist.gov/fipspubs/fip186-2.pdf

http://www.itl.nist.gov/fipspubs/fip186-2.pdf

[WIK13] Pentium fdiv bug, March 7, 2013. http://en.wikipedia.org/wiki/Pentium_FDIV_bug.

84

http://en.wikipedia.org/wiki/Pentium_FDIV_bug

http://en.wikipedia.org/wiki/Pentium_FDIV_bug

Development of Side Channel Attacking Tools on Embedded Components

Documents

Transcript of Development of Side Channel Attacking Tools on Embedded Components