Detecting Software Theft in Embedded Systems: A Side ... · Detecting Software Theft in Embedded...

1

Detecting Software Theft in Embedded Systems: ASide-Channel Approach

Georg T. Becker, Daehyun Strobel, Christof Paar, Fellow, IEEE,, Wayne Burleson, Fellow, IEEE

Abstract—Source code plagiarism has become a se-rious problem for the industry. Although there existmany software solutions for comparing source codes,they are often not practical in the embedded envi-ronment. Today’s microcontrollers have frequently im-plemented a memory read protection that prevents averifier from reading out the necessary source code. Inthis paper, we present three verification methods todetect software plagiarism in embedded software with-out knowing the implemented source code. All threeapproaches make use of side-channel information thatis obtained during the execution of the suspicious code.The first method is passive, i.e., no previous modifica-tion of the original code is required. It determines theHamming weights of the executed instructions of thesuspicious device and uses string matching algorithmsfor comparisons with a reference implementation. Incontrast, the second method inserts additional codefragments as a watermark that can be identified in thepower consumption of the executed source code. As athird method, we present how this watermark can beextended by using a signature that serves as a proof-of-ownership. We show that particularly the last two ap-proaches are very robust against code-transformationattacks.

Index Terms—Side-channel analysis, software water-marking, IP protection, embedded systems.

I. IntroductionSoftware plagiarism and piracy is a serious problem

which is estimated to cost the software industry billions ofdollars per year [6]. Software piracy for desktop computershas gained most of the attention in the past. However,software plagiarism and software piracy is also a hugeproblem for companies working with embedded systems.In this paper, we focus on the unique challenge to detectsoftware plagiarism and piracy in embedded systems. If adesigner suspects that his code has been used in an embed-ded system, it is quite complicated for her to determinewhether or not her suspicion is true. Usually she needs tohave access to the program code of the suspected deviceto be able to compare the code with the original code.However, program memory protection mechanisms thatprevent unauthorized read access to the program memory

Copyright (c) 2012 IEEE. Personal use of this material is permit-ted. However, permission to use this material for any other purposesmust be obtained from the IEEE by sending a request to [email protected].

G.T. Becker, C. Paar and W. Burleson are with the Department ofElectrical and Computer Engineering, University of Massachusetts,Amherst, USA. E-mail: {becker, burleson}@ecs.umass.edu.

D. Strobel and C. Paar are with the Horst Görtz Institutefor IT-Security, Ruhr University Bochum, Germany. E-mail: {dae-hyun.strobel, christof.paar}@rub.de.

are frequently used in today’s microcontrollers. Hence, togain access to the program code of an embedded systemthese protection mechanisms would have to be defeatedfirst. This makes testing embedded devices towards soft-ware plagiarism very difficult, especially if it needs tobe done in an automated way. Furthermore, the personwho is illegally using unlicensed code might apply code-transformation techniques to hide the fact that he is usingsomeone else’s code. In this case detecting the softwaretheft is hard even if the program code is known.

Software watermarks [23] enable a verifier to test andproof the ownership of a piece of software code. Previ-ously proposed watermarks usually require access to theprogram code [11] or the data memory [22], [7] during theexecution of the program to detect the watermark. But asmentioned before, access to the memory and the programcode is usually restricted in the embedded environment.Most software watermarks proposed in the literature areused for higher programming languages such as Java andC++. But in embedded applications many programs arewritten in C or assembly. Furthermore, many previouslyproposed software watermarks have shown to be not veryrobust to code-transformations [11].

In this paper, we introduce new and very efficientmethods to detect software theft in embedded systems. Todetermine whether or not an embedded system is using ourcode, we use side-channel information that the device leaksout while the suspected code is running on the device. Inthis way, we avoid the need to gain access to the programor data memory. The basic idea is to measure the powerconsumption of the device under test and use these mea-surements to decide whether or not a piece of unlicensedcode is running on the device. We present three differentlevels of protection. The first method is used to detectsoftware thefts that are copies of the original software codewith only minor changes. From the power measurementswe are able to derive the Hamming weight of each executedassembler instruction. Using string matching algorithmswe can then efficiently detect whether a suspected code isequal or similar to the reference code. The great advantageof this method is that no changes to the original code needto be done to be able to detect the code, i.e., no watermarkneeds to be added. Hence, this method does not introduceany kind of overhead and can be applied to any existingcode.

By using code-transformations (see Sect. V-B) an at-tacker might be able to modify the code in a way thatit is still functioning correctly but cannot be detectedusing the above mentioned method. For higher protection

2

against these types of attacks we therefore propose a side-channel software watermark. The side-channel softwarewatermark consists of only a few instructions that areinserted at the assembly level. These instructions arethen leaking information in the power consumption of thetested design and can be detected by a trusted verifierwith methods very similar to classic side-channel attacks.These watermarks provide higher robustness against code-transformation attacks while introducing only a very smalloverhead in terms of performance and code size.

The previous two methods to detect software plagiarismcan be used by a verifier to detect whether or not hiscode is present in an embedded system. However, thesemethods can not be used to prove towards a third partythat the code or the watermark belongs to the verifier. Theoriginal side-channel software watermark only transmitsone bit of information – either the watermark is presentor not – but it does not contain any information about theowner of the watermark. We therefore extended the side-channel watermark so that the watermark can transmit adigital signature. The verifier can use this digital signatureto prove towards a third party, e.g., a judge, that he isthe owner of the watermark. In most cases this will implythat he is also the owner of part of the software code thatcontained this watermark.

The rest of the paper is organized as follows. The nextsection explains in detail how the detection based onHamming weights work. In Sect. III the software side-channel watermark is introduced. How this watermarkidea can be extended to provide proof-of-ownership ispresented in Sect. IV. We will then discuss the robustnessof the proposed side-channel software watermark in Sect.V. We conclude the paper with a short summary of theachieved results in the last section.

II. Plagiarism Detection with String MatchingAlgorithms

Our first approach is intended to provide an indicationof software plagiarism in the case where the insertion ofcopy detection methods inside the code is not available,e.g., for already existing products.

Let us begin with a definition of the Hamming weightof a number D as this is a crucial point for this ap-proach. Assume that D is given by its m-bit representationD = dm−1 . . . d1d0. The Hamming weight HW (D) is thendefined by the number of 1’s, i.e.,

HW (D) =m−1∑

i

di.

We now propose an efficient method that makes use of thefact that the power consumption of the microcontroller isrelated to the Hamming weight of the prefetched opcode.This dependency allows us to map the power consumptionof every clock cycle, which may consist of thousands ofsampling points, to only one Hamming weight. The resultis a high dimensionality reduction of the power traces.For subsequent analyses, string matching algorithms are

applied for comparisons between original and suspicioussoftware executions.

Our target device is the 8-bit microcontroller AtmelAVR ATmega8. The Atmega8 is an extremly widely usedmicrocontroller which is based on the RISC design strat-egy. As most of the RISC microcontrollers, the ATmega8uses a Harvard architecture, i.e., program and data mem-ory are physically separated and do not share one bus.The main advantage of this architecture is a pipeliningconcept that allows a simultaneous opcode prefetch duringan instruction execution (see Fig. 1).

Fig. 1. Pipelining concept of the ATmega8 [2]. While one instructionis executed, another one is prefetched from the program memory.

The opcodes of the ATmega8 have a 16-bit format andconsist of the bit representation of the instruction andcorresponding operands or literals.1 The instruction AND,e.g., performs a logical AND between the two registers rand d, 0 ≤ r, d ≥ 31, and is given by the opcode

0010 00r4d4 d3d2d1d0 r3r2r1r0,

where d4 . . . d0 and r4 . . . r0 are the binary representationsof d and r, respectively.A closer look at the power consumption reveals that

the prefetching mechanism is leaking the Hamming weightof the opcode.2 Figure 2 shows the power consumptionof random instructions with different Hamming weightsof the prefetched opcode recorded at a clock rate of1Mhz. Right before the rising to the second peak, theHamming weights of the prefetched opcodes are clearlydistinguishable from each other.

Our prediction was verified by the analysis of an AESimplementation on the ATmega8. The histogram in Fig.3 depicts the distribution of the power consumption atthe point of leakage using about 6 million instructions ofmultiple AES encryptions. The intersections between thepeaks were taken as threshold values when mapping thevoltages to Hamming weights. In this way the power tracesare converted to strings of Hamming weights which meansthat every clock cycle is represented by only one value. Inour tests, the probability of a correct mapping to the Ham-ming weight was close to 100% and we were able to reducethe size of one power trace from approximately 3 750 000

1The only exceptions are four (of 130) instructions that have accessto the program or data memory. These instructions can be describedeither by 16 or 32-bit opcodes. 32-bit opcodes are fetched in two clockcycles.

2We focused on the ATmega8 here. However, we observed the sameleakage behavior for other microcontrollers of the AVR family byAtmel and PIC family by Microchip. Both families use the Harvardarchitecture with separated buses for data and instruction memory.

3

Fig. 2. Power consumption of instructions with different Hammingweights. There exist no 16-bit instruction with a Hamming weight of16.

−125 −120 −115 −110 −105 −100 −95 −90 −850

2

4

6

8

10

12

14

16

x 104

Voltage [mV]

Fre

quen

cy

Hamming weight:15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Fig. 3. Histogram of the power consumptions at the moment ofleakage in every clock cycle. For this purpose, 6 000 000 instructionsof multiple AES executions were recorded.

measurement points to a string of only 3000 Hammingweights. Due to this reduction, the computational effort foranalyzing the strings stays at a low level, i.e., the stringmatching algorithms discussed in the following typicallydo not last longer than a few seconds on a modern PC.

There is one instruction that does not comply with ourleakage model. The instruction LPM Rd,Z loads one byte ofthe program memory to the register Rd and does not leakthe Hamming weight of the fetched opcode. Dependingon the implemented algorithm, this should be consideredduring the detection process.

A. DetectionThe Hamming weight strings can now be used to in-

dicate software plagiarism. The idea can be summarizedin three steps: First, the execution flow of the originalsoftware is mapped to a sting s of Hamming weights.This is realized either by measuring and evaluating thepower consumption of the execution as stated above or bysimulating the execution flow and calculating the Ham-ming weights from the opcodes. Note that, e.g., branchinstructions need more than one clock cycle and hence,during the execution more than one opcode is prefetchedfrom the program memory.

The second step is to record a power trace of the suspi-cious device to obtain also a string of Hamming weights,denoted as t. A one-to-one copy can now be detected in athird step by comparing the two strings s and t.In fact, a one-to-one copy is an idealized assumption of

software plagiarism and is therefore easy to detect. In thefollowing we discuss how minor modifications can also bedetected by the verifier.1) Register substitution: The easiest modification that

influences the Hamming weights is to substitute theworking registers. There are 32 general purpose workingregisters on the ATmega8, which means that one canchoose between six different Hamming weights (registerR0→ 00000 to register R31→ 11111). However, a substi-tution of a register affects the Hamming weight of everyinstruction that processes the data of the changed register.For instance, the short assembler code

LDI R16,0x34 ;loads the constant directlyLDI R17,0x53 ;to the given registerADD R16,R17 ;adds the two registers and

;places the result in R16

results to the string s = (6, 8, 5) (cf. [1]). If the attackersubstitutes, e.g., R16 with R23, this affects line 1 and 3 andleads to t = (9, 8, 8). To consider register substitutions, theverification process can now be extended by a generalizedstring s′ that considers the dependencies between theregisters in different instructions. For instance, the stringof our short example could be changed to s′ = (6 + x, 7 +y, 2 + x + y), where 6, 7 and 2 are the minimum Hammingweights of the given instructions excluding the registersand x and y are variables for the Hamming weights of theregisters. The verifier has now to check if there exist x andy with 0 ≤ x, y ≤ 5 such that s′ and t match. Of course,a long string with much dependencies is more meaningfulthan a short string as given in our example.2) Partial copy: Instead of scanning the whole source

code, it might make sense to focus only on the importantparts of the program to detect partial copies of the originalcode. In this case, approximate string matching algorithmslike the generalized Boyer-Moore-Horspool algorithm [19]are well-suited to find all positions where these patternsoccur in the string t. The difference to an exact stringmatching algorithm is that the generalized Boyer-Moore-Horspool algorithm considers a predefined number k ofmismatches. This is necessary due to infrequent erroneousmappings of power consumptions to Hamming weights orto consider the usage of LPM instructions. The generalizedBoyer-Moore-Horspool algorithm solves the k mismatchesproblem in O(nk( k

|Σ| + 1m−k )) on average, with n being

the length of the string, m the length of the pattern andΣ the alphabet. Additionally, a preprocessing is executedonce for every pattern in O(m + k|Σ|).3) Insert, delete or substitute instructions: A metric

that is often used for approximate string matching is theLevenshtein distance of two strings [12], also known as editdistance. It was introduced by V. Levenshtein in 1965 andis defined as the minimum number of necessary edits to

4

Fig. 4. Computation of the Levenshtein distance between the twostrings s = 1, 2, 6, 6 and t = 5, 2, 6, 5, 6. The gray boxes illustrate thetrace that is followed to get to the highlighted overall distance 2.

transform one string into another, in our case s into t.Possible edits are:

insertion – add one character of t to the string s,deletion – remove one character of the string s,substitution – replace a character of s by a character

of t.

Let us denote the length of s as m and the length of t as n.Then the Levenshtein distance is determined by creatingan (m + 1)× (n + 1) matrix C with the following steps:

i. Initialize the first row and column:

Ci,0 = i for all 0 ≤ i ≤ m,

C0,j = j for all 1 ≤ j ≤ n.

ii. For all 1 ≤ i ≤ m and 1 ≤ j ≤ n set

Ci,j = Ci−1,j−1

if no edit is necessary, i.e., si = tj with si andtj denoting the i-th character of s and the j-thcharacter of t, respectively, and

Ci,j = min

Ci−1,j−1 + 1 (substitution)Ci,j−1 + 1 (insertion)Ci−1,j + 1 (deletion)

else.The overall distance between s and t is the value storedin Cm,n. An example is given in Fig. 4. Since every cellof the matrix C has to be computed, the computationalcomplexity is bound by O(mn).With this method the verifier is able to measure the

degree of modification or simply use the distance as anindication for plagiarism.4) Handling unpredictable branches or interrupts: In

some cases, it is impossible to acquire measurements ofthe suspicious implementation with the same input data asthe reference implementation. Another scenario is that theunderlying algorithm is non-deterministic regarding theorder of executed subroutines due to random parameters.The result might be that for every measurement a differentpath of subroutines is executed. While scanning the wholestring after a partial match, e.g., a subroutine, could bestill successful, comparing the whole strings seems to bemore complicated.

Fig. 5. Dot plot of the two strings s = 1, 5, 2, 8, 6 and t =1, 5, 2, 6, 8, 2. Every character match is highlighted by a gray box.

One technique that can help us here is called dot plotand comes from the field of bioinformatics where it ismostly applied on DNA sequences to illustrate similarities[10]. In [21], Yankov et al. adopted this technique to ana-lyze two time series after converting them into a symbolicrepresentation. The idea is very simple: to compare twostrings s with length m and t with length n, an m × n-matrix is generated where position (i, j) is marked if si

and tj match. A small example is given in Fig. 5.Two interesting observations can be made from this plot.

First, equal substrings are represented as diagonal lines. Inour example the substring 1, 5, 2 appears in both strings.The second observation is that a mirrored substring ismarked as a diagonal line which is perpendicular to themain diagonal. This is the case for 2, 8, 6 and 6, 8, 2.We applied this technique to two Hamming weight

strings of different software implementations with bothcontaining two equal subroutines. A small detail is de-picted in Fig. 6(a).

HW string 1

HW

str

ing

2

100 200 300 400

100

200

300

400

(a)

HW string 1H

W s

trin

g 2

100 200 300 400

100

200

300

400

(b)

Fig. 6. Comparing two HW strings with dot plots (a) before and(b) after removing noise. The diagonal lines illustrate the matchingsbetween the strings.

As there is much noise in this plot, we removed everymatching that is shorter than four characters. The resultis given in Fig. 6(b), where at least two diagonal lines aresuspicious. The first, starting at about character 60 in HWstring 2 and the second, starting at about 240. Anotherinteresting point is that the second line is interrupted. Thisis due to the insertion of some dummy instructions duringthe subroutine of the second implementation which shouldsimulate an interrupt service routine.

The methods proposed in this section provide only anindication of plagiarism. To make a more precise state-

5

ment, the software code has to be modified in advance, aswe will discuss in the following sections.

III. Side-Channel Software WatermarkThe main idea behind the side-channel software wa-

termark that we first introduced in [3] is to use thepower consumption as a hidden communication channelto transmit a watermark. This idea is related to the side-channel-based hardware watermark proposed in [4]. Thewatermark is hidden in the power consumption of thetested system and can only be revealed by a verifier whopossesses the watermark secret. The watermark consistsof two components• combination function• leakage generator

and is realized by adding instructions at the assembly levelto the targeted code. The combination function uses someknown internal state of the program and a watermarkconstant to compute a one-bit output. This output bit isthen leaked out (transmitted) via the power consumptionusing a leakage generator. The leakage generator is real-ized by one or several instructions whose power dissipationdepends on the value of the combination function. Thisresults in a power consumption that depends on theknown internal state, the combination function and thewatermark constant. To detect the watermark, the verifiercan use his knowledge of the watermark design to performa correlation power analysis (CPA), similar to a classicside-channel attack [13]. If this power analysis is successful,i.e., the watermark signal is detected in the power traces,the verifier can be sure that his watermark is embeddedin the device. In many applications this will imply that acopy of the original software is present.

In the following, we will first describe the details of ourproof-of-concept implementation before we explain howthis watermark can be detected by means of a side-channelanalysis.

A. ImplementationTo show the feasibility of our approach we implemented

a side-channel watermark on the ATmega8 microcon-troller. As mentioned, the watermark consists of a com-bination function and a leakage generator. In our casethe input to the combination function is a 16-bit internalstate and a 16-bit watermark constant. The combina-tion function computes a one-bit output which is leakedout using a leakage generator. There are many ways toimplement a combination function and the combinationfunction introduced in this paper should only be seenas an example. In our implementation we chose a verysmall and compact combination function that only consistsof four assembler instructions. The 16-bit input to thecombination function is separated into two byte values in1and in2. In the first step, in1 and in2 are each subtractedfrom the one-byte watermark constants, c1 and c2. In thenext step the two one-byte results of these subtractions aremultiplied with each other. The resulting two byte value

r from this multiplication is again separated into two one-byte values r0 and r1. r0 and r1 are then multiplied witheach other again.

r = (in1 − c1) · (in2 − c2) = r0||r1

res = r0 · r1

The output of the combination function is the 8th leastsignificant bit of the result res of this multiplication. Thecorresponding assembler code can be found below.

subi in1,35 ;subtracts constant from in1subi in2,202 ;subtracts constant from in2mul in1,in2 ;multiplies in1 and in2 and

;stores the result in R0 and R1mul R0,R1 ;multiplies R0 and R1

The registers in1 and in2 are used as the internal statesand the integers 35 and 202 are the two watermark con-stants c1 and c2. The instruction subi in1,35 subtractsthe constant 35 from in1 and stores the result back in in1.In the ATmega8 instruction set the two-byte result of themultiply instruction mul is always stored in R0 and R1. Theresult of the combination function is the most significantbit in R0.

A leakage generator is used to leak out the output bitof the combination function. In our implementation weused a conditional jump as our leakage generator. If theoutput bit is 0, we compute the two’s complement ofthe register R0, otherwise no operation is executed. Wefurthermore store the result of R0 in the memory. Belowis the corresponding assembler code:

SBRC R0,7 ;skip next instruction if;bit 7 in register R0 is 0

neg R0 ;compute 2’s complement of R0st Z,R0 ;store R0 in the RAM

Recall that the output of the combination function is themost significant bit of R0. SBRC R0,7 checks the mostsignificant bit of R0 and skips the next instruction if thisbit is 0. Otherwise the neg instruction is executed, whichcomputes the two’s complement of R0. In the last step, thevalue of R0 is stored in the memory.The leakage generator helps the verifier to detect the wa-

termark. The difference in the power consumption betweenthe case that the neg instruction is executed in comparisonto the case the instruction is skipped is very large. Thismakes detecting the watermark using side-channel analysisstraightforward.

We were also able to successfully detect the watermarkwithout any leakage generator. This is due to the fact thatthe power consumption of the last multiply instructionis higher if the output bit is 1 compared to 0. However,the leakage generator makes detection much easier andcan also protect against reverse-engineering and code-transformation attacks (see Sect. V). In the next sub-section we will give more details on how the watermarkdetection works.

6

B. Watermark VerificationTo detect the watermark we use a correlation power

analysis (CPA). The main idea of a correlation poweranalysis is to exploit the fact that the power consumptionof a device depends on the executed algorithm as wellas on the processed data. However, this data dependentpower consumption might be too small to be observedwith a simple power analysis as it has been used inSection II. Therefore, a lot of traces are measured in aCPA and then statistical methods are used to extract thewanted information. In a classical CPA setting the goalis to retrieve a secret key. In the watermark case, theverifier does not want to retrieve a secret key but wantsto verify whether or not his watermark is present. To dothis, the verifier first collects power traces with differentinputs of the system under test. For each trace the verifiercomputes the known internal state that is used as theinput to the combination function (in our implementationthis was the value of register in1 and in2). The internalstate used for the watermark should be a varying statethat is predictable for the verifier, e.g., a state dependingon the input values. The verifier uses this internal stateand the watermark constants to compute the output ofthe used combination function for each input value andstores these values as the correct hypothesis. He repeatsthis procedure n − 1 times by using different watermarkconstants or combination functions. At the end the verifierhas n hypotheses with n different watermark constantsor combination functions, where one of the hypothesescontains the correct watermark constant and combinationfunction.

In the last step the verifier correlates the hypotheseswith the measured power traces. If the watermark isembedded in the tested device, a correlation peak shouldbe visible for the hypothesis with the correct watermarkconstant. This is due to the fact that the correct hypothesisis the best prediction of the power consumption duringthe execution of the leakage circuit. The reference [5]gives a more detailed introduction to this side-channelanalysis method. The result of the CPA on our exampleimplementation can be found in Figs. 7 and 8. In Fig. 8we can see that detecting the watermark is possible withless than 100 measurements.

(a) (b)

Fig. 7. The result of the side-channel analysis plotted (a) againsttime and (b) with respect to different hypotheses, where hypothesisnumber 100 is the correct one.

Other microcontrollers will have a different power be-

Fig. 8. The results of the side-channel analysis with respect tonumber of measurements. Even with less than hundred measurementsthe correct hypothesis can be clearly detected.

havior and therefore the number of traces needed to detectthe watermark might vary from CPU to CPU. It should benoted that a few hundred traces are easily obtained frommost practical embedded systems, and it is reasonable toassume that a verifier can use much more measurementsif needed. Hence, even if the signal-to-noise ratio mightdecrease for other microcontrollers, it is safe to assumethat detection of this kind of watermark will in mostcases be possible. It should also be noted that the lengthof the code that is being watermarked does not have animpact on the signal-to-noise ratio of the detection. Thenumber of instructions that are executed before or afterthe watermark does not make any difference in this typeof side-channel analysis.

C. Triggering

One important aspect of the software watermark de-tection is the triggering and alignment of the the powermeasurements. To take a power measurement we need tohave a trigger signal that will indicate the oscilloscopeto start the measurement. In practice, a communicationsignal or the power-up signal is usually used as the triggersignal for the oscilloscope. Other possible trigger pointsmight be an unusual low or high power consumption, e.g.,when data is written to a non-volatile memory position,the microcontroller wakes up from a sleep mode or a co-processor is activated. Modern oscilloscopes have advancedtriggering mechanisms where several trigger conditionscan be used at once, e.g., a specific signal from the busfollowed by an unusual high power consumption, etc.Because these trigger points might not be close to theinserted watermark, the verifier might need to locate andalign the watermark in a large power trace. With someknowledge of the underlying design, it is usually possibleto guess a time window in which the watermarked code isexecuted. Looking for power patterns that, e.g., are causedby a large number of memory lookups or the activation ofa coprocessor can also help to identify the time windowwhere the watermark is executed. Once this time windowis located, alignment mechanisms such as simple patternmatching algorithms [13] or more advanced methods suchas [16], [20] are used to align each power trace with eachother.

7

Basically, the problem of triggering and alignment ofthe power trace for a side-channel watermark is similarto the problem of triggering and alignment of the powertraces in a real-world side-channel attack. Often in a real-world side-channel attack the attacker has actually lessknowledge of the attacked system than the verifier hasfor detecting the watermark. The verifier knows the codeand flow of his watermarked program while in a real-world attack the attacker can often only guess how it isimplemented. We would therefore like to refer to the areaof real-world side-channel attacks for more details on thefeasibility of triggering a measurement in practice [9], [14],[15], [17]. The alignment of power traces will be addressedin more detail in Section V-B when we describe how toovercome the insertion of random delays as one of thepossible attacks on our watermark.

IV. Proof-of-OwnershipThe watermark discussed in the previous section only

transmits one bit of information: either the watermark ispresent or not. This is very helpful to detect whether ornot your code was used in an embedded system. However,it is not possible for the verifier to prove towards a thirdparty that he is the legitimate owner of the watermark.This is due to the fact that the watermark itself doesnot contain any information about the party who insertedthe watermark. Therefore, everyone could claim to be theowner of the watermark once he detects a watermark in asystem. We call the goal to be able to prove towards a thirdparty that you are the legitimate owner of the watermarkproof-of-ownership. In this section, we will show how wecan expand the watermark idea from Sect. III to alsoprovide proof-of-ownership.

The idea to establish proof-of-ownership is to modifythe side-channel watermark in a way that the watermarktransmits a digital signature that can uniquely identify theowner of the watermark. One nice property of the side-channel watermark is that the watermark is hidden in thenoise of the power consumption of the system. Withoutthe knowledge of the watermark secret, the presence ofthe watermark cannot be detected. So we have already es-tablished a hidden communication channel. In Fig. 7(a) wecan observe a positive correlation peak while the leakagecircuit is being executed. Recall that the leakage circuitis designed in such a way that the power consumptionis higher when the output of the combination function is‘1’. If we change the leakage generator in a way that thepower consumption is higher when the output bit is ‘0’instead of ‘1’, then the correlation will be inverted. Thatmeans that we would not see a positive, but a negativecorrelation peak. We can use this property to transmitinformation. If the bit we want to transmit is ‘0’, weinvert the output bit of the combination function. If itis ‘1’, we do not invert the output bit. By doing so, weknow that ‘1’ is being transmitted when we see a positivecorrelation peak and ‘0’ when we see a negative correlationpeak. We can use this method to transmit data one bitat a time. We tested this kind of watermark by using

the same combination function as discussed in Sect. IIIbut exchanged the leakage generator. We stored an 80-bitsignature that we want to transmit with our watermark inthe program memory. We then load the signature, one byteat a time, from the program memory and subsequentlyadd the output bit of the combination function to the bitwe want to transmit. The resulting bit is then leaked outusing a conditional jump, just as it has been done in Sec.III-A.

The same detection method as explained in Sect. III-Bis used to detect the watermark and to read out thetransmitted signature. Figure 9 shows the result of thiscorrelation based power analysis. The positive and nega-tive correlation peaks over time represent the transmittedsignature. The resulting watermark is still quite small. Inour example implementation the watermark consists ofonly 15 assembler instruction for the leakage generatorand only 4 instructions for the combination function. Wealso used 80 bits of the program memory to store thedigital signature. If storing the signature in the programmemory is too suspicious, it is also possible to implementthe leakage generator without a load instruction by usingconstants. This might increase the code-size a bit, butit is still possible to program the leakage generator witharound 30 instructions for a 80 bit signature on an 8-bitmicrocontroller. In a 16-bit or 32-bit architecture, smallercode sizes can be achieved.

Fig. 9. Side-channel watermark that transmits an ID. Positivecorrelation peaks indicate that a ‘1’ is being transmitted, negativecorrelation peaks indicate a ‘0’. In this Figure we can see how thehexadecimal string “E926CFFD” is being transmitted.

V. Robustness and Security Analysis of theSoftware Watermark

In the previous section we have introduced our side-channel watermarks and showed that we are able to re-liably detect the watermarks. However, so far we have nottalked about the security of the watermark. Traditionally,the security of watermarks towards different attacks iscalled robustness. To the best of our knowledge there doesnot exist a completely robust software watermark thatcan withstand all known attacks. For software watermarks,and especially side-channel watermarks, it is very difficultto quantify the robustness of the watermark. We do not

8

claim that our watermark is “completely robust” or secure— given sufficient effort the side-channel software water-mark can be removed. In the following, we will introduceour security model and describe some possible attacksagainst the system. We will provide arguments why theseattacks can be non-trivial in practice. Hence, we will showthat the watermark—although not impossible to remove— still represents a significant obstacle for attackers.

In the security model of the software watermark threeparties are involved: The owner of the watermark whoinserted the watermark, the verifier who locates the wa-termark in a suspected device and an attacker who triesto remove the watermark from a software code. Theattacker has only access to the assembler code of thewatermarked program. The attacker does not know thedesign of the combination function as well as what part ofthe assembler code implements this combination functionand which internal states or constants are being used inthis combination function. This knowledge is consideredthe watermark secret. The verifier needs to be a trustedthird party who shares the watermark secret with theowner of the watermark. A successful attack is defined asfollows:A transformation of the watermarked software code that

(1) will make it impossible for the verifier to locate thewatermark with means of side-channel analysis and (2)does not change the functionality of the software program.

Hence, an attacker was unsuccessful if either the ver-ifier is still able to detect the software watermark orthe resulting software code does not fulfill the intendedpurpose of the program any longer. We will discuss threedifferent attack approaches to remove the watermark fromthe assembler code:

Reverse-engineering attack: In a reverse-engineeringattack the attacker tries to locate the assembler in-structions that implement the watermark using reverse-engineering techniques so that he can remove or alter theseinstructions.

Code-transformation attacks: In a code-transformation attack, the attacker uses automatedcode-transformations to change the original assemblercode in a way that the resulting code is still functioningcorrect but the watermark detection is impossible.

Side-channel attacks: In a side-channel attack theattacker tries to use side-channel techniques to locate theside-channel signal in the power consumption. This givesthe attacker the knowledge of the location of some of thewatermark instructions (e.g., the leakage generator).

In the following we discuss each of the three attacks inmore detail.

A. Reverse-Engineering AttackIf the attacker can reverse-engineer the entire code and

identify the purpose of each instruction and function, theattacker also knows which instructions are not directlyneeded by the program and which are therefore possi-ble watermark instructions. However, complete reverse-engineering of the assembler code can be very difficult and

time consuming, especially in larger programs. Further-more, complete reverse-engineering might be more expen-sive than actually implementing it, making product piracynot cost effective if reverse-engineering is needed. Anattacker can try to locate the watermark without reverse-engineering the entire code. For example, the attackercould use techniques such as data-flow diagrams to detectsuspicious code segments which he can then investigatefurther. The complexity of such attacks depends on theattackers reverse-engineer skills as well as on the way thewatermark is embedded in the code.

We believe that due to the small size of the water-marks, locating the watermarks with methods of reverse-engineering can be very expensive for the attacker. Espe-cially in larger designs, which are usually more attractivefor software theft, this can be very difficult. Anotherattractive property of the side-channel watermarks is thatthey are hidden in the power consumption of the system.This means that an attacker cannot tell whether or not awatermark is present in the code. So even if she locatesand removes one or several side-channel watermarks froma code, she cannot be sure if there are not still morewatermarks present in the code. Considering the small sizeof only 5-10 assembly instructions for some watermarks,adding multiple watermarks is still very economical. Thismay discourage attackers from stealing the code as reverse-engineering the entire code is necessary to ensure that allwatermarks have been removed.

B. Code-Transformation AttacksIn an automated code-transformation attack, a software

is used to change the program code without changingthe semantically correct execution of the program. Ex-amples for code-transformations are recompiling the code,reordering of instructions and obfuscation techniques suchas replacing one instruction with one or more instruc-tions that have the same result. Code-transformationscan be a very powerful attack tool for disabling softwarewatermarks as has been shown in [11], where all testedstatic software watermarks for Java bytecodes have beensuccessfully removed with standard obfuscation tools.

Let us first consider the impact of reordering of in-structions and the insertion of dummy instructions onour side-channel watermark. If these methods are usedby the attacker, they can have the effect that the leakagegenerator is executed in a different clock cycle comparedto the original code. For the detection this means thatthe correlation peak will be at a different clock cycle.However, the correlation peak will be as visible as withoutthe reordering as inserting a static delay does not decreasethe signal-to-noise ratio. Therefore, simple reordering andthe insertion of dummy instructions cannot prevent theverifier from detecting the watermark.

However, if the attacker does not add a static buta random delay this will have a negative impact onthe watermark detection. Random delays have the effectthat the measurement traces are not aligned with each

9

other, i.e., the clock cycle where the leakage generatoris executed varies from measurement to measurement.Unaligned traces hamper side-channel analysis but thedetection can still be successful if enough traces are alignedwith each other [13]. It is not always easy to insert efficientrandom delays into the code, e.g., a source of randomnessis needed and simply measuring the execution time ofa program might give indication of the random delayintroduced. Furthermore, the verifier can use alignmentmethods to detect such misalignment and remove therandom delays. By using these alignment methods theverifier has a good chance to counteract the randomdelays, especially if the delays are inserted several clockcycles before the leakage generator. Due to the fact thatthe attacker does not know the location of the leakagegenerator this is very likely. Otherwise the attacker wouldneed to insert a lot of random delays which will hurt theperformance.

To show the power of alignment techniques to counter-act random delays we changed our experiment by insertingrandom delays. In our first approach we added our ownrandom delays by using a timer interrupt that wouldpseudo-randomly trigger every 1-128 clock cycles. We useda S-Box to generate our pseudo-random numbers and anexternally generated 8-bit random number as its initializa-tion for each measurement. These random interrupts didnot provide much of an obstacle and with a simple patternmatching algorithm [13] we could detect the watermark.The result of the CPA is shown in Fig. 10(a). To make ourexperiment more credible we also implemented the randomdelay based side-channel countermeasure presented at [8]to our watermarked AES. This countermeasure, calledimproved floating mean, inserts random delays at fixedpositions but with a varying length. The initial state of theused PRNG is an externally generated 64-bit number3. Weagain used a pattern-matching alignment algorithm andpeak extraction before performing our CPA. The resultof this analysis is shown in Fig. 10(b). The correlationcoefficient decreased for this analysis compared to theoriginal watermarked AES implementation from around0.85 to 0.5 but this correlation value is still very large.These results aim to show that it is not simple to insertrandom delays to defeat the watermark detection. Withmore improved alignment methods (e.g. [16], [20]) betterresults could probably be achieved. Furthermore, in [18] itwas demonstrated that the random delays can be removedwithout a big decrease in the correlation coefficient withmethods similar to the ones described in Section II.

By replacing instructions, an attacker might change thepower profile of a code. For example, instead of using thedecrement instruction DEC to decrease a register value, thesubtract with constant instruction SUBI could be used.

3We used the parameters provided in [8] for our implementationof improved floating mean with three dummy rounds before theencryption and inserted the watermark in the main AES encryptionfunction. 29 random delays, each varying between 24-536 clockcycles in steps of two, are executed before the first execution of thewatermark.

600 700 800 900 1000−0.2

0

0.2

0.4

0.6

Clock Cycle

Cor

rela

tion

Coe

ffici

ent

Correlation vs. Time

(a) (b)

Fig. 10. Figure showing a CPA with 1.000 measurements todetect the watermark in two implementations with random-delaycountermeasures. In both figures peak extraction and a pattern basedalignment method were used. In (a) random delays are introduced bypseudo-randomly triggering a timer interrupt every 1-128 clock cy-cles. In (b) the side-channel countermeasure improved floating meanwas added to the watermarked AES. In both cases the watermark isclearly detectable.

These instructions have a different power profile. However,even if the attacker can change the power profile of thecode significantly, this does not impact the CPA. Thepower consumption of the clock cycles before or after thewatermark do not have impact on the CPA correlation.As long as there is a difference in power consumptionaccording to the output of the combination function thisdifference can be used to detect the watermark using aCPA. For example, just the transmission of the outputof the combination function over the internal bus usuallyleaks enough data-dependent power consumption to beused in a side-channel analysis, regardless of the actualinstruction that is being executed.4

A code-transformation that removes the output of thecombination function on the other hand would be suc-cessful. But every code-transformation algorithm needs tomake sure that the resulting code does not change thesemantically correct execution of the program. Therefore,it needs to be ensured that for the compiler or code-transformation algorithm the watermark value is consid-ered needed. This can be done by storing the outputor using it in some other way. In this case any code-transformations attacks will be unsuccessful as removingthe watermark value would destroy the semantical correctexecution of the program from the view of a compiler.

Adding additional side-channel watermarks to a pro-gram can be seen as a code-transformation as well. Aside-channel watermark should not change the state of theprogram that is being watermarked to ensure that the wa-termark does not cause software failures. Hence, additionalwatermarks will not change the combination function of apreviously inserted watermark. They might only introducesome either static or data-dependent delays. For thisreason, it is possible to add multiple watermarks into adesign without interference problems.

4This assumes that the power consumption of the internal bus iscorrelated with the hamming weight of the transmitted bits, whichis usually the case for microcontrollers[13].

10

C. Side-Channel Attacks

If the attacker can successfully detect the watermarkusing a side-channel analysis, the attacker also gains theknowledge of the exact clock cycles where watermarkinstructions (e.g., the leakage generators) are executed. Inthis case the attacker only needs to remove or alter theseinstructions to make the watermark detection impossible.Therefore, the watermark should only be detectable by thelegitimate verifier who possesses the watermark secret.

The attacker can try to discover the watermark secretby performing a brute-force side-channel analysis in whichhe tries every possible watermark secret. But the attackerfaces two problems with this approach: the big searchspace of possible watermarks and false-positives. The sizeof the search space of possible watermark secrets dependsstrongly on the application, the size of the watermark, andthe architecture of the microcontroller. The applicationthat is being watermarked determines how many internalstates can be used as inputs to the combination functionand the size of the watermark determines how manyoperations the combination function performs. Finally, thenumber of available instructions and functions that can beused for the watermark also influences the search space.

In the following, we give a rough estimation of a possiblesearch space for our example application of an AES-128 en-cryption on the 8-bit ATmega8 microcontroller. The AESencryption program has two 16-byte inputs, the plaintextand the key, and one 16-byte output, the ciphertext. Letus assume that the designer of the watermark can use the16-byte input for his internal state of the watermark. Forsimplicity we assume that the combination function con-sists of 10 basic instructions using the internal states andtwo 8-bit watermark constants as inputs. Furthermore weassume that only the six ATmega8 instructions addition,subtraction, AND, OR, exclusive-OR, and multiplicationcan be used. Using these parameters the lower bound ofpossible different combination functions is roughly 275.

Besides the large search space an attacker has also toface the problem of false-positives. If an attacker tries275 different watermark secrets it is likely that the at-tacker will see some correlation peaks that are not due tothe actual watermark. One reason for a correlation peakmight simply be noise as statistically some hypotheseswill generate greater correlation peaks than others. Suchpeaks are usually called ghost peaks in the literature.These correlation peaks should be smaller than the actualcorrelation peak due to the watermark if enough tracesare used. However, if the attacker has not discovered thewatermark yet he does not know how high the correlationpeak of the watermark is supposed to be and mighttherefore falsely suspect wrong parts of the design to bethe watermark. The second reason why a false-positivemight appear is that some part of the actual programthat is being watermarked might be linearly related to apossible watermark. In a brute-force approach all possibleoperations on the internal states are tested. Therefore,it is more than likely that one or several of the tested

combination functions are identical or linearly related toparts of the actual program. In this case, correlationpeaks appear that will indicate a possible watermark ata location where there is no watermark embedded.

To summarize, detecting the watermark using side-channel analysis can be quite complicated for the at-tacker. Using small (and possible multiple) watermarkswill increase the problem of false-positives while the searchspace becomes too big in practice if larger watermarksecrets (more operations and/or more possible internalstates) are used. In our opinion, it seems that usinga reverse-engineering or code-transformation approach ismore promising to remove the watermark in practice.

VI. ConclusionIn this work we introduced three very efficient and cost-

effective ways to detect software plagiarism and piracy inembedded systems. The biggest advantage of these meth-ods are that they do not require access to the suspiciousprogram code. This property is very useful for embeddedsystems as the access to the program code is usuallyrestricted by program memory protection mechanisms.Hence, our methods enable a verifier to efficiently testmany embedded systems towards software plagiarism bysimply measuring the power consumption of the devicesunder test. In our first approach, we achieve this byderiving the Hamming weight of the running opcode fromthe taken power measurements. This can be done withoutthe need to modify the program code, i.e., no watermarkneeds to be inserted. The Hamming weight method candetected one-to-one copies of software code very easily.We also showed that this detection mechanism can still besuccessful if the attacker makes some changes to the pro-gram code. For this case we showed how string matchingalgorithms can be applied to the Hamming weight methodto determine how much the tested code is similar to thereference code. If the code is very similar to the referencecode, this is a very good indicator for software plagiarism.

For an increased robustness and easier detection wehave introduced the side-channel software watermarks.These watermarks can be inserted at the assembly leveland introduce only a very small overhead in terms ofcode size and runtime. Like the Hamming weight method,these watermarks can be detected by simply measuringthe power consumption of the device under test. Practicalexperiments showed that only as few as hundred traceswere needed to clearly detect the watermark. We have alsodiscussed why these watermarks are very robust to code-transformations and other attacks. Furthermore, the side-channel software watermark can be extended to transmit adigital signature. Such a signature can be used in court toproof the legitimate ownership of the watermark. Hence,this watermark can not only detect the software plagia-rism, but can also be used to provide proof-of-ownership.

References[1] ATMEL. 8-bit AVR instruction set. Revision 0856I-AVR-07/10.

11

[2] ATMEL. ATmega8 datasheet: 8-bit AVR with 8k bytes in-system programmable flash. Revision 2486Z-AVR-02/11.

[3] G. T. Becker, W. Burleson, and C. Paar. Side-channel water-marks for embedded software. In 9th IEEE NEWCAS Confer-ence, June 2011.

[4] G. T. Becker, M. Kasper, A. Moradi, and C. Paar. Side-channelbased watermarks for integrated circuits. In IEEE InternationalSymposium on Hardware-Oriented Security and Trust (HOST),pages 30–35, June 2010.

[5] E. Brier, C. Clavier, and F. Olivier. Correlation power analysiswith a leakage model. In Cryptographic Hardware and EmbeddedSystems (CHES), pages 16–29, 2004.

[6] Business Software Alliance. Eighth annual BSA global softwarepiracy study. Washington, DC, May 2011.

[7] C. S. Collberg, A. Huntwork, E. Carter, G. Townsend, andM. Stepp. More on graph theoretic software watermarks:Implementation, analysis, and attacks. Inf. Softw. Technol.,51:56–67, January 2009.

[8] J.-S. Coron and I. Kizhvatov. Analysis and improvement of therandom delay countermeasure of CHES 2009. In CryptographicHardware and Embedded Systems (CHES), pages 95–109, 2010.

[9] T. Eisenbarth, T. Kasper, A. Moradi, C. Paar, M. Salmasizadeh,and M. T. M. Shalmani. On the power of power analysis inthe real world: A complete break of the KeeLoq code hoppingscheme. In CRYPTO, pages 203–220, 2008.

[10] A. Gibbs and G. McIntyre. The diagram, a method for compar-ing sequences. Its use with amino acid and nucleotide sequences.Eur J Biochem, 16(1):1–11, 1970.

[11] J. Hamilton and S. Danicic. An evaluation of static Javabytecode watermarking. In Lecture Notes in Engineering andComputer Science: Proceedings of The World Congress on Engi-neering and Computer Science 2010, pages 1–8, San Francisco,USA, Oct. 2010.

[12] V. Levenshtein. Binary codes capable of correcting deletions,insertions, and reversals. Soviet Physics Doklady, 10(8):707–710, 1966.

[13] S. Mangard, E. Oswald, and T. Popp. Power Analysis Attacks- Revealing the Secrets of Smart Cards. Springer, 2007.

[14] A. Moradi, A. Barenghi, T. Kasper, and C. Paar. On the vul-nerability of FPGA bitstream encryption against power analysisattacks: Extracting keys from Xilinx Virtex-II FPGAs. In ACMConference on Computer and Communications Security, pages111–124, 2011.

[15] A. Moradi, M. Kasper, and C. Paar. On the portability of side-channel attacks - an analysis of the Xilinx Virtex 4 and Virtex5 bitstream encryption mechanism. IACR Cryptology ePrintArchive, 2011:391, 2011.

[16] R. Muijrers, J. G. J. van Woudenberg, and L. Batina. RAM:Rapid alignment method. In CARDIS, Lecture Notes in Com-puter Science. Springer, 2011.

[17] D. Oswald and C. Paar. Breaking Mifare DESFire MF3ICD40:Power analysis and templates in the real world. In CryptographicHardware and Embedded Systems (CHES), pages 207–222, 2011.

[18] D. Strobel and C. Paar. An efficient method for eliminatingrandom delays in power traces of embedded software. In Inter-national Conference on Information Security and Cryptololgy(ICISC), 2011.

[19] J. Tarhio and E. Ukkonen. Approximate Boyer-Moore stringmatching. SIAM J. Comput., 22:243–260, April 1993.

[20] J. G. J. van Woudenberg, M. F. Witteman, and B. Bakker.Improving differential power analysis by elastic alignment. InCT-RSA, pages 104–119, 2011.

[21] D. Yankov, E. J. Keogh, S. Lonardi, and A. W.-C. Fu. Dotplots for time series analysis. In International Conference onTools with Artificial Intelligence (ICTAI), pages 159–168. IEEEComputer Society, 2005.

[22] W. Zhu and C. Thomborson. Algorithms to watermark softwarethrough register allocation. In Digital Rights Management.Technologies, Issues, Challenges and Systems, volume 3919 ofLecture Notes in Computer Science, pages 180–191. Springer,2006.

[23] W. Zhu, C. Thomborson, and F.-Y. Wang. A survey of softwarewatermarking. In Intelligence and Security Informatics, volume3495 of Lecture Notes in Computer Science, pages 454–458.Springer, 2005.

Georg T. Becker is a PhD student in Electri-cal and Computer Engineering at the Univer-sity of Massachusetts, Amherst. His primaryresearch interest is hardware security with aspecial focus on side-channel analysis, IP pro-tection and hardware Trojans. He received aBS degree in Applied Computer Science anda MS degree in IT-Security from the Ruhr-University of Bochum in 2007 and 2009 respec-tively.

Daehyun Strobel received a BS degree inApplied Computer Science in 2006 and a MSdegree in IT-Security in 2009, both from theUniversity in Bochum, Germany. He is cur-rently working toward the PhD degree at theChair for Embedded Security at the Univer-sity of Bochum. His research interests includepractical side-channel attacks on embeddedsystems and side-channel reverse-engineeringof embedded software.

Christof Paar has the Chair for EmbeddedSecurity at the University of Bochum, Ger-many, and is affilated professor at the Univer-sity of Massachusetts Amherst. He co-founded,with Cetin Koc, the CHES workshop series.His research interests include highly efficientsoftware and hardware realizations of cryptog-raphy, physical security, penetration of real-world systems and cryptanalytical hardware.He has over 150 peer-reviewed publicationsand is co-author of the textbook Understand-

ing Cryptography. Christof is co-founder of escrypt - EmbeddedSecurity, a leading consultancy in applied security. He is Fellow ofthe IEEE.

Wayne Burleson is Professor of Electricaland Computer Engineering at UMass Amherstwhere he has been on the faculty since 1990.He has degrees from MIT and the Universityof Colorado as well as substantial industry ex-perience in Silicon Valley. He currently directsand conducts research in Security Engineering.Although his primary focus is Hardware Se-curity, building on 20 years in the Microelec-tronics field, he also studies higher level issuesof System Security and Security Economics,

with Applications in Payment Systems, RFID and Medical Devices.He teaches courses in Microelectronics, Embedded Systems andSecurity Engineering. He is a Fellow of the IEEE for contributions inIntegrated Circuit Design and Signal Processing.

Detecting Software Theft in Embedded Systems: A Side ... · Detecting Software Theft in Embedded...

Documents

Transcript of Detecting Software Theft in Embedded Systems: A Side ... · Detecting Software Theft in Embedded...