Time-bounded Authentication of FPGAs

1

Time-bounded Authentication of FPGAsMehrdad Majzoobi,Student Member, IEEE,Farinaz Koushanfar,Member, IEEE

Abstract—This paper introduces a novel technique to authen-ticate and identify field programmable gate arrays (FPGAs).The technique uses the reconfigurability feature of FPGAs toperform self-characterization and extract the unique timing ofthe FPGA building blocks over the space of possible inputs.The characterization circuit is then exploited for constructinga physically unclonable function (PUF). The PUF can acceptdifferent forms of challenges including pulse width, digital binaryand placement challenges. The responses from the PUF areonly verifiable by entities with access to the unique timingsignature. However, the authentic device is the only entitywhocan respond within a given time constraint. The constraint is setby the gap between the speed of PUF evaluation on authentichardware and simulation of its behavior. A suite of authenticationprotocols is introduced based on the time-bounded mechanism.We ensure that the responses are robust to fluctuations inoperational conditions such as temperature and voltage variationsby employing: (i) a linear calibration mechanism that adjusts theclock frequency by a feedback from on-chip temperature andvoltage sensor readings, (ii) a differential PUF structurewithreal-valued responses that cancels out the common impact ofvariations on delays. Security against various attacks is discussedand a proof-of-concept implementation of signature extractionand authentication are demonstrated on Xilinx Virtex 5 FPGAs.

Index Terms—field-programmable gate arrays; physically un-clonable function; delay characterization; time-boundedauthen-tication;

I. I NTRODUCTION

Many security mechanisms are based upon the concept ofa secret. Classic cryptography protocols contain a secret keyfor reversing trapdoor functions. While such protocols areoften secure against attacks at the algorithmic level, it iswell-known that digitally stored secret keys can be attacked andcloned. Furthermore, secret key storage has major limitationswhen applied to conventional FPGAs since the reconfigurablefabric technology cannot easily integrate non-volatile memory(NVM). Thus, the keys need to be stored on off-chip memorywhich demands secure channels and additional protocols tocommunicate the keys to- and from- the off-chip components.On-chip key storage requires the overhead of a constant powersource. Such reliance not only incurs additional overhead,butincreases the vulnerability to attacks.

Physical unclonable functions (PUFs) are efficient mecha-nisms for many security applications [1], [2]. PUFs exploitsecret information inherently embedded in the unclonablephysical variations of the silicon devices to produce secure dig-ital keys. Moreover, a PUF provides a unique chip-dependentmapping from a set of digital inputs (challenges) to digitaloutputs (responses) based on the unique properties of the

F. Koushanfar and M. Majzoobi are with the Department of Electrical andComputer Engineering, Rice University, Houston, TX, 77005USA e-mail:[email protected] and [email protected].

underlying physical device. PUFs can be employed to providesecurity at multiple levels and to address a range of problemsfrom securing processors [3], to software protection [4], IPprotection [5], and IC authentication [6]. Even though anumber of methods for realizing PUFs on FPGAs have beenproposed [5], [7], the scope of existing FPGA PUFs is limited.Either they have a limited number of challenge-response pairs,or they are limited by the routing constraints on FPGAs, orthey introduce noise and other vulnerabilities to the system.

In this paper, we introduce and implement a novel mech-anism for FPGA PUFs. First, the delay of each configurableblock is characterized for different combinations of inputs.A challenge to the PUF queries the delays of a subset ofconfigurable blocks. Reproducing the responses requires theknowledge of the extracted delays as well as the structureand placement of the PUF circuit. We introduce a dynamicauthentication protocol, where the placement of the PUF israndomly changed during each round of authentication, forcingthe adversary to constantly reverse-engineer the configurationbit stream to discover the PUF structure and placement. Theprotocol is based on the fact that it is infeasible to generatethe challenge-response signatures through simulation in ashorttime duration compared to evaluation on hardware.

As an example application, the FPGA can be used as a hard-ware security token, where the owner of the genuine FPGAcan use it to authenticate him/herself. Before distribution ofthe tokens (FPGAs) to end users, the delay characteristics ofthe FPGA components are extracted and stored in a publiclyknown database. A PIN or serial number is associated witheach FPGA and its corresponding database entry. Next, thetoken is handed out to the users. At the time of authentica-tion, the user in possession of the genuine FPGA presentshim/herself by the associated serial number. The verifier thensends a configuration bitstream along with a set of challengesto the device. The responses are expected within an allotedtime frame. Since the verifier knows the configuration of thePUF, he/she can simulate its behavior with the help of publicdelay characteristics and compare the simulated response tothose received from the user to perform authentication.

Our contributions are as follows:• We introduce a new mechanism for efficient self-

extraction of the unclonable and unique analog timingsignature of the FPGA logic blocks. The self extractiononly requires ordinary desktop computers and commoditylogic analyzers.

• We introduce a new PUF structure built upon the self-extracting mechanism. The introduced PUF can acceptdifferent inputs (challenge formats) including timing,digital binary, and placement challenges; it can generateboth binary and real-valued outputs (responses).

• A new time-bounded authentication is developed which

2

exploits the delay gap between the speed of evaluationof authentic hardware and simulation of the hardwarebehavior. A suit of new authentication protocols is builtupon the time-bounded property.

• A linear calibration method is presented that adjuststhe clocks to compensate for the impact of variationsin operational conditions using feedback from on-chiptemperature and voltage sensors.

• To cancel out the common impact of variation, a lowoverhead differential PUF structure is presented.

• Attacks and countermeasures for the new PUF and thetime-bounded authentication are discussed.

• Extensive evaluation and proof-of-concept implementa-tion on vertex 5 FPGA demonstrate the applicability,consistency and efficiency of the new methods.

The remainder of the paper is organized as follows. The nextsection summarizes the related literature. Section III presentsthe flow of the proposed methodology. In Section IV, wepresent the method for signature extraction for each possiblechallenge. Section V describes how the characterization circuitcan be challenged as a PUF. Next, different ways to achieverobustness of PUF responses are presented in Section VI.Section VII introduces time-bounded authentication protocoland its variants. Attacks and countermeasures are discussedin Section VIII. Experimental evaluations are demonstrated inSection IX. Finally, we conclude the paper in Section XI.

II. RELATED WORK

The idea of using complex unclonable features of a physicalsystem as an underlying security mechanism was initiallyproposed by Pappu et al. [1]. The concept was demonstratedby studying mesoscopic physics of coherent transport througha disordered medium. Another group of researchers observedthat the manufacturing process variability in modern silicontechnology can be utilized for building a PUF. They proposedthe arbiter-based PUF architecture based on the variationsinCMOS logic delays [2]. The arbiter-based PUF implementa-tion on ASICs was demonstrated, and a number of attacksand countermeasures were discussed [2], [8], [9], [6], [10]. Inparticular, it was observed that the linear arbiter-based PUF isvulnerable to modeling attacks and the use of nonlinear feed-forward arbiters and hashing to proposed to safeguard againstthis attack [2]. Moreover, error correcting codes were proposedin [11] to alleviate instability of PUF responses.

Further efforts were made to address the PUF vulnerabilityissues by adding input/output networks, adding nonlinearitiesto hinder machine learning and enforcing an upper boundon the PUF evaluation time [7], [12], [13]. The work in [7]demonstrated that even though successful ASIC implementa-tion of arbiter PUF was shown, FPGA implementation of thisPUF is troubled due to the routing constraints enforced bythe regularities of the underlying FPGA fabric. In particular,asymmetries in routing when implementing arbiter-based PUFcause skews and bias in delays, which in turn introduces biasin responses. To eliminate bias in delays, authors in [14] in-troduced a precision look-up table based programmable delaylines to tune and balance the delays. For implementing PUFs

on FPGA, Ring oscillator (RO) PUFs were also proposed in[6]. The major drawback of the RO PUFs is having only aquadratic number of challenges with respect to the numberof RO’s [12]. Furthermore, the ROs (while in use) consumesignificant dynamic power due to frequent transitions duringoscillations. SRAM PUFs suffer from the same limitation interms of the number of possible challenge combinations [12].

This paper is a major extension to the work presented in[15]. It presents the first practical method and proof-of-conceptFPGA implementation of a PUF with an exponential numberof possible challenges of different types including placementchallenges. The new proposed PUF uses the unique cell-by-cell characteristics of the FPGA array. In order to provideresilience against variations in environmental conditions, alinear calibration method as well as a differential signatureextraction system are presented. The authors in [13] exploitedthe time-bounded characteristics of generic public key pro-tocol by PUFs but no practical implementation options werediscussed.

Besides the ongoing research on PUFs, several other rel-evant works on delay characterization serve as the enablingthrust for realization of our novel PUF structures. To performdelay characterization, Wong et al. in [16] proposed a built-inself-test mechanism for fast chip level delay characterization.The system utilizes the on-chip PLL and DCM modules forclock generation at discrete frequencies. The delay fingerprintcan be used to detect any malicious modification to the originaldesign due to insertion of hardware Trojan horses [17], [18].

In addition, the use of reconfigurability to enhance systemsecurity and IP protection has previously been a subject ofresearch. The work in [19] proposes a secure reconfigurationcontroller (SeReCon) which provides secure runtime manage-ment of designs downloaded to the DPR FPGA system andprotects the design IP. The work in [20] introduces methodsfor securing the integrity of FPGA configuration while [21]leverages the capabilities of reconfigurable hardware to pro-vide efficient and flexible architectural support for securitystandards as well as defenses against hardware attacks. In thispaper, we take advantage of the degree of freedom offeredby reconfigurable platforms to carry out random placementsacross different configurations. Instantaneous reverse engi-neering of the configuration bitstream is a highly complexand challenging task because of the proprietary encoding ofconfiguration bitstream. The paper uses this observation toestablish a time-bound on the authentication.

III. FLOW

The flow of the proposed methodology is presented inFigure 1. First, a one-time process is conducted to extractthe delay parameters and characterize the timing behavior ofthe FPGA components. Next, the extracted chip-dependentunique and physically unclonable characteristics are registeredin a publicly accessible database. Each time authenticationis needed, the prover in possession of the genuine hardwareresponds to the challenges posed by the verifier. The verifieris able to predict and simulate the responses to the challengesbased on the pre-stored timing characteristics in the database.

3

Response simulation/emulation by a verifier would be muchslower than the prover with genuine hardware. Authenticationis performed by comparing the simulated responses to theactual responses within the alloted time frame.

Public

Database

VerifierAuthentication

Characterization

& Registration

Prover

FPGA

Fig. 1. The flow of the proposed method.

IV. D ELAY SIGNATURE EXTRACTION

To measure the delays of components inside FPGA, weexploit the device reconfigurability to implement a delay signa-ture extraction circuit. A high level view of the delay extractioncircuitry is shown in Figure 2. The target circuit/path delay tobe extracted is called theCircuit Under Test (CUT). Three flip-flops (FFs) are used in this delay extraction circuit:launch FF,sample FF, andCapture FF. The clock signal is routed to allthree FFs as shown on the Figure. Assume for now that thebinary challenge input to the CUT is held constant and thusthe CUT delay is fixed.

Assuming the FFs in Figure 2 are initialized to zero, alow-to-high signal is sent through the CUT by the launchFF at the rising edge of the clock. The output is sampledT seconds later on falling edge of the clock (T is half theclock period). Notice that the sampling register is clockedatthe falling edge of the clock. If the signal arrives at the sampleFF before sampling takes place, the correct signal value wouldbe sampled; otherwise, the sampled value would be differentand will generate an error. The actual signal value and thesampled value are compared by XOR logic and the result isheld for one clock cycle by the capture FF.

A more careful timing analysis of the circuit reveals therelationship between the delay of the CUT (tCUT ), the clockpulse width (T ), the clock-to-Q delay at the launch FF(tclk2Q), and the clock skew between the launch and sampleFFs (tskew). The setup/hold of the sampling register and thesetup/hold time of the capture register are denoted bytsetS ,tholdS , tsetC , and tholdC respectively. The propagation delayof the XOR gate is denoted bytXOR. The time it takes forthe signal to propagate through CUT and reach the sample

��

�

��

��

��

�

�

��

�

��

�

Fig. 2. The timing signature extraction circuit.

flip flop from the moment the launch flip flop is clocked isrepresented bytP . Based on the circuit in Figure 2,tP =tCUT + tclk2Q − tskew .

As T approachestP , the sample flip flop enters a metastableoperation (because of the setup and hold time violations) andits output becomes nondeterministic. The probability thatthemetastable state resolves to a 0 or 1 is a function of how closeT is to tP . For instance, ifT and tCUT are equal, the signaland the clock simultaneously arrive at the sample flip flopand the metastable state resolves to a 1 with a probability of0.5. If there are no timing errors in the circuit, the followingrelationships must hold:

tholdC < tP < T − tsetS (1)

The errors start to appear iftp enters the following interval:

T − tsetS < tP < T + tholdS (2)

The rate (probability) of observing timing error increasesastpgets closer to the upper limit of Inequality 2. If the followingcondition holds, then timing error happens every clock cycle:

T + tholdS < tP < 2T − (tsetC + tXOR) (3)

Observability of timing errors follows a periodic behavior.In other words, iftp goes beyond2T − (tsetC + tXOR) inInequality 3, the rate of timing errors begins to decrease again.This time the decrease in the error rate is not due to the properoperation but it is because the timing errors cannot be observedand captured by the capture FF.

Inequality 4 corresponds to the transition from the casewhere timing error happens every clock cycle (Inequality 3)to the case where no errors can be detected (Inequality 5).

2T − (tsetC + tXOR) < tP < 2T + (tholdC − tXOR) (4)

2T + (tholdC − tXOR) < tP < 3T − tsetS (5)

Timing errors no longer stay undetected iftp is greater than3T − tsetS . Timing errors begin to appear and can be capturedif tp falls into the following intervals:

3T − tsetupS < tp < 3T + tholdS (6)

If the following condition holds, then timing error gets de-tected every clock cycle.

3T + tholdS < tp < 4T − (tsetC + tXOR) (7)

This periodic behavior continues the same way for integermultiples ofT , however it is upper bounded by the maximumclock frequency of the FPGA device. In general, ifT is muchlarger than the XOR and flip flop delays, the intervals can besimplified ton×T < tp < (n+1)×T and timing errors canonly be detected for odd values ofn.

Notice that in the circuit in Figure 2, high-to-low and low-to-high transitions travel through the CUT every other clockcycle. The propagation delay of these two transitions differ inpractice. Suppose that the low-to-high transition propagationdelay (tl→h

p ) is smaller than the high-to-low transition prop-agation delay (th→l

p ). Then, for low-to-high transitions,tl→hp

satisfies Inequalities 1 and for high-to-low transitions,th→lp

satisfies Inequality 3. Timing errors in this case happen only

4

for high-to-low transitions and as a result timing error canonlybe observed 50% of the times. Thus, the final measurementrepresents the superposition of both effects.

The top plot in Figure 3 shows the observed/measuredprobability of timing error as a function of clock pulse width(T ). The right most region (R1) corresponds to the errorfree region of operation expressed by Inequality 1. Note thatthe difference betweenth→l

p and tl→hp causes the plateau at

R2. The gray regions marked byR2 and R4 correspond tothe condition expressed by Inequality 2. RegionR5 can beexplained by Inequality 3. Metastable regions ofR6 andR8

relate to inequality 4. Inequality 5 corresponds to the error freeregion ofR9. Similar to R3, regionsR7 andR11 are due tothe difference between high-to-low and low-to-high transitiondelays. Metastable regions ofR10 andR12 relate to inequality6 and lastly regionR13 corresponds Inequality 7.

T

1

0

0.5

!6 !4 !3 !2 !1!5

d6 d5 d4 d3 d2 d1

R1

R3

R12

R2

R4R5R6

R7

R8

R9

R10

R11

R13Erro

r Pro

babi

lity

T0

1!6 !4 !3 !2 !1!5

d6 d5 d4 d3 d2 d1

R1R3R12

R2

R4R5

R6

R7R8R9

R10

R11R13

T0

1!6 !4 !3 !2 !1!5

d6 d5 d4 d3 d2 d1

R1R3

R12

R2

R4

R5R6R7

R8

R9R10R11R13Err

or P

rob.

Erro

r Pro

b.

Total

Probability of timing errors for low-to-high transitions

Probability of timing errors for high-to-low transitions

Fig. 3. The probability of observing timing failure as a function of clockpulse width,T .

Notice that similar totp, all of the delays defined above forthe XOR, flip flops, and clock skew have two distinct valuesfor high-to-low (rising edge) and low-to-high (falling edge)transitions. Nevertheless, all of the inequalities definedin thissection hold true for both cases.

We refer to the characterization circuit that includes theCUT as acharacterization cellor simply a cell. Each cellin our implementation is contained in one configurable logicblock (CLB). The circuit under test consists of four cascadedlook-up tables (LUT) each implementing a variable delayinverter. We explain in Section V how the delay of the inverterscan be changed.

A. Signature extraction system

In this subsection, we describe the system that efficientlyextracts the probability of observing timing failure as a func-tion of clock pulse width for a group of components on FPGA.

The circuit shown in Figure 2 only produces a single bit flagof whether errors happen or not. We require a mechanismto measure the rate or probability at which errors appear atthe output of the circuit in Figure 2 to extract the smoothtransitions as depicted in Figure 3.

To measure the probability of observing error at a givenclock frequency, an error histogram accumulator is imple-mented by using two counters. The first counter is the errorcounter whose value increments by unity every time an errortakes place. The second counter counts the clock cycles andresets (clears) the error counter every2N clock cycles, whereN is the size of the binary counters. The value of the errorcounter is stored in the memory exactly one clock cycle beforeit is cleared. Now, the stored number of errors normalized toN would yield the error probability value.

"#

$%

&'()*+ ,--./00 1/2'-/.

#345667899:8;3687

<..'. &')+=/.

&'+=.'((/.

>('?@( &('2A

B.C=/D+2./*/+=

E8FF5667899

,--./00 &')+=/.

&(/@.

&('2A G)(0/ +)*?/.&('2A H)(0/ &')+=/.

"#

"#

EIJFF8KL8

M'*/.N

&(/@.

Fig. 4. The architecture for chip level delay extraction of logic components.

The clock frequency to the system is swept linearly andcontinuously inTsweep seconds fromfi = 1

2Tito ft =

1

2Tt,

whereTt < tp < Ti. A separate counter counts the number ofclock pulses in each frequency sweep. This counter acts as anaccurate timer that bookmarks the frequency at which timingerrors happen. The value of this counter is retrieved every timethe error counter content is written into memory. This actionhappens every2N clock cycles. For further details on clocksynthesis see [22].

The system shown in Figure 4 is used for extracting thedelays of an array of CUTs on the FPGA. Each square in thearray represents the characterization circuit (orcell) shown inFigure 2. Any logic configuration can be utilized within theCUT in the characterization circuit. In particular, the logicinside the CUT can be made a function of binary challenges,such that its delay varies by the given inputs. The systemin Figure 4 characterizes each cell by sweeping the clockfrequency once. Then, it increments the cell address and movesto the next cell. The cells are characterized in serial. Therow and column decoders activate the given cell while therest of the cells are deactivated. Therefore, the output of thedeactivated cells remain zero and the output of the OR functionsolely reflect the timing errors captured in the activated cell.

5

Each time the data is written to the memory, three valuesare stored: the cell address, the accumulated error value, andthe clock pulse number at which the error has occurred. Theclock counter is then for each new sweep. The whole operationiterates over different binary challenges to the cells. Note thatthe scanning can also be performed in parallel to reduce thecharacterization time [22].

B. Characterization accuracy

The timing resolution, i.e., the accuracy of the measureddelays, is a function of the following factors: (i) the clockjitter and noise, (ii) the number of frequency sample points,and (iii) the number of pulse samples at each frequency.Recall that the output of the characterization circuit is a binaryzero/one value. By resending multiple clock pulses of thesame width to the circuit and summing up the number ofones at the output, a real-valued output can be obtained. Theobtained value represents the rate (or the probability whennormalized) at which the timing errors happen for the inputclock pulse width. Equivalently, it represents a sample pointon the curve shown in Figure 3. The more we repeat theinput clock pulse, the higher sample resolution/accuracy canbe achieved along Y axis. Now suppose that the clock pulseof width T is sent to the PUF forM times. Due to clock jitterand phase noise, the characterization circuit receives a clockpulse of widthTeff = T + Tj , whereTj is additive jitternoise. Let us assumeTj is a random variable with zero meanand a symmetric distribution. Since the output probabilityisa smooth and continuous function ofTeff , estimating theprobability by averaging will be an asymptotically unbiasedestimator asM → ∞. Finally, the minimum measurable delayis a function of the maximum speed at which the FFs canbe driven (maximum clock frequency). When performing alinear frequency sweep, a longer sweep increases (ii) and (iii)and thus the accuracy of the characterization. A completediscussion on characterization time and accuracy for thismethod is presented in [22].

C. Parameter extraction

So far, we have described the system that measures theprobability of observing timing errors for different clockpulsewidths. The error probability can be represented compactlybya set of few parameters. These parameters are directly relatedto the circuit component delays and flip flop setup and holdtime. It can be shown that the probability of timing error canbe expressed as the sum of shifted Gaussian CDFs [7]. TheGaussian nature of the error probabilities can be explainedbythe central limit theorem. Equation 8 shows the parameterizederror probability function.

fD,Σ(t) = 1 + 0.5

|Σ|−1∑

i=1

−1⌈i/2⌉[

Q(t− diσi

)

]

(8)

whereQ(x)= 1√2π

∫∞x

exp(

−u2

2

)

anddi+1 > di. To estimatethe timing parameters,f is fit to the set of measured datapoints (ti,ei), whereei is the error value recorded when thepulse width equalsti.

Placement 1 Placement 2

PUF

cellT

11...01

Binary

Challenge

Pulse Width

Challenge

Response

1/0

Fig. 5. Two random placement of PUF cells on FPGA.

V. T IMING PUF

To enable authentication, a mechanism for applying chal-lenge inputs to the device and observing the evoked responsesis required. In this section, we present a PUF circuit basedon the delay characterization circuit shown in Figure 2. Theresponse is a function of the clock pulse widthT , the delayof circuit under test,tCUT , and flip flop characteristics,σi. Inthe following, we discuss three different ways to challengethePUF.

A. Pulse challenge

One way to challenge the PUF is to change the clock pulsewidth. The clock pulse width can be considered as an analoginput challenge to the circuit in Figure 2. The response toa given clock pulse of widthT is either 0 or 1 with theprobability given by Equation 8 or the plot in Figure 3.

However, the use of clock pulse width as challenge has anumber of implications. First, the response from the PUF willbe predictable ifT is either too high and too low comparedto the nominal circuit under test delaytCUT . Predictabilityof responses makes it easy for the attacker to impersonatethe PUF without knowledge of the exact value oftCUT . Asanother example, suppose that the response to multiple clockpulses of the same width,T1, are equal to ‘0’; then, the attackercan deduce thatT1 is in either regionR1 or R9 in Figure3 with high confidence. If the nominal boundaries of theseregions (R1,...,R13) are known, the attacker can determinewhich regionT1 belongs by just comparing it to the boundariesTRi

< T1 < TRi+1. Knowing the correct region, it becomes

much easier to predict the response to the given pulse width,especially for odd regionsR1, R3, ..., R13.

Within the thirteen regions shown in Figure 3, the sixregions that include transitions produce the least predictableresponses. Setting the challenge clock pulse width to thestatistical median of the center points of transitions in Figure3 would maximize the entropy of the PUF output responses.In other words, there are only six independent pulse widthsthat can be used for challenges and the results for other pulsewidths are highly predictable. As it can be seen, the space ofpossible independent challenges for this type of challengeisrelatively small.

Another limitation of pulse challenges is that depending onthe available clocking resources, generating many clock pulseswith specific widths can be costly. Under such limitations, theverifier may prefer to stick to a fixed pulse width. In the nextsections, we look into other alternatives to challenge the PUF.

6

OP OQ OR

S

T

U

T

U

T

U

T

U

VWXYZ[ \]Y^^_X`_

abcdefg hij

OQ OR

OP S

klm

OP

OQ

OR

S

nopqrY^s_t

Fig. 6. The internal structure of LUTs. The signal propagation path insidethe LUTs change as the inputs change.

B. Binary challenge

An alternative method to challenge the PUF is to changethe tCUT while the clock pulse width is fixed. So far, weassumed that the delay of CUT is not changing. To changetCUT , one must devise an input vector to the circuit-under-test that changes its effective input/output delay by alteringthe signal propagation path inside the CUT. In other words,the binary input challenge vector alters the CUT delay bychanging its internal signal propagation path length, henceaffecting the response.

In this work, we introduce a low overhead method toalter the CUT delay by tweaking the LUT internal signalproportion path. We implement the CUT by a set of LUTs eachimplementing an inverter function. Figure 6 shows the internalcircuit structure of an example 3-input LUT. In general, aQ-input LUT consists of2Q-1 2-input MUXs which allow selec-tion of 2Q values stored in SRAM cells. The SRAM cell valuesare configured to implement a pre-specified functionality.

In this example, the SRAM cell values are configured toimplement an inverter. The LUT output is only the functionof A1, i.e., O = f(A1), disregarding values onA2 andA3.However, changing the inputsA2 andA3 can alter the delay ofthe inverter due to the modifications in the signal propagationpaths inside the LUT. For instance, two internal propagationpath for the values ofA2A3 = 00 and A2A3 = 11 arehighlighted in Figure 6. As it can be seen, the path lengthfor the latter case is longer than the former, yielding a largereffective delay. The LUTs in Xilinx Virtex 5 FPGAs consistof 6 inputs. Five inputs of the LUT can be used to control andalter the inverter delay resulting in25 = 32 distinct delays foreach LUTs. Finally, note that the delays for each binary inputmust be measured prior to authentication. The response to thePUF is then predicted by the verifier based on the configureddelay and the input clock pulse width.

C. Placement challenge

Another important type of challenge which can be imple-mented solely on reconfigurable platforms is the placementchallenge. This type of challenge is enabled by the degreeof freedom in placing the PUF cells on FPGA in each

configuration. During characterization, a complete database ofall CUT delays across the FPGA is gathered. At the time ofauthentication, only a subset of these possible locations withinthe FPGA array are selected to implement and hold the PUFcells. The placement challenge is equivalent to choosing andquerying a subset of PUF cells, where the selection input isembedded in the configuration bitstream.

Figure 5 shows two random placements of 20 PUF cellsacross the FPGA array. Each black square in the figure con-tains a PUF cell which receives a pulse and binary challenge.The high degree of freedom in placement of PUF cells acrossthe FPGA results in a huge challenge/response space. In ourimplementation, each PUF cell can be fit into a CLB on FPGA.With N CLBs on FPGA, there will be

(

Nk

)

different ways toplace k PUF cells on FPGA. The smallest Xilinx Virtex 5FPGA (LX30) has 2400 CLBs which enables

(

2400

512

)

numberof possibilities to place 512 PUF cells on the FPGA.

VI. RESPONSE ROBUSTNESS

Although PUF responses are functions of chip-dependentprocess variations and input challenges, they can also beaffected by variations in operational conditions such as tem-perature and supply voltage. In this section, we discuss twotechniques to provide calibration and compensation to makeresponses resilient against variations in operational conditions.

The first method takes advantage of on-chip sensors toperform linear calibration of the input clock pulse widthchallenge, while the second method uses a differential struc-ture to cancel out the fluctuations in operational conditionsand extract signatures that are less sensitive to variations inoperational conditions. We will discuss the advantages anddisadvantages of each method. The existing body of researchtypically addresses this issue mainly through the use of errorcorrection techniques [11] and fuzzy extractors [23]. The errorcorrection techniques used for this purpose rely on a syndromewhich is a public piece of information being sent to the PUFsystem along with the challenge. The response from the PUFand the syndrome are input to the ECC to produce the correctoutput response. The methods discussed in this section helpreduce the amount of errors in responses and they can be usedalong with many other error correction techniques.

A. Linear Calibration

The extracted delay signatures at characterization phaseare subject to changes due to aging of silicon devices, vari-ations in the operating temperature, and supply voltage ofthe FPGA. Such variations can undermine the reliability ofthe authentication process. The proposed method performscalibration on clock pulse width according to the currentoperating conditions. Fortunately, many modern FPGAs areequipped with built-in temperature and core voltage sensors.Before authentication begins, the prover is required to send tothe verifier the readings from the temperature and core voltagesensors. The prover, then based on the current operatingconditions, adjusts and calibrates the clock frequency. Thepresented calibration method linearly adjusts the pulse width

7

using the Equations 9 and 10.

Tcalib = αtmp × (tmpcur − tmpref) + Tref (9)

Tcalib = αvdd × (vddcur − vddref ) + Tref (10)

tmpref andvddref are the reference temperature and FPGAcore voltage measured during the characterization phase.tmpcur andvddcur represent the current operating conditions.The responses from the PUF to the clock pulse widthTcalib

are then treated as ifTref were sent to the PUF at referenceoperating condition. The calibration coefficientsαtmp andαvdd are device specific. These coefficients can be determinedby testing and characterizing each single FPGA at differenttemperatures and supply voltages. For example, ifdtmp1

i anddtmp2

i are i-th extracted delay parameter under operatingtemperaturestmp1 and tmp2, then

αtmp,i =dtmp1

i − dtmp2

i

tmp1 − tmp2, αvdd,i =

dvdd1

i − dvdd2

i

vdd1 − vdd2(11)

Note that for each delay parameter on each chip, two cali-bration coefficients can be defined (one for temperature andone for voltage supply effect) and the clock pulse widthcan be calibrated accordingly. Ideally, with the help of amore sophisticated prediction model (potentially a nonlinearmodel) trained on a larger number of temperature and voltagesupply points (instead of two points as in Equation 11),highly accurate calibration can be performed on the clockfrequency. In reality, due to limitations on test time andresources, it is impractical to perform such tests for eachFPGA device. Instead, calibration coefficients can be derivedfrom a group of sample devices and a universal coefficientcan be defined for all devices by averaging the coefficients.In Section IX, we demonstrate reliability of authentication foruniversal calibration coefficients. Note that in Equations9 and10, we assume that only one type of operational conditionvariation is happening at a time and both temperature andvoltage supply do not fluctuate simultaneously. However, ifweconsider these effects independently, we can superimpose theeffects by applying Equation 9 to the output of Equation 10.A more general approach would be to consider a 2D nonlineartransformation given by:

Tcalib = f (vddcur , tmpcur, Tref ) (12)

The main disadvantage of calibration methods is the time andeffort required to characterize the delay at various operationalconditions. Hence, more effort spent on building and trainingthe regression model, the more accurate calibration and ahigher robustness in responses can be achieved.

B. Differential Structure

In this section, we introduce a differential PUF structure,that compensates for the common mode variation induced bythe impact of fluctuations in operational conditions on thedelays. The goal of the method is to extract a signature thatis invariant to fluctuations in operational conditions.

The PUF introduced previously receives a clock pulse and abinary challenge to produce a binary response. Here, insteadof looking at the output responses from a single PUF cell,

uvwxyvz{

y|}~w{

z~�z

�

��

��y|x�

��v� ��

��~

��v� ��

u��zyw~

��v� ��

�

��

�

��

�

u��x�

� � �

� �

�v|�w� u��~|�~

uvwxyvz{

y|}~w{

z~�z

�

��

��y|x�

��v� ��

��~

��v� ��

u��zyw~

��v� ��

�

��

�

��

� �

u�y|z~w

�|xw~�~|z

��

��

�v�|�zyw~

Fig. 7. The differential signature extraction system.

we consider the difference of the responses from two adjacentPUF cells. More specifically, the outputs of the capture flipflops from the two cells drive an XOR logic. Assumingi1and i2 are the inputs andO is the output of the XOR logic,then the probability of output being equal to ‘1’ ,ρO, as afunction of the probability of inputs being equal to ‘1’,ρ1andρ2, can be written as:

ρO = ρ1 + ρ1 − 2× ρ1 × ρ2 (13)

ρ1 andρ2 are functions of the clock pulse width (T) and thebinary challenge as explained in Section IV. The resultingoutput probability is shown in Figure 8 (see the red dashedline) for two sample PUF cells under (a) normal operatingcondition and (b) low operating temperature of -10oC. As itcan be seen, since both PUF cell delay parameters are shiftedtogether under the same operational conditions, the resultingXOR output probability retains the shape, with only a scalarshift along the x axis. To extract robust signatures, one needsto look into shift invariant features that are less sensitive toenvironmental variables. Features such as the high/low regionwidths of the resulting XOR probability plot, or the totalarea under the XOR output probability plot can be used forthis purpose. In this work, we use the area under the XOR

Fig. 8. The timing error probability for two sample PUF cellsand theresulting XOR output probability under (a) normal operating condition and(b) low operating temperature of -10oC.

output probability curve. The area is shaded in Figure 8 for

8

the two operating conditions. The area under the curve can becalculated by integrating the probability curve from the lowestto highest clock pulse width. We use the Riemann sum methodto approximate the total area underneath the XOR probabilitycurve in hardware. The result of the integration is a resilientreal valued signature extracted from the PUF cell pairs.

In order to find a quick approximation to this integral inhardware, we sweep the input clock frequency linearly fromfrequencyfl = 1/2Tu to fu = 1/2Tl whereTl ≪ Dmin,Tu ≫ Dmax, Dmin andDmax represent lowest and highestbounds on delay parameters under all operational conditions.In other words, the sweep window must always completelycontain all parts of the curve. The output of the XOR isconnected to a counter as shown in Figure 7. The aggregatecounter value after a complete sweep is a function of the areaunder the curve. Please note that this value is not exactly equalto the area under the curve and is only proportional to theintegral. Also, a longer sweep time results in a larger numberof clock pulses and thus more accurate approximation of thesignature. This is analogous to using a larger number of nar-rower subintervals when approximating the area under curvewith the Riemann sum to achieve a smaller approximationerror.

Although the generated responses are less sensitive tovariations in operational conditions, it should be noted thatthe responses are a function of the difference in the timingcharacteristics of the two PUF cells. The area under the curveloses a lot of information about the shape of the curve and alsosome information is lost on each individual probability curvethrough the difference operation. Therefore, the responses havea lower entropy compared to the linear calibration method. Toobtain the same amount of information, more PUF cell pairsmust be challenged and scanned. Another limitation of thisstructure is the length of the input challenge. To estimate thearea under the curve with a high accuracy, the whole intervalfrom the lowest to the highest frequency must be swept in finesteps and thus, it would require more clock pulses compared tothe other method. Using few clock pulses leads to a larger areaestimation error, lower probability of detection, and higherprobability of false alarm. Finally, the pairing of the PUF cellsintroduces another degree of freedom to the system where aset of challenges can specify pairing of the PUF cells.

VII. A UTHENTICATION PROTOCOL

In this section, we show how the extracted cell characteris-tics in Section IV can be utilized for FPGA authentication. Thefollowing terminology is used in the remainder of the paper.The verifier (V ) authenticates theprover (P ) who owns thegenuine FPGA device. The verifier authenticates the device byverifying the unique timing properties of the device.

A. Classic Authentication

The registration and authentication processes for the classicauthentication case are demonstrated in the diagram in Figure9(a) and (b) (disregard the darker boxes for now). The min-imum required assumptions for this case are: (i) the verifieris not constrained in power, (ii) it is physically impossible

to clone the FPGA, and (iii) the characteristics of the FPGAowned by the prover is a secret only known to the prover andverifier.

As shown in Figure 9(a), during the registration phase, theverifier extracts and securely stores the cell delay parametersby performing characterization as explained in Sections IV.By knowing the FPGA-specific features in addition to thestructure and placement of the configured PUF, the verifieris able to predict the responses to any challenges to the PUF.After registrations, the FPGA along with the pertinent PUFconfiguration bitstream is passed to the end-user.

At the authentication, end-user (prover) is queried by theverifier to make sure she is the true owner of the FPGA.Classic authentication is shown in Figure 9(b). To authenticatethe ownership, the verifier utilizes a random seed and generatesa set of pseudorandom challenge vectors for querying theprover. The prover responds to the challenges she receivesfrom the verifier by applying them to the configured FPGAhardware. The verifier then compares the received responsesfrom the prover with the predicted ones, and authenticates thechip if the responses are similar.

To ensure robustness against errors in measuring the delaysand the changes in operational conditions, the registrationentity may also compute the error correction information forthe responses to the given challenges. To prevent informationleakage via the error correction bits, secure sketch techniquescan be used. A secure sketch produces public informationabout its input that does not reveal the input, and still permitsexact recovery of the input given another value that is closeto it [24].

The device is authenticated if the response after errorcorrection would be mapped to the verifier-computed hashof responses. Otherwise, the authentication will fail. Alter-natively, the verifier can allow for some level of errors inthe collected responses and remove the error correction andhashing from the protocol. However, accepting some errorsin the responses makes the verifier be more susceptible toemulation/impersonating attacks [2], [25].

�� ¡¢£¤¥¦ ��¡

§¨©¡©§ª¢¡�«��

¦¬¬® ¯��¢¡¢�ª¤ °¢ ± ²�¬ ª°

¯¢ª¢¡³��¢ �¢£¢©ª ¡¢

´ª�¡¢ �¢£¢©ª ¡¢°

µ¢��°ª¡©ª��

µ©�¶�³�¨©¢��¢°·¬ °¢¸¹��©¡®º

µ¢°¬��°¢»³ ©ª��

��³¬ ª¢ »¡¡�¡��¡¡¢§ª�� ¼�ª°

µ©�¶�³µ¢§�� ¡©ª��

µ¢°¬��°¢¥¢�¢¡©ª��

�� ¡¢ ª¨¢£¤¥¦

¦¬¬® »¡¡�¡��¡¡¢§ª��

½©°¨ £ �§ª��½©°¨ £ �§ª��

»¾ ©¿

¤©°° £©�

À¢° Á�

Â¢¡��¢¡ ¤¡�Ã¢¡

µ©�¶�³�¨©¢��¢°·¬ °¢¸¹��©¡®º

µ¢°¬��°¢»³ ©ª��

µ©�¶�³µ¢§�� ¡©ª��

µ¢°¬��°¢¥¢�¢¡©ª��

�� ¡¢ ª¨¢£¤¥¦

Â¢¡��¢¡ ¤¡�Ã¢¡

ÄÅÆÇÈÉÊËÌÍ

ÎÉÏÏÇÐÑÒÇÌÍ

¤©°° £©�

À¢° Á�

Á�

À¢°

·©º ·¹º ·§º

Fig. 9. (a) FPGA registration (b) Classic authentication flow (c) Time-boundauthentication flow.

9

B. Time-bounded Authentication Using Reconfigurability

After the FPGA registration, the verifier is able to computeand predict the responses to any set of challenges by knowing(i) the cell-level features of the pertinent FPGA, (ii) thecircuit structure, and (iii) placement of the PUF circuit. Theinformation on the PUF circuit structure and placement is em-bedded into the configuration bitstream. In the classic authenticmethod, the bitstream is never changed. A dishonest prover,off-line and given enough time and resources can (i) extractthe cell-level delays of the FPGA, and (ii) reverse engineerthe bitstream to discover the PUF structure and its placementon the FPGA. During the authentication, the dishonest provercan compute the responses to the given challenges online bysimulating the behavior of the PUF on the fly and producingthe responses that pass the authentication.

A stronger set of security protocols can be built upon thefact that the prover is the only entity who can compute thecorrect response to a random challenge within a specific timebound since he has access to the actual hardware. In thisprotocol, prior to the beginning of the authentication session,the FPGA is blank. The verifier then sends a bitstream to thedevice in which a random subset of LUTs are configured forauthentication. After the device is configured, the verifierstartsquerying the FPGA with random challenges. The verifier ac-cepts the responses that are returned back only if∆t ≤ ∆tmax

where∆t is the time elapsed on the prover device to computethe responses after receiving the configuration bitstream,and∆tmax is the upper bound delay estimated computation ofresponses by the authentic FPGA prover device, which iscomposed of device configuration, response generation, errorcorrection, and hashing time all performed in hardware.

The verifier will authenticate the device only if the timethe device takes to generate the response is less than∆tmax.We denote the minimum emulation time bytemu

min , wheretemumin >> ∆tmax. Time-bounded authentication protocol can

be added to the authentication flow, as demonstrated in Figure9(c). Compared to the classic authentication flow, a time boundcheck is added after the hash function. While performingthe above authentication, we emphasize on the assumptionthat the time gap between hardware response generation andsimulation (or emulations) of the prover must be larger thanthevariation in the channel latency. The time-bound assumptionwould be enough for providing the authentication proof [7],[13], [26].

1) Estimating the time-bound:Now let us look at∆t, thetime elapsed on the prover device to compute the responses.Before proceeding, note that the characterization is a one-time offline operation which happens prior to authenticationphase and its time complexity does not affect the time-bounddiscussed here.∆t is the sum of time required to configure theFPGA,Tconf , and the time spent on evaluating the PUF,Teval,i.e.,∆t = Tconf+Teval. During the PUF evaluation,Np clockpulses atNf distinct frequencies are sent toNcell PUF cellsin serial with an average pulse width ofTavg, therefore theaverage evaluation time is,Teval = Np×Ncell×Nf×Tavg. Forinstance, in our experiments,Np = 8, Np = 6, Ncell = 1024,andTavg = 2ns, yielding Teval ≃ 98µseconds.

Configuration time varies for different configurationschemes and depends on the configuration file size, config-uration port width, and frequency of the driving clock. Con-figuration time can roughly be estimated byTconf = Lb

fc×Pw

,whereLb is the configuration bitstream length in bits,fc isthe clock frequency in Hz, andPw is the configuration portwidth in bits. For example, In our experiment on Xilinx Virtex5 FPGAs (LX110),Lb=3.5MB,fc=50MHz andPw=16bit, theconfiguration time equals 350 milli-seconds. Faster clockscanexpedite the configuration process.

VIII. A TTACKS AND COUNTERMEASURES

Perhaps the most dangerous attack to an authenticationsystem is impersonation attack. Impersonation attack aimsatdeceiving the verifier to get through the authentication byreverse-engineering and simulation of the authentic devicebehavior, or storing and replaying the communication, orrandom guessing. Storage and reply attacks are impracticalas long as the verifier uses a new random challenge everytime. Random guessing and prediction attacks pose a threat ifthe responses have a low entropy and are predictable. As wementioned in Section V, by setting the input clock pulse widthsto the statistical median of the center of transition regions,the entropy of the responses can be maximized. For a fixedbinary challenge, there are not more than six independent inputclock pulse widths to be tried. In other words, the responsestoother input clock pulse widths would lack sufficient entropy.To obtain more response bits, more binary challenges must beused instead.

Among the aforementioned threats, the reverse engineeringand simulation attacks are the most critical attacks to ad-dress. The time-bounded protocol discussed in Section VIIis constructed based on secrecy in placement of the PUFand the connection of the input challenges to the CUTs. Thesecret expires within the given time bound. To provide thecorrect response to a new challenge, the adversary has toreverse engineer the bitstream to decipher the placement andconnection of the input challenges to the PUF. Next, he hasto simulate (or emulate) the PUF behavior using the publictiming characterization. These two steps must be performedwithin the given time constraint. Even after many years ofresearch in rapid simulation technologies for hardware designand validation, accurate simulation or emulation of a hardwarearchitecture is extremely slow compared to the real device.In addition, even though bitstream reverse-engineering havepartially been performed on some FPGAs [27], performing itwould require a lot of simulations and pattern matching. Thus,it would take many more cycles than the authentic hardwarewhere the verifying time is dominated by the bitstream con-figuration time (in the order of 100 mili seconds).

IX. EXPERIMENTAL EVALUATIONS

In this section, the implementation details of the signa-ture extraction system are presented. We demonstrate resultsobtained by measurements performed on Xilinx FPGAs andfurther use the platform to carry out authentication on theavailable population of FPGAs. For delay signature extraction,

10

the system shown in Figure 4 is implemented on XilinxVirtex 5 FPGAs. The system contains a 32× 32 array ofcharacterization circuits as demonstrated in Figure 2. TheCUTinside the characterization circuit consists of 4 inverters eachbeing implemented using one 6-input LUT. The first LUTinput (A1) is used as the input of the inverter and the rest of theLUT inputs (A2,...,A6) serve as the binary challenges whichalter the effective delay of the inverter. The characterizationcircuit is pushed into 2 slices (one CLB) on the FPGA. Infact, this is the lower bound on the characterization circuithardware area. The reason is that the interconnects inside theFPGA force all the flip flops within the same slice to operateeither on rising edge or falling edge of the clock. Since thelaunch and sample flip-flops must operate on different clockedges, they cannot be placed inside the same slice. In total,8LUTs and 4 flip flops are used (within two slices) to implementthe characterization circuit. The error counter size (N ) is setto 8. To save storage space, the accumulated error values arestored only if they are between 7 and 248.

We use an ordinary desktop function generator to sweepthe clock frequency from 8MHz to 20MHz and afterwardsshift the frequency up 34 times using the PLLs inside theFPGA. The sweeping time is set to 1 milli seconds (due to thelimitations of the function generator, a lower sweeping timecould not be reached). The measured accumulated error valuesare stored on an external memory and the data is transferred toa PC for further processing. Notice that the storage operationcan easily be performed without the logic analyzer by usingany off-chip memory.

The system is implemented on twelve Xilinx Virtex 5XC5VLX110 chips and the measurements are taken underdifferent input challenges and operating conditions. The char-acterization system in total uses 2048 slices for the character-ization circuit array and 100 slices for the control circuitoutof 17,280 slices.

The measured samples for each cell are processed and thetwelve parameters as defined in Section IV-C are extracted.Figure 10 shows the measured probability of timing errorversus the clock pulse width for a single cell and a fixed chal-lenge. The (red) circles represent original measured samplepoints and the (green) dots show the reconstructed samples.Asexplained earlier, to reduce the stored data size, error sampleswith values of 0 and 1 (after normalization) are not written tothe memory and later are reconstructed from the rest of thesample points. The solid line shows the Gaussian fit on thedata as expressed in Equation 8.

Parameter extraction procedure is repeated for all cells andchallenges. Figure 11 shows the extracted parametersd1 andσ1 for all cells on chips#9 and #10 while the binarychallenge is fixed. The pixels in the images correspond tothe cells within the 32×32 array on FPGA. Some levels ofspatial correlation amongd1 parameters can be observed onthe FPGA fabric.

The boxplots in Figure 12(a) show the distribution of thedelay parametersdi for i=1,2,...,6 over all 12 chips and 1024cells and 2 challenges. The central mark on the boxplotdenotes the median, the edges of the boxes correspond tothe 25th and 75th percentiles, the whiskers extent to the most

ÓÔ

ÕÖÔ

Ó×

ÕÖ×

ÓØ

ÓÙ

ÕÖÙ

ÕÖÚ

ÓÛ

ÕÖÛ

ÓÜ

ÕÖÜ

Fig. 10. The probability of detecting timing errors versus the input clockpulse widthT . The solid line shows the Gaussian fit to the measurement data.

X coordinate

Y c

oo

rdin

ate

D1 for chip #9

10 20 30

5

10

15

20

25

301.24

1.26

1.28

1.3

1.32

1.34

1.36

(a)

X coordinate

Y c

oo

rdin

ate

D1 for chip #10

10 20 30

5

10

15

20

25

30 1.16

1.18

1.2

1.22

1.24

1.26

1.28

1.3

(b)

X coordinate

Y c

oo

rdin

ate

σ1 for chip #9

10 20 30

5

10

15

20

25

303

4

5

6

(c)

X coordinate

Y c

oo

rdin

ate

σ1 for chip #10

10 20 30

5

10

15

20

25

30

2

3

4

5

6

(d)

Fig. 11. The extracted parametersd1 (a,b) andσ1 (c,d) for chips 9 and 10.

extreme data points and the plus signs show the outlier points.Using the measured data from the twelve chips, we in-

vestigate different authentication scenarios. The authenticationparameters substantially increase the degree of freedom inchallenging the PUF. These parameters include the number ofclock pulses to send to the PUF (Np), the number of binarychallenges to apply to the PUF (Nc), the challenge clock pulsewidth (T ), and the number of PUF cells (Ncell) to be queried.In other words, in each round of authentication,Nc challengesare applied toNcell PUF cells on the chip and thenNp pulsesof width T are sent to to these PUF cells. The response toeach challenge consists ofNp bits. For ease of demonstration,the response can be regarded as the percentage of ones in theNp response bits, i.e., an integer between 0 andNp.

To quantify the authentication performance, we study theeffect ofNcell andT on the probability of detection (pd) andfalse alarm (pf ). Detection error occurs in cases where thetest and target chips are the same, but due to instability andnoise in responses, they fail to be authenticated as the same.On the other hand, false alarm corresponds to the cases wherethe test and target chips are different, but they are identified asthe same chips. During this experiment, the binary challengesto PUF cells are fixed and the number clock pulses is set toNp = 8. The clock width (T ) is set to each of the medians

11

of the values shown in Figure 12 (a). Setting the clock pulsewidth to the median values results in least predictability ofresponses. AllNcell=1024 PUF cells are queried. The sameexperiment is repeated for 10 times to obtain 10 responsevectors (each vector isNp = 8 bits) for each chip. Therefore,each clock pulse generates 8×1024 bits of responses fromevery chip. After that, the distance between the responses fromthe same chips (intra-chip distance) over repeated evaluationsis measured using the normalizedL1 distance metric. Thedistance between responses from different chips (inter-chipdistance) is also measured. If the distance between the testchipand the target chip responses is smaller than a pre-specifieddetection threshold, then the chip is successfully authenticated.In the experiments, the detection threshold is set at 0.15.

Table I shows the probability of detection and false alarmfor different clock pulse widths and number of queried PUFcells. To calculate the probabilities, the distance between theresponse of every distinct pair of FPGAs are calculated. Thenumber of pairs with a response distance of less than 0.15,normalized to the total number of pairs yield the probabilityof false alarm. To find the probability of detection, the distancebetween the responses from the same chip acquired at differenttimes are compared to 0.15. The percentage stay within thethreshold determine the probability of detection. As it canbeobserved, the information extracted from even the smallestsetof cells is sufficient to reliably authenticate the FPGA chipifthe pulse width is correctly set.

(a)

NcellChallenge Pulse Width

1.23 1.15 1.06 1.03 0.9 0.87

64 0.96 0 0 0 0 1.52128 2.04 0 0 0 0 1.52256 4.55 0 0 0 0 1.52

(b)

NcellChallenge Pulse Width

1.23 1.15 1.06 1.03 0.9 0.87

64 93.3 96.2 100 100 100 100128 94.2 98.8 100 100 100 100256 99.85 100 100 100 100 100

TABLE I(A) PROBABILITY OF FALSE ALARM (B) PROBABILITY OF DETECTION.

In the next experiment, we study the effect of fluctuations inthe operating conditions (temperature and core supply voltage)on the probabilities of detection and false alarm. Moreover, wedemonstrate how linear calibration of the challenge clock pulsewidth can improve the reliability of detection. To calculate thecalibration coefficient defined by Equation 11, we repeat thedelay extraction process and find the delay parameters for alltwelve chips at temperature -10oC and core voltage0.9 Volts.The chip operates at the temperature 37oC and core voltageof 1 volts in the normal (reference) condition. We use thebuilt-in sensors and the Xilinx Chip Scope Pro package tomonitor the operating temperature and core voltage. To cooldown the FPGAs, liquid compressed air is consistently sprayedover the FPGA surface. Figure 12 (b) depicts the changes inthe distribution of the first delay parameter (d1) at the three

different operating conditions.The probabilities of detection and false alarm are derived

before and after performing calibration on the challenge pulsewidth for different clock pulse widths and number of binarychallenges to the cells. In this experiment, all 1024 PUFcells on the FPGA are queried for the response.Np =8 asbefore. As it can be seen in Table II, the detection probabilitiesare significantly improved after performing linear calibrationbased on the coefficients extracted for each chip. The variablesvlow andtlow correspond to -10oC temperature and 0.9 supplyvoltages respectively. The reported probabilities of Table II areall in percentage. Also note that for the challenge pulse widthof T = 0.87 ns, the probability of detection reaches 100% andprobability of alarm falls to zero after calibration. The sameholds true forNc = 2 andT = 0.87, 0.9, 0.95. Thus, increasedlevel of reliability can be achieved during authenticationwithproper choice of pulse width and number of challenges.

Figure 13 shows how performing calibration decreasesthe intra-chip response distances in presence of temperaturechanges. The histogram corresponds toT = 0.95ns andNc = 2 in Table II before and after calibration.

Next, we examine the differential signature extraction sys-tem presented in Section VI-B. To extract the signature, thebase frequency is swept from 8 to 20 MHz in a linear fashionin 1 mili second and shifted up 34 times using the FPGAinternal PLLs. The sweep is repeated for the 512 pairs of PUFcells producing a real-valued signature vector of size 512.Alarge number of pulses (∼ 107) are generated in a completesweep. The signature as explained in Section VI-B is theaccumulation of the timing errors over a complete sweep. Toachieve an accurate approximation of the area under the curve,a large number of clock pulses must be tried. This is the maindisadvantage of this method compared to the singled endedmethod. To extract the shift invariant parameters such the

0.8

0.9

1

1.1

1.2

1.3

1.4

D1 D2 D3 D4 D5 D6 Extracted Parameters

Dis

trib

utio

n (

na

no

se

cs)

(a)

1

1.2

1.4

1.6

1.8

low Vdd normal low Temp

Dis

trib

utio

n (n

ano

secs

)

Temp=−10oC V

DD = 1 V

Temp=37oC V

DD = 1 V

Temp=37oC V

DD = 0.9 V

(b)

Fig. 12. (a) Distribution of delay parametersdi. (b) The distribution ofd1for normal, low operating temperature, and low core voltage.

12

No Calibration CalibratedNC=1 NC=2 NC=1 NC=2

vlow tlow vlow tlow vlow tlow vlow tlowpd pf pd pf pd pf pd pf pd pf pd pf pd pf pd pf

T

1.23 18.4 0 33.3 16.7 18.4 0 33.3 22.29 100 0 75 0 100 0 75 01.06 18.4 0 18.4 0 18.4 0 18.4 0 50 0 50 0 57.3 0 50 01.01 18.4 0 16.7 0 18.4 0 16.7 0 66.6 0 75 0 68.2 0 75 00.95 18.4 0 16.7 0 18.4 0 16.7 0 66.7 0 100 0 84.9 0 100 00.9 16.7 0 25 0 16.7 0 25 0 83.3 0 91.7 0 83.4 0 100 00.87 25 0 25 0 25 1.5 25 0 100 0 100 0 100 0 100 0

TABLE IITHE PROBABILITY OF DETECTION AND FALSE ALARM BEFORE AND AFTERPERFORMING CALIBRATION ON THE CHALLENGE PULSE WIDTH IN PRESENCE

OF VARIATIONS IN TEMPERATURE AND CORE VOLTAGE.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

Response Distance

Dis

tribu

tion

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

Response Distance

Dis

tribu

tion

Inter−chipIntra−chip

Inter−chip Intra−chip

Fig. 13. The inter-chip and intra-chip response distances for T = 0.95 ns

andNc = 2 before (top) and after (bottom) calibration against changes intemperature.

region width and/or area under the probability curve probingthe PUF circuit at single frequency points will not yieldsufficient information. Therefore, a complete sweep coveringthe regions with high information content is needed. TheL1

distance of the signatures from the same chip under differentoperational conditions (intra-chip distance) and the distanceof the signatures from different chips (inter-chip distance) arecalculated. Figure 14 shows the distribution of intra and inter-chip distance of signatures under variations in temperature andsupply voltage for the twelve Virtex 5 chips. As it is shownin the figure, the distance among signatures obtained at roomtemperature and−10oC temperature from the same chip isalways smaller than those from different chips, resulting in100% probability of detection and 0% false alarm probability.However, with 10% variations in voltage supply, the intra- andinter-chip distributions overlap slightly.

X. CONCLUSIONS

We presented a technique for FPGA authentication thattakes advantage of the unclonable timing variability presentin FPGAs, the reconfigurability feature, and its unprecedentedspeed. Authentication is composed of two phases; namelyregistration and authentication. During registration, cell leveltiming features are extracted and stored in a database. Laterat the authentication phase, the verifier generates a randomconfiguration bitstream and sends it to the prover. A uniqueaspect of the new method is its high degree of freedom in

0 1 2 3 4 5 6 7

x 106

0

10

20 Under temperature variations

Signature distance

Dis

trib

utio

n

0 1 2 3 4 5 6 7

x 106

0

10

20 Under supply voltage variations

Signature distance D

istr

ibut

ion



Detection Threshold

Detection Threshold

Fig. 14. The distribution of the intra- and inter-chip signature L1 distances

placing the PUF cells and selection of challenges. The pro-tocol relies on the fact that online reverse-engineering ofthebitstream is a non-trivial task. A new calibration method forimproving robustness to temperature and voltage fluctuationswas demonstrated. Evaluations on Xilinx V5 FPGA showthe effectiveness and practicality of the new timing signatureextraction and authentication method.

XI. A CKNOWLEDGEMENT

This research is in part supported by the Office of NavalResearch (ONR) grant under grant No. R16480 and NationalScient Foundation Career Grant under grant No. 0939766, andSemiconductor Research Center (SRC) under the grant No.1836.039. The authors would like to thank Ahmed Elnablyfor assistance in data collection process.

REFERENCES

[1] R. Pappu, B. Recht, J. Taylor, and N. Gershenfeld, “Physical one-wayfunctions,” Science, vol. 297, pp. 2026–2030, 2002.

[2] B. Gassend, D. Clarke, M. van Dijk, and S. Devadas, “Silicon physicalrandom functions,” inConference on Computer and CommunicationsSecurity (CCS), 2002, pp. 148–160.

[3] G. Suh, C. O’Donnell, and S. Devadas, “AEGIS: A single-chip secureprocessor,”IEEE Design & Test of Computers, vol. 24, no. 6, pp. 570–580, 2007.

[4] M. Atallah, E. Bryant, J. Korb, and J. Rice, “Binding software tospecific native hardware in a VM environment: the PUF challenge andopportunity,” inWorkshop on Virtual Machine Security, 2008, pp. 45–48.

[5] J. Guajardo, S. Kumar, G. Schrijen, and P. Tuyls, “FPGA intrinsicPUFs and their use for IP protection,” inWorkshop on CryptographicHardware and Embedded Systems (CHES), 2007, pp. 63–80.

[6] G. Suh and S. Devadas, “Physical unclonable functions for deviceauthentication and secret key generation,” inDesign Automation Con-ference (DAC), 2007, pp. 9–14.

13

[7] M. Majzoobi, F. Koushanfar, and M. Potkonjak, “Techniques for designand implementation of secure reconfigurable PUFS,” ACM Transactionson Reconfigurable Technology and Systems (TRETS), vol. 2, no. 1, pp.1–33, 2009.

[8] B. Gassend, D. Clarke, M. van Dijk, and S. Devadas, “Controlledphysical random functions,” inAnnual Computer Security ApplicationsConference (ACSAC), 2002, pp. 149–160.

[9] J. Lee, L. Daihyun, B. Gassend, G. Suh, M. van Dijk, and S. Devadas,“A technique to build a secret key in integrated circuits foridentificationand authentication applications,” inSymp. of VLSI, 2004, pp. 176–179.

[10] D. Holcomb, W. Burleson, and K. Fu, “Power-up SRAM stateas anidentifying fingerprint and source of true random numbers,”IEEE Trans.on Computers, 2009.

[11] B. Gassend, “Physical random functions,” S.M. Thesis,MIT, 2003.[12] U. Ruhrmair, J. Solter, and F. Sehnke, “On the foundations of physical

unclonable functions,”Cryptology ePrint Archive, 2009.[13] U. Ruhrmair, “SIMPL system: on a public key variant of physical

unclonable function,”Cryptology ePrint Archive, 2009.[14] M. Majzoobi, F. Koushanfar, and S. Devadas, “FPGA PUF using

programmable delay lines,” inIEEE Workshop on Information Forensicsand Security, 2010, p. in press.

[15] M. Majzoobi, A. Elnably, and F. Koushanfar, “FPGA time-boundedunclonable authentication,” inInformation Hiding Conference (IH), vol.6387, 2010, pp. 1–16.

[16] J. S. J. Wong, P. Sedcole, and P. Y. K. Cheung, “Self-measurementof combinatorial circuit delays in FPGAs,”ACM Transactions onReconfigurable Technology and Systems (TRETS), vol. 2, no. 2, pp. 1–22, 2009.

[17] Y. Jin and Y. Makris, “Hardware trojan detection using path delayfingerprint,” in International Symposium on Hardware-Oriented Securityand Trust (HOST), 2008, pp. 51–57.

[18] D. Rai and J. Lach, “Performance of delay-based trojan detectiontechniques under parameter variations,” inInternational Symposium onHardware-Oriented Security and Trust (HOST), 2009, pp. 58–65.

[19] K. Kepa, F. Morgan, K. Kosciuszkiewicz, and T. Surmacz,“Serecon:A secure dynamic partial reconfiguration controller,”IEEE ComputerSociety Annual Symposium on VLSI, vol. 0, pp. 292–297, 2008.

[20] J. B. Webb and J. B. Webb, “Methods for securing the integrity ofFPGA configurations,” Tech. Rep., 2006.

[21] G. Gogniat, T. Wolf, W. Burleson, J.-P. Diguet, L. Bossuet, and R. Vaslin,“Reconfigurable hardware for high-security/ high-performance embed-ded systems: The safes perspective,”IEEE Transactions on Very LargeScale Integration (VLSI) Systems, vol. 16, no. 2, pp. 144 –155, 2008.

[22] M. Majzoobi, E. Dyer, A. Elnably, and F. Koushanfar, “Rapid FPGAdelay characterization using clock synthesis and sparse sampling,” inInternational Test Conference (ITC), 2010, p. in press.

[23] C. Bosche, J. G. A. Sadeghi, J. Shokrollahi, and P. Tuyls, “Efficienthelper data key extractor on FPGAs,” inWorkshop on CryptographicHardware and Embedded Systems (CHES), 2008, pp. 181–197.

[24] Y. Dodis, L. Reyzin, and A. Smith, “Fuzzy extractors: How to generatestrong keys from biometrics and other noisy data,” inEUROCRYPT,2004, pp. 523–540.

[25] M. Majzoobi, F. Koushanfar, and M. Potkonjak, “Testingtechniques forhardware security,” inInternational Test Conference (ITC), 2008, pp.1–10.

[26] N. Beckmann and M. Potkonjak, “Hardware-based public-key cryptogra-phy with public physically unclonable functions,” inInformation HidingConference (IH), 2009, pp. 206–220.

[27] J. Note and E. Rannaud, “From the bitstream to the netlist,” in Interna-tional Symposium on Field-Programmable Gate Arrays (FPGA), 2008,pp. 264–274.

Mehrdad Majzoobi received his B.Sc. and M.Sc.degree in Electrical and Computer Engineering fromUniversity of Tehran, Iran in 2006 and Rice Uni-versity, Texas in 2009 respectively. He is currentlya Ph.D. candidate in the Electrical and ComputerEngineering Department at Rice University. Hisresearch interests include statistical modeling andoptimization, low power embedded systems, sensor-based and reconfigurable systems, VLSI testing,hardware security, and hardware/software intellec-tual property (IP) protection and watermarking.

Farinaz Koushanfar received the Ph.D. degree inelectrical engineering and computer science fromthe University of California at Berkeley in 2005.She also received the M.A. degree in statistics fromthe University of California, Berkeley, the M.Sc.degree in electrical engineering from the Universityof California, Los Angeles, and the B.A.Sc. degreein electrical engineering from the Sharif Universityof Technology, Iran. She is currently an AssistantProfessor in the Department of Electrical and Com-puter Engineering at Rice University where she is

also the director of the Texas Instruments (TI) DSP Leadership Universityprogram. Prior to joining Rice University in July 2006, she held the CoordinateScience Lab (CSL) fellowship at the University of Illinois,Urbana Cham-paign. Her research interests include data integrity, statistical modeling andoptimization, embedded systems, sensor-based and reconfigurable systems,hardware security, and hardware/software intellectual property (IP) protection.Dr. Koushanfar is a recipient of the Presidential Early Career Awards forScientists and Engineers (PECASE), the Office of Naval Research (ONR)Young Investigator Program Award (2009), MIT Technology Review’s Top35 Young Innovators Award (2008), the Defense Advanced Research ProjectsAgency (DARPA) Young Faculty Award (2007), the National Science Foun-dation (NSF) CAREER Award (2007), an Intel Open Collaborative ResearchFellowship, and a Best Paper Award at Mobicom.

Time-bounded Authentication of FPGAs

Documents

Transcript of Time-bounded Authentication of FPGAs