Differential Privacy (Part III) · 2016-03-03 · Approximate (or (ℇ,∂))-differential privacy...

Differential Privacy (Part III)

Approximate(or(ℇ,∂))-differentialprivacy•Generalizeddefinitionofdifferentialprivacyallowingfora(supposedlysmall)additivefactor

•Usedinavarietyofapplications

A query mechanism M is (✏, �)-di↵erentially private if,

for any two adjacent databases D and D0(di↵ering in

just one entry) and C ✓ range(M)

Pr(M(D) 2 C) e✏ · Pr(M(D0) 2 C) + �

TheGaussianmechanism

For c2>2ln(1.25/δ), the Gaussian mechanism with parameter σ≥c∆2(f)/ε is (ε,δ)-differentially private

The ℓ2-sensitivity off:ℕ|X|→ℝk is defined as ∆2(f)=max ||f(x)-f(y)||2

for all x,y∈ℕ|X|,||x-y||1=1

SparseVectorTechnique

✦ [Hardt-Rothblum,FOCS’10]studytheproblemofk,adaptivelychosen,lowsensitivityquerieswhere• onlyaverysmallnumberofthesequeries(sayc)takevaluesaboveacertainthresholdT

• thedataanalystisonlyinterestedinsuchqueries• usefultolearncorrelations,e.g.,whetherthereisadependencybetweensmokeandcancer

✦ Thedataanalystcouldaskonlythesignificantqueries,butshedoesnotknowtheminadvance!

✦ Goal:answeronlythesignificantqueries,payonlyforthem,andignoretheothers

Histogramsandlinearqueries✦ Ahistogramx ∈ ℝNrepresentsadatabase(oradistribution)

overauniverseUofsize|U|=N • Databaseshavesupportofsizen,whereasdistributionsdo

notnecessarilyhaveasmallsupport✦ Weassumexisnormalizedsothat✦ Herewefocusonlinearqueries

• canbeseenastheinner-product<x,f >for• countingqueries(i.e.,howmanyelementsinthedatabase

fulfillacertainpredicate)areaspecialcase✦ Example:U={1,2,3}D=[1,2,2,3,1]

• x=(2,2,1),afternormalization(2/5,2/5,1/5)• “howmanyentries≤2”⇒f=(1,1,0)

✦ Bynormalization,linearquerieshavesensitivity1/n

X

i2U

xi = 1

f : RN ! [0, 1]

f 2 [0, 1]N

SVT:algorithm

✦ Intuition:answeronlythosequerieswhosesanitizedresultisabovethesanitizedthreshold

We pay only for c queriesWe pay only for c queries

We need to sanitize the threshold otherwise the

conditional branch would leak information

SVT:accuracy

•αcapturesthedistancebetweenthesanitizedresultandtherealresult•βcapturestheerrorprobability

We say Sparse is (↵,�)-accurate for a sequence of kqueries Q1, . . . , Qk, if except with probability at most

�, the algorithm does not abort before Qk, and for all

ai 2 R:|ai �Qi(D)| ↵

and for all ai =?:

Qi(D) T + ↵

SVT:accuracytheorem

•Thelargerβ,thesmallerα•Theaccuracylossislogarithmicinthenumberofqueries

For any sequence of k queriesQ1, . . . , Qk such that L(T ) =|{i : Qi(D) � T�↵}| c, Sparse(D, {Qi}, T, c) is (↵,�)-accurate for:

↵ = 2�(log k + log

2

�) =

4c (log k + log

2� )

✏n

SVT:privacytheorem

•So,whatdidweproveintheend?•Youcanestimatetheactualanswersandreportonlythoseinthisrange:

•Wecanfishoutinsignificantqueriesalmost“forfree”,payingonlylogarithmicallyforthemintermsofaccuracy

The Sparse vector algorithm is ✏-di↵erentially private

T T+ α ∞

SVT:approximatedifferentialprivacy

✦ Setting,wegetthefollowingtheorems:� =

p32c ln 1/�

✏n

The Sparse vector algorithm is (✏, �)-di↵erentially private

For any sequence of k queriesQ1, . . . , Qk such that L(T ) =|{i : Qi(D) � T�↵}| c, Sparse(D, {Qi}, T, c) is (↵,�)-accurate for:

↵ = 2�(log k + log

2

�) =

128c ln

1� (log k + log

2� )

✏n

Limitations

✦ Differentialprivacyisageneralpurposeprivacydefinition,originallythoughtfordatabasesandlaterappliedtoavarietyofdifferentsettings

✦ Atthemoment,itisconsideredthestate-of-the-art✦ Still,itisnottheholygrailanditisnotimmunefrom

concerns,criticisms,andlimitations✦ Typicallyaccompaniedbysomeover-claims

Nofreelunchindataprivacy

✦ Privacyandutilitycannotbeprovidedwithoutmakingassumptionsabouthowdataaregenerated(nofreelunchtheorem)

✦ Privacymeanshidingtheevidenceofparticipationofanindividualinthedatageneratingprocess

✦ Ifdatabaserowsarenotindependent,thisisdifferentfromremovingonerow• Bob’sparticipationinasocialnetworkmaycause

newedgesbetweenpairsofhisfriends✦ Ifthereisgroupstructure,differentialprivacymay

notworkverywell...

Nofreelunchindataprivacy(cont’d)

✦ Thisworkdisputesthreepopularover-claims✦ “DP requires no assumptions on the data”

• databaserowsmustactuallybeindependent,otherwiseremovingonerowdoesnotsufficetoremovetheindividual’sparticipation

✦ Ifrowsarenotindependent,decidinghowmanyentriesshouldberemovedandwhichonesisfarfrombeingeasy...

Nofreelunchindataprivacy(cont’d)

✦ Theattackerknowsallentriesofthedatabaseexceptforone,so“the more an attacker knows, the greater the privacy risks”

✦ Thusweshouldprotectagainstthestrongestattacker

✦ Careful!InDP,themoretheattackerknows,thelessnoiseweactuallyadd• intuitively,thisisduetothefact

thatwehavelesstohide

Nofreelunchindataprivacy(cont’d)✦ “DP is robust to arbitrary background knowledge”✦ Actually,DPisrobustwhencertainsubsetsofthe

tuplesareknowntotheattacker✦ Othertypesofbackgroundknowledgemayinstead

beharmful• e.g.,previousexactqueryanswers

✦ DPcomposeswellwithitself,butnotnecessarilywithotherprivacydefinitionsorreleasemechanisms

✦ Onecangetanew,moregeneric,DPprivacyguaranteeif,afterreleasingexactqueryanswers,asetoftuples(notjustone),calledneighbours,isalteredinawaythatisstillconsistentwithpreviouslyansweredqueries(plausibledeniability)

Geo-indistinguishability

•Goal:protectuser’sexactlocation,whileallowingapproximateinformation(typicallyneededtoobtainacertaindesiredservice)tobereleased•Idea:protecttheuser’slocationwithinaradiusrwithalevelofprivacythatdependsonr•correspondstoageneralizedversionofthewell-knownconceptofdifferentialprivacy.

Pictorially…

•Achievel-privacywithinr • the provider cannot easily infer the user’s location within, say, the 7th arrondissement of Paris

• the provider can infer with high probability that the user is located in Paris instead of, say, London

Moreformally…

•HereK(x)denotesthedistribution(oflocations)generatedbythemechanismKappliedtolocationx•AchievedthroughavariantoftheLaplacemechanism

Browserextension

Maliciousaggregators

•Sofarwefocusedonmaliciousanalysts…•…butaggregatorscanbemalicious(oratleastcurious)too!

Users Aggregator Analyst

x1xn

f(x1,…,xn)

Existingapproaches

• Securehardware(ortrustedserver)-basedmechanisms

• Fullydistributedmechanismswithindividualnoise

DistributedDifferentialPrivacy

How to compute differentially private queries in a distributed setting

(attacker model, cryptographic protocols…)?

“What’stheaverageageofyourself-helpgroup?”

Smart-metering

✦ Fine-grainedsmart-meteringhasmultipleuses:• time-of-usebilling,providingenergyadvice,settlement,

forecasting,demandresponse,andfrauddetection✦ USA:EnergyIndependenceandSecurityActof2007

• AmericanRecoveryandReinvestmentAct(2009,$4.5bn)✦ EU:Directive2009/72/EC✦ UK:deploymentof47millionsmartmetersby2020

✦ Remote reads ✦ Reads every 15-30 min

✦ Manual reads ✦ One reads every 3

months to 1 year

Smart-metering:privacyissues

✦ Meterreadingsaresensitive• Wereyouinlastnight?• YoudolikewatchingTV,don’tyou?• Anotherreadymealinthemicrowave?• Hasyourboyfriendmovedin?

Smart-metering:privacyissues(cont’d)

Privacy-friendlysmartmetering

✦ Goals:• precisebillingof

consumptionwhilerevealingnoconsumptioninformationtothirdparties

• privacy-friendlyreal-timeaggregation

Protocoloverview

✦ rianswerfromclienti✦ kijkeysharedbetween

clientiandaggregatorj✦ tlabelclassifyingthe

kindofreading✦ wiweightgiventoi’s

answers

Protocoloverview

✦ Geometricdistribution,Geom(α),withα >1,isthediscretedistributionwithsupportandprobabilitymassfunction

✦ DiscretecounterpartofLaplacedistribution

↵� 1

↵+ 1↵�|k|

Z

Let f : D ! Z be a function with sensitivity �f . Then

g = f(X) + Geom(

✏�f ) is ✏-di↵erentially private.

Protocoloverview

✦ Intermsofutility,thenoiseaddedtotheaggregatehasmean0andvariance

✦ Pisthenumberofaggregators

✦ Theprotocolguaranteesε-differentialprivacyevenifallexceptforoneaggregatorsaredishonest

PX

k2Z

↵� 1

↵+ 1↵�|k|k2 =

2P↵

(↵� 1)2

The noise increases with the number of aggregators (each adds noise that suffices to get ε-differential privacy).

On the other hand this seems to be necessary to protect from malicious

aggregators…we will see a more elegant and precise solution based on SMPC

LimitationsofExistingApproaches

• Privacyvsutilitytradeoff

• Lackofgenerality(andscalability)

• Inefficiency:significantcomputationaleffortonuser’sside

• Answerpollution:singleentitycanpolluteresultbyexcessivenoise

PrivaDA:IdeaandDesign

SecureMulq-PartyComputaqon

SecureMulq-PartyComputaqon

✦

Computaqonparqes

• Inputsaresharedamongcomputationparties

• Computationpartiesjointlycomputedifferentiallyprivatestatistics

• Requirednoiseisgeneratedinadistributedfashion

• Nopartylearnstheindividualinputs

OurContributions(PrivaDA)

•WeleveragerecentadvancesonSMPCforarithmeticoperations• usesSMPCtocomposeuserdata• usesSMPCtojointlycomputethesanitizationmechanism

•Wesupportthreesanitizationmechanisms• Lap,DLap,exponentialmechanism,morearepossible

•Weemployβcomputationparties

•Weemployzero-knowledgeproofs

• FirstpubliclyavailablelibraryforefficientarithmeticSMPCoperationsinmalicioussetting

strong privacy

optimal utility

efficiency scalability

generality

malicious setting no answer pollution

PrivaDA101:DifferentiallyPrivateYearofBirth

bornapprox.+ =

1500 4781505 474

1 9 7 8≈

1 9 7 91 9 7 9

SMPCforDistributedSanitizationMechanisms

•WeemployrecentSMPCforarithmeticoperations• fixed-pointnumbers[Catrina&Saxena,FC’10]• floatingpointnumbers[Aliasgarietal.,NDSS’13]• integers[From&Jakobsen,2006]

• KeySMPCprimitives• RandInt(k)• IntAdd,FPAdd,FLAdd,FLMul,FLDiv• FL2Int,Int2FL,FL2FP,FP2FL• FLExp,FLLog,FLLT,FLRound

In: d1

, . . . , dn

; � = �f

✏

Out: (nP

i=1

di

) + Lap(�)

1: d =nP

i=1

di

2: rx

U

(0,1]

; ry

U

(0,1]

3: rz

= �(ln rx

� ln ry

)4: w = d+ r

z

5: return w

(a) LM

In: d1

, . . . , dn

; � = e� ✏

�f

Out: (nP

i=1

di

) + DLap(�)

1: d =nP

i=1

di

2: rx

U

(0,1]

; ry

U

(0,1]

3: ↵ = 1

ln �

= ��f

✏

4: rz

=b↵(ln r

x

)c � b↵(ln ry

)c

5: w = d+ rz

6: return w

(b) DLM

In: d1

, . . . , dn

; a1

, . . . , am

;� = ✏

2

Out: winning ak

1: I0

= 02: for j = 1 to m do

3: zj

=nP

i=1

di

(j)

4: �j

= e�zj

5: Ij

= �j

+ Ij�1

6: r U

(0,1]

; r0 = rIm

7: k = binary search(r0,,I0

, . . . , Im

)8: return a

k

(c) EMTable 1: Algorithms: Sanitization Mechanisms

shown before. It holds that DLap(�) = Geo(1��)�Geo(1��), where Geo(�0) =bExp(� ln (1� �0))c. Thus, using the previous results we know that

DLap(�) =

�1

ln �ln U

(0,1]

⌫�

�1

ln �ln U

(0,1]

⌫. (2)

In particular, for � = e�✏

�f , ✏-DP is guaranteed.The algorithm to add discrete Laplace noise to n inputs is shown in Table 1b.

It takes as input (i) the n integer numbers d1

, . . . , dn

owned by P1

, . . . , Pn

respec-tively, which correspond to locally executing the query f on each P

i

’s databaseDi

(di

= f(Di

)) and (ii) the privacy budget parameter �, which will be set to e�✏

�f

to guarantee ✏-DP. The algorithm returns the integer w = (P

n

i=1

di

) +DLap(�),which is computed analogously to the Laplace mechanism, using (2).

Privacy of DLM. The DLM algorithm implementsP

n

i=1

di

+⌅

1

ln �

ln U

(0,1]

⇧�⌅

1

ln �

ln U

(0,1]

⇧=

Pn

i=1

di

+ DLap(�). By Theorem 1 DLM(d1

, . . . , dn

,�) is ✏-

di↵erentially private for � = e�✏

�f , where di

= f(Di

).

Exponential Mechanism. Concerning the algorithm to compute the exponen-tial mechanism [5] (EM) for n inputs, our approach is inspired by [17], which ishowever constrained to a 2-party setting.

Inputs and outputs. The algorithm to compute the EM on the join of n databasesis presented in Table 1c. It outputs the candidate a 2 R (where |R| = m 2 N),which is the result of locally executing the desired query f on the databasesD

1

, . . . , Dn

that are under the control of the participants P1

, . . . , Pn

respec-tively and sanitizing the joint result using the exponential mechanism. The al-gorithm takes the following inputs: (i) the data sets d

1

, . . . , dn

belonging tothe participants P

1

, . . . , Pn

respectively, (ii) the list of candidates a1

, . . . , am

,and (iii) the privacy parameter �. Note that in order to guarantee ✏-DP, theparameter � will be set to ✏

2�q

. For the sake of simplicity, we assume eachdata set d

i

2 D to be a histogram that is the result of locally executingf(D

i

). Each histogram is a sequence of m natural numbers z1

, . . . , zm

thatcorrespond to the frequency of candidates a

1

, . . . , am

2 R. For instance, for

9

• WeprovidealgorithmsforLaplace,DiscreteLaplace,andExponential• Trick:reducetheproblemtorandomnumbergeneration

• Lap(λ)=Exp(1/λ)-Exp(1/λ)withExp(λ)=-ln𝒰(0,1]/λ• DLap(λ)=Geo(1-λ)-Geo(1-λ)withGeo(λ)=⎣Exp(-ln(1-λ))⎦

• Exp(ε/2)=drawr∈𝒰(0,1]andcheck

AlgorithmsforSanitizationMechanisms

r ·mX

j=1

e✏q(D,aj) 2 (j�1X

k=1

e✏q(D,ak),jX

k=1

e✏q(D,ak)]

In: Shared fixed point form (�, f) inputs [d1

]�

, . . . , [dn

]�

; � = �f

✏

Out: w = (nP

i=1

di

) + Lap(�) in the fixed point form

1: [d]�

= [d1

]�

2: for i = 2 to n do3: [d]

�

= FPAdd([d]�

, [di

]�

)4: [r

x

] = RandInt(� + 1); [ry

] = RandInt(� + 1)5: h[v

x

], [px

], 0, 0i = FP2FL([rx

], �, f = �, `, k)6: h[v

y

], [py

], 0, 0i = FP2FL([ry

], �, f = �, `, k)7: h[v

x/y

], [px/y

], 0, 0i = FLDiv(h[vx

], [px

], 0, 0i, h[vy

], [py

], 0, 0i)8: h[v

ln

], [pln

], [zln

], [sln

]i = FLLog2(h[vx/y

], [px/y

], 0, 0i)9: h[v

z

], [pz

], [zz

], [sz

]i = FLMul( �

log2 e

, h[vln

], [pln

], [zln

], [sln

]i)

10: [z] = FL2FP(h[vz1 ], [pz1 ], [zz1 ], [sz1 ]i, `, k, �)

11: [w]�

= FPAdd([d]�

, [z])12: return w = Rec([w]

�

)

Table 2: Protocol: Distributed LM

In: Shared integer number (�) inputs [d1

]�

, . . . , [dn

]�

; � = e� ✏

�f ; ↵ = 1

ln�·log2 e

Out: integer w = (nP

i=1

di

) + DLap(�)

1: [d]�

= [d1

]�

2: for i = 2 to n do3: [d]

�

= IntAdd([d]�

, [di

]�

)4: [r

x

] = RandInt(� + 1); [ry

] = RandInt(� + 1)5: h[v

x

], [px

], 0, 0i = FP2FL([rx

], �, f = �, `, k)6: h[v

y

], [py

], 0, 0i = FP2FL([ry

], �, f = �, `, k)7: h[v

lnx

], [plnx

], [zlnx

], [slnx

]i = FLLog2(h[vx

], [px

], 0, 0i)8: h[v

lny

], [plny

], [zlny

], [slny

]i = FLLog2(h[vy

], [py

], 0, 0i)9: h[v

↵lnx

], [p↵lnx

], [z↵lnx

], [s↵lnx

]i = FLMul(↵, h[vlnx

], [plnx

], [zlnx

], [slnx

]i)10: h[v

↵lny

], [p↵lny

], [z↵lny

], [s↵lny

]i = FLMul(↵, h[vlny

], [plny

], [zlny

], [slny

]i)11: h[v

z1 ], [pz1 ], [zz1 ], [sz1 ]i = FLRound(h[v↵lnx

], [p↵lnx

], [z↵lnx

], [s↵lnx

]i, 0)12: h[v

z2 ], [pz2 ], [zz2 ], [sz2 ]i = FLRound(h[v↵lny

], [p↵lny

], [z↵lny

], [s↵lny

]i, 0)13: [z

1

] = FL2Int(h[vz1 ], [pz1 ], [zz1 ], [sz1 ]i, `, k, �)

14: [z2


15: [w]�

= IntAdd([d]�

, IntAdd([z1

],�[z2

]))16: return w = Rec([w]

�

)

Table 3: Protocol: Distributed DLM

query result [d1

] + . . .+ [dn

]. We compute the noise values in floating point formas the required logarithm and exponentiation operations are only available fordistributed floating point arithmetic. We use the conversion operations FP2FL,FL2Int, Int2FL whenever necessary.

Random Number Generation. As we have seen in the previous section, ouralgorithms rely heavily on the generation of a random number in the interval(0, 1] drawn according to the uniform distribution U

(0,1]

. Since the SMPC suitewe consider does not include such a function we encode it using the existingprimitive RandInt for the generation of a random integer. For instance, this is

12

• Forβcomputationparties:

ProtocolforDistributedLaplaceNoise

In: d1

, . . . , dn

; � = �f

✏

Out: (nP

i=1

di

) + Lap(�)

1: d =nP

i=1

di

2: rx

U

(0,1]

; ry

U

(0,1]

3: rz

= �(ln rx

� ln ry

)4: w = d+ r

z

5: return w

(a) LM

In: d1

, . . . , dn

; � = e� ✏

�f

Out: (

nPi=1

di

) + DLap(�)

1: d =

nPi=1

di

2: rx

U(0,1]

; ry

U(0,1]

3: ↵ =

1

ln �

= ��f

✏

4: rz

= b↵(ln rx

)c � b↵(ln ry

)c5: w = d+ r

z

6: return w

(b) DLM

In: d1

, . . . , dn

; a1

, . . . , am

; � =

✏

2

Out: winning ak

1: I0

= 0

2: for j = 1 to m do

3: zj

=

nPi=1

di

(j)

4: �j

= e�zj

5: Ij

= �j

+ Ij�1

6: r U(0,1]

; r0 = rIm

7: k = binary search(r0,, I0

, . . . , Im

)

8: return ak

(c) EM

Table 1: Algorithms: Sanitization Mechanisms

Privacy of EM. The EM algorithm implements the join ofthe individual n histograms, the utility function q as de-fined above, and the drawing of a random value accord-ing to "�

q

(d1

+ . . . + dn

), which is soundly encoded as ex-plained above. Thus, EM(d

1

, . . . , dn

, a1

, . . . , am

,�) com-putes "�

q

(d1

+ . . . + dn

), where q has sensitivity 1 and byTheorem 1 it follows that EM(d

1

, . . . , dn

, a1

, . . . , am

,�) is ✏-di↵erentially private for � = ✏

2

, where di

= f(Di

).

4. INSTANTIATIONIn this section, we instantiate the three mechanisms de-

scribed in the previous section. A technical challenge wehad to face was to identify, among all possible instantiations(based, e.g., on di↵erent type conversions), the most e�-cient one based on the currently available SMPC schemes(cf. § 2). The protocols we propose to compute the dis-tributed Laplace, the distributed discrete Laplace, and thedistributed exponential mechanism are given in Tables 2, 3,and 4 respectively, and are explained below.

Number Representation. For floating point form, eachreal value u is represented as a quadruple (v, p, z, s), wherev is an `-bit significand, p is a k-bit exponent, z is a bitwhich is set to 1 when the value u = 0, s is a sign bit, andu = (1� 2s) · (1� z) · v · 2p. Here, the most significant bit ofv is always set to 1 and thus v 2 [2`�1, 2`). The k-bit signedexponent p is from the range Zhki = (�2k�1, 2k+1). We use� to denote the bit-length of values in either integer or fixedpoint representation, and f to denote the bitlength of thefractional part in fixed point values. Every integer value xbelongs to Zh�i = (�2��1, 2�+1), while a fixed point numberx is represented as x̄ such that x̄ 2 Zh�i and x = x̄2�f .Finally, it is required that k > max(dlog(`+ f)e , dlog(�)e)and q > max(22`, 2� , 2k). For ease of exposition, we assumethat � = 2` for integers and fixed point numbers, and thatf = �

2

for fixed point numbers.

Input Distribution and Output Reconstruction. Weassume that prior to the computation the users P

1

, . . . , Pn

create � shares of their respective integer or fixed pointinputs d

1

, . . . , dn

in the (�,�)-sharing form and distributethem amongst the � computation parties C

1

, . . . C�

, so thateach party C

k

holds a share of each input value [di

], fork 2 {1, . . . ,�} and i 2 {1, . . . , n}.

Note that the input values are either integers or fixedpoint numbers that are only subject to addition operations.Therefore, for security against ��1 (instead of b��1

2

c) com-promised parties, we perform (�,�) sharing for input val-ues to obtain [·]

�

. For DP noise generation, we still relyon the honest majority assumption and thus use the usual(�, d�+1

2

e) sharing. After the parties C1

, . . . , C�

jointly com-


]

�

, . . . , [dn

]

�

; � =

�f

✏

Out: w = (

nPi=1

di


1: [d]�

= [d1

]

�

2: for i = 2 to n do3: [d]

�

= FPAdd([d]�

, [di

]

�

)

4: [rx

] = RandInt(� + 1); [ry

] = RandInt(� + 1)

5: h[vx

], [px

], 0, 0i = FP2FL([rx

], �, f = �, `, k)6: h[v

y

], [py

], 0, 0i = FP2FL([ry

], �, f = �, `, k)7: h[v

x/y

], [px/y


], [px

], 0, 0i, h[vy

], [py

], 0, 0i)8: h[v

ln

], [pln

], [zln

], [sln

]i = FLLog2(h[vx/y

], [px/y

], 0, 0i)9: h[v

z

], [pz

], [zz

], [sz

]i = FLMul(

�

log2 e

, h[vln

], [pln

], [zln

], [sln

]i)10: [z] = FL2FP(h[v

z1 ], [pz1 ], [zz1 ], [sz1 ]i, `, k, �)11: [w]

�

= FPAdd([d]�


�

)


puted the shared result [w]�

of the noise mechanism, theparties collaborate to reconstruct the result w = Rec([w]

�

).

General Overview. Intuitively, the instantiation for themost part unfolds the mathematical operations used in thealgorithms presented in § 3 and replaces them by the corre-sponding SMPCs for arithmetic operations listed in § 2.Additions for both integers and fixed point numbers are

very fast, while for floating point values, the protocol iscostly. We thus choose the n shared data inputs [d

1

], . . . , [dn

]to the mechanisms to be fixed point or integer numbers re-spectively to lower the cost of adding them together to yieldthe joint unperturbed query result [d

1

]+ . . .+[dn

]. We com-pute the noise values in floating point form as the requiredlogarithm and exponentiation operations are only availablefor distributed floating point arithmetic. We use the conver-sion operations FP2FL, FL2Int, Int2FL whenever required.

Random Number Generation. As we have seen in theprevious section, our algorithms rely heavily on the genera-tion of a random number in the interval (0, 1] drawn accord-ing to the uniform distribution U

(0,1]

. Unfortunately, theSMPC suite we consider does not include such a function.Hence we devised an SMPC protocol that is based on theidea of encoding such a random number generation usingthe primitive RandInt for the generation of a random integer(e.g., cf. steps 4 and 5 in Table 2. We first generate a shared(� + 1)-bit integer [r

x

] using the SMPC primitive RandInt.We then consider this integer to be the fractional part offixed point number, whose integer part is 0 (by choosingf = �). Afterwards, the fixed point number is convertedto floating point by using the function FP2FL and disre-garding the shared sign bit. Notice that strictly speaking,


ProtocolforDistributedDiscreteLaplaceNoise


]�

, . . . , [dn

]�

; � = �f

✏

Out: w = (nP

i=1

di


1: [d]�

= [d1

]�

2: for i = 2 to n do3: [d]

�

= FPAdd([d]�

, [di

]�

)4: [r

x

] = RandInt(� + 1); [ry

] = RandInt(� + 1)5: h[v

x

], [px

], 0, 0i = FP2FL([rx

], �, f = �, `, k)6: h[v

y

], [py

], 0, 0i = FP2FL([ry

], �, f = �, `, k)7: h[v

x/y

], [px/y


], [px

], 0, 0i, h[vy

], [py

], 0, 0i)8: h[v

ln

], [pln

], [zln

], [sln

]i = FLLog2(h[vx/y

], [px/y

], 0, 0i)

9: h[vz

], [pz

], [zz

], [sz

]i = FLMul( �

log2 e

, h[vln

], [pln

], [zln

], [sln

]i)

10: [z] = FL2FP(h[vz1 ], [pz1 ], [zz1 ], [sz1 ]i, `, k, �)

11: [w]�

= FPAdd([d]�


�

)


In: Shared integer number (�) inputs [d1

]�

, . . . , [dn

]�

; � = e�✏

�f ; ↵ = 1

ln�·log2 e

Out: integer w = (nP

i=1

di

) + DLap(�)

1: [d]�

= [d1

]�

2: for i = 2 to n do3: [d]

�

= IntAdd([d]�

, [di

]�

)4: [r

x

] = RandInt(� + 1); [ry

] = RandInt(� + 1)5: h[v

x

], [px

], 0, 0i = FP2FL([rx

], �, f = �, `, k)6: h[v

y

], [py

], 0, 0i = FP2FL([ry

], �, f = �, `, k)7: h[v

lnx

], [plnx

], [zlnx

], [slnx

]i = FLLog2(h[vx

], [px

], 0, 0i)8: h[v

lny

], [plny

], [zlny

], [slny

]i = FLLog2(h[vy

], [py

], 0, 0i)9: h[v

↵lnx

], [p↵lnx

], [z↵lnx

], [s↵lnx

]i =FLMul(↵, h[v

lnx

], [plnx

], [zlnx

], [slnx

]i)10: h[v

↵lny

], [p↵lny

], [z↵lny

], [s↵lny

]i =FLMul(↵, h[v

lny

], [plny

], [zlny

], [slny

]i)11: h[v

z1 ], [pz1 ], [zz1 ], [sz1 ]i =FLRound(h[v

↵lnx

], [p↵lnx

], [z↵lnx

], [s↵lnx

]i, 0)12: h[v

z2 ], [pz2 ], [zz2 ], [sz2 ]i =FLRound(h[v

↵lny

], [p↵lny

], [z↵lny

], [s↵lny

]i, 0)13: [z

1


14: [z2


15: [w]�

= IntAdd([d]�

, IntAdd([z1

],�[z2

]))16: return w = Rec([w]

�

)

Table 3: Protocol: Distributed DLM

sary.

14

In: d1

, . . . , dn

; � = �f

✏

Out: (nP

i=1

di

) + Lap(�)

1: d =nP

i=1

di

2: rx

U

(0,1]

; ry

U

(0,1]

3: rz

= �(ln rx

� ln ry

)4: w = d+ r

z

5: return w

(a) LM

In: d1

, . . . , dn

; � = e� ✏

�f

Out: (

nPi=1

di

) + DLap(�)

1: d =

nPi=1

di

2: rx

U(0,1]

; ry

U(0,1]

3: ↵ =

1

ln �

= ��f

✏

4: rz

= b↵(ln rx

)c � b↵(ln ry

)c5: w = d+ r

z

6: return w

(b) DLM

In: d1

, . . . , dn

; a1

, . . . , am

; � =

✏

2

Out: winning ak

1: I0

= 0


3: zj

=

nPi=1

di

(j)

4: �j

= e�zj

5: Ij

= �j

+ Ij�1

6: r U(0,1]

; r0 = rIm


, . . . , Im

)

8: return ak

(c) EM



q

(d1

+ . . . + dn


1

, . . . , dn

, a1

, . . . , am


q

(d1

+ . . . + dn


1

, . . . , dn

, a1

, . . . , am


2

, where di

= f(Di

).




2



1

, . . . , Pn


1

, . . . , dn


1

, . . . C�


k


], fork 2 {1, . . . ,�} and i 2 {1, . . . , n}.


2


�


2


, . . . , C�

jointly com-


]

�

, . . . , [dn

]

�

; � =

�f

✏

Out: w = (

nPi=1

di


1: [d]�

= [d1

]

�

2: for i = 2 to n do3: [d]

�

= FPAdd([d]�

, [di

]

�

)

4: [rx

] = RandInt(� + 1); [ry

] = RandInt(� + 1)

5: h[vx

], [px

], 0, 0i = FP2FL([rx

], �, f = �, `, k)6: h[v

y

], [py

], 0, 0i = FP2FL([ry

], �, f = �, `, k)7: h[v

x/y

], [px/y


], [px

], 0, 0i, h[vy

], [py

], 0, 0i)8: h[v

ln

], [pln

], [zln

], [sln

]i = FLLog2(h[vx/y

], [px/y

], 0, 0i)9: h[v

z

], [pz

], [zz

], [sz

]i = FLMul(

�

log2 e

, h[vln

], [pln

], [zln

], [sln

]i)10: [z] = FL2FP(h[v

z1 ], [pz1 ], [zz1 ], [sz1 ]i, `, k, �)11: [w]

�

= FPAdd([d]�


�

)




�

).



1

], . . . , [dn


1

]+ . . .+[dn



(0,1]


x


SIMILAR


ProtocolforDistributedExponentialMechanism

In: [d1

], . . . , [dn

]; the number m of candidates; � = ✏

2

Out: m-bit w, s.t. smallest i for which w(i) = 1 denotes winning candidate ai

1: I0

= h0, 0, 1, 0i2: for j = 1 to m do3: [z

j

]�

= 04: for i = 1 to n do5: [z

j

]�

= IntAdd([zj

]�

, [di

(j)]�

)6: h[v

zj ], [pzj ], [zzj ], [szj ]i = Int2FL([zj

]�

, �, `)7: h[v

z

0j], [p

z

0j], [z

z

0j], [s

z

0j]i =

FLMul(� · log2

e, h[vzj ], [pzj ], [zzj ], [szj ]i)

8: h[v�j ], [p�j ], [z�j ], [s�j ]i =

FLExp2(h[vz

0j], [p

z

0j], [z

z

0j], [s

z

0j]i)

9: h[vIj ], [pIj ], [zIj ], [sIj ]i = FLAdd(h[v

Ij�1 ], [pIj�1 ],[z

Ij�1 ], [sIj�1 ]i, h[v�j ], [p�j ], [z�j ], [s�j ]i)10: [r] = RandInt(� + 1)11: h[v

r

], [pr

], 0, 0i = FP2FL([r], �, f = �, `, k)12: h[v0

r

], [p0r

], [z0r

], [s0r

]i =FLMul(h[v

r

], [pr

], 0, 0i, h[vIm ], [p

Im ], [zIm ], [s

Im ]i)13: j

min

= 1; jmax

= m14: while j

min

< jmax

do15: j

M

= b

j

min

+j

max

2

c

16: if FLLT(h[vIj

M

], [pIj

M

], [zIj

M

], [sIj

M

]i, h[v0r

], [p0r

], [z0r

], [s0r

]i) then17: j

min

= jM

+ 1 else jmax

= jM

18: return wj

min

Table 4: Protocol: Distributed EM

Random Number Generation. As we have seen in the previous section, our algo-rithms rely heavily on the generation of a random number in the interval (0, 1] drawnaccording to the uniform distribution U

(0,1]

. Unfortunately, the SMPC suite we considerdoes not include such a function. Hence we devised an SMPC protocol that is based onthe idea of encoding such a random number generation using the primitive RandInt forthe generation of a random integer (e.g., cf. steps 4 and 5 in Table 2. We first generatea shared (� + 1)-bit integer [r

x

] using the SMPC primitive RandInt. We then considerthis integer to be the fractional part of fixed point number, whose integer part is 0 (bychoosing f = �). Afterwards, the fixed point number is converted to floating pointby using the function FP2FL and disregarding the shared sign bit. Notice that strictlyspeaking, this generates a random number in [0, 1). We can achieve a transition to theexpected interval (0, 1] by slightly modifying the conversion primitive FP2FL such thatthe shared [0] is replaced by the sharing of [1] in step 3 [9, § 5]. We could avoid themodification of FP2FL and instead transition into the desired interval by subtractingthe random number from 1, but this requires an additional costly addition step.

Exponentiation and Logarithm. The work by Aliasgari et al. [9] provides SMPCsfor computing exponentiation with base 2 (FLExp2) and logarithm to base 2 (FLLog2).Since we often require exponentiation and logarithm to a base b 6= 2, we use the

15

In: d1

, . . . , dn

; � = �f

✏

Out: (nP

i=1

di

) + Lap(�)

1: d =nP

i=1

di

2: rx

U

(0,1]

; ry

U

(0,1]

3: rz

= �(ln rx

� ln ry

)4: w = d+ r

z

5: return w

(a) LM

In: d1

, . . . , dn

; � = e� ✏

�f

Out: (

nPi=1

di

) + DLap(�)

1: d =

nPi=1

di

2: rx

U(0,1]

; ry

U(0,1]

3: ↵ =

1

ln �

= ��f

✏

4: rz

= b↵(ln rx

)c � b↵(ln ry

)c5: w = d+ r

z

6: return w

(b) DLM

In: d1

, . . . , dn

; a1

, . . . , am

; � =

✏

2

Out: winning ak

1: I0

= 0


3: zj

=

nPi=1

di

(j)

4: �j

= e�zj

5: Ij

= �j

+ Ij�1

6: r U(0,1]

; r0 = rIm


, . . . , Im

)

8: return ak

(c) EM



q

(d1

+ . . . + dn


1

, . . . , dn

, a1

, . . . , am


q

(d1

+ . . . + dn


1

, . . . , dn

, a1

, . . . , am


2

, where di

= f(Di

).




2



1

, . . . , Pn


1

, . . . , dn


1

, . . . C�


k


], fork 2 {1, . . . ,�} and i 2 {1, . . . , n}.


2


�


2


, . . . , C�

jointly com-


]

�

, . . . , [dn

]

�

; � =

�f

✏

Out: w = (

nPi=1

di


1: [d]�

= [d1

]

�

2: for i = 2 to n do3: [d]

�

= FPAdd([d]�

, [di

]

�

)

4: [rx

] = RandInt(� + 1); [ry

] = RandInt(� + 1)

5: h[vx

], [px

], 0, 0i = FP2FL([rx

], �, f = �, `, k)6: h[v

y

], [py

], 0, 0i = FP2FL([ry

], �, f = �, `, k)7: h[v

x/y

], [px/y


], [px

], 0, 0i, h[vy

], [py

], 0, 0i)8: h[v

ln

], [pln

], [zln

], [sln

]i = FLLog2(h[vx/y

], [px/y

], 0, 0i)9: h[v

z

], [pz

], [zz

], [sz

]i = FLMul(

�

log2 e

, h[vln

], [pln

], [zln

], [sln

]i)10: [z] = FL2FP(h[v

z1 ], [pz1 ], [zz1 ], [sz1 ]i, `, k, �)11: [w]

�

= FPAdd([d]�


�

)




�

).



1

], . . . , [dn


1

]+ . . .+[dn



(0,1]


x


SIMILAR

AttackerModelandPrivacyGuarantees• Weconsidertwosettings:

• honest-but-curious(HbC)computationparties:• weassumethatlessthant<β/2ofβpartiescollude

• maliciouscomputationparties:• weassumethatlessthant<β/2ofβpartiescollude• wemodifyourSMPCsuchthatcorrectnessofeachcomputationstepisprovedbyzero-knowledgeproofs

Main results: ✦ The SMPC protocols for LM, DLM, and EM are differentially

private in the honest-but-curious setting.✦ The SMPC protocols for LM, DLM, and EM are differentially

private in the malicious setting under the strong RSA and decisional Diffie-Hellman assumptions.

PerformanceofSMPCOperations(insec)

Libraries:GMP,Relic,Boost,andOpenSSL

Setup:3.20GHz(Inteli5)Linuxmachinewith16GBRAM,usinga1GbpsLAN

Type Protocol HbC Malicious� = 3, � = 5, � = 3, � = 5,t = 1 t = 2 t = 1 t = 2

Float FLAdd 0.48 0.76 14.6 29.2FLMul 0.22 0.28 3.35 7.54FLScMul 0.20 0.28 3.35 7.50FLDiv 0.54 0.64 4.58 10.2FLLT 0.16 0.23 2.82 6.22FLRound 0.64 0.85 11.4 23.4

Convert FP2FL 0.83 1.21 25.7 50.9Int2FL 0.85 1.22 25.7 50.9FL2Int 1.35 1.91 26.3 54.3FL2FP 1.40 1.96 26.8 55.3

Log FLLog2 12.0 17.0 274 566

Exp FLExp2 7.12 9.66 120 265

Table 5: Performance of SMPC operations measured in sec

tation cost of 15.5 sec, while the distributed DLM protocolrequires around 31.3 sec. The better e�ciency of the LM

mechanism is due to the fact that we halved the number ofcostly logarithm operations FLLog2 and necessary follow-upoperations by using the property ln r

x

� ln ry

= ln r

x

r

y

, which

is not possible for its discrete counterpart due to the nec-essary floor operations FLRound. The computation cost ofthe distributed EM protocol linearly depends on the numberm = |R| of result candidates. For instance, for m = 5, thecost of computation is 42.3 sec.

For larger numbers of computation parties �, one can ex-trapolate the performance from our analysis for (� = 3, t =1) and (� = 5, t = 2). Even for � ⇡ 100, we expect the dis-tributed LM and DLM protocols to take about a few hun-dred seconds in the HbC setting. We also compared ourexperimental results with [8]. We could not reproduce theirresults, possibly due to the introduced memory managementand correctness verifications.

Cost Analysis (Malicious Setting). As expected, thecomputations times for the SMPC operations secure againstan active adversary are significantly higher (around 15-20 times) than those of the operations secure against anHbC adversary. The average performance costs for our dis-tributed LM, DLM, and EM protocols for (� = 3, t = 1)computation parties and 100, 000 users in the malicious set-ting are as follows: The distributed LM protocol has an aver-age computation cost of 344 sec, while the distributed DLM

protocol requires 477 sec. The cost of the distributed EM

protocol, for m = 5 result candidates, is 652 sec.We stress that these operations are performed by compu-

tation parties, and that there are no critical timing restric-tions on DDP computations in most real-life scenarios, suchas web analytics. Nevertheless, we expect 1 order of magni-tude performance gain in the HbC as well as the malicioussetting by employing high performance computing servers.Furthermore, since users have to simply forward their sharedvalues to the computation parties, which is an inexpensiveoperation (< 1msec in the HbC setting and a couple ofmilliseconds in the malicious setting), we believe that thesenumbers demonstrate the practicality of PrivaDA even ina setting where clients are equipped with computationallylimited devices, such as smartphones.

7. APPLICATION SCENARIOSWe showcase the flexibility of our architecture by briefly

discussing how PrivaDA can be used to improve the state-of-the-art in two di↵erent application scenarios. An additional

scenario (anonymous surveys) is given in the long version [2].

Web Analytics. Web analytics consist in the measure-ment, collection, analysis, and reporting of Internet dataabout users visiting a website. For instance, data can in-clude user demographics, browsing behavior, and informa-tion about the clients’ systems. This information is impor-tant for publishers, because it enables them to optimize theirsite content according to the users’ interests, for advertisers,because it allows them to target a selected population, andmany other parties, which we will refer to as analysts.

State-of-the-Art. In order to obtain aggregated user infor-mation websites commonly use third party web analyticsservices, called aggregators, which however track individualusers’ browsing behavior across the web, thereby violatingtheir privacy. Newer systems, e.g., a series of non-trackingweb analytics systems [6,15] proposed by Chen et al., provideusers with DP guarantees but rely on strong non-collusionassumptions. Should a collusion happen, not only the noisebut also the individual user’s data would be disclosed.

Protocol design in PrivaDA. The computation parties areoperated by third-parties, which are possibly paid by theaggregator. In order to avoid multiple responses by eachuser without relying on a public key infrastructure, whichis unrealistic in this setting, we add an initial step to theprotocol. The publisher signs and gives each visiting usera di↵erent token, along with one or more queries and anassociated expiry time (within which the result has to becomputed). The user sends the tokens to the computationparties, together with their answer shares, so that the com-putation parties are able to detect duplicates and to discardthem before the aggregation. The users have just to submittheir shares and can then go o✏ine. Finally, the support fora variety of perturbation mechanisms enables the executionof di↵erent kinds of analytical queries.

Tra�c Statistics for Anonymous CommunicationNetworks (ACNs). Given their anonymous nature, it ishard to collect egress tra�c statistics from ACNs, such asTor, without violating the privacy of users. Such statis-tics are interesting to both designers and researchers, whichmight for instance want to know how much of the networktra�c is made up by people trying to circumvent censorship.

State-of-the-Art. Elahi et al. recently proposed PrivEx [24],a system for collecting di↵erentially private statistics onACNs tra�c in predefined slots of time (epochs). Theirwork provides two ad-hoc protocols that rely on secret shar-ing and distributed decryption respectively. Nevertheless, totolerate even an HbC adversary PrivEx has to compromiseon the utility or the epoch duration.

Protocol design in PrivaDA. We can easily apply PrivaDAto the problem of collecting anonymous tra�c statistics: wesimply let the ACN egress nodes, which relay tra�c be-tween the ACN and the destination websites, count the ac-cesses to the di↵erent destinations that they relayed. Af-ter a fixed epoch, they then share their individual countsamong mutually distrustful computation parties (e.g., pri-vacy organizations, research centers, and service providers),which jointly compute the overall egress tra�c in a privacy-preserving manner with optimal utility.

8. CONCLUSION AND FUTURE WORKAlthough it is a long-held belief that SMPCs may be used

to generically design di↵erentially private data aggregation

PerformanceofLM,DLMandEM

•Forβ=3andt=1andnumberofusersn=100,000

•TheHbCsetting•DistributedLMprotocol:15.5sec•DistributedDLMprotocol:31.3sec•DistributedEMprotocol:42.3sec (fornumberofcandidatesm=5)

•Themalicioussetting•DistributedLMprotocol:344sec

Caveatswithnumberrepresentations

•Carefulwithfiniterepresentationofrealnumbers!•E.g.,porosityofFLrepresentationbreaksLaplace•Intheabovepapers,solutionsbasedonsuitableroundingandtruncationmechanisms•Canbeeasilyintegratedinourframework

Type Protocol HbC Malicious� = 3, t = 1 � = 5, t = 2 � = 3, t = 1 � = 5, t = 2

Float FLAdd 0.55 .99 24 43FLMul 0.27 0.5 10 18.1FLScMul 0.24 0.47 9.9 18.1FLDiv 0.56 0.9 13 24.7FLLT 0.18 0.31 7 12.9FLRound 0.69 1.04 22.7 40.6

Conversion FP2FL 0.88 1.40 42 74Int2FL 0.88 1.32 42 74FL2Int 1.49 2.19 53 95FL2FP 1.50 2.21 54 98

Logarithm FLLog2 13.7 19.5 563 1001Exponentiation FLExp2 8.9 12.1 336 605

Table 5: Performance of measured in seconds

necessary floor operations FLRound. The computation cost of the distributedEM protocol linearly depends on the number m = |R| of result candidates. Forinstance, for m = 5, the cost of computation is 55.8 sec.

Given the e�ciency and non-interactiveness of the IntAdd and FPAdd proto-cols, we can easily cater to a larger number of users n: for n = 100, 000, the costincreases by 0.1 sec for the distributed LM and DLM protocols, and by 0.1⇤m sec

for the distributed EM protocol. For larger numbers of computation parties �,one can extrapolate the performance from our analysis for (� = 3, t = 1) and(� = 5, t = 2). Even for � ⇡ 100, we expect the distributed LM and DLMprotocols to take about a few hundred seconds in the HbC setting. We also com-pared our experimental results with [18]. We could not re-produce their results,possibly due to the introduced memory management and correctness checks.

Cost Analysis (Malicious Setting). As expected, the computations timesfor the SMPC operations secure against an active adversary are significantlyhigher (around 30-50 times) than those of the operations secure against an HbCadversary. The average performance costs for our distributed LM, DLM, and EM

protocols for (� = 3, t = 1) computation parties and 1000 users in the malicioussetting are as follows: The distributed LM protocol has an average computationcost of 709 sec, while the distributed DLM protocol requires 1396 sec. The costof the distributed EM protocol, for m = 5 result candidates, is 2155 sec.

We are working on a multi-threaded implementation, and based on the ex-periments from [18], we expect 1 order of magnitude performance gain in theHbC as well as the malicious setting; e.g., the cost of the distributed EM protocolwith m = 5 in the malicious setting should drop from 2155 sec to around 200 sec.Nevertheless, we stress that these operations are performed by computation par-ties, and that there are no critical timing restrictions on DDP computations inmost real-life scenarios, such as web analytics. Furthermore, since users have tosimply forward their shared values to the computation parties, which is an inex-pensive operation (< 1msec in the HbC setting and a couple of milliseconds inthe malicious setting), we believe that these numbers demonstrate the practical-

17

insec

✦ Operations performed by computation parties ✦ No critical timing restrictions on DDP computations in most

real-life scenarios ✦ Users simply forward their shared values to the computation

parties (< 1 sec)

Demonstrates practicality of PrivaDA (even on computationally limited devices, such as

smartphones)

ImplementationandPerformance

Differential Privacy (Part III) · 2016-03-03 · Approximate (or (ℇ,∂))-differential privacy...

Documents

Transcript of Differential Privacy (Part III) · 2016-03-03 · Approximate (or (ℇ,∂))-differential privacy...