Foundations of Privacy Lecture 7

Foundations of Privacy

Lecture 7

Lecturer: Moni Naor

(, d) - Differential Privacy

Bad Responses: Z ZZ

Pr [response]

ratio bounded

This course: d negligible

Sanitizer M gives (, d) -differential privacy if: for all adjacent D1 and D2, and all A µ range(M):

Pr[M(D1) 2 A] ≤ e Pr[M(D2) 2 A] + d

Typical setting and negligible

Example: NO Differential PrivacyU set of (name,tag 2{0,1}) tuplesOne counting query: #of participants with tag=1

Sanitizer A: choose and release a few random tagsBad event T: Only my tag is 1, my tag releasedPrA[A(D+I) 2 T] ≥ 1/nPrA[A(D-I) 2 T] = 0

PrA[A(D+I) 2 T]

PrA[A(D-I) 2 T]≤ eε ≈ 1+ε e-ε ≤

• Not ε diff private for any ε!• It is (0,1/n) Differential

Private

Counting Queries

Counting-queriesQ is a set of predicates q: U {0,1}Query: how many x participants satisfy q?Relaxed accuracy:

answer query within α additive error w.h.pNot so bad: some error anyway inherent in statistical analysis

U

Database x of size n

Query qn individuals, each contributing a single point in U

Sometimes talk about fraction

Bounds on Achievable Privacy

Bounds on the • Accuracy

– The responses from the mechanism to all queries are assured to be within α except with probability

• Number of queries t for which we can receive accurate answers

• The privacy parameter ε for which ε differential privacy is achievable – Or (ε,d) differential privacy is achievable

Composition: t-FoldSuppose we are going to apply a DP mechanism t times.

– Perhaps on different databases

Want: the combined outcome is differentially private• A value b 2 {0,1} is chosen • In each of the t rounds:

– adversary A picks two adjacent databases D0i and D1

i and an -DP mechanism Mi

– receives result zi of the -DP mechanism Mi on Dbi

• Want to argue: A‘s view is within ’ for both values of b

• A‘s view: (z1, z2, …, zt) plus randomness used.

M2(Db2)

D02, D1

2

M1(Db1)

Adversary’s view

A

D01,

D11

M1 M2

Mt(Dbt)

D0t, D1

t

Mt

…

z1 z2 zt

A’s view: randomness + (z1, z2, …, zt) Distribution with b: Vb

Differential Privacy: Composition

Last week:• If all mechanisms Mi are -DP, then for any view the

probability that A gets the view when b=0 and when b=1 are with et

• t releases , each -DP, are t¢ -DP

• Today: – t releases, each -DP, are (√t+t 2,d)-DP (roughly)

Therefore results for a single query translate to results on several queries

Privacy Loss as a Random Walk

Number of Steps t

grows as

1-1 1 1 -11 1 -1

potentially dangerous rounds

Privacy loss

The Exponential Mechanism [McSherry Talwar]

A general mechanism that yields • Differential privacy• May yield utility/approximation• Is defined and evaluated by considering all possible answers

The definition does not yield an efficient way of evaluating it

Application/original motivation: Approximate truthfulness of auctions

• Collusion resistance• Compatibility

Side bar: Digital Goods Auction

• Some product with 0 cost of production• n individuals with valuation v1, v2, … vn• Auctioneer wants to maximize profit

Key to truthfulness: what you say should not affect what you pay• What about approximate truthfulness?

Example of the Exponential Mechanism• Data: xi = website visited by student i today• Range: Y = {website names}• For each name y, let q(y, X) = #{i : xi = y}Goal: output the most frequently visited site• Procedure: Given X, Output website y with

probability proportional to eq(y,X)

• Popular sites exponentially more likely than rare ones Website scores don’t change too quickly

Size of subset

Setting• For input D 2 Un want to find r2R• Base measure on R - usually uniform• Score function w: Un £ R R

assigns any pair (D,r) a real value– Want to maximize it (approximately)

The exponential mechanism– Assign output r2R with probability proportional to

ew(D,r) (r)

Normalizing factor r ew(D,r) (r)

The reals

The exponential mechanism is private• Let = maxD,D’,r |w(D,r)-w(D’,r)|

Claim: The exponential mechanism yields a 2¢¢ differentially private solution

For adjacent databases D and D’ and for all possible outputs r 2R • Prob[output = r when input is D]

= ew(D,r) (r)/r ew(D,r) (r)• Prob[output = r when input is D’]

= ew(D’,r) (r)/r ew(D’,r) (r)

adjacent

Ratio isbounded by

e e

sensitivity

Laplace Noise as Exponential Mechanism

• On query q:Un→R let w(D,r) = -|q(D)-r|

• Prob noise = y e-y /2 y e-y = /2 e-y

Laplace distribution Y=Lap(b) has density function Pr[Y=y] =1/2b e-|y|/b

y

0 1 2 3 4 5-1-2-3-4

Any Differentially Private Mechanism is an instance of the Exponential Mechanism

• Let M be a differentially private mechanism

Take w(D,r) to be log (Prob[M(D) =r])

Remaining issue: Accuracy

Private Ranking• Each element i 2 {1, … n} has a real valued

score SD(i) based on a data set D.• Goal: Output k elements with highest scores.• Privacy• Data set D consists of n entries in domain D.

– Differential privacy: Protects privacy of entries in D.• Condition: Insensitive Scores

– for any element i, for any data sets D and D’ that differ in one entry:

|SD(i)- SD’(i)| · 1

Approximate ranking

• Let Sk be the kth highest score in on data set D.• An output list is -useful if:

Soundness: No element in the output has score · Sk -

Completeness: Every element with score ¸ Sk + is in the output.

Score · Sk -

Sk + · Score

Sk - · Score · Sk +

Two Approaches• Score perturbation

– Perturb the scores of the elements with noise – Pick the top k elements in terms of noisy scores.– Fast and simple implementation Question: what sort of noise should be added?What sort of guarantees?

• Exponential sampling– Run the exponential mechanism k times.– more complicated and slower implementationWhat sort of guarantees?

Each input affects all scores

Exponential Mechanism: Simple Example (almost free) private lunch

Database of n individuals, lunch options {1…k},each individual likes or dislikes each option (1 or 0)

Goal: output a lunch option that many likeFor each lunch option j2 [k], ℓ(j) is # of individuals who

like jExponential Mechanism:

Output j with probability eεℓ(j)

Actual probability: eεℓ(j)/(∑i eεℓ(i))Normalizer

The Net Mechanism

• Idea: limit the number of possible outputs– Want |R| to be small

• Why is it good?– The good (accurate) output has to compete with a few

possible outputs– If there is a guarantee that there is at least one good

output, then the total weight of the bad outputs is limited

NetsA collection N of databases is called an -net of databases for a class of queries C if: • for all possible databases x there exists a y2N

such that Maxq2C |q(x) –q(y)| ·

If we use the closest member of N instead of the real database

lose at most In terms of worst query

The Net Mechanism

For a class of queries C, privacy and accuracy , on data base x• Let N be an -net for the class of queries C • Let w(x,y) = - Maxq2C |q(x) –q(y)| • Sample and output according to exponential

mechanism with x, w, and R=N– For y2N: Prob[y] proportional to

ew(x,y)

Privacy and UtilityClaimsPrivacy: the net mechanism is ¢ differentially private Utility: the net mechanism is (2, ) accurate for any and such that

a ¸ 2/ ¢ log (|N|/)Proof: – there is at least one good solution: gets weight at least e-

– there are at most |N| (bad) outputs: each get weight at most e-2

– Use the Union Bound

Accuracy less than 2

query 1,query 2,. . .

Synthetic DB: Output is a DB

Database

answer 1answer 3

answer 2

?

Sanitizer

Synthetic DB: output is always a DB

• Of entries from same universe U

• User reconstructs answers to queries by evaluating the query on output DB

Software and people compatibleConsistent answers

Counting Queries

• Queries with low sensitivity

Counting-queriesC is a set of predicates c: U {0,1}Query: how many D participants satisfy c ?

Relaxed accuracy: answer query within α additive error w.h.pNot so bad: error anyway inherent in statistical analysis

Assume all queries given in advance

U

Database D of size n

Query c

Non-interactive

-Net For Counting QueriesIf we want to answer many counting queries C with

differential privacy:– Sufficient to come up with an -Net for C– Resulting accuracy max{, log (|N|/) / }

Claim: consider the set N consisting of all databases of size m where m = log|C|/2

Consider each element in the set to have weight n/mThen N is an -Net for any collection C of counting queries• Error is Õ(n2/3 log|C|)

RemarkableHope for rich private analysis of small DBs!• Quantitative: #queries >> DB size,

• Qualitative: output of sanitizer -synthetic DB-output is a DB itself

The BLR Algorithm

For DBs F and Ddist(F,D) = maxq2C |q(F) – q(D)|

Intuition: far away DBs get smaller probability

Algorithm on input DB D:Sample from a distribution on DBs of size m: (m < n)

DB F gets picked w.p. / e-ε·dist(F,D)

Blum Ligett Roth 2008

The BLR Algorithm

Idea:• In general: Do not use large DB

– Sample and answer accordingly• DB of size m guaranteeing hitting each query with

sufficient accuracy

The BLR Algorithm: Error Õ(n2/3 log|C|)

Goodness Lemma: there exists Fgood of size

m =Õ((n\α)2·log|C|) s.t. dist(Fgood,D) ≤ α

Proof: construct member of by Fgood taking m random samples from U



The BLR Algorithm: Error Õ(n2/3 log|C|)

Goodness Lemma: there exists Fgood of size

m =Õ((n\α)2·log|C|) s.t. dist(Fgood,D) ≤ αPr [Fgood] ~ e-εα

For any Fbad with dist 2α, Pr [Fbad] ~ e-2εα

Union bound: ∑ bad DB Fbad Pr [Fbad] ~ |U|me-

2εα

For α=Õ(n2/3log|C|), Pr [Fgood] >> ∑ Pr [Fbad]



The BLR Algorithm: 2ε-Privacy

For adjacent D,D’ for every F|dist(F,D) – dist(F,D’)| ≤ 1

Probability of F by D: e-ε·dist(F,D)/∑G of size m e-ε·dist(G,D)

Probability of F by D’:numerator and denominator can change by eε-factor ) 2ε-privacy



The BLR Algorithm: Running Time

Generating the distribution by enumeration:Need to enumerate every size-m database,where m = Õ((n\α)2·log|C|)

Running time ≈ |U|Õ((n\α)2·log|c|)



Conclusion

Offline algorithm, 2ε-Differential Privacy for anyset C of counting queries

• Error α is Õ(n2/3 log|C|/ε)

• Super-poly running time: |U|Õ((n\α)2·log|C|)

Maintaining State

Query q

State = Distribution D

The Multiplicative Weights Algorithm

• Powerful tool in algorithms design• Learn a Probability Distribution iteratively• In each round:

• either current distribution is good• or get a lot of information on distribution

• Update distribution

The true value

The PMW Algorithm

Initialize D to be uniform on URepeat up to k times• Set Ã T + Lap()• Repeat while no update occurs:

– Receive query q 2 Q– Let = x(q) + Lap() – Test: If |q(D)- | · output q(D).– Else (update):

• Output • Update D[i] / D[i] e±T/4q[i] and re-weight.

the plus or minus are according to the sign of the error

Algorithm fails if more than k updates

Maintain a distribution D on universe U This is the state. Is completely public!

Overview: Privacy AnalysisFor the query family Q = {0,1}U for (, d, ) and t the PMW mechanism is • (, d) –differentially private• (,) accurate for up to t queries where

= Õ(1/( n)1/2)

• State = Distribution is privacy preserving for individuals (but not for queries)

Log dependency on |U|, d, and t

accuracy

Foundations of Privacy Lecture 7

Documents

Transcript of Foundations of Privacy Lecture 7