Foundations of Privacy Lecture 5

Foundations of Privacy

Lecture 5

Lecturer: Moni Naor

Desirable Properties from a sanitization mechanism

• Composability– Applying the sanitization several time yields a graceful

degradation – Will see: t releases , each -DP, are t¢ -DP– Next class: (√t+t 2,)-DP (roughly)

• Robustness to side information– No need to specify exactly what the adversary knows: – knows everything except one row

Differential Privacy: satisfies both…

Differential PrivacyProtect individual participants:

Curator/Sanitizer M

Curator/Sanitizer M

+

Dwork, McSherry Nissim & Smith 2006

D2

D1

Adjacency: D+I and D-I

Differential PrivacyProtect individual participants:

Probability of every bad event - or any event - increases only by small multiplicative factor when I enter the DB.

May as well participate in DB…

ε-differentially private sanitizer MFor all DBs D, all individuals I and all events T

PrA[M(D+I) 2 T]PrA[M(D-I) 2 T]

≤ eε ≈ 1+ε e-ε ≤

Handles aux input

Differential Privacy

(Bad) Responses: Z ZZ

Pr [response]

ratio bounded

Sanitizer M gives -differential privacy if: for all adjacent D1 and D2, and all A µ range(M):

Pr[M(D1) 2 A] ≤ e Pr[M(D2) 2 A]

Participation in the data set poses no additional risk

Differing in one user

Example of Differential Privacy

X is a set of (name,tag 2 {0,1}) tuplesOne query: #of participants with tag=1

Sanitizer : output #of 1’s + noise• noise from Laplace distribution with parameter 1/ε • Pr[noise = k-1] ≈ eε Pr[noise=k]

0 1 2 3 4 5-1-2-3-4

0 1 2 3 4 5-1-2-3-4

(, ) - Differential Privacy

Bad Responses: Z ZZ

Pr [response]

ratio bounded

This course: negligible

Sanitizer M gives (, ) -differential privacy if: for all adjacent D1 and D2, and all A µ range(M):

Pr[M(D1) 2 A] ≤ e Pr[M(D2) 2 A] +

Typical setting and negligible

Example: NO Differential PrivacyU set of (name,tag 2{0,1}) tuplesOne counting query: #of participants with tag=1

Sanitizer A: choose and release a few random tagsBad event T: Only my tag is 1, my tag releasedPrA[A(D+Me) 2 T] ≥ 1/nPrA[A(D-Me) 2 T] = 0

PrA[A(D+Me) 2

T]PrA[A(D-Me) 2 T]≤ eε ≈ 1+ε e-ε ≤

• Not ε diff private for any ε!• It is (0,1/n) Differential

Private

Counting Queries

Counting-queriesQ is a set of predicates q: U {0,1}Query: how many x participants satisfy q?Relaxed accuracy:

answer query within α additive error w.h.pNot so bad: some error anyway inherent in statistical analysis

U

Database x of size n

Query qn individuals, each contributing a single point in U

Sometimes talk about fraction

Bound on Achievable Privacy

Want to get bounds on the • Accuracy

– The responses from the mechanism to all queries are assured to be within α except with probability

• Number of queries t for which we can receive accurate answers

• The privacy parameter ε for which ε differential privacy is achievable – Or (ε,) differential privacy is achievable

Blatant Non PrivacyMechanism M is Blatantly Non-Private if there is an adversary A that • On any database D of size n can select queries and

use the responses M(D) to reconstruct D’ such that ||D-D’||1 2 o(n)

D’ agrees with D in all but o(n) of the entries.

Claim: Blatant non privacy implies that M is not (, ) -DP for any constant

12

Sanitization Can’t be Too Accurate

Usual counting queries–Query: q µ [n]– i 2 q di Response = Answer + noise

Blatant Non-Privacy: Adversary Guesses 99% bits

Theorem: If all responses are within o(n) of the true answer, then the algorithm is blatantly non-private.

But: require exponential # of queries .

13

Proof: Exponential Adversary• Focus on Column Containing Super Private Bit

• Assume all answers are within error bound .

“The database”

Vector d 2 {0,1}n

0

1

1

1

1

0

0

Will show that cannot be o(n)

14

Proof: Exponential Adversary for Blatant Non Privacy

• Estimate #1’s in all possible sets– 8 S µ [n]: |M(S) – i 2 S di | ≤

• Weed Out “Distant” DBs– For each possible candidate database c 2 {0,1}n:

If for any S µ [n]: |i 2 S ci – M(S)| > ,

then rule out c.– If c not ruled out, halt and output c

Claim: Real database d won’t be ruled out

M(S): answer on S

Impossibility of Exponential QueriesThe result means that we cannot sanitize the data and publish a data structure so that • for all queries the answer can be deduced correctly

to within 2 o(n)

query 1,query 2,. . .

Database

answer 1answer 3

answer 2

?

SanitizerOn the other hand: we will see that we can get accuracy up to log |Q|

What can we do efficiently?Allowed “too” much power to the adversary• Number of queries: exponential• Computation: exponential• On the other hand: lack of wild errors in the responses

Theorem: For any sanitization algorithm: If all responses are within o(√n) of the true answer, then it

is blatantly non-private even against a polynomial time adversary making O(n log2 n) random queries.

The Model

• As before: database d is a bit string of length n.

• Counting queries:– A query is a subset q µ {1, …, n}– The (exact) answer is aq = i 2q di

• -perturbation – for an answer: aq ±

Slide 18

What If We Had Exact Answers? • Consider a mechanism 0-perturbations

– Receive the exact answer aq = i 2q di

Then with n linearly independent queries – over the reals

we could reconstruct d precisely: • Obtain n linearly equations aq = i 2q ci and solve uniquely

When we have -perturbations : get an inequality • aj - ≤ i 2q ci ≤ aj +

Idea: use linear programming

A solution must exist: d itself

Privacy requires Ω(√n) perturbation

Consider a database with o(√n) perturbation• Adversary makes t = n log2 n random queries qj,

getting noisy answers aj• Privacy violating Algorithm:

Construct database c = {ci}1 ≤ i ≤ n by solving Linear Program:

0 ≤ ci ≤ 1 for 1 ≤ i ≤ naj - ≤ i 2q ci ≤ aj + for 1 ≤ j ≤ t

• Round the solution:– if ci > 1/2 set to 1 and to 0 otherwise

A solution must exist: d itself

For every query qj: its answer according to c is

at most 2 far from its (real) answer in d.

Bad solutions to LP do not surviveA query q disqualifies a potential database c 2 [0,1]n if its answer on q is more than 2 far answer in d:

|i 2q ci -i 2q di| > 2• Idea: show that for a database c that is far away from d

a random query disqualifies c with some constant probability

• Want to use the Union Bound: all far away solutions are disqualified w.p. at least 1 – nn(1 - )t = 1–neg(n) How do we limit the solution space?

Round each value to closest 1/n

Privacy requires Ω(√n) perturbationA query q disqualifies a potential database c 2 [0,1]n if its answer on q is more than 2 far answer in d: Lemma: if c is far away from d, then a random query

disqualifies c with some constant probability • If Probi 2 [n] [|di-ci| ¸ 1/3] > ,

then there is a >0 such that Probq 2 {0,1}[n] [|i 2q (ci – di)|¸ 2+1] >

Proof uses Azuma’s inequality

Privacy requires Ω(√n) perturbationCan discretize all potential databases c 2 [0,1]n : Suppose we round each entry ci to closest fraction with denominator n:

|ci – wi/n| · 1/n

The response on q change by at most 1. • If we disqualify all `discrete’ databases then we also

effectively eliminate all c 2 [0,1]n • There are nn `discrete’ databases

Privacy requires Ω(√n) perturbationA query q disqualifies a potential database c 2 [0,1]n if its answer on q is more than 2 far answer in d: Claim:if c is far away from d, then a random query

disqualifies c with some constant probability • Therefore: t = n log2 n queries leave a negligible

probability for each far away reconstruction.• Union bound: all far away suggestions are

disqualified w.p. at least 1 – nn(1 - )t = 1 – neg(n)

Can apply union bound by discretization

Count number of entries far from d

Review and Conclusion

• When the perturbation is o(√n), choosing Õ(n) random queries gives enough information to efficiently reconstruct an o(n)-close db.

• Database reconstructed using Linear programming – polynomial time.

Slide 25

o(√n) databases are Blatantly Non-Private.poly(n) time reconstructable

CompositionSuppose we are going to apply a DP mechanism t times.

– Perhaps on different databases

Want to argue that result is differentially private• A value b 2 {0,1} is chosen • In each of the t rounds adversary A picks two adjacent

databases D0i and D1

i and receives result zi of an -DP mechanism Mi on Db

i

• Want to argue A‘s view is within for both values of b • A‘s view: (z1, z2, …, zt) plus randomness used.

Differential Privacy: Composition

Handles auxiliary informationComposes naturally• A1(D) is ε1-diffP• for all z1, A2(D,z1) is ε2-diffP,Then A2(D,A1(D)) is (ε1+ε2)-diffPProof:

for all adjacent D,D’ and (z1,z2):e-ε1 ≤ P[z1] / P’[z1] ≤ eε1 e-ε2 ≤ P[z2] / P’[z2] ≤ eε2

e-(ε1+ε2) ≤ P[(z1,z2)]/P’[(z1,z2)] ≤ eε1+ε2

P[z1] = Pr z~A1(D)[z=z1]P’[z1] = Pr z~A1(D’)[z=z1]

P[z2] = Pr z~A2(D,z1)[z=z2]P’[z2] = Pr z~A2(D’,z1)[z=z2]

Differential Privacy: Composition

• If all mechanisms Mi are -DP, then for any view the probability that A gets the view when b=0 and when b=1 are with et

Therefore results for a single query translate to results on several queries

Answering a single counting query

U set of (name,tag2 {0,1}) tuplesOne counting query: #of participants with tag=1

Sanitizer A: output #of 1’s + noiseDifferentially private! If choose noise properly

Choose noise from Laplace distribution

0 1 2 3 4 5-1-2-3-4

Laplacian Noise

Laplace distribution Y=Lap(b) has density function Pr[Y=y] =1/2b e-|y|/b

Standard deviation: O(b)Take b=1/ε, get that Pr[Y=y] Ç e-|y|

Laplacian Noise: ε-Privacy

Take b=1/ε, get that Pr[Y=y] Ç e-|y|

Release: q(D) + Lap(1/ε)

For adjacent D,D’: |q(D) – q(D’)| ≤ 1For output a: e- ≤ Prby D[a]/Prby D’[a] ≤

e

0 1 2 3 4 5-1-2-3-4

Laplacian Noise: ε-Privacy

Theorem: the Laplace mechanism with parameter b=1/ is -differential private

0 1 2 3 4 5-1-2-3-4

0 1 2 3 4 5-1-2-3-4

Laplacian Noise: Õ(1/ε)-Error

Take b=1/ε, get that Pr[Y=y] Ç e-|y|

Concentration of the Laplace distribution:Pry~Y[|y| > k·1/ε] = O(e-k)

Setting k=O(log n)Expected error is 1/ε, w.h.p error is Õ(1/ε)

Foundations of Privacy Lecture 5

Documents

Transcript of Foundations of Privacy Lecture 5