Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A...

Pufferfish: A Semantic Approach to Customizable Privacy

Ashwin Machanavajjhala

ashwin AT cs.duke.edu

Collaborators: Daniel Kifer (Penn State), Bolin Ding (UIUC, Microsoft Research)

1 iDASH Privacy Workshop 9/29/2012

Outline

• Background – How to define privacy ?

• Correlations: A case for customizable privacy [Kifer-M SIGMOD ‘11]

• Pufferfish Privacy Framework [Kifer-M PODS’12]

• Case Study [Kifer-M PODS’12 & Ding-M ‘13]

– Privately handling correlations due to exact release of counts

• Open Challenges

iDASH Privacy Workshop 9/29/2012 2

Data Privacy Problem

6

Individual 1

r1

Individual 2

r2

Individual 3

r3

Individual N

rN

Server

DB

Utility: Privacy: No breach about any individual

iDASH Privacy Workshop 9/29/2012

Data Privacy in the real world


Application Data Collector Third Party (adversary)

Private Information

Function (utility)

Medical Hospital Epidemiologist Disease Correlation between disease and geography

Genome analysis

Hospital Statistician/ Researcher

Genome Correlation between genome and disease

Advertising Google/FB/Y! Advertiser Clicks/Browsing

Number of clicks on an ad by age/region/gender …

Social Recommen-

dations

Facebook Another user Friend links / profile

Recommend other users or ads to users based on

social network

T-closeness

Li et. al ICDE ‘07

K-Anonymity

Sweeney et al. IJUFKS ‘02

Many definitions

• Linkage attack

• Background knowledge attack

• Minimality /Reconstruction attack

• de Finetti attack

• Composition attack


L-diversity

Machanavajjhala et. al TKDD ‘07

E-Privacy

Machanavajjhala et. al VLDB ‘09

& several attacks

Differential Privacy

Dwork et. al ICALP ‘06


• Domain independent privacy definition that is independent of the attacker.

• Tolerates many attacks that other definitions are susceptible to. – Avoids composition attacks

– Claimed to be tolerant against adversaries with arbitrary background knowledge.

• Allows simple, efficient and useful privacy mechanisms – Used in a live US Census Product [M et al ICDE ‘08]


No Free Lunch Theorem

It is not possible to guarantee any utility in addition to privacy, without making assumptions about

• the data generating distribution

• the background knowledge available to an adversary

14

[Kifer-Machanavajjhala SIGMOD ‘11]


[Dwork-Naor JPC ‘10]

Outline






• Open Challenges


Contingency tables

16

2 2

2 8

D

Count( , )

Each tuple takes k=4 different values


Contingency tables

17

? ?

? ?

D

Count( , )

Want to release counts privately


Laplace Mechanism

18

2 + Lap(1/ε) 2 + Lap(1/ε)

2 + Lap(1/ε) 8 + Lap(1/ε)

D Mean : 8 Variance : 2/ε2

Guarantees differential privacy.


Marginal counts

19

2 + Lap(1/ε) 2 + Lap(1/ε) 4

2 + Lap(1/ε) 8 + Lap(1/ε) 10

4 10

D

Does Laplace mechanism still guarantee privacy?

Auxiliary marginals published for following reasons:

1. Legal: 2002 Supreme Court case Utah v. Evans

2. Contractual: Advertisers must know exact demographics at coarse granularities

4

10

4 10


Marginal counts

20

2 + Lap(1/ε) 2 + Lap(1/ε) 4

2 + Lap(1/ε) 8 + Lap(1/ε) 10

4 10

D

2 + Lap(1/ε)

2 + Lap(1/ε) 2 + Lap(1/ε)

Count ( , ) = 8 + Lap(1/ε) Count ( , ) = 8 - Lap(1/ε) Count ( , ) = 8 - Lap(1/ε) Count ( , ) = 8 + Lap(1/ε)


Mean : 8 Variance : 2/ke2

Marginal counts

21

2 + Lap(1/ε) 2 + Lap(1/ε) 4

2 + Lap(1/ε) 8 + Lap(1/ε) 10

4 10

D

2 + Lap(1/ε)

2 + Lap(1/ε) 2 + Lap(1/ε)

can reconstruct the table with high precision for large k


Customizing Privacy

• For handling correlations – Social networks [Kifer-M SIGMOD ‘11]

• Utility driven applications – “Realistic” vs “Worst-case” adversaries [M et al PVLDB ‘09]

• Dealing with aggregate secrets [Kifer-M PODS ‘12]

Qn: How to design principled privacy definitions customized to such scenarios?


Outline






• Open Challenges


Pufferfish Semantics

• What is being kept secret?

• Who are the adversaries?

• How is information disclosure bounded?


Sensitive Information

• Secrets: S be a set of potentially sensitive statements – “individual j’s record is in the data, and j has Cancer”

– “individual j’s record is not in the data”

• Discriminative Pairs: Spairs is a subset of SxS. Mutually exclusive pairs of secrets. – (“Bob is in the table”, “Bob is not in the table”)

– (“Bob has cancer”, “Bob has diabetes”)


Adversaries

• An adversary can be completely characterized by his/her prior information about the data – We do not assume computational limits

• Data Evolution Scenarios: set of all probability distributions that could have generated the data. – No assumptions: All probability distributions over data instances are

possible.

– I.I.D.: Set of all f such that: P(data = {r1, r2, …, rk}) = f(r1) x f(r2) x…x f(rk)


Information Disclosure

• Mechanism M satisfies ε-Pufferfish(S, Spairs, D), if for every – w ε Range(M),

– (si, sj) ε Spairs

– Θ ε D, such that P(si | θ) ≠ 0, P(sj | θ) ≠ 0

P(M(data) = w | si, θ) ≤ eε P(M(data) = w | sj, θ)


Pufferfish Semantic Guarantee


Prior odds of si vs sj

Posterior odds of si vs sj

Applying Pufferfish to Differential Privacy

• Spairs: – “record j is in the table” vs “record j is not in the table”

– “record j is in the table with value x” vs “record j is not in the table”

• Data evolution: – Probability record j is in the table: πj

– Probability distribution over values of record j: fj

– For all θ = [f1, f2, f3, …, fk, π1, π2, …, πk ]

– P[Data = D | θ] = Πrj not in D (1-πj) x Πrj in D πj x fj(rj)


Applying Pufferfish to Differential Privacy



• Data evolution: – For all θ = [f1, f2, f3, …, fk, π1, π2, …, πk ]

– P[Data = D | θ] = Πrj not in D (1-πj) x Πrj in D πj x fj(rj)

A mechanism M satisfies differential privacy

if and only if

it satisfies Pufferfish instantiated using Spairs and {θ} (as defined above)



• Sensitive information: All pairs of secrets “individual j is in the table with value x” vs “individual j is not in the table”

• Adversary: Adversaries who believe the data is generated using any probability distribution that is independent across individuals

• Disclosure: ratio of the prior and posterior odds of the adversary is bounded by eε


Characterizing “good” privacy definition

• Any privacy definition that can be phrased as follows composes with itself.

where I is the set of all tables.


Outline






• Open Challenges


Induced Neighbor Privacy

• Differential Privacy: Neighboring tables differ in one value

• Induced neighbors: tables whose shortest path does not contain another table that is consistent with the marginals (e.g., D and D’)


D D’’ D’

Induced Neighbor Privacy

For every output …

O D2 D1

Adversary should not be able to distinguish between any D1 and D2 based on any O

Pr[A(D1) = O] Pr[A(D2) = O] .

For every pair of induced neighbors

< ε (ε>0) log


Induced Neighbors Privacy and Pufferfish

• Given a set of count constraints Q,



• Data evolution: – For all θ = [f1, f2, f3, …, fk, π1, π2, …, πk ]

– P[Data = D | θ] α Πrj not in D (1-πj) x Πrj in D πj x fj(rj) , if D satisfies Q

– P[Data = D | θ] = 0, if D does not satisfy Q

A mechanism M satisfies induced neighbors privacy

if and only if

it satisfies Pufferfish instantiated using Spairs and {θ}


Computing induced sensitivity

2D case: qall : outputs all the counts in a 2-D contingency table. Marginals: row and column sums. The induced-sensitivity of qall = min(2r, 2c).

General Case: Deciding whether Sin(q) > 0 is NP-hard.

Conjecture: Computing Sin(q) is hard (and complete) for the second level of the polynomial hierarchy.


Outline






• Open Challenges


Open Questions

• Privacy of social networks – Adversaries may use social network evolution models to infer sensitive

information about edges in a network [Kifer-M SIGMOD ‘11]

– Can correlations in a social network be generatively described?

• Characterize necessary and sufficient conditions for resilience against classes of attacks. – Sufficient conditions for composition. [Kifer-M PODS’12]

• Attack-resilient relaxations of differential privacy – Existing privacy definitions do not provide sufficient utility for certain

applications (e.g., social recommendations [M et al, VLDB ‘11])


Thank you

[M et al PVLDB’11] A. Machanavajjhala, A. Korolova, A. Das Sarma, “Personalized Social Recommendations – Accurate or Private?”, PVLDB 4(7) 2011

[Kifer-M SIGMOD’11] D. Kifer, A. Machanavajjhala, “No Free Lunch in Data Privacy”, SIGMOD 2011

[Kifer-M PODS’12] D. Kifer, A. Machanavajjhala, “A Rigorous and Customizable Framework for Privacy”, PODS 2012

[Ding-M ‘12] B. Ding, A. Machanavajjhala, “Induced Neighbors Privacy(Work in progress)”, 2012


Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A...

Documents

Transcript of Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A...