Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A...
Transcript of Pufferfish: A Semantic Approach to Customizable Privacyashwin/talks/Pufferfish-iDASH...Pufferfish: A...
Pufferfish: A Semantic Approach to Customizable Privacy
Ashwin Machanavajjhala
ashwin AT cs.duke.edu
Collaborators: Daniel Kifer (Penn State), Bolin Ding (UIUC, Microsoft Research)
1 iDASH Privacy Workshop 9/29/2012
Outline
• Background – How to define privacy ?
• Correlations: A case for customizable privacy [Kifer-M SIGMOD ‘11]
• Pufferfish Privacy Framework [Kifer-M PODS’12]
• Case Study [Kifer-M PODS’12 & Ding-M ‘13]
– Privately handling correlations due to exact release of counts
• Open Challenges
iDASH Privacy Workshop 9/29/2012 2
Data Privacy Problem
6
Individual 1
r1
Individual 2
r2
Individual 3
r3
Individual N
rN
Server
DB
Utility: Privacy: No breach about any individual
iDASH Privacy Workshop 9/29/2012
Data Privacy in the real world
iDASH Privacy Workshop 9/29/2012 7
Application Data Collector Third Party (adversary)
Private Information
Function (utility)
Medical Hospital Epidemiologist Disease Correlation between disease and geography
Genome analysis
Hospital Statistician/ Researcher
Genome Correlation between genome and disease
Advertising Google/FB/Y! Advertiser Clicks/Browsing
Number of clicks on an ad by age/region/gender …
Social Recommen-
dations
Facebook Another user Friend links / profile
Recommend other users or ads to users based on
social network
T-closeness
Li et. al ICDE ‘07
K-Anonymity
Sweeney et al. IJUFKS ‘02
Many definitions
• Linkage attack
• Background knowledge attack
• Minimality /Reconstruction attack
• de Finetti attack
• Composition attack
iDASH Privacy Workshop 9/29/2012 10
L-diversity
Machanavajjhala et. al TKDD ‘07
E-Privacy
Machanavajjhala et. al VLDB ‘09
& several attacks
Differential Privacy
Dwork et. al ICALP ‘06
Differential Privacy
• Domain independent privacy definition that is independent of the attacker.
• Tolerates many attacks that other definitions are susceptible to. – Avoids composition attacks
– Claimed to be tolerant against adversaries with arbitrary background knowledge.
• Allows simple, efficient and useful privacy mechanisms – Used in a live US Census Product [M et al ICDE ‘08]
iDASH Privacy Workshop 9/29/2012 13
No Free Lunch Theorem
It is not possible to guarantee any utility in addition to privacy, without making assumptions about
• the data generating distribution
• the background knowledge available to an adversary
14
[Kifer-Machanavajjhala SIGMOD ‘11]
iDASH Privacy Workshop 9/29/2012
[Dwork-Naor JPC ‘10]
Outline
• Background – How to define privacy ?
• Correlations: A case for customizable privacy [Kifer-M SIGMOD ‘11]
• Pufferfish Privacy Framework [Kifer-M PODS’12]
• Case Study [Kifer-M PODS’12 & Ding-M ‘12]
– Privately handling correlations due to exact release of counts
• Open Challenges
iDASH Privacy Workshop 9/29/2012 15
Contingency tables
16
2 2
2 8
D
Count( , )
Each tuple takes k=4 different values
iDASH Privacy Workshop 9/29/2012
Contingency tables
17
? ?
? ?
D
Count( , )
Want to release counts privately
iDASH Privacy Workshop 9/29/2012
Laplace Mechanism
18
2 + Lap(1/ε) 2 + Lap(1/ε)
2 + Lap(1/ε) 8 + Lap(1/ε)
D Mean : 8 Variance : 2/ε2
Guarantees differential privacy.
iDASH Privacy Workshop 9/29/2012
Marginal counts
19
2 + Lap(1/ε) 2 + Lap(1/ε) 4
2 + Lap(1/ε) 8 + Lap(1/ε) 10
4 10
D
Does Laplace mechanism still guarantee privacy?
Auxiliary marginals published for following reasons:
1. Legal: 2002 Supreme Court case Utah v. Evans
2. Contractual: Advertisers must know exact demographics at coarse granularities
4
10
4 10
iDASH Privacy Workshop 9/29/2012
Marginal counts
20
2 + Lap(1/ε) 2 + Lap(1/ε) 4
2 + Lap(1/ε) 8 + Lap(1/ε) 10
4 10
D
2 + Lap(1/ε)
2 + Lap(1/ε) 2 + Lap(1/ε)
Count ( , ) = 8 + Lap(1/ε) Count ( , ) = 8 - Lap(1/ε) Count ( , ) = 8 - Lap(1/ε) Count ( , ) = 8 + Lap(1/ε)
iDASH Privacy Workshop 9/29/2012
Mean : 8 Variance : 2/ke2
Marginal counts
21
2 + Lap(1/ε) 2 + Lap(1/ε) 4
2 + Lap(1/ε) 8 + Lap(1/ε) 10
4 10
D
2 + Lap(1/ε)
2 + Lap(1/ε) 2 + Lap(1/ε)
can reconstruct the table with high precision for large k
iDASH Privacy Workshop 9/29/2012
Customizing Privacy
• For handling correlations – Social networks [Kifer-M SIGMOD ‘11]
• Utility driven applications – “Realistic” vs “Worst-case” adversaries [M et al PVLDB ‘09]
• Dealing with aggregate secrets [Kifer-M PODS ‘12]
Qn: How to design principled privacy definitions customized to such scenarios?
iDASH Privacy Workshop 9/29/2012 24
Outline
• Background – How to define privacy ?
• Correlations: A case for customizable privacy [Kifer-M SIGMOD ‘11]
• Pufferfish Privacy Framework [Kifer-M PODS’12]
• Case Study [Kifer-M PODS’12 & Ding-M ‘12]
– Privately handling correlations due to exact release of counts
• Open Challenges
iDASH Privacy Workshop 9/29/2012 25
Pufferfish Semantics
• What is being kept secret?
• Who are the adversaries?
• How is information disclosure bounded?
iDASH Privacy Workshop 9/29/2012 27
Sensitive Information
• Secrets: S be a set of potentially sensitive statements – “individual j’s record is in the data, and j has Cancer”
– “individual j’s record is not in the data”
• Discriminative Pairs: Spairs is a subset of SxS. Mutually exclusive pairs of secrets. – (“Bob is in the table”, “Bob is not in the table”)
– (“Bob has cancer”, “Bob has diabetes”)
iDASH Privacy Workshop 9/29/2012 28
Adversaries
• An adversary can be completely characterized by his/her prior information about the data – We do not assume computational limits
• Data Evolution Scenarios: set of all probability distributions that could have generated the data. – No assumptions: All probability distributions over data instances are
possible.
– I.I.D.: Set of all f such that: P(data = {r1, r2, …, rk}) = f(r1) x f(r2) x…x f(rk)
iDASH Privacy Workshop 9/29/2012 29
Information Disclosure
• Mechanism M satisfies ε-Pufferfish(S, Spairs, D), if for every – w ε Range(M),
– (si, sj) ε Spairs
– Θ ε D, such that P(si | θ) ≠ 0, P(sj | θ) ≠ 0
P(M(data) = w | si, θ) ≤ eε P(M(data) = w | sj, θ)
iDASH Privacy Workshop 9/29/2012 30
Pufferfish Semantic Guarantee
iDASH Privacy Workshop 9/29/2012 31
Prior odds of si vs sj
Posterior odds of si vs sj
Applying Pufferfish to Differential Privacy
• Spairs: – “record j is in the table” vs “record j is not in the table”
– “record j is in the table with value x” vs “record j is not in the table”
• Data evolution: – Probability record j is in the table: πj
– Probability distribution over values of record j: fj
– For all θ = [f1, f2, f3, …, fk, π1, π2, …, πk ]
– P[Data = D | θ] = Πrj not in D (1-πj) x Πrj in D πj x fj(rj)
iDASH Privacy Workshop 9/29/2012 32
Applying Pufferfish to Differential Privacy
• Spairs: – “record j is in the table” vs “record j is not in the table”
– “record j is in the table with value x” vs “record j is not in the table”
• Data evolution: – For all θ = [f1, f2, f3, …, fk, π1, π2, …, πk ]
– P[Data = D | θ] = Πrj not in D (1-πj) x Πrj in D πj x fj(rj)
A mechanism M satisfies differential privacy
if and only if
it satisfies Pufferfish instantiated using Spairs and {θ} (as defined above)
iDASH Privacy Workshop 9/29/2012 33
Differential Privacy
• Sensitive information: All pairs of secrets “individual j is in the table with value x” vs “individual j is not in the table”
• Adversary: Adversaries who believe the data is generated using any probability distribution that is independent across individuals
• Disclosure: ratio of the prior and posterior odds of the adversary is bounded by eε
iDASH Privacy Workshop 9/29/2012 34
Characterizing “good” privacy definition
• Any privacy definition that can be phrased as follows composes with itself.
where I is the set of all tables.
iDASH Privacy Workshop 9/29/2012 35
Outline
• Background – How to define privacy ?
• Correlations: A case for customizable privacy [Kifer-M SIGMOD ‘11]
• Pufferfish Privacy Framework [Kifer-M PODS’12]
• Case Study [Kifer-M PODS’12 & Ding-M ‘12]
– Privately handling correlations due to exact release of counts
• Open Challenges
iDASH Privacy Workshop 9/29/2012 37
Induced Neighbor Privacy
• Differential Privacy: Neighboring tables differ in one value
• Induced neighbors: tables whose shortest path does not contain another table that is consistent with the marginals (e.g., D and D’)
iDASH Privacy Workshop 9/29/2012 39
D D’’ D’
Induced Neighbor Privacy
For every output …
O D2 D1
Adversary should not be able to distinguish between any D1 and D2 based on any O
Pr[A(D1) = O] Pr[A(D2) = O] .
For every pair of induced neighbors
< ε (ε>0) log
40 iDASH Privacy Workshop 9/29/2012
Induced Neighbors Privacy and Pufferfish
• Given a set of count constraints Q,
• Spairs: – “record j is in the table” vs “record j is not in the table”
– “record j is in the table with value x” vs “record j is not in the table”
• Data evolution: – For all θ = [f1, f2, f3, …, fk, π1, π2, …, πk ]
– P[Data = D | θ] α Πrj not in D (1-πj) x Πrj in D πj x fj(rj) , if D satisfies Q
– P[Data = D | θ] = 0, if D does not satisfy Q
A mechanism M satisfies induced neighbors privacy
if and only if
it satisfies Pufferfish instantiated using Spairs and {θ}
iDASH Privacy Workshop 9/29/2012 41
Computing induced sensitivity
2D case: qall : outputs all the counts in a 2-D contingency table. Marginals: row and column sums. The induced-sensitivity of qall = min(2r, 2c).
General Case: Deciding whether Sin(q) > 0 is NP-hard.
Conjecture: Computing Sin(q) is hard (and complete) for the second level of the polynomial hierarchy.
44 iDASH Privacy Workshop 9/29/2012
Outline
• Background – How to define privacy ?
• Correlations: A case for customizable privacy [Kifer-M SIGMOD ‘11]
• Pufferfish Privacy Framework [Kifer-M PODS’12]
• Case Study [Kifer-M PODS’12 & Ding-M ‘12]
– Privately handling correlations due to exact release of counts
• Open Challenges
iDASH Privacy Workshop 9/29/2012 45
Open Questions
• Privacy of social networks – Adversaries may use social network evolution models to infer sensitive
information about edges in a network [Kifer-M SIGMOD ‘11]
– Can correlations in a social network be generatively described?
• Characterize necessary and sufficient conditions for resilience against classes of attacks. – Sufficient conditions for composition. [Kifer-M PODS’12]
• Attack-resilient relaxations of differential privacy – Existing privacy definitions do not provide sufficient utility for certain
applications (e.g., social recommendations [M et al, VLDB ‘11])
iDASH Privacy Workshop 9/29/2012 46
Thank you
[M et al PVLDB’11] A. Machanavajjhala, A. Korolova, A. Das Sarma, “Personalized Social Recommendations – Accurate or Private?”, PVLDB 4(7) 2011
[Kifer-M SIGMOD’11] D. Kifer, A. Machanavajjhala, “No Free Lunch in Data Privacy”, SIGMOD 2011
[Kifer-M PODS’12] D. Kifer, A. Machanavajjhala, “A Rigorous and Customizable Framework for Privacy”, PODS 2012
[Ding-M ‘12] B. Ding, A. Machanavajjhala, “Induced Neighbors Privacy(Work in progress)”, 2012
iDASH Privacy Workshop 9/29/2012 49