# 20130219 nofreelunch arai

date post

07-Jul-2015Category

## Documents

view

222download

1

Embed Size (px)

### Transcript of 20130219 nofreelunch arai

- 1. PPDM3 No Free Lunch in Data Privacy (Daniel Kifer, Ashwin Machanavajjhala SIGMOD2011)

2. Contributions of this paper 1. Simplify the impossibility result* 2. Show privacy definition that relies on assumptions about data generating mechanism and compare with DP 3. Propose a guideline for determining whether DP is suitable for a given application. 4. Demonstrate cases that DP does not meet the guideline 1. when applied to arbitrary SNS data. 2. when applied to tabular data when an attacker has aggregate-level background knowledge. 5. Propose a modification of DP for tabular data with aggregate-level background knowledge *briefly,answering many queries w/ bounded noise does not preserve privacy 3. Outline } Brief review of differential privacy (DP) } Analysis of attacker } Define discriminant, non privacy game, note no free lunch theorem } Show relation between DP and no free lunch theorem } Privacy risks of DP algorithms for various attackers } Unsuitable cases : nave application for correlated data } New DP definition subject to background knowledge } Introducing restrictions that represent previously released exact query answers 4. Differential privacy : problem setting } Query-answering mechanism ageItem A Item B A2501 B4211 answer Type x Type Y Type Z 20324152 30343413 DB wants to preserve privacy of individuals Query answers: statistical information etc. query Collection of private data 5. Differential privacy : motivation } A privacy guarantee that limits risk incurred by JOINING encourages participation in the dataset. } minimize the increased risk to an individual incurred by joining (or leaving) the database. (NOT comparing an adversary's prior and posterior views ) w/ Yuko w/o Yuko Not so different Recall : Daleniuss problem } if the statistical database teaches us anything at all, then it should change our beliefs about individuals } the things that statistical databases are designed to teach can, sometimes indirectly, cause damage to an individual, even if this individual is not in the database. 6. Differential privacy : definition D D Almost same probability Randomized query-response algorithm K [Dwork06] SS PK(D) S)PK(D) S) Range(K) Definition of the neighboring DBs is very important in this paper 7. Differential privacy: mechanism Density function of the Laplace distribution for any z, z such that |z z| 1the density at z is at most times the density at z, satisfying the condition in [Dwork06] 8. Definitions of DP } Two flavors of DP } Deleting of inserting a tuple: unbounded } Changing tuple value: bounded Note that the existence of the tuple participation ! 9. The no-free-lunch theorem } It is not possible to guarantee privacy and utility w/o making assumptions about the data-generating mechanism } To discuss this problem: } Define the discriminant as a lower bound on utility } Analyze of the Laplace mechanism } Define the non-privacy game } Propose no free lunch theorem } Free lunch theorem for DP 10. Discriminant (as a utility measure) } : a measure for query accuracy. If ~1 * , A answers with reasonable accuracy. } A : randomized answering query processor } Integer k : like anonymity parameter ? } Constraint c : lower bound of utility of A with parameter k. * Note that the discriminant is for deterministic algorithm e.g. k-anonymity algorithm. 11. Discriminant D D e.g. k=2 SS PA(D) ScPA(D) S)c Range(K) 12. Example of discriminant } Canser-patient DB } # of canser pationts in DB : D1: 0 / D2: 10,000 / D3: 20,000 } S1=[0,1000], S2=[9000,11000], S3=[19000,], } P(A(Di)Si)0.95 for all i 13. Discriminant of the Laplace mechanism Intuitive description: } Laplace mechanism w/ sensitivity 0.5 } Choose n large enough we can choose {Di} and {Si} so that the distances between Di s and the ranges of Sis are large enough discriminant became 1 n 14. Non-privacy game } Privacy definition as a game: } Assume a data-generating mechanism P } The attacker guess a true answer q(D) from a randomized answer A(D) against a sensitive query q PD q A(D) q(D) ? 15. No free lunch theorem } Providing both privacy (as a game) and utility is impossible if there are no restriction on data-generating mechanism } If D is uniformly distributed, the attackers strategy is to guess q(D) if A(D) Si } The attackers guess is correct w/ probability 1/k w/o A(D). } He wins w/ probability 1w/ A(D) ! 16. No free lunch and differential privacy } Privacy definition w/o assumption about the data : } Note: the discriminant (k;A) of any algorithm A satisfying -free-lunch privacy is bounded by } (my interpretation) Let P(A(D1)S)=c. There are at least k-1 possible DB instances {Di} where ce- P(A(Di)S)ce. Using i P(A(Di)S)=1, c 17. Privacy risks in differential privacy } General guideline for determining a privacy definition } Note that the DP for more knowledgeable attacker add less noise ! Consider three kinds of DP algorithm 18. example } Consider the table with 1 tuple (Bob) and two 2-bit attributes R1 and R2: 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 neighbors neighborsneighbors neighbors Bounded DP (tuple) Attribute DP 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 neighbors neighbors neighbors Bit DP neighbors neighbors neighbors Probability of answering the true record Question: bound 19. example } Consider the table with 1 tuple (Bob) and two 2-bit attributes R1 and R2: 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 neighbors neighborsneighbors neighbors Bounded DP (tuple) Attribute DP 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 neighbors neighbors neighbors Bit DP neighbors neighbors neighbors Probability of answering the true record Higher lower Probability of answering the true record 20. Problem: correlated data } If several records are known to have the same attribute value, sensitivity must be larger } E.g. disease database : bob and his family might have the same disease } How should we deal with this problem? } Hide evidence of participation (any influence of a certain participation) } Discussion } Growing SNSs } Prior knowledge about exact statistics 21. Growing social networks Assume edge-growing SNSs. } Let the network grow, after which the attacker will ask the query how many edges are there between the two communities". Can we preserve privacy of Bobs external link? making assumptions about data generating model 1. Forest Fire model 2. Copying model 3. MVS model From simulation: } 1,2 we cannot set a noise parameter reliably unless we know the network parameters (model parameters or final edge number). } 3 has a steady state distribution, rather favorable Only bob has external link Initial state of two clusters Charlie Bob MVS model 22. Privacy breach after some exact data releases } Example: contingency tables(deterministic) and additional differential private data release } a demonstration for additional privacy breach (4.1) } Consider a table T, attribute R w/ domain {r1,,rk} } k-1 queries : } If we additionally knew the exact answer to select count(*) from T where R=ri, we would be able to exactly reconstruct the table. the tuples are correlated !! } Additional differential private answers... 23. Privacy breach after some exact statistics release (2) } Consider a table T, attribute R w/ domain {r1,,rk} } k-1 queries : } Additional k -differential private answers... } If k is large (e.g. d-bit vector w/ 2^d possible value) the variance is small (recall 2.2 knowledge vs privacy risk) T is reconstructed w/ very high probability (due to correlation w/ prior release of information) 24. A plausible deniability (idea) } What we should do to maintain consistency w/ previously deterministic query answers have been released? } We should choose bounded DP } If the number of tuples had been answered previously, the number of tuples might be stay the same } In general, we can maintain consistency in several ways } exchange attribute values collaboratively, for example 25. differential privacy subject to background knowledge } definitions RL M43 9 52 F44 4 48 87 13 100 contingency table cellcell count table T idgender hande dness taromale left hanafemale right RL M42 9 52 F44 4 48 87 13 100 move 26. differential privacy subject to background knowledge } Define DP for neighboring tables 27. Neighbors induced by other prior statistics } Example: exact query answer for select gender, count(*) from T group by gender } unbounded DP : the number of tuples is already published } bounded DB : we cannot arbitrarily modify a single tuple.. } Define neighbors that maintaining consistency with the prior query answers: 28. Neighbor-based algorithm for DP } Definitions : } distance function between two contingency tables: } To achieve 2-generic DP, exponential mechanism [McSherry06], can be used. (q=d(Ta,Tb)) 29. Neighbor-based algorithm for DP } Laplace mechanism : } Sensitivity } The Laplace mechanism adds noise (the probability of density function is ) to the query answer. 30. NP-hard problem } Dealing with neighbors under constraints is NP-hard problem the general problem of finding an upper bound on the sensitivity of a query is at least co-NP-hard, and we suspect thatthe problem is p2-complete. 31. The case where efficient algorithms exists } Consider 2d table } Let the query qall as:SELECT R1, R2, COUNT(*) FROM T GROUP BY R1, R2. } The sensitivity of qall can be computed using the following lemma: } Removing a subset paths that form Hamiltonian cycles, it is shown that the original set of moves was the smallest set of moves. 32. Related works } Impossibility result [Dwork06, Dinor & Nissim03, etc.] } Answering many queries