Tools for Privacy Preserving Distributed Data Mining By Michael Holmes.

15
Tools for Privacy Preserving Distributed Data Mining By Michael Holmes

Transcript of Tools for Privacy Preserving Distributed Data Mining By Michael Holmes.

Page 1: Tools for Privacy Preserving Distributed Data Mining By Michael Holmes.

Tools for Privacy Preserving Distributed Data Mining

By Michael Holmes

Page 2: Tools for Privacy Preserving Distributed Data Mining By Michael Holmes.

Why Private Data Mining

❖ The CDC may want to use data mining techniques to identify trends in disease outbreaks.

❖ Insurance companies have useful data but can’t disclose it because of privacy concerns.

❖ Is there a way to obtain this data without revealing the identity of the patients?

Page 3: Tools for Privacy Preserving Distributed Data Mining By Michael Holmes.

Private Data Mining Techniques

❖ Secure Sum

❖ Secure Set Union

❖ Secure Size of Set Intersection

❖ Scalar Product

Page 4: Tools for Privacy Preserving Distributed Data Mining By Michael Holmes.

Private Data Mining Toolkit

❖ Association Rules in horizontally partitioned data

❖ Association Rules in vertically partitioned data

❖ EM Clustering

Page 5: Tools for Privacy Preserving Distributed Data Mining By Michael Holmes.

Secure Sum

❖ Securely compute the sum from individual databases.

❖ Have a site randomly generate a number R

❖ Add this number to every value and send it to site 2.

❖ Site 2 can then add each of it’s values to that values sent from site 1 and return a single number back to Site 1.

❖ Site 1 can then remove the random number N times and find the correct sum.

Page 6: Tools for Privacy Preserving Distributed Data Mining By Michael Holmes.

Secure Sum

Page 7: Tools for Privacy Preserving Distributed Data Mining By Michael Holmes.

Secure Set Union

Page 8: Tools for Privacy Preserving Distributed Data Mining By Michael Holmes.

Secure Size of Set Intersection

❖ Only possible with Commutative Encryption.

❖ very party encrypts their data and then sends it to another party.

❖ The next party also encrypts the encrypted data.

❖ After all parties have encrypted all the data from every other party only that has been duplicated by the encryption is shared.

❖ Count the duplicates and you know the size of the intersection.

Page 9: Tools for Privacy Preserving Distributed Data Mining By Michael Holmes.

Scalar Product

❖ Want to compute the sum of x1 * y1 between two databases

❖ Use linear combinations of random numbers to disguise elements and then computationally remove these once you get the result.

Page 10: Tools for Privacy Preserving Distributed Data Mining By Michael Holmes.

Association Rules in Horizontally Partitioned Data

❖ Candidate Set Generation

❖ Local Pruning

❖ Itemset Exchange (Secure Union Step here)

❖ Support Count Exchange

Page 11: Tools for Privacy Preserving Distributed Data Mining By Michael Holmes.

Association Rules in Vertically Partitioned Data

❖ Uses scalar product to determine if the count of an item set is greater than a threshold

❖ If the count is above the threshold you’ve determined that the database is worth querying

❖ Can also user Secure Size Set Intersection to see how much is in common.

❖ Useful when using algorithm such as apriori algorithm

Page 12: Tools for Privacy Preserving Distributed Data Mining By Michael Holmes.

EM Clustering

❖ Uses secure sum to get a global number associated with all sites involved.

❖ Once global sum is computed, it can be used in the Expectation-maximization method to generate staistical models.

Page 13: Tools for Privacy Preserving Distributed Data Mining By Michael Holmes.

EM Clustering

❖ Uses secure sum to get a global number associated with all sites involved.

❖ Once global sum is computed, it can be used in the Expectation-maximization method to generate staistical models.

Page 14: Tools for Privacy Preserving Distributed Data Mining By Michael Holmes.

Things to Note

❖ These algorithms are not fully private, some information is learned in the process.

❖ For example in the set intersection, sites can potentially learn the sizes of each database.

❖ Make sure to pick the appropriate algorithms for what you need to accomplish

❖ Watch out for intermediate information being leaked!

Page 15: Tools for Privacy Preserving Distributed Data Mining By Michael Holmes.

Thank you