DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7....

31
Omar Abdel Wahab, Concordia University Moulay Omar Hachami, Concordia University Arslan Zaffari, Concordia University Mery Vivas, Concordia University Gaby G. Dagher, Concordia University DARM: A Privacy-preserving Approach for Distributed Association Rules Mining on Horizontally-partitioned Data Presenter: Gaby Dagher 1

Transcript of DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7....

Page 1: DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7. 28. · Introduction Contributions Contribution #1: Propose a comprehensive privacy-preserving

Omar Abdel Wahab, Concordia University

Moulay Omar Hachami, Concordia University

Arslan Zaffari, Concordia University

Mery Vivas, Concordia University

Gaby G. Dagher, Concordia University

DARM: A Privacy-preserving Approach for Distributed Association Rules Mining on Horizontally-partitioned Data

Presenter: Gaby Dagher

1

Page 2: DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7. 28. · Introduction Contributions Contribution #1: Propose a comprehensive privacy-preserving

Outline

Introduction

Problem Definition

Literature Review

Performance Evaluation

2

6

1

3

5

2

Conclusions

Proposed Solution4

Page 3: DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7. 28. · Introduction Contributions Contribution #1: Propose a comprehensive privacy-preserving

Outline

Introduction

Problem Definition

Literature Review

Performance Evaluation

2

6

1

3

5

3

Conclusions

Proposed Solution4

Page 4: DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7. 28. · Introduction Contributions Contribution #1: Propose a comprehensive privacy-preserving

Introduction

Motivation:

Rapid evolution of data collection and storage technologies

Extracting knowledge and hidden patterns from stored data has become a major necessity for individuals, companies, and government agencies.

Applying data mining techniques to extract information is considered a challenge when the data is distributed over multiple owners

Each data owner is concerned about the privacy of individuals in his data.

4

Page 5: DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7. 28. · Introduction Contributions Contribution #1: Propose a comprehensive privacy-preserving

IntroductionMotivating Scenario

5

Page 6: DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7. 28. · Introduction Contributions Contribution #1: Propose a comprehensive privacy-preserving

Introduction

Challenges:

Data Privacy One data provider should not learn sensitive information about the data of

other providers.

Data Utility The generated rules should satisfy the data consumer’s request and needs.

Protection against Inference Attacks Prevent the data consumer from inferring sensitive information about the

individuals involved in the database.

6

Page 7: DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7. 28. · Introduction Contributions Contribution #1: Propose a comprehensive privacy-preserving

IntroductionContributions

Contribution #1: Propose a comprehensive privacy-preserving approach for answering association rules queries in a distributed environment

Contribution #2: Protect all providers against inference attacks from data consumers by guaranteeing that the returned association rules satisfy ε-differential privacy.

Contribution #3: Preserve the privacy of the mined data by preventing each data provider from learning sensitive information about other data providers during the mining process.

Contribution #4: Protect the confidentiality of the data consumer’s query against the data providers.

Contribution #5: We conduct performance evaluation on real-life data, and show that that our approach is both scalable and efficient.

7

Page 8: DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7. 28. · Introduction Contributions Contribution #1: Propose a comprehensive privacy-preserving

Outline

Introduction

Problem Definition

Literature Review

Performance Evaluation

2

6

1

3

5

8

Conclusions

Proposed Solution4

Page 9: DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7. 28. · Introduction Contributions Contribution #1: Propose a comprehensive privacy-preserving

Literature Review

9

Association Rules Mining [1], [2], [3], [4], [5], [6], [7]:

Summary: Study the problem of mining association rules in distributed and parallelmanners, where the data is partitioned across several nodes.

Limitations: these approaches were mostly interested in increasing the efficiency ofthe mining process, while ignoring the privacy concerns that may arise from buildinga global mining model.

Page 10: DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7. 28. · Introduction Contributions Contribution #1: Propose a comprehensive privacy-preserving

Literature Review

10

Privacy in Distributed Mining Models [8], [9], [10], [11], [12]:

Summary: Consider the privacy concerns that may arise from mining the data globally.

Limitations: rely on encryption to achieve privacy between data providers. However,a recent study shows that most encryption schemes are insufficient to guaranteedata privacy and confidentiality, as the protocol on which they are based, namelyprecise query protocol (PQP), is vulnerable to attribute values inference.

Page 11: DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7. 28. · Introduction Contributions Contribution #1: Propose a comprehensive privacy-preserving

Literature Review

11

Privacy-preserving Data Mashup [13], [14], [15], [16]:

Summary: Preserve the privacy of the data in a data mashup scenario.

Limitations: In contrary to our model which considers privacy-preserving data mining(PPDM), these approaches are designed to support privacy-preserving datapublishing (PPDP) since they assume that the data itself will be shared among thedifferent parties.

Page 12: DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7. 28. · Introduction Contributions Contribution #1: Propose a comprehensive privacy-preserving

Outline

Introduction

Problem Definition

Literature Review

Performance Evaluation

2

6

1

3

5

12

Conclusions

Proposed Solution4

Page 13: DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7. 28. · Introduction Contributions Contribution #1: Propose a comprehensive privacy-preserving

Problem Definition

13

System Inputs:

(1) Association Rules Queries: To obtain the set of strong association rules R from the distributed data, the data consumer submits a query request q to the master miner in which he specifies the minimum support threshold γ, the minimum confidence threshold α, and a set of predicates P.

(2) ε-differentially Private Data: We assume that the data is horizontally partitioned into sub-tables each of which is hosted by one data provider.

Each data provider owns the same type of attribute information on different set of individuals.

Page 14: DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7. 28. · Introduction Contributions Contribution #1: Propose a comprehensive privacy-preserving

Problem Definition

14

• Adversary ModelSemi-honest, where each party is expected to follow the protocol correctly; however, it iscurious and might try to infer sensitive information about the other parties.

• Problem StatementGiven relational data D that is horizontally partitioned into n partitions, the objective is to design aprivacy-preserving model for answering association rules queries in a distributed environment. Themodel must achieve three objectives: (1) to prevent each data provider from learning sensitiveinformation about other data providers during the mining process, (2) to protect all providersagainst inference attacks from the data consumers, and (3) to preserve the confidentiality of eachdata consumer’s query against the data providers.

Page 15: DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7. 28. · Introduction Contributions Contribution #1: Propose a comprehensive privacy-preserving

Outline

Introduction

Problem Definition

Literature Review

Performance Evaluation

2

6

1

3

5

15

Conclusions

Proposed Solution4

Page 16: DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7. 28. · Introduction Contributions Contribution #1: Propose a comprehensive privacy-preserving

Proposed Solution

• Step 1 - Data Anonymization

• Step 2 - Frequent Itemsets Generation

• Step 3 - Association Rules Generation

16

Page 17: DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7. 28. · Introduction Contributions Contribution #1: Propose a comprehensive privacy-preserving

Proposed Solution

Step1: Data Anonymization:

In this step, the data providers use the ε-differential privacy algorithm, called DiffGen, to anonymize their data and provide protection against linkage and inference attacks.

Using DiffGen, the data owner makes sure that the regenerated data table provides privacy guarantee while being insensitive to any specific record.

The data anonymization process can be divided into three main parts: (1) Selecting a candidate attribute for specialization(2) Determining the split value parameter(3) Publishing the noisy counts

17

Page 18: DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7. 28. · Introduction Contributions Contribution #1: Propose a comprehensive privacy-preserving

Proposed Solution

18

Page 19: DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7. 28. · Introduction Contributions Contribution #1: Propose a comprehensive privacy-preserving

Proposed Solution

Step 2: Frequent Itemsets Generation:

The master miner receives the data consumer’s query

The master miner requests the support counts of all the attributes the data consumer is interested in from the different data providers

The master miner generates all the possible frequent itemsets of different lengths subject to the minimum support threshold γ specified in the query.

19

Page 20: DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7. 28. · Introduction Contributions Contribution #1: Propose a comprehensive privacy-preserving

Proposed Solution

20

Page 21: DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7. 28. · Introduction Contributions Contribution #1: Propose a comprehensive privacy-preserving

Proposed Solution

Step 3 - Association Rules Generation:

Now that the frequent itemsets are known, the master miner generates all the possiblecombinations of the k-length (k > 1) frequent itemsets that may constitute association rules.

The master miner then sends these combinations to the data providers which separately calculateand send back the support counts of these combinations

The master miner computes the confidence of each association rule based on the feedback from the data providers.

For each association rule, if its confidence exceeds the minimum confidence threshold αspecified by the data consumer, then the rule is considered a useful rule.

Finally, the master miner returns to the data consumer the set of all useful association rules.

21

Page 22: DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7. 28. · Introduction Contributions Contribution #1: Propose a comprehensive privacy-preserving

Proposed Solution

22

Page 23: DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7. 28. · Introduction Contributions Contribution #1: Propose a comprehensive privacy-preserving

Outline

Introduction

Problem Definition

Literature Review

Performance Evaluation

2

6

1

3

5

23

Conclusions

Proposed Solution4

Page 24: DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7. 28. · Introduction Contributions Contribution #1: Propose a comprehensive privacy-preserving

Performance EvaluationEfficiency

24

Page 25: DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7. 28. · Introduction Contributions Contribution #1: Propose a comprehensive privacy-preserving

Performance EvaluationScalability

25

Page 26: DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7. 28. · Introduction Contributions Contribution #1: Propose a comprehensive privacy-preserving

Performance EvaluationEfficiency w.r.t. nSpecializations

26

Page 27: DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7. 28. · Introduction Contributions Contribution #1: Propose a comprehensive privacy-preserving

Outline

Introduction

Problem Definition

Literature Review

Performance Evaluation

2

6

1

3

5

27

Conclusions

Proposed Solution4

Page 28: DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7. 28. · Introduction Contributions Contribution #1: Propose a comprehensive privacy-preserving

Conclusions

28

In this paper, we propose a comprehensive privacy-preserving approach for answering association rules queries in a distributed environment, with the goal of preserving both data privacy and query confidentiality.

The proposed approach (1) protects all providers against inference attacks from data consumers by guaranteeing that the returned association rules to the data consumer satisfy ε-differential privacy, (2) preserves the privacy of the mined data by preventing each data provider from learning sensitive information about other data providers during the mining process, and (3) protects the confidentiality of the data consumer’s query against the data providers such that the master miner is able to mine the association rules without revealing the query to the data providers.

Page 29: DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7. 28. · Introduction Contributions Contribution #1: Propose a comprehensive privacy-preserving

References (1)

1. R. Agrawal and J. Shafer, “Parallel mining of association rules,” IEEE Trans. Knowl. Data Eng., vol. 8, no. 6, pp. 962-969, 1996.

2. D. W.-L. Cheung, J. Han, V. T. Y. Ng, A. W.-C. Fu, and Y. Fu, “A fast distributed algorithm for mining association rules,” in Proceedings of the fourth international conference on on Parallel and distributed information systems, 1996, pp. 31-43.

3. D. W. Cheung, V. T. Ng, and a. W. Fu, “Efficient mining of association rules in distributed databases,” IEEE Trans. Knowl. Data Eng., vol. 8, no. 6, pp. 911-922, 1996.

4. J. Park, M. Chen, and P. Yu, “Efficient parallel data mining for association rules,” in Proceedings of the fourth international conference on Information and knowledge management, 1995, pp. 31-36.

5. M. Z. Ashrafi, D. Taniar, and K. Smith, “ODAM: an optimized distributed association rule mining algorithm,” IEEE Distrib. Syst. Online, vol. 5, no. 3, pp. 1-18, Mar. 2004.

6. A. Anitha and G. R. Suhanantham, “An Efficient Association Rule Mining Model for Distributed Databases,” Int. J. Comput. Sci. Technol., vol. 3, no. 1, pp. 794-797, 2012.

7. J. A. Renjit, “Mning the Data from Distributed Database using an Improved Mning Algorithm,” Int. J. Comput. Sci. Inf. Secur., vol. 7, no. 3, pp. 116-121, 2010.

8. N. Zhang, M. Li, andW. Lou, “Distributed Data Mining with Differential Privacy,” in 2011 IEEE International Conference on Communications (ICC), 2011, pp. 1-5.

Page 30: DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7. 28. · Introduction Contributions Contribution #1: Propose a comprehensive privacy-preserving

References (2)

9. C. Clifton and M. Kantarcioglu, “Privacy-preserving Distributed Mining of Association Rules on Horizontally Partitioned Data,” IEEE Trans. Knowl. Data Eng., vol. 16, no. 9, pp. 1026-1037, 2004.

10. J. Vaidya and C. Clifton, “Privacy preserving association rule mining in vertically partitioned data,” in The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002, pp. 639- 644.

11. W. Wong, D. Cheung, and E. Hung, “Security in outsourcing of association rule mining,” in Proceedings of the 33rd international conference on Very large data bases, 2007, pp. 111-122.

12. F. Giannotti, L. V. S. Lakshmanan, A. Monreale, D. Pedreschi, and H. Wang, “Privacy-Preserving Mining of Association Rules From Outsourced Transaction Databases,” IEEE Syst. J., vol. 7, no. 3, pp. 385-395, Sep. 2013.

13. T. Trojer, B. C. M. Fung, and P. C. K. Hung, “Service-Oriented Architecture for Privacy-Preserving Data Mashup,” in IEEE International Conference on Web Services, 2009, pp. 767-774.

14. P. Gurunathan, N. Ishwarya, V. Sridevi, C. Nandhini, and S. Deepalakshmi, “High-Dimensional Confidential Data Mash up using Service-Oriented Architecture,” Int. J. Emerg. Sci. Eng., vol. 1, no. 6, pp. 48-51, 2013.

15. S. A. Chun, J. Warner, and A. D. Keromytis, “Privacy policy-driven mashups,” Int. J. Bus. Contin. Risk Manag., vol. 4, no. 4, pp. 344-370, 2013.

16. N. Mohammed, B. C. M. Fung, K. Wang, and P. C. K. Hung, “Privacy-preserving data mashup,” in Proceedings of the 12th International Conference on Extending Database Technology Advances in Database Technology - EDBT 09, 2009, pp. 228-239.

Page 31: DARM: A Privacy-preserving Approach for Distributed Association Rules Mining … · 2015. 7. 28. · Introduction Contributions Contribution #1: Propose a comprehensive privacy-preserving

Thank You…