When Does Randomization Fail to Protect Privacy?

22
1 When Does Randomization Fail to Protect Privacy? Wenliang (Kevin) Du epartment of EECS, Syracuse University

description

When Does Randomization Fail to Protect Privacy?. Wenliang (Kevin) Du Department of EECS, Syracuse University. Random Perturbation. Agrawal and Srikant’s SIGMOD paper. Y = X + R. Original Data X. Random Noise R. Disguised Data Y. +. Random Perturbation. - PowerPoint PPT Presentation

Transcript of When Does Randomization Fail to Protect Privacy?

Page 1: When Does Randomization Fail to Protect Privacy?

1

When Does Randomization Fail to Protect Privacy?

Wenliang (Kevin) Du

Department of EECS, Syracuse University

Page 2: When Does Randomization Fail to Protect Privacy?

2

Random Perturbation Agrawal and Srikant’s SIGMOD

paper. Y = X + R

+

Original Data X Random Noise R Disguised Data Y

Page 3: When Does Randomization Fail to Protect Privacy?

3

Random Perturbation Most of the security analysis

methods based on randomization treat each attribute separately.

Is that enough? Does the relationship among data

affect privacy?

Page 4: When Does Randomization Fail to Protect Privacy?

4

As we all know … We can’t perturb the same number for

several times. If we do that, we can estimate the

original data: Let t be the original data, Disguised data: t + R1, t + R2, …, t + Rm

Let Z = [(t+R1)+ … + (t+Rm)] / m Mean: E(Z) = t Variance: Var(Z) = Var(R) / m

Page 5: When Does Randomization Fail to Protect Privacy?

5

This looks familiar … This is the data set (x, x, x, x, x, x, x,

x) Random Perturbation:

(x+r1, x+r2,……, x+rm)

We know this is NOT safe.

Observation: the data set is highly correlated.

Page 6: When Does Randomization Fail to Protect Privacy?

6

Let’s Generalize! Data set: (x1, x2, x3, ……, xm) If the correlation among data

attributes are high, can we use that to improve our estimation (from the disguised data)?

Page 7: When Does Randomization Fail to Protect Privacy?

7

Introduction

A heuristic approach toward privacy analysis

Principal Component Analysis (PCA)

PCA-based data reconstruction Experiment results Conclusion and future work

Page 8: When Does Randomization Fail to Protect Privacy?

8

Privacy Quantification: A Heuristic Approach

Our goal: to find a best-effort algorithm that

reconstructs the original data, based on the available information.

Definition

n

i

m

jjijiF DDL

mnPM

1 1,

*, ),(

1

Page 9: When Does Randomization Fail to Protect Privacy?

9

How to use the correlation? High Correlation Data

Redundancy Data Redundancy Compression Our goal: Lossy compression:

We do want to lose information, but We don’t want to lose too much data, We do want to lose the added noise.

Page 10: When Does Randomization Fail to Protect Privacy?

10

PCA Introduction

The main use of PCA: reduce the dimensionality while retaining as much information as possible.

1st PC: containing the greatest amount of variation.

2nd PC: containing the next largest amount of variation.

Page 11: When Does Randomization Fail to Protect Privacy?

11

Original Data

Page 12: When Does Randomization Fail to Protect Privacy?

12

After Dimension Reduction

Page 13: When Does Randomization Fail to Protect Privacy?

13

For the Original Data They are correlated. If we remove 50% of the

dimensions, the actual information loss might be less than 10%.

Page 14: When Does Randomization Fail to Protect Privacy?

14

For the Random Noises They are not correlated. Their variance is evenly distributed

to any direction. If we remove 50% of the

dimensions, the actual noise loss should be 50%.

Page 15: When Does Randomization Fail to Protect Privacy?

15

Data Reconstruction Applying PCA

Find Principle Components: C = Q QT

Set to be the first p columns of Q. Reconstruct the data:

Q

TTT

T

QQQQQQ)(

Q Q Y X

RXRX

Page 16: When Does Randomization Fail to Protect Privacy?

16

Random Noise R How does affect accuracy? Theorem:

,)()(m

pRVarQQRVar T

TQQR

Page 17: When Does Randomization Fail to Protect Privacy?

17

How to Conduct PCA on Disguised Data?

Estimating Covariance Matrix

jifor ),,(

jifor ,),(

),(),(

2

ji

ji

jjiiji

XXCov

XXCov

RXRXCovYYCov

Page 18: When Does Randomization Fail to Protect Privacy?

18

Experiment 1: Increasing the Number of Attributes

Normal Distribution Uniform Distribution

Page 19: When Does Randomization Fail to Protect Privacy?

19

Experiment 2: Increasing the number of Principal Components

Normal Distribution Uniform Distribution

Page 20: When Does Randomization Fail to Protect Privacy?

20

Experiment 3: Increasing Standard Deviation of Noises

Normal Distribution Uniform Distribution

Page 21: When Does Randomization Fail to Protect Privacy?

21

Conclusions

Privacy analysis based on individual attributes is not sufficient. Correlation can disclose information.

PCA can filter out some randomness from a highly correlated data set.

When does randomization fail: Answer: when the data correlation is high. Can it be cured?

Page 22: When Does Randomization Fail to Protect Privacy?

22

Future Work How to improve the randomization

to reduce the information disclosure? Making random noises correlated?

How to combine the PCA with the univariate data reconstruction?