k-Jump Strategy for Preserving Privacy in Micro-Data Disclosure

35
Center for Secure Information Systems Concordia Institute for Information Systems Engineering k-Jump Strategy for Preserving Privacy in Micro-Data Disclosure Wen Ming Liu 1 , Lingyu Wang 1 , and Lei Zhang 2 1 Concordia University 2 George Mason University ICDT 2010 CIISE / CSIS March 23 , 2010

description

k-Jump Strategy for Preserving Privacy in Micro-Data Disclosure. Wen Ming Liu 1 , Lingyu Wang 1 , and Lei Zhang 2 1 Concordia University 2 George Mason University ICDT 2010. March 23 , 2010. CIISE / CSIS. Agenda. Background. K-Jump Strategy. Data Utility Comparison. - PowerPoint PPT Presentation

Transcript of k-Jump Strategy for Preserving Privacy in Micro-Data Disclosure

Page 1: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

Center for Secure Information Systems

Concordia Institute for Information Systems Engineering

k-Jump Strategy for Preserving Privacy in Micro-Data Disclosure

Wen Ming Liu1, Lingyu Wang1, and Lei Zhang2

1 Concordia University2 George Mason University

ICDT 2010

CIISE / CSIS March 23 , 2010

Page 2: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

Agenda

2

Background

K-Jump Strategy

Data Utility Comparison

Conclusion

Page 3: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

Agenda

3

Background

K-Jump Strategy

Data Utility Comparison

Conclusion

Example

Algorithm anaive and asafe

Page 4: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

Data Holder’s View

4

Example

Page 5: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

Example – Data Holder’s View

Name DoB Condition

Alice 1990 flu

Bob 1985 cold

Charlie 1974 cancer

David 1962 cancer

Eve 1953 headache

Fen 1941 toothache

Micro-Data Table t0

5

DoB Condition

1970~1999

1940~1969

Generalization g2(t0)

Goal:Release table to satisfy 2-diversity

generalization

Name: identifier. DoB: quasi-identifier.Condition: sensitive attribute.

Data Holder

DoB Condition

1980~1999

1960~1979

1940~1959

Generalization g1(t0)

generalization algorithm: considering generalization function g1 and then g2 in order

Goal:Release table to satisfy 2-diversity

DoB Condition

1980~1999

1960~1979

1940~1959

generalization function g1()

DoB Condition

1970~1999

1940~1969

generalization function g2()

Condition

flu

cold

cancer

cancer

headache

toothache

2-diversity? 2-diversity?

Condition

flu

cold

cancer

cancer

headache

toothache

generalization

Released!

DoB Condition

1970~1999 flu

cold

cancer

1940~1969 cancer

headache

toothache

Generalization g2(t0)

Released!

Page 6: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

Adversary’s View

6

Example (cont.)

Page 7: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

Example (cont.) – Adversary’s View

Name DoB Condition

Alice 1990

Bob 1985

Charlie 1974

David 1962

Eve 1953

Fen 1941

Public Knowledge

7

DoB Condition

1970~1999 flu

cold

cancer

1940~1969 cancer

headache

toothache

ReleasedGeneralization g2(t0)

Name: identifier. DoB: quasi-identifier.Condition: sensitive attribute.

permutation setWhat can adversary infer?

Adversary

Name DoB Condition

Alice 1990 ???

Bob 1985 ???

Charlie 1974 ???

David 1962 ???

Eve 1953 ???

Fen 1941 ???

UnknownMicro-Data Table t0

t1

A flu

B col

C can

D can

E hac

F tac

t2

flu

can

col

can

hac

tac

t3

col

flu

can

can

hac

tac

t4

col

can

flu

can

hac

tac

t35

can

flu

col

tac

hac

can

t36

can

col

flu

tac

hac

can

Attacker knows: generalization public knowledge privacy property

Goal:Guess what is the

micro-data

DoB Condition

1970~1999 flu

cold

cancer

1940~1969 cancer

headache

toothache

ReleasedGeneralization g2(t0)

The three persons in each group may have the three conditions in any given order.

Page 8: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

This would be the adversary’s best guesses of the micro-data table, if the released

generalization is his/her only knowledge,However …

8

t1

A flu

B col

C can

D can

E hac

F tac

t2

flu

can

col

can

hac

tac

t3

col

flu

can

can

hac

tac

t4

col

can

flu

can

hac

tac

t35

can

flu

col

tac

hac

can

t36

can

col

flu

tac

hac

can

permutation set

Example (cont.)

Page 9: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

Example (cont.) – Adversary Simulating the Algorithm

9

However, adversary also knows the generalization algorithm, and can simulate the algorithm to further exclude some invalid guesses.

Page 10: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

DoB Condition

1980~1999 ???

???

1960~1979 ???

???

1940~1959 ???

???

Generalization g1(ti)

Name DoB Condition

Alice 1990 ???

Bob 1985 ???

Charlie 1974 ???

David 1962 ???

Eve 1953 ???

Fen 1941 ???

Possible Table ti

Example (cont.) – Adversary Simulating the Algorithm

Name DoB Condition

Alice 1990 ???

Bob 1985 ???

Charlie 1974 ???

David 1962 ???

Eve 1953 ???

Fen 1941 ???

UnknownMicro-Data Table t0

10

DoB Condition

1970~1999 flu

cold

cancer

1940~1969 cancer

headache

toothache

ReleasedGeneralization g2(t0)

DoB Condition

1980~1999 ???

???

1960~1979 ???

???

1940~1959 ???

???

Checked but unusedGeneralization g1(t0)

disclosure setpermutation set

t1

flu

cold

cancer

cancer

headache

toothache

t1

flu

cold

cancer

cancer

headache

toothache

Violate privacy!Satisfyprivacy!

t2

flu

cancer

cold

cancer

headache

toothache

t2

flu

cancer

cold

cancer

headache

toothache

t3

cold

flu

cancer

cancer

headache

toothache

t3

cold

flu

cancer

cancer

headache

toothache

t4

cold

cancer

flu

cancer

headache

toothache

t4

cold

cancer

flu

cancer

headache

toothache

t35

cancer

flu

cold

toothache

headache

cancer

t35

cancer

flu

cold

toothache

headache

cancer

t36

cancer

cold

flu

toothache

headache

cancer

t36

cancer

cold

flu

toothache

headache

cancer

Name DoB

Alice 1990

Bob 1985

Charlie 1974

David 1962

Eve 1953

Fen 1941

t1

flu

cold

cancer

cancer

headache

toothache

t3

cold

flu

cancer

cancer

headache

toothache

t7

flu

cold

cancer

cancer

toothache

headache

t9

cold

flu

cancer

cancer

toothache

headache

t1

flu

col

can

can

hac

tac

t2

flu

can

col

can

hac

tac

A

B

C

D

E

F

t3

col

flu

can

can

hac

tac

t4

col

can

flu

can

hac

tac

t35

can

flu

col

tac

hac

can

t36

can

col

flu

tac

hac

can

Is this the valid guess of the

micro-data table?

Let’s try to check it using the algorithm!

Sim

ulat

ing

the

algo

rith

mM

enta

l im

age

Page 11: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

11

Decision Process of Safe and Unsafe Algorithms 

per1

g1 g2

t0

g1(t0)

YN

per2

g2(t0)

YN

gi

peri

gi(t0)

YN

gn

pern

gn(t0)

YN... ...

ds1

per1

g1 g2

t0

g1(t0)

YN

ds2

per2

g2(t0)

YN

gi

dsi

peri

gi(t0)

YN

gn

dsn

pern

gn(t0)

YN... ...

anaive

asafe

 

evaluation path

box: the ith iteration diamond:

an evaluation of the privacy property

per: permutation set ds: disclosure set

Most existing generalization algorithms (without considering this problem):

Evaluate the permutation set.(Adversary’s mental image of the micro-data table without the knowledge about the algorithm)

Safe generalization algorithms (Zhang’07ccs, ….)

Evaluate the disclosure set, instead.(Adversary’s mental image of the micro-data table after simulating the algorithm)

Page 12: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

Agenda

12

Background

Data Utility Comparison

Conclusion

The Algorithm Family ajump( k )

Properties of ajump( k )

K-Jump Strategy

Page 13: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

13

The Algorithm Family ajump(k) 

ds1

per1

g1 g2

t0

g1(t0)

YN

g2(t0)

N

g2+k

g2+k(t0)

N

gn

gn(t0)

N... ...

ajump(k)

Y

ds2

per2

Y

Y

ds2+k

per2+k

Y

Y

dsn

pern

Y

YN N

N

 

naive strategy : evaluate privacy property on permutation set only safe strategy : evaluate privacy property on disclosure set directly k-jump strategy: penalize by jumping over the next k-1 iterations

naive strategy: efficient but unsafe safe strategy : safe but costly

Page 14: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

14

Properties of ajump(k)  

Computation of the disclosure set

ds(g1(t0)) and ds(g2(t0))

Size of the family

asafe: to compute ds(gi(t0)), must first compute ds(gj(t)) for all t in per(gi(t0)) and j=1,2, … ,i-1

ds1

per1

g1 g2

t0

g1(t0)

YN

g2(t0)

N

g2+k

g2+k(t0)

N

gn

gn(t0)

N... ...

ajump(k)

Yds2

per2

Y

Yds2+k

per2+k

Y

Ydsn

pern

Y

YN N

N

ajump: to compute ds(gi(t0)) (2<i<2+k), no longer need to compute ds(g2(t)) for all t in per(gi(t0))

ds(g1(t0)) = per(g1(t0))

ds(g2(t0)) is independent of the distance vector.

There are (n-1)! different jump distance vectors.

Page 15: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

Agenda

15

Background

Conclusion

Construction for Theorem 1: 1-jump and i-jump (1<i) incomparable

Construction for Theorem 2: i-jump and j-jump (1<i<j) incomparable

Construction for Theorem 3: K1-jump and K2-jump (K1,K2: vector) incomparable

Construction for proposition 2: Reusing generalization functions

Results on asafe and ajump(1)

K-Jump Strategy

Data Utility Comparison

Page 16: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

Construction for Theorem1:1-jump and i-jump (1<i) incomparable

16

QID g1 g2 g3 …

A C0 C0 C0 …

B C1 C1 C1 …

C C2 C2 C2 …

D C3 C3 C3 …

E C4 C4 C4 …

F C5 C5 C5 …

G C6 C6 C6 …

H C6 C6 C6 …

I C6 C6 C6 …

J C7 C7 C7 …

K C7 C7 C7 …

L C8 C8 C8 …

M C8 C8 C8 …

N C9 C9 C9 …

O C9 C9 C9 …

privacy property :highest ratio of a

sensitive value in a group must be no greater than

1/2

To compute ds3k(t0):

Excluding any table t for which p(per1(t))=true

1

S1 S2 S3 S4

A C0 C0 C0 C0

B C1 C1 C1 C1

C C2 C2 C2 C2

D C3 C3 C3 C3

E C4 C4 C4 C4

F C5 C5 C5 C5

G C6 C6 C6 C6

H C6 C6 C6 C6

I C6 C8/C9 C7/C9 C7/C8

J C7 C6 C6 C6

K C7 C8 C7 C7

L C8 C9 C9 C8

M C8 C8/C9 C7/C9 C7/C8

N C9 C7 C8 C9

O C9 C7 C8 C9

# 4320 1152 1152 1152

Belongs to one of the four disjoint sets.

S1 S2 S3 S4

A C0 C0 C0 C0

B C1 C1 C1 C1

C C2 C2 C2 C2

D C3 C3 C3 C3

E C4 C4 C4 C4

F C5 C5 C5 C5

G C6 C6 C6 C6

H C6 C6 C6 C6

I C6 C8/C9 C7/C9 C7/C8

J C7 C6 C6 C6

K C7 C8 C7 C7

L C8 C9 C9 C8

M C8 C8/C9 C7/C9 C7/C8

N C9 C7 C8 C9

O C9 C7 C8 C9

# 4320 1152 1152 1152

Page 17: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

Construction for Theorem1 (cont.) : 1-jump and i-jump (1<i)

17

QID g1 g2 g3 …

A C0 C0 C0 …

B C1 C1 C1 …

C C2 C2 C2 …

D C3 C3 C3 …

E C4 C4 C4 …

F C5 C5 C5 …

G C6 C6 C6 …

H C6 C6 C6 …

I C6 C6 C6 …

J C7 C7 C7 …

K C7 C7 C7 …

L C8 C8 C8 …

M C8 C8 C8 …

N C9 C9 C9 …

O C9 C9 C9 …

privacy property :highest ratio of a

sensitive value in a group must be no greater than

1/2

To compute ds3k(t0):

S1 S2 S3 S4

A C0 C0 C0 C0

B C1 C1 C1 C1

C C2 C2 C2 C2

D C3 C3 C3 C3

E C4 C4 C4 C4

F C5 C5 C5 C5

G C6 C6 C6 C6

H C6 C6 C6 C6

I C6 C8/C9 C7/C9 C7/C8

J C7 C6 C6 C6

K C7 C8 C7 C7

L C8 C9 C9 C8

M C8 C8/C9 C7/C9 C7/C8

N C9 C7 C8 C9

O C9 C7 C8 C9

# 4320 1152 1152 1152

Excluding any table t for which p(per1(t))=true

Considering generalizing these tables using g2

1

2

S2, S3, S4 cannot be disclosed under g2.

Page 18: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

Construction for Theorem1 (cont.) : 1-jump and i-jump (1<i)

18

QID g1 g2 g3 …

A C0 C0 C0 …

B C1 C1 C1 …

C C2 C2 C2 …

D C3 C3 C3 …

E C4 C4 C4 …

F C5 C5 C5 …

G C6 C6 C6 …

H C6 C6 C6 …

I C6 C6 C6 …

J C7 C7 C7 …

K C7 C7 C7 …

L C8 C8 C8 …

M C8 C8 C8 …

N C9 C9 C9 …

O C9 C9 C9 …

privacy property :highest ratio of a

sensitive value in a group must be no greater than

1/2

To compute ds3k(t0):

S1 S101 S102 S103

A C0 C0 C0 C0

B C1 C1 C1 C1

C C2 C2 C2 C2

D C3 C3 C3 C3

E C4 C4 C4 C4

F C5 C5 C5 C5

G C6 C6 C6 C6

H C6 C6 C6 C6

I C6

C6 C6 C6

J C7 C8 C7 C7

K C7 C8 C7 C7

L C8 C9 C9 C8

M C8 C9 C9 C8

N C9 C7 C8 C9

O C9 C7 C8 C9

# 4320 288 288 288

|S1’|=864

Excluding any table t for which p(per1(t))=true

Considering generalizing these tables using g2

1

2

a. Subsets in S1 which with both N and O have C7, C8,

or C9 cannot be disclosed under g2.

Page 19: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

19

QID g1 g2 g3 …

A C0 C0 C0 …

B C1 C1 C1 …

C C2 C2 C2 …

D C3 C3 C3 …

E C4 C4 C4 …

F C5 C5 C5 …

G C6 C6 C6 …

H C6 C6 C6 …

I C6 C6 C6 …

J C7 C7 C7 …

K C7 C7 C7 …

L C8 C8 C8 …

M C8 C8 C8 …

N C9 C9 C9 …

O C9 C9 C9 …

privacy property :highest ratio of a

sensitive value in a group must be no greater than

1/2

To compute ds3k(t0):

S1 S111 S112 S113

A C0 C0 C0 C0

B C1 C1 C1 C1

C C2 C2 C2 C2

D C3 C3 C3 C3

E C4 C4 C4 C4

F C5 C5 C5 C5

G C6 C6 C6 C6

H C6 C6 C6 C6

I C6

C6 C6 C6

J C7 C7 C7 C7

K C7 C8 C8 C7

L C8 C9 C8 C8

M C8 C9 C9 C9

N C9 C7 C7 C8

O C9 C8 C9 C9

# 4320 1152 1152 1152

|S1\S1’|=3456

Excluding any table t for which p(per1(t))=true

Considering generalizing these tables using g2

1

2

b. For ajump(i),all tables in S1\S1’ will be excluded from

ds3i(t0).

432103 SSSS)(tds 'i

Satisfied!

Construction for Theorem1 (cont.) : 1-jump and i-jump (1<i)

Page 20: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

20

QID g1 g2 g3 …

A C0 C0 C0 …

B C1 C1 C1 …

C C2 C2 C2 …

D C3 C3 C3 …

E C4 C4 C4 …

F C5 C5 C5 …

G C6 C6 C6 …

H C6 C6 C6 …

I C6 C6 C6 …

J C7 C7 C7 …

K C7 C7 C7 …

L C8 C8 C8 …

M C8 C8 C8 …

N C9 C9 C9 …

O C9 C9 C9 …

privacy property :highest ratio of a

sensitive value in a group must be no greater than

1/2

To compute ds3k(t0):

S1 S111 S1111 S1112

A C0 C0 C0 C0

B C1 C1 C1 C1

C C2 C2 C2 C2

D C3 C3 C3 C3

E C4 C4 C4 C4/C5

F C5 C5 C5 C6

G C6 C6 C6 C6

H C6 C6 C6 C4/C5

I C6 C6

C6 C6

J C7 C7 C7 C7

K C7 C8 C8 C8

L C8 C9 C9 C9

M C8 C9 C9 C9

N C9 C7 C7 C7

O C9 C8 C8 C8

# 4320 1152 576 576

Excluding any table t for which p(per1(t))=true

Considering generalizing these tables using g2

1

2

c. For ajump(1),the disclosure set of all

tables in S1\S1’ under g2 do not satisfy the

privacy property.

4321013 SSSS)(tds

The ratio of I being associated with C6 is 5/9.Violated!

Construction for Theorem1 (cont.) : 1-jump and i-jump (1<i)

Page 21: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

21

Show the evaluation paths by figures.

Construction for Theorem2: i-jump and j-jump (1<i<j) incomparable

Page 22: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

22

g1 g2 g3 … gj gj+1 gj+2 …

C0 C0 C0 … C0 C0 C0 …

C1 C1 C1 … C1 C1 C1 …

C2 C2 C2 … C2 C2 C2 …

C3 C3 C3 … C3 C3 C3 …

C4 C4 C4 … C4 C4 C4 …

S S S … S S S …

S S S … S S S …

C5 C5 C5 … C5 C5 C5 …

C6 C6 C6 … C6 C6 C6 …

C7 C7 C7 … C7 C7 C7 …

C8 C8 C8 … C8 C8 C8 …

C9 C9 C9 … C9 C9 C9 …

… … … … … … … …

The case where i-jump has better

utility than j-jump is relatively easier to

construct. We only show the construction

for the other case.

For this construction, generalization

gj+2 will be released for j-jump, while gj+i+1

or after will be released for i-jump.

Construction for Theorem2 (cont.) : i-jump and j-jump (1<i<j)

Page 23: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

23

Construction for Theorem3:

K1-jump and K2-jump (K1,K2:vectors) incomparable

Page 24: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

24

QID g1 g2 g3 g2'

A C1 C1 C1 C1

B C2 C2 C2 C2

C C3 C3 C3 C3

D C4 C4 C4 C4

E C5 C5 C5 C5

F C3 C3 C3 C3

G C3 C3 C3 C3

Cannot be disclosed under g1(.) or g3(.) .

1

The table will lead to disclosing

nothing!

g2 S1 S2 S3

A C1 C1 C1/C2 C1/C2

B C2 C2 C3 C3

C C3 C3 C1/C2 C1/C2

D C4 C4 C3 C4

E C5 C5 C3 C5

F C3 C3 C4 C3

G C3 C3 C5 C3

# 72 24 8 8

40

the jump distance is 1;

the privacy property:highest ratio of a sensitive value in a group must be no greater than ½.

Without reusing g2:

To compute ds2:2

Belongs to one of the three disjoint

sets.

3212 SSSds

Violated!

Construction for proposition2: Reusing generalization functions

Page 25: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

25

QID g1 g2 g3 g2'

A C1 C1 C1 C1

B C2 C2 C2 C2

C C3 C3 C3 C3

D C4 C4 C4 C4

E C5 C5 C5 C5

F C3 C3 C3 C3

G C3 C3 C3 C3

To calculate ds2’, the tables can be

disclosed under g1, g2, and g3 must be excluded from per2’

g3 S1 S2 S3

A C1 C1 C1/C2 C1/C2

B C2 C2 C3 C3

C C3 C3 C1/C2 C1/C2

D C4 C4 C3 C4

E C5 C5 C3 C5

F C3 C3 C4 C3

G C3 C3 C5 C3

# 24 8 8

40

the jump distance is 1;

the privacy property:highest ratio of a sensitive value in a group must be no greater than ½.

g2 is reused as g2’:

S1,S2, and S3 cannot be disclosed under g2, as mentioned above.

1

S2 and S3 cannot be disclosed under g3.

2

Construction for proposition2 (cont.): Reusing generalization functions

Page 26: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

26

QID g1 g2 g3 g2'

A C1 C1 C1 C1

B C2 C2 C2 C2

C C3 C3 C3 C3

D C4 C4 C4 C4

E C5 C5 C5 C5

F C3 C3 C3 C3

G C3 C3 C3 C3

To caculate ds2’, the tables can be

disclosed under g1, g2, and g3 must be excluded from per2’

S1 S11 S12 S13

A C1 C1 C1 C1

B C2 C2 C2 C2

C C3 C3 C3 C3

D C4 C3 C3 C4

E C5 C4/C5 C3 C5

F C3 C3 C4 C3

G C3 C4/C5 C5 C3

# 24 16 4 4

the jump distance is 1;

the privacy property:highest ratio of a sensitive value in a group must be no greater than ½.

g2 is reused as g2’:

S1,S2, and S3 cannot be disclosed under g2, as mentioned above.

1

S2 and S3 cannot be disclosed under g3.

2

S1 can be further divided into three disjoint subsets

3

a. S12 and S13 cannot be disclosed under g3.

Construction for proposition2 (cont.): Reusing generalization functions

Page 27: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

27

QID g1 g2 g3 g2'

A C1 C1 C1 C1

B C2 C2 C2 C2

C C3 C3 C3 C3

D C4 C4 C4 C4

E C5 C5 C5 C5

F C3 C3 C3 C3

G C3 C3 C3 C3

To caculate ds2’, the tables can be

disclosed under g1, g2, and g3 must be excluded from per2’

S1 S11 tA SA1 SA2

A C1 C1 C1 C3 C1/C2/C4

B C2 C2 C2 C3 C1/C2/C4

C C3 C3 C3 C1 C3

D C4 C3 C3 C2 C3

E C5 C4/C5 C4 C4 C1/C2/C4

F C3 C3 C3 C3 C3

G C3 C4/C5 C5 C5 C5

# 24 16 120 12 36

g2 is reused as g2’:

S1,S2, and S3 cannot be disclosed under g2, as mentioned above.

1

S2 and S3 cannot be disclosed under g3.

2

S1 can be further divided into three disjoint subsets

3

b. The tables in subset S11 can be disclosed under

g3.

To compute ds3(t0 in S11):

Excluding any table t for which p(per1(t))=true

A

Belongs to one of the two disjoint sets

(nor under g2).

These subsets cannot be disclosed under g2.

B

one instance

Construction for proposition2 (cont.): Reusing generalization functions

Page 28: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

28

QID g1 g2 g3 g2'

A C1 C1 C1 C1

B C2 C2 C2 C2

C C3 C3 C3 C3

D C4 C4 C4 C4

E C5 C5 C5 C5

F C3 C3 C3 C3

G C3 C3 C3 C3

S12 S13 S2 S3

A C1 C1 C1/C2 C1/C2

B C2 C2 C3 C3

C C3 C3 C1/C2 C1/C2

D C3 C4 C3 C4

E C3 C5 C3 C5

F C4 C3 C4 C3

G C5 C3 C5 C3

# 4 4 8 8

the jump distance is 1;

the privacy property:highest ratio of a sensitive value in a group must be no greater than ½.

g2 is reused as g2’:

321312'2 SSSSds

The ratio of D and E being associated with

C3 are 0.5, which is the highest ratio.

Satisfied!

Construction for proposition2 (cont.): Reusing generalization functions

Page 29: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

Results on asafe and ajump(1)

29

Lemma 3: p(per(t0))=false p(any of its subsets)=false

Corollary 1:The algorithm asafe has the same data utility as ajump(1)

1. When the privacy property is:either set-monotonicor based on the highest ratio of sensitive values

Lemma 4: The ds3 under asafe is a subset of that under ajump(1)

Theorem 5:The data utility of asafe and ajump(1) is generally incomparable.

2. When the privacy property is other cases:

Page 30: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

Agenda

30

Background

K-Jump Strategy

Data Utility Comparison

Conclusion

Page 31: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

Conclusion

31

We have proposed a novel k-jump strategy for micro-data disclosure.

Transform a given generalization algorithm into a large number of safe algorithms.

Show the data utility is generally incomparable by constructing counter-examples.

Practical impact: make a secret choice.

Page 32: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

Further Result and Future Work

32

Future studies:

Study more efficient safe algorithms.

Employ statistical methods to compare different k-jump algorithms..

Further investigate the opportunity in reusing generalization

functions.

Further Results in the extended version of this paper:

Computational complexity:

Making a secret choice among unsafe algorithms does not yield a safe

solution.

)|)max((| k

n

perO

Page 33: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

Thank you!

33

Page 34: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

Example – Data Holder View

Name DoB Condition

Alice 1990 flu

Bob 1985 cold

Charlie 1974 cancer

David 1962 cancer

Eve 1953 headache

Fen 1941 toothache

Micro-Data Table t0

34

DoB Condition

1970~1999

1940~1969

Generalization g2(t0)

Goal:Release table to satisfy 2-diversity

generalization

Name: identifier. DoB: quasi-identifier.Condition: sensitive attribute.

Data Holder

DoB Condition

1980~1999

1960~1979

1940~1959

Generalization g1(t0)

generalization algorithm: considering generalization function g1 and then g2 in order

Goal:Release table to satisfy 2-diversity

DoB Condition

1980~1999

1960~1979

1940~1959

generalization function g1()

DoB Condition

1970~1999

1940~1969

generalization function g2()

Condition

flu

cold

cancer

cancer

headache

toothache

2-diversity? 2-diversity?

Condition

flu

cold

cancer

cancer

headache

toothache

generalization

Page 35: k-Jump Strategy for Preserving Privacy in  Micro-Data Disclosure

Toy Example

Name DoB Condition

Alice 1990 flu

Bob 1985 cold

Charlie 1974 cancer

David 1962 cancer

Eve 1953 headache

Fen 1941 toothache

Micro-Data Table t0

35

DoB Condition

1970~1999 flu

cold

cancer

1940~1969 cancer

headache

toothache

Generalization g2(t0)2-diversity

generalized

Name: identifier.

DoB: quasi-identifier.

Condition: sensitive attribute.

permutation setWhat can

attacker infer?

Data Holder

Attacker

Name DoB Condition

Alice 1990 ???

Bob 1985 ???

Charlie 1974 ???

David 1962 ???

Eve 1953 ???

Fen 1941 ???

External Data

t1

A flu

B col

C can

D can

E hac

F tac

t2

flu

can

col

can

hac

tac

t3

col

flu

can

can

hac

tac

t4

col

can

flu

can

hac

tac

t35

can

flu

col

tac

hac

can

t36

can

col

flu

tac

hac

can

Attacker knows: generalization external data privacy property