Vitaly Feldman - GitHub Pages · 2020. 8. 30. · Online monitoring 4 𝑥1,1 𝑥2,1 𝑥3,1...

11
Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity Vitaly Feldman Ulfar Erlingsson Ilya Mironov Ananth Raghunathan Kunal Talwar Abhradeep Thakurta

Transcript of Vitaly Feldman - GitHub Pages · 2020. 8. 30. · Online monitoring 4 𝑥1,1 𝑥2,1 𝑥3,1...

Page 1: Vitaly Feldman - GitHub Pages · 2020. 8. 30. · Online monitoring 4 𝑥1,1 𝑥2,1 𝑥3,1 𝑥𝑛,1 𝑥1,3 𝑥2,3 𝑥3,3 𝑥𝑛,3 𝑥1,2 𝑥2,2 𝑥3,2 𝑥𝑛,2 𝑥1,𝑑

Amplification by Shuffling:From Local to Central Differential Privacy via Anonymity

Vitaly Feldman

Ulfar Erlingsson Ilya Mironov Ananth Raghunathan Kunal Talwar Abhradeep Thakurta

Page 2: Vitaly Feldman - GitHub Pages · 2020. 8. 30. · Online monitoring 4 𝑥1,1 𝑥2,1 𝑥3,1 𝑥𝑛,1 𝑥1,3 𝑥2,3 𝑥3,3 𝑥𝑛,3 𝑥1,2 𝑥2,2 𝑥3,2 𝑥𝑛,2 𝑥1,𝑑

Local Differential Privacy (LDP)

For all 𝑖, 𝐴𝑖 is a local 𝜖-DP randomizer:

for all 𝑣, 𝑣′ ∈ 𝑋

[Warner ‘65; EGS ‘03; KLNRS ‘08]

𝐴𝑖(𝑥𝑖 = 𝑣) 𝐴𝑖(𝑥𝑖 = 𝑣′)

Server

𝑥2

𝑥1

𝑥3

𝑥𝑛

𝐴1

𝐴2

𝐴3

𝐴𝑛

Compute (approximately)𝑓(𝑥1, 𝑥2, … , 𝑥𝑛)

Page 3: Vitaly Feldman - GitHub Pages · 2020. 8. 30. · Online monitoring 4 𝑥1,1 𝑥2,1 𝑥3,1 𝑥𝑛,1 𝑥1,3 𝑥2,3 𝑥3,3 𝑥𝑛,3 𝑥1,2 𝑥2,2 𝑥3,2 𝑥𝑛,2 𝑥1,𝑑

Outline

Online monitoring with LDP

3

Benefits of anonymity:

privacy amplification by shuffling

Page 4: Vitaly Feldman - GitHub Pages · 2020. 8. 30. · Online monitoring 4 𝑥1,1 𝑥2,1 𝑥3,1 𝑥𝑛,1 𝑥1,3 𝑥2,3 𝑥3,3 𝑥𝑛,3 𝑥1,2 𝑥2,2 𝑥3,2 𝑥𝑛,2 𝑥1,𝑑

Online monitoring

4

𝑥1,1

𝑥2,1

𝑥3,1

𝑥𝑛,1

𝑥1,3

𝑥2,3

𝑥3,3

𝑥𝑛,3

𝑥1,2

𝑥2,2

𝑥3,2

𝑥𝑛,2

𝑥1,𝑑

𝑥2,𝑑

𝑥3,𝑑

𝑥𝑛,𝑑

Estimate the daily counts 𝑆𝑗 = σ𝑖=1𝑛 𝑥𝑖,𝑗 for all 𝑗 ∈ [𝑑]

𝑥𝑖,𝑗 ∈ {0,1}

Status of user 𝑖 on day 𝑗

Assume that each user’s status

changes at most 𝑘 times

• only for utility

𝑆1 𝑆2 𝑆3 𝑆𝑛

time

Page 5: Vitaly Feldman - GitHub Pages · 2020. 8. 30. · Online monitoring 4 𝑥1,1 𝑥2,1 𝑥3,1 𝑥𝑛,1 𝑥1,3 𝑥2,3 𝑥3,3 𝑥𝑛,3 𝑥1,2 𝑥2,2 𝑥3,2 𝑥𝑛,2 𝑥1,𝑑

Monitoring with LDP

• Report the status changes (only first 𝑘)

• Maintains a tree of counters each over an interval of time

• Based on [DNPR ‘10; CSS ‘11]

5

There exists an 𝜖-LDP algorithm that constructs estimates መ𝑆1, መ𝑆2, … , መ𝑆𝑑 such that with high prob. for all 𝑗 ∈ [𝑑],

𝑆𝑗 − መ𝑆𝑗 = 𝑂𝑛𝑘 (log 𝑑)2

𝜖

Page 6: Vitaly Feldman - GitHub Pages · 2020. 8. 30. · Online monitoring 4 𝑥1,1 𝑥2,1 𝑥3,1 𝑥𝑛,1 𝑥1,3 𝑥2,3 𝑥3,3 𝑥𝑛,3 𝑥1,2 𝑥2,2 𝑥3,2 𝑥𝑛,2 𝑥1,𝑑

Encode-Shuffle-Analyze (ESA) [Bittau et al. ‘17]

6

Server

𝑥2

𝑥1

𝑥3

𝑥𝑛

𝐴1

𝐴2

𝐴3

𝐴𝑛

Shuffle and anonymize

Page 7: Vitaly Feldman - GitHub Pages · 2020. 8. 30. · Online monitoring 4 𝑥1,1 𝑥2,1 𝑥3,1 𝑥𝑛,1 𝑥1,3 𝑥2,3 𝑥3,3 𝑥𝑛,3 𝑥1,2 𝑥2,2 𝑥3,2 𝑥𝑛,2 𝑥1,𝑑

Privacy amplification by shuffling

7

For any 𝜖 = 𝑂(1) and any sequence of 𝜖-LDP algorithms (𝐴1, … , 𝐴𝑛), let

𝐴shuffle 𝑥1, … , 𝑥𝑛 = 𝐴1 𝑥𝜋 1 , 𝐴2 𝑥𝜋 2 , … , 𝐴𝑛 𝑥𝜋 𝑛

for a random and uniform permutation 𝜋: 𝑛 → 𝑛

Then 𝐴shuffle is 𝜖′, 𝛿 -DP in the central model for 𝜖′ = 𝑂𝜖 log 1/𝛿

𝑛

Holds for adaptive case: 𝐴𝑖 may depend on outputs of 𝐴1, … , 𝐴𝑖−1

Page 8: Vitaly Feldman - GitHub Pages · 2020. 8. 30. · Online monitoring 4 𝑥1,1 𝑥2,1 𝑥3,1 𝑥𝑛,1 𝑥1,3 𝑥2,3 𝑥3,3 𝑥𝑛,3 𝑥1,2 𝑥2,2 𝑥3,2 𝑥𝑛,2 𝑥1,𝑑

Comparison with subsampling

Advantages of shuffling:

• does not affect the statistics of the dataset

• does not increase LDP cost

8

Running 𝜖-DP algorithm on random 𝑞-fraction of elements is ≈ 𝑞𝜖-DP (𝜖 ≤ 1) [KLNRS ‘08]

Output 𝐴1 𝑥𝑖1 , 𝐴2 𝑥𝑖2 , … , 𝐴𝑛 𝑥𝑖𝑛

where 𝑖1, 𝑖2, … , 𝑖𝑛 ∼ [𝑛] (independently) is 𝜖′, 𝛿 -DP for 𝜖′ = 𝑂𝜖 log 1/𝛿

𝑛

e.g. [BST ‘14]

Shuffling includes all elements so 𝑞 = 1

Page 9: Vitaly Feldman - GitHub Pages · 2020. 8. 30. · Online monitoring 4 𝑥1,1 𝑥2,1 𝑥3,1 𝑥𝑛,1 𝑥1,3 𝑥2,3 𝑥3,3 𝑥𝑛,3 𝑥1,2 𝑥2,2 𝑥3,2 𝑥𝑛,2 𝑥1,𝑑

Server

𝑥2

𝑥1

𝑥3

𝑥𝑛

𝐴1

𝐴2

𝐴3

𝐴𝑛

Shuffle and anonymize

Implications for ESA

9

Set 𝑆 ⊆ [𝑛] with the

same randomizer

For every 𝑖 ∈ 𝑆, the output is 𝑂𝜖 log 1/𝛿

𝑆, 𝛿 -DP for element at position 𝑖

Page 10: Vitaly Feldman - GitHub Pages · 2020. 8. 30. · Online monitoring 4 𝑥1,1 𝑥2,1 𝑥3,1 𝑥𝑛,1 𝑥1,3 𝑥2,3 𝑥3,3 𝑥𝑛,3 𝑥1,2 𝑥2,2 𝑥3,2 𝑥𝑛,2 𝑥1,𝑑

Output distribution is determined by 𝑚 = #1(RR(𝑥1), … , RR(𝑥𝑛))

𝑚 ∼ Bin 𝑘,2

3+ Bin 𝑛 − 𝑘,

1

3, where 𝑘 = #1(𝑥1, … , 𝑥𝑛)

For a neighboring dataset: 𝑘′ = 𝑘 ± 1

Bin 𝑘,2

3+ Bin 𝑛 − 𝑘,

1

3≈

log 1/𝛿𝑛 ,𝛿

Bin 𝑘 + 1,2

3+ Bin 𝑛 − 𝑘 − 1,

1

3

[DKMMN ‘06]

Special case: binary randomized response

10

RR: For 𝑥 ∈ 0,1 , return 𝑥 flipped with probability 1/3. Satisfies (log 2)-LDP

Also given in [Cheu,Smith,Ullman,Zeber,Zhilyaev ‘18] (independently)

Page 11: Vitaly Feldman - GitHub Pages · 2020. 8. 30. · Online monitoring 4 𝑥1,1 𝑥2,1 𝑥3,1 𝑥𝑛,1 𝑥1,3 𝑥2,3 𝑥3,3 𝑥𝑛,3 𝑥1,2 𝑥2,2 𝑥3,2 𝑥𝑛,2 𝑥1,𝑑

Conclusions

• Monitoring with LDP and log dependence on time

• General privacy amplification technique

o Match state of the art in the central model

o Can be used to derive lower bounds for LDP

• Provable benefits of anonymity for ESA-like architectures

• To appear in SODA 2019

• arxiv.org/abs/1811.12469

11