Near-Optimal Private Approximation Protocols via a Black Box Transformation
description
Transcript of Near-Optimal Private Approximation Protocols via a Black Box Transformation
Near-Optimal Private Approximation Protocols
via a Black Box Transformation
David WoodruffIBM Almaden
Outline1. Communication Protocols and Goals
2. Private Approximation Protocols
3. Previous Work
4. Our Results
5. Proof of our Main Transformation
t-Party Communication Model
x2
x1
What is f(x1, x2, …, xt)?
x3 xt-1
xt
…
Application – IP session data
Source Destination
Bytes Duration
Protocol
18.6.7.110.6.2.311.1.0.612.3.1.5…
19.7.3.212.3.4.811.6.8.214.7.0.1…
40K20K58K30K…
28182232…
httpftphttphttp…
AT & T collects 100+ GBs of NetFlow everyday
Application – IP Session Data
AT & T needs to process massive stream of network data
Traffic estimationWhat fraction of network IP addresses are active?Distinct elements computation
Traffic analysis What are the 100 IP addresses with the most traffic? Frequent items computation
Security/Denial of Service Are there any IP addresses witnessing a spike in traffic? Skewness computation
Application – Secure Datamining
For medical research, hospitals wish to mine their joint data
Patient confidentiality imposes strict laws on what information can be shared. Mining cannot leak anything sensitive
Protocol Goals Communication Complexity: Minimize total
number of bits exchanged between the parties
Round Complexity: Minimize total number of messages exchanged between the parties
Computational Complexity: Minimize workload of the parties
Privacy: No party should learn unnecessary information about another party’s input
Outline1. Communication Protocols and Goals
2. Private Approximation Protocols
3. Previous Work
4. Our Results
5. Proof of our Main Transformation
Initial Observations
Even if the parties are randomized, unless they output approximate answers, the communication is large
How do we cope?
Computing many functions for which the parties are deterministic require a huge amount of communication
Settle for an approximation
Allow randomness and a small chance of error
How do we cope?
This helps with communication, round, and computational complexity, but what is a private randomized approximation?
Privacy Definition
What does privacy mean for approximating a function f?
8 i: Party i does not learn anything about xj
, j i, other than what follows from xi and f(x1, …, xt)
First, what does privacy mean for computing a function f?
8 i: Party i not learn anything about xj, j i, other than what follows from xi and the approximation to f(x1, …, xt)
Not Sufficient!!
MinimalRequirement
Does thiswork?
Privacy Definition
x1 2 {0,1}n x2 2 {0,1}n
Party 1 Party 2
Set the LSB of the approximation f’(x1, x2) to be LSB of x2, and the remaining bits of f’(x1, x2) to agree with those of f(x1, x2)
f’(x1, x2) is a +/- 1 approximation to f(x1, x2), but Alice learns LSB of x2 , which doesn’t follow from x1 and f(x1, x2)
What is the Hamming Distance f(x1, x2) between x1 and x2?
New Privacy Definition [FIMNSW]
What does privacy mean for approximating a function f?
8 i: Party i does not learn anything about xj, j i, other than what follows from xi and f(x1, …, xt)
f’(x1, …, xt) is determined by f(x1, …, xt) and the randomness
NewRequirement
Implications
So, we allow for approximation to reduce communication,but we define privacy with respect to exact computation
Simplifications for This Talk
- We only consider two parties in the rest of the talk
- Their names are Alice and Bob
- Their inputs are x and y
What Can Alice and Bob do to Breach Privacy?
x y
Alice Bob
Semi-honest: parties follow their instructions but try to learn more than what is prescribed
Malicious: parties deviate from the protocol arbitrarily- Use a different input- Force other party to output wrong answer- Abort before other party learns answer
Difficult to achieve security in
malicious model…
Reductions – Yao, GMW, NN
Protocolsecure in thesemi-honest
model
Protocolsecure in the
malicious model
Efficiency of the new protocol =
Efficiency of the old protocol
It suffices to design protocols in the semi-honest model
The parties follow the instructions of the protocol.Don’t need to worry about “weird” behavior.
Just make sure neither party learns anything about the other party’s input, other than what follows from the exact function value
More Simplifications
Complicated Protocol
AliceInput xRandom string rA
BobInput yRandom string
rB
Output f’(x,y)
Using known techniques, just need efficient
simulators SA and SB that depend only on x, y, rA, rB and f(x,y)
Simulators
SA(x, f(x,y))
=negl(n) (rB, y, f’(x,y))
=negl(n) (rA, x, f’(x,y))
SB(y, f(x,y))
Outline1. Communication Protocols and Goals
2. Private Approximation Protocols
3. Previous Work
4. Our Results
5. Proof of our Main Transformation
Known Private Approximations
Communication
Rounds Computation
Papers
Lp-norm0 < p · 2
O*(1) O*(1) O*(n) [IW][KMSZ][MM]
L2-heavy hitters(reveals L2)
O*(1) O*(1) O*(n) [KMSZ]
“Even functions that are efficiently computablefor moderately sized data sets are often not efficiently
computable for massive data sets.” [FIMNSW]
What about all of these problems? Lp-norm for p > 2 and p = 0 Lp-heavy hitters for every p Lp-sampling Max Dominance Norm Distinct Summation Empirical Entropy Cascaded Moments Subspace Approximation L2-distance to independence Etc.
Other Related Work Can privately approximate the
permanent of a matrix [FIMNSW] Some NP-hard problems can be privately
approximated if leak a few bits [HKKN] Many NP-hard problems cannot be
privately approximated even when leaking a large number of bits [BHN]
If answer is not unique, e.g., search problem, private approximations even harder to come by [BCNW]
Outline1. Communication Protocols and Goals
2. Private Approximation Protocols
3. Previous Work
4. Our Results
5. Proof of our Main Transformation
Our Main Transformation• Suppose f =Σi=1
n g(xi, yi)• suppose g is non-negative and efficiently computable
• Let ¦ be an arbitrary non-private protocol for approximating f up to a (1 ± 1/log n)-factor with probability ¸ 2/3
• Then there is a private approximation protocol ¦’ for approximating f up to a (1 ± ε)-factor with probability ¸ 2/3
• The communication, round, and computational complexity of ¦’ agree with that of ¦ up to a poly(log n / ε) factor
Near-Optimal Private Approximation Protocols
Communication
Work
Lp-distance, p > 2Lp-Heavy hitters,Lp-sampling
O*(n1-2/p) O*(1)
Max-Dominance Norm
O*(1) O*(n)
Distinct Summation
O*(1) O*(n)
Empirical Entropy
O*(1) O*(n)
Subspace Approximation
O*(d) O*(nd)
Other Private Approximations
Also obtain near-optimal bounds for: Cascaded frequency moments L2-distance to Independence
Using [BO], we get O*(1) communication for any g(xi, yi) = h(xi-yi) where h has “at most quadratic growth’’
Weaker Assumptions
If non-private protocol ¦ is a “simultaneous protocol”, then it is enough to assume symmetrically private information retrieval with polylog(n) communication [CMS, NP]
Outline1. Communication Protocols and Goals
2. Private Approximation Protocols
3. Previous Work
4. Our Results
5. Proof of our Main Transformation
Main Transformation Given a non-private approximation protocol ¦ for
approximating f(x,y) = Σi=1n g(xi, yi), we design a private
approximation protocol ¦’
Main Theorem: There is a low-communication importance sampling procedure which:
If B is an upper bound on f(x,y),
Then Alice and Bob sample from a distribution ¹ on [n] [ ? :8 i 2 [n], ¹(i) = g(xi, yi)/B
¹(?) = 1- f(x,y)/B
How do we design ¦’
given such a procedure?
Importance Sampling Procedureobtains samples from [n] [ ?.
1-Pr [obtain ?] = f(x,y)/B
Private Approximation Protocol
Thus, this probability depends only on f(x,y)!
1. Let B be an upper bound on f(x,y)2. The protocol outputs a bit c. 3. Since c is a bit, it is determined from its expectation.
Pr[c = 1] = 1-Pr[obtain ?] = f(x,y)/B · 1
Repeat a few times to get
concentration
If most repetitions return c = 0,
replace B with B/2, and repeat
The process of halving B
depends only on f(x,y), which
helps for simulation
Once B < 2f(x,y), with very high
probability, enough coin tosses are 1
What’s left? Need an importance sampling procedure, and
show our overall approximation protocol is simulatable
We can’t sample exactly from ¹ on [n] [ ? : 8 i 2 [n], ¹(i) = g(xi, yi)/B
¹(?) = 1- f(x,y)/B
We can sample from a distribution with negl(n) distance from ¹
Notation
For input vectors x and y,
let f[a,b] = Σi=ab g(xi, yi)
Importance Sampling
x, rA y, rB
¦ is a non-private protocol for (1/log n, negl(n))-approximating f = Σi=1
n g(xi, yi),
Use ¦ to estimate f[1, n/2], obtaining f*[1, n/2]Use ¦ to estimate f[n/2+1, n], obtaining f*[n/2+1, n]Recurse on [1, n/2] with probability
f*[1,n/2]/(f*[1,n/2] + f*[n/2+1, n])Else recurse on [n/2+1, n]
f*[1, n/2] is a (1 ± 1/log n)-approximation to f[n/2]
Importance Samplingf[1,8]
f[1,4] f[5,8]
f[1,2] f[3,4]
g(x3, y3) g(x4, y4)
With probability f*[1,4]/(f*[1,4] + f*[5,
8])go left, else go rightWith probability
f*[1,2]/(f*[1,2] + f*[3, 4])
go left, else go rightWith probability g(x3, y3)/(g(x3, y3)+g(x4,
y4))go left, else go right
Pr[g(x3, y3) chosen] =
f*[1,4]/(f*[1,4]+f*[5,8])x
f*[3,4]/(f*[1,2]+f*[1,4])x
g(x3, y3)/(g(x3, y3)+g(x4, y4))=
C*g(x3, y3)/f(x,y)
Importance Sampling Procedure gives a way to sample from a distribution ½:
½(i) = Ci ¢ g(xi,yi)/f(x,y),where Ci 2 [1/2, 2]
If i is sampled, then we know the probability ½(i) that we chose it
We can also obtain g(xi, yi) efficiently
With probability g(xi,yi)/(½(i)¢B), output i, else output ? !
Pr[don’t output ?] = i ½(i)¢g(xi,yi)/(½(i)¢B)= f(x,y)/B
Hence, we sample from ¹:
8 i 2 [n], ¹(i) = g(xi, yi)/B ¹(?) = 1- f(x,y)/B
(up to negl(n), since small probability ¦ fails)
Simulators
For f’(x,y) , SA generates random coins with expectation f(x,y)/B, and keeps halving B until there are enough coin tosses equal to 1
For rA, SA outputs a random rA SA outputs (rA, x, f’(x,y)) which is equal to
the distribution in ¦’ except with negl(n) probability
SA(x, f(x,y)) =negl(n) (rA, x, f’(x,y))
Conclusions Any non-private approximation protocol for a
function f = Σi=1n g(xi, yi) can be transformed into a
private one with an O*(1) blowup in complexity
Many problems can be expressed this way (e.g., lp-norms), even non-obvious ones (e.g., entropy), for which we had no technique of achieving a private approximation
What about other functions?