Near-Optimal Private Approximation Protocols via a Black Box Transformation

Near-Optimal Private Approximation Protocols

via a Black Box Transformation

David WoodruffIBM Almaden

Outline1. Communication Protocols and Goals

2. Private Approximation Protocols

3. Previous Work

4. Our Results

5. Proof of our Main Transformation

t-Party Communication Model

x2

x1

What is f(x1, x2, …, xt)?

x3 xt-1

xt

…

Application – IP session data

Source Destination

Bytes Duration

Protocol

18.6.7.110.6.2.311.1.0.612.3.1.5…

19.7.3.212.3.4.811.6.8.214.7.0.1…

40K20K58K30K…

28182232…

httpftphttphttp…

AT & T collects 100+ GBs of NetFlow everyday

Application – IP Session Data

AT & T needs to process massive stream of network data

Traffic estimationWhat fraction of network IP addresses are active?Distinct elements computation

Traffic analysis What are the 100 IP addresses with the most traffic? Frequent items computation

Security/Denial of Service Are there any IP addresses witnessing a spike in traffic? Skewness computation

Application – Secure Datamining

For medical research, hospitals wish to mine their joint data

Patient confidentiality imposes strict laws on what information can be shared. Mining cannot leak anything sensitive

Protocol Goals Communication Complexity: Minimize total

number of bits exchanged between the parties

Round Complexity: Minimize total number of messages exchanged between the parties

Computational Complexity: Minimize workload of the parties

Privacy: No party should learn unnecessary information about another party’s input



3. Previous Work

4. Our Results


Initial Observations

Even if the parties are randomized, unless they output approximate answers, the communication is large

How do we cope?

Computing many functions for which the parties are deterministic require a huge amount of communication

Settle for an approximation

Allow randomness and a small chance of error

How do we cope?

This helps with communication, round, and computational complexity, but what is a private randomized approximation?

Privacy Definition

What does privacy mean for approximating a function f?

8 i: Party i does not learn anything about xj

, j i, other than what follows from xi and f(x1, …, xt)

First, what does privacy mean for computing a function f?

8 i: Party i not learn anything about xj, j i, other than what follows from xi and the approximation to f(x1, …, xt)

Not Sufficient!!

MinimalRequirement

Does thiswork?

Privacy Definition

x1 2 {0,1}n x2 2 {0,1}n

Party 1 Party 2

Set the LSB of the approximation f’(x1, x2) to be LSB of x2, and the remaining bits of f’(x1, x2) to agree with those of f(x1, x2)

f’(x1, x2) is a +/- 1 approximation to f(x1, x2), but Alice learns LSB of x2 , which doesn’t follow from x1 and f(x1, x2)

What is the Hamming Distance f(x1, x2) between x1 and x2?

New Privacy Definition [FIMNSW]

What does privacy mean for approximating a function f?

8 i: Party i does not learn anything about xj, j i, other than what follows from xi and f(x1, …, xt)

f’(x1, …, xt) is determined by f(x1, …, xt) and the randomness

NewRequirement

Implications

So, we allow for approximation to reduce communication,but we define privacy with respect to exact computation

Simplifications for This Talk

- We only consider two parties in the rest of the talk

- Their names are Alice and Bob

- Their inputs are x and y

What Can Alice and Bob do to Breach Privacy?

x y

Alice Bob

Semi-honest: parties follow their instructions but try to learn more than what is prescribed

Malicious: parties deviate from the protocol arbitrarily- Use a different input- Force other party to output wrong answer- Abort before other party learns answer

Difficult to achieve security in

malicious model…

Reductions – Yao, GMW, NN

Protocolsecure in thesemi-honest

model

Protocolsecure in the

malicious model

Efficiency of the new protocol =

Efficiency of the old protocol

It suffices to design protocols in the semi-honest model

The parties follow the instructions of the protocol.Don’t need to worry about “weird” behavior.

Just make sure neither party learns anything about the other party’s input, other than what follows from the exact function value

More Simplifications

Complicated Protocol

AliceInput xRandom string rA

BobInput yRandom string

rB

Output f’(x,y)

Using known techniques, just need efficient

simulators SA and SB that depend only on x, y, rA, rB and f(x,y)

Simulators

SA(x, f(x,y))

=negl(n) (rB, y, f’(x,y))

=negl(n) (rA, x, f’(x,y))

SB(y, f(x,y))



3. Previous Work

4. Our Results


Known Private Approximations

Communication

Rounds Computation

Papers

Lp-norm0 < p · 2

O*(1) O*(1) O*(n) [IW][KMSZ][MM]

L2-heavy hitters(reveals L2)

O*(1) O*(1) O*(n) [KMSZ]

“Even functions that are efficiently computablefor moderately sized data sets are often not efficiently

computable for massive data sets.” [FIMNSW]

What about all of these problems? Lp-norm for p > 2 and p = 0 Lp-heavy hitters for every p Lp-sampling Max Dominance Norm Distinct Summation Empirical Entropy Cascaded Moments Subspace Approximation L2-distance to independence Etc.

Other Related Work Can privately approximate the

permanent of a matrix [FIMNSW] Some NP-hard problems can be privately

approximated if leak a few bits [HKKN] Many NP-hard problems cannot be

privately approximated even when leaking a large number of bits [BHN]

If answer is not unique, e.g., search problem, private approximations even harder to come by [BCNW]



3. Previous Work

4. Our Results


Our Main Transformation• Suppose f =Σi=1

n g(xi, yi)• suppose g is non-negative and efficiently computable

• Let ¦ be an arbitrary non-private protocol for approximating f up to a (1 ± 1/log n)-factor with probability ¸ 2/3

• Then there is a private approximation protocol ¦’ for approximating f up to a (1 ± ε)-factor with probability ¸ 2/3

• The communication, round, and computational complexity of ¦’ agree with that of ¦ up to a poly(log n / ε) factor

Near-Optimal Private Approximation Protocols

Communication

Work

Lp-distance, p > 2Lp-Heavy hitters,Lp-sampling

O*(n1-2/p) O*(1)

Max-Dominance Norm

O*(1) O*(n)

Distinct Summation

O*(1) O*(n)

Empirical Entropy

O*(1) O*(n)

Subspace Approximation

O*(d) O*(nd)

Other Private Approximations

Also obtain near-optimal bounds for: Cascaded frequency moments L2-distance to Independence

Using [BO], we get O*(1) communication for any g(xi, yi) = h(xi-yi) where h has “at most quadratic growth’’

Weaker Assumptions

If non-private protocol ¦ is a “simultaneous protocol”, then it is enough to assume symmetrically private information retrieval with polylog(n) communication [CMS, NP]



3. Previous Work

4. Our Results


Main Transformation Given a non-private approximation protocol ¦ for

approximating f(x,y) = Σi=1n g(xi, yi), we design a private

approximation protocol ¦’

Main Theorem: There is a low-communication importance sampling procedure which:

If B is an upper bound on f(x,y),

Then Alice and Bob sample from a distribution ¹ on [n] [ ? :8 i 2 [n], ¹(i) = g(xi, yi)/B

¹(?) = 1- f(x,y)/B

How do we design ¦’

given such a procedure?

Importance Sampling Procedureobtains samples from [n] [ ?.

1-Pr [obtain ?] = f(x,y)/B

Private Approximation Protocol

Thus, this probability depends only on f(x,y)!

1. Let B be an upper bound on f(x,y)2. The protocol outputs a bit c. 3. Since c is a bit, it is determined from its expectation.

Pr[c = 1] = 1-Pr[obtain ?] = f(x,y)/B · 1

Repeat a few times to get

concentration

If most repetitions return c = 0,

replace B with B/2, and repeat

The process of halving B

depends only on f(x,y), which

helps for simulation

Once B < 2f(x,y), with very high

probability, enough coin tosses are 1

What’s left? Need an importance sampling procedure, and

show our overall approximation protocol is simulatable

We can’t sample exactly from ¹ on [n] [ ? : 8 i 2 [n], ¹(i) = g(xi, yi)/B

¹(?) = 1- f(x,y)/B

We can sample from a distribution with negl(n) distance from ¹

Notation

For input vectors x and y,

let f[a,b] = Σi=ab g(xi, yi)

Importance Sampling

x, rA y, rB

¦ is a non-private protocol for (1/log n, negl(n))-approximating f = Σi=1

n g(xi, yi),

Use ¦ to estimate f[1, n/2], obtaining f*[1, n/2]Use ¦ to estimate f[n/2+1, n], obtaining f*[n/2+1, n]Recurse on [1, n/2] with probability

f*[1,n/2]/(f*[1,n/2] + f*[n/2+1, n])Else recurse on [n/2+1, n]

f*[1, n/2] is a (1 ± 1/log n)-approximation to f[n/2]

Importance Samplingf[1,8]

f[1,4] f[5,8]

f[1,2] f[3,4]

g(x3, y3) g(x4, y4)

With probability f*[1,4]/(f*[1,4] + f*[5,

8])go left, else go rightWith probability

f*[1,2]/(f*[1,2] + f*[3, 4])

go left, else go rightWith probability g(x3, y3)/(g(x3, y3)+g(x4,

y4))go left, else go right

Pr[g(x3, y3) chosen] =

f*[1,4]/(f*[1,4]+f*[5,8])x

f*[3,4]/(f*[1,2]+f*[1,4])x

g(x3, y3)/(g(x3, y3)+g(x4, y4))=

C*g(x3, y3)/f(x,y)

Importance Sampling Procedure gives a way to sample from a distribution ½:

½(i) = Ci ¢ g(xi,yi)/f(x,y),where Ci 2 [1/2, 2]

If i is sampled, then we know the probability ½(i) that we chose it

We can also obtain g(xi, yi) efficiently

With probability g(xi,yi)/(½(i)¢B), output i, else output ? !

Pr[don’t output ?] = i ½(i)¢g(xi,yi)/(½(i)¢B)= f(x,y)/B

Hence, we sample from ¹:

8 i 2 [n], ¹(i) = g(xi, yi)/B ¹(?) = 1- f(x,y)/B

(up to negl(n), since small probability ¦ fails)

Simulators

For f’(x,y) , SA generates random coins with expectation f(x,y)/B, and keeps halving B until there are enough coin tosses equal to 1

For rA, SA outputs a random rA SA outputs (rA, x, f’(x,y)) which is equal to

the distribution in ¦’ except with negl(n) probability

SA(x, f(x,y)) =negl(n) (rA, x, f’(x,y))

Conclusions Any non-private approximation protocol for a

function f = Σi=1n g(xi, yi) can be transformed into a

private one with an O*(1) blowup in complexity

Many problems can be expressed this way (e.g., lp-norms), even non-obvious ones (e.g., entropy), for which we had no technique of achieving a private approximation

What about other functions?

Near-Optimal Private Approximation Protocols via a Black Box Transformation

Documents

Transcript of Near-Optimal Private Approximation Protocols via a Black Box Transformation