Download - 1 Sampling Lower Bounds via Information Theory Ziv Bar-Yossef IBM Almaden.

1

Sampling Lower Bounds via Information Theory

Ziv Bar-Yossef

IBM Almaden

2

Standard Approach to Hardness of Approximation

• 8a 2 A, b 2 B, f(a) is “far” from f(b).• Given x 2 A [ B, decide if x 2 A.

Hardness of approximation

for f: Xn ! Y

Hardness of a decision

“promise problem”

A BXn“Promise problem”:

3

The “Election Problem”• input: a sequence x of n votes to k parties

7/18 4/18 3/18 2/18 1/18 1/18

(n = 18, k = 6)

• Want to get s.t. || - x|| < .

• How big a poll should we conduct?

Vote Distribution x

• 8 S µ [k], easy to decide between: A = { x | x(S) ¸ ½ + } and B = { x | x(S) · ½ - }.

• Hardness due to the abundance of such decision problems ! poll has to be of size (k).

4

Similarity Hardness vs. Abundance Hardness

In this talk: • A lower bound technique that captures both types of hardness in the

context of sampling algorithms.

Hardness of approximation

for f: Xn ! Y

Hardness of a decision

“promise problem”

Abundance of decision

“promise problems”

Similarity hardness

Abundance hardness

5

Input Data Set

Why Sampling?

Algorithm

A small number of

queries

• Queries can be chosen randomly• Output is typically approximate• Sub-linear time & space

6

Some Examples

Statistics• Statistical decision and estimation• Statistical learning• …

CS• PAC and machine learning• Property testing• Sub-linear time approximation algorithms• Extractors and dispersers• …

7

Query Complexity

Query complexity of a function f:# of queries required to approximate f

Examples:• High query complexity:

– Parity– # of distinct elements

• Low query complexity:– Mean in [0,1]– Median

8

Our Main Result

• A technique for obtaining lower bounds on the query complexity of approximating functions

– Template for obtaining specific lower bounds• Arbitrary domain and range• All types of approximation• Usable for wide classes of functions with symmetry properties

– Outperforms previous techniques for functions with “abundance hardness”

– Matches previous techniques for functions with “similarity hardness”

9

Previous Work• Statistics

– Crámer-Rao inequality– VC dimension– Optimality of the sequential probability ratio test

• CS– Lower bounds via the Hellinger distance [B., Kumar,

Sivakumar 01]

– Specific lower bounds [Canetti, Even, Goldreich 95], [Radhakrishnan, Ta-Shma 96], [Dagum, Karp, Luby, Ross 95], [Schulman, Vazirani 99], [Charikar, Chaudhuri, Motwani, Narasayya 00]

None addresses abundance hardness!

10

Reduction from a Binary Promise Problem

Yf(a) f(b)pairwise

f(c)

“disjoint inputs”

Binary promise problem:

Given x 2 { a, b }, decide whether x = a or x = b

f: Xn ! Y

Multi-Way

, c or x = c

Multi-way

Can be solved by any sampling algorithm approximating f

11

Main Result

The lower bound “recipe”f: Xn ! Y: a function with an appropriate symmetry property

1. Identify a set S = { x1,…,xm } of “pairwise disjoint” inputs.

2. Calculate the “dissimilarity” D(x1,…,xm) among x1,…,xm.

(D(¢,…,¢) is a distance measure taking values in [0,log m]).

Theorem:Any algorithm approximating f requires q queries, where

Tradeoff between “similarity hardness” and “abundance hardness”

12

Measure of Dissimilarity

i : distribution of the value of a uniformly chosen entry of xi

Then:

• Jensen-Shannon divergence

21m

13

Application I: The Election Problem

Previous bounds on the query complexity:(1/2) [BKS01](k) [Batu et al. 00]• O(k/2) [BKS01]

Theorem [This paper] (k/2)

14

Combinatorial Designs

t-design:

PropositionFor all k and for all t ¸ 12, there exists a t-design of size m = 2(k).

B1

B2

B3[k]

15

Proof of the Lower Bound

Step 1: Identification of a set S of pairwise disjoint inputs:

B1,…,Bm µ [k]: a t-design of size m = 2(k).

S = { x1,…,xm }, where

Bi [k]nBi

Step 2: Dissimilarity calculation: D(x1,…,xm) = O(2).

By main theorem, # of queries is at least (k/2).

16

Application II: Low Rank Matrix Approximation

Exact low rank approximation:• Given an m £ n real matrix M and k · m,n, find the m £ n

matrix Mk of rank k for which ||M – Mk||F is minimized.

• Solution: SVD. Requires querying all of M.

Approximate low rank approximation (LRMk):

• Get a rank k martix A, s.t.

||M – A||F · ||M – Mk||F + ||M||F.

Theorem [This paper]

Computing LRMk requires (m + n) queries.

17

Proof of the Lower Bound

0

2k

2k

0

0

0

0Bi

Mi is all-zero, except for the diagonal, which is the characteristic vector of Bi.

• Mi is of rank k (Mi)k = Mi.

• ||Mi||F = k1/2.

• ||Mi – Mj||F ¸ (|Bi n Bj|)1/2 ¸ (k/12)1/2 ¸ (||Mi||F + ||Mj||F).

Step 1: Identification of a set S of pairwise disjoint inputs:

B1,…,Bt µ [2k]: a combinatorial design of size t = 2(k).

S = { M1,…,Mt }, where

Step 2: Dissimilarity calculation: D(M1,…,Mt) = 2k/m.

By main theorem, # of queries is at least (m).

18

Low Rank Matrix Approximation (cont.)

Theorem [Frieze, Kannan, Vempala 98]By querying an s £ s submatrix of M chosen using any distributions which “approximate” the row and column weight distributions of M, one can solve LRMk with s = O(k4/3).

Theorem [This paper]Solving LRMk by querying an s £ s submatrix of M chosen even according to the exact row and column weight distributions of M requires s = (k/2).

19

Oblivious Sampling

• Query positions are independent of the given input.• Algorithm has a fixed query distribution on [n]q.

• i.i.d. queries: queries are independent and identically distributed: = q, where is a distribution on [n].

Phase 1:

Choose query positions i1,…,iq

Phase 2:

Query xi1,…,xiq

20

Main Theorem: Outline of the Proof

Adaptive sampling

Oblivious sampling with i.i.d queries

Statistical classification

Lower bounds via information theory

(For functions with symmetry properties)

21

Statistical Classification

Black Box

1

Classifier

q i.i.d. samples

m

2 i 2 [m]

• 1,…,m are distributions on Z.

• Classifier is required to be correct with probability ¸ 1 - .

22

From Sampling to Classification• T : oblivious algorithm with query distribution = q that

approximates f: Xn ! Y. x : joint distribution of a query and its answer when T

runs on input x (distribution on [n] £ X).• S = {x1,…,xm} : set of pairwise disjoint inputs.

Black Box

x1

T

q i.i.d. samples

xm

x2

Decide i iff T’s output 2 A(xi)

23

Jensen-Shannon Divergence [Lin 91]

• KL divergence between distributions , on Z:

• Jensen-Shannon divergence among distributions 1,…,m on Z: ( = (1/m) i i)

7

6

5

4

3

2

18

24

Main Result

Theorem [Classification lower bound]

Any -error classifier for 1,…,m requires q queries, where

Corollary [Query complexity lower bound]

For any oblivious algorithm with query distribution = q that (,)-approximates f, and for any set S = {x1,…,xm} of “pairwise disjoint” inputs, the number of queries q is at least

25

Outline of the Proof

Lemma 1 [Classification error lower bound]

Proof: by Fano’s inequality.

Lemma 2 [Decomposition of Jensen-Shannon]

Proof: By subadditivity of entropy and conditional independence.

26

Conclusions

• General lower bound technique for the query complexity– Template for obtaining specific bounds– Works for wide classes of functions– Captures both “similarity hardness” and “abundance hardness”

• Applications– The “Election Problem”– Low rank matrix approximation– Matrix reconstruction

• Also proved– A lower bound technique for the expected query complexity– Tightly captures similarity hardness but not abundance hardness

• Open problems– Tight bounds for low rank matrix approximation– Better lower bounds on the expected query complexity– Lower bounds for non-symmetric functions

27

Simulation of Adaptive Sampling by Oblivious Sampling

Definition

f: Xn ! Y is symmetric, if 8x and 8 2 Sn, f((x)) = f(x).

f is -symmetric, if 8x 8, A((x)) = A(x).

Lemma [BKS01]

Any q-query algorithm approximating an -symmetric f can be simulated by a q-query oblivious algorithm whose queries are uniform without replacement.

Corollary

If q < n/2, can be simulated by a 2q-query oblivious algorithm whose queries are uniform with replacement.

28

Simulation Lemma: Outline of the Proof

• T: q-query sampling algorithm approximating f• WLOG, T never queries the same location twice.

Simulation:• Pick a random permutation .• Run T on (x).

• By -symmetry, output is likely to be in A((x)) = A(x).

• Queries to x are uniform without replacement.

29

Extensions

Definitions• f is (g,)-symmetric if 8x, 8, 8y 2 A((x)), g(,y) 2 A(x).

• A function f on m £ n matrices is -row-symmetric, if for all matrices M, and for all row-permutation matrices , A( ¢ M) = A(M).

Similarly:

-column-symmetry, and (g,)-row- and column-symmetry.

We prove: similar simulations hold for all of the above.