1
Sampling Lower Bounds via Information Theory
Ziv Bar-Yossef
IBM Almaden
2
Standard Approach to Hardness of Approximation
• 8a 2 A, b 2 B, f(a) is “far” from f(b).• Given x 2 A [ B, decide if x 2 A.
Hardness of approximation
for f: Xn ! Y
Hardness of a decision
“promise problem”
A BXn“Promise problem”:
3
The “Election Problem”• input: a sequence x of n votes to k parties
7/18 4/18 3/18 2/18 1/18 1/18
(n = 18, k = 6)
• Want to get s.t. || - x|| < .
• How big a poll should we conduct?
Vote Distribution x
• 8 S µ [k], easy to decide between: A = { x | x(S) ¸ ½ + } and B = { x | x(S) · ½ - }.
• Hardness due to the abundance of such decision problems ! poll has to be of size (k).
4
Similarity Hardness vs. Abundance Hardness
In this talk: • A lower bound technique that captures both types of hardness in the
context of sampling algorithms.
Hardness of approximation
for f: Xn ! Y
Hardness of a decision
“promise problem”
Abundance of decision
“promise problems”
Similarity hardness
Abundance hardness
5
Input Data Set
Why Sampling?
Algorithm
A small number of
queries
• Queries can be chosen randomly• Output is typically approximate• Sub-linear time & space
6
Some Examples
Statistics• Statistical decision and estimation• Statistical learning• …
CS• PAC and machine learning• Property testing• Sub-linear time approximation algorithms• Extractors and dispersers• …
7
Query Complexity
Query complexity of a function f:# of queries required to approximate f
Examples:• High query complexity:
– Parity– # of distinct elements
• Low query complexity:– Mean in [0,1]– Median
8
Our Main Result
• A technique for obtaining lower bounds on the query complexity of approximating functions
– Template for obtaining specific lower bounds• Arbitrary domain and range• All types of approximation• Usable for wide classes of functions with symmetry properties
– Outperforms previous techniques for functions with “abundance hardness”
– Matches previous techniques for functions with “similarity hardness”
9
Previous Work• Statistics
– Crámer-Rao inequality– VC dimension– Optimality of the sequential probability ratio test
• CS– Lower bounds via the Hellinger distance [B., Kumar,
Sivakumar 01]
– Specific lower bounds [Canetti, Even, Goldreich 95], [Radhakrishnan, Ta-Shma 96], [Dagum, Karp, Luby, Ross 95], [Schulman, Vazirani 99], [Charikar, Chaudhuri, Motwani, Narasayya 00]
None addresses abundance hardness!
10
Reduction from a Binary Promise Problem
Yf(a) f(b)pairwise
f(c)
“disjoint inputs”
Binary promise problem:
Given x 2 { a, b }, decide whether x = a or x = b
f: Xn ! Y
Multi-Way
, c or x = c
Multi-way
Can be solved by any sampling algorithm approximating f
11
Main Result
The lower bound “recipe”f: Xn ! Y: a function with an appropriate symmetry property
1. Identify a set S = { x1,…,xm } of “pairwise disjoint” inputs.
2. Calculate the “dissimilarity” D(x1,…,xm) among x1,…,xm.
(D(¢,…,¢) is a distance measure taking values in [0,log m]).
Theorem:Any algorithm approximating f requires q queries, where
Tradeoff between “similarity hardness” and “abundance hardness”
12
Measure of Dissimilarity
i : distribution of the value of a uniformly chosen entry of xi
Then:
• Jensen-Shannon divergence
21m
13
Application I: The Election Problem
Previous bounds on the query complexity:(1/2) [BKS01](k) [Batu et al. 00]• O(k/2) [BKS01]
Theorem [This paper] (k/2)
14
Combinatorial Designs
t-design:
PropositionFor all k and for all t ¸ 12, there exists a t-design of size m = 2(k).
B1
B2
B3[k]
15
Proof of the Lower Bound
Step 1: Identification of a set S of pairwise disjoint inputs:
B1,…,Bm µ [k]: a t-design of size m = 2(k).
S = { x1,…,xm }, where
Bi [k]nBi
Step 2: Dissimilarity calculation: D(x1,…,xm) = O(2).
By main theorem, # of queries is at least (k/2).
16
Application II: Low Rank Matrix Approximation
Exact low rank approximation:• Given an m £ n real matrix M and k · m,n, find the m £ n
matrix Mk of rank k for which ||M – Mk||F is minimized.
• Solution: SVD. Requires querying all of M.
Approximate low rank approximation (LRMk):
• Get a rank k martix A, s.t.
||M – A||F · ||M – Mk||F + ||M||F.
Theorem [This paper]
Computing LRMk requires (m + n) queries.
17
Proof of the Lower Bound
0
2k
2k
0
0
0
0Bi
Mi is all-zero, except for the diagonal, which is the characteristic vector of Bi.
• Mi is of rank k (Mi)k = Mi.
• ||Mi||F = k1/2.
• ||Mi – Mj||F ¸ (|Bi n Bj|)1/2 ¸ (k/12)1/2 ¸ (||Mi||F + ||Mj||F).
Step 1: Identification of a set S of pairwise disjoint inputs:
B1,…,Bt µ [2k]: a combinatorial design of size t = 2(k).
S = { M1,…,Mt }, where
Step 2: Dissimilarity calculation: D(M1,…,Mt) = 2k/m.
By main theorem, # of queries is at least (m).
18
Low Rank Matrix Approximation (cont.)
Theorem [Frieze, Kannan, Vempala 98]By querying an s £ s submatrix of M chosen using any distributions which “approximate” the row and column weight distributions of M, one can solve LRMk with s = O(k4/3).
Theorem [This paper]Solving LRMk by querying an s £ s submatrix of M chosen even according to the exact row and column weight distributions of M requires s = (k/2).
19
Oblivious Sampling
• Query positions are independent of the given input.• Algorithm has a fixed query distribution on [n]q.
• i.i.d. queries: queries are independent and identically distributed: = q, where is a distribution on [n].
Phase 1:
Choose query positions i1,…,iq
Phase 2:
Query xi1,…,xiq
20
Main Theorem: Outline of the Proof
Adaptive sampling
Oblivious sampling with i.i.d queries
Statistical classification
Lower bounds via information theory
(For functions with symmetry properties)
21
Statistical Classification
Black Box
1
Classifier
q i.i.d. samples
m
2 i 2 [m]
• 1,…,m are distributions on Z.
• Classifier is required to be correct with probability ¸ 1 - .
22
From Sampling to Classification• T : oblivious algorithm with query distribution = q that
approximates f: Xn ! Y. x : joint distribution of a query and its answer when T
runs on input x (distribution on [n] £ X).• S = {x1,…,xm} : set of pairwise disjoint inputs.
Black Box
x1
T
q i.i.d. samples
xm
x2
Decide i iff T’s output 2 A(xi)
23
Jensen-Shannon Divergence [Lin 91]
• KL divergence between distributions , on Z:
• Jensen-Shannon divergence among distributions 1,…,m on Z: ( = (1/m) i i)
7
6
5
4
3
2
18
24
Main Result
Theorem [Classification lower bound]
Any -error classifier for 1,…,m requires q queries, where
Corollary [Query complexity lower bound]
For any oblivious algorithm with query distribution = q that (,)-approximates f, and for any set S = {x1,…,xm} of “pairwise disjoint” inputs, the number of queries q is at least
25
Outline of the Proof
Lemma 1 [Classification error lower bound]
Proof: by Fano’s inequality.
Lemma 2 [Decomposition of Jensen-Shannon]
Proof: By subadditivity of entropy and conditional independence.
26
Conclusions
• General lower bound technique for the query complexity– Template for obtaining specific bounds– Works for wide classes of functions– Captures both “similarity hardness” and “abundance hardness”
• Applications– The “Election Problem”– Low rank matrix approximation– Matrix reconstruction
• Also proved– A lower bound technique for the expected query complexity– Tightly captures similarity hardness but not abundance hardness
• Open problems– Tight bounds for low rank matrix approximation– Better lower bounds on the expected query complexity– Lower bounds for non-symmetric functions
27
Simulation of Adaptive Sampling by Oblivious Sampling
Definition
f: Xn ! Y is symmetric, if 8x and 8 2 Sn, f((x)) = f(x).
f is -symmetric, if 8x 8, A((x)) = A(x).
Lemma [BKS01]
Any q-query algorithm approximating an -symmetric f can be simulated by a q-query oblivious algorithm whose queries are uniform without replacement.
Corollary
If q < n/2, can be simulated by a 2q-query oblivious algorithm whose queries are uniform with replacement.
28
Simulation Lemma: Outline of the Proof
• T: q-query sampling algorithm approximating f• WLOG, T never queries the same location twice.
Simulation:• Pick a random permutation .• Run T on (x).
• By -symmetry, output is likely to be in A((x)) = A(x).
• Queries to x are uniform without replacement.
29
Extensions
Definitions• f is (g,)-symmetric if 8x, 8, 8y 2 A((x)), g(,y) 2 A(x).
• A function f on m £ n matrices is -row-symmetric, if for all matrices M, and for all row-permutation matrices , A( ¢ M) = A(M).
Similarly:
-column-symmetry, and (g,)-row- and column-symmetry.
We prove: similar simulations hold for all of the above.
Top Related