Post on 01-Apr-2015
Algorithmic Frontiers of Doubling Metric SpacesRobert Krauthgamer
Weizmann Institute of Science
Based on joint works with Yair Bartal, Lee-Ad Gottlieb, Aryeh Kontorovich
The Traveling Salesman Problem: Low-dimensionality implies PTAS
Robert Krauthgamer Weizmann Institute of Science
Joint work with Yair Bartal and Lee-Ad Gottlieb
Traveling Salesman Problem (TSP) Definition: Given a set of cities (points), find a minimum-length tour
that visits all points Classic, well-studied NP-hard problem
[Karp‘72; Papadimitriou-Vempala‘06] Mentioned in a handbook from 1832!
Common benchmark for optimization methods Many books devoted to TSP…
Numerous variants Closed/open tour Multiple tours Average visit time (repairman) Etc…
Algorithmic Frontiers of Doubling Metric Spaces
Optimal tour
3
Metric TSP Basic assumptions on distances
Symmetric d(x,y) = d(y,x)
Metric Triangle inequality: d(x,y) + d(y,z) ≤ d(x,z)
Easy 2-approximation via MST Since OPT ≥ MST
Can do better… MST+Matching OPT [Christofides’76]
Algorithmic Frontiers of Doubling Metric Spaces
MST
4
Euclidean TSP Sanjeev Arora [JACM‘98] and Joe Mitchell [SICOMP‘99]:
Euclidean TSP with fixed dimension admits a PTAS Find (1+Ɛ)-approximate tour In time n∙(log n)Ɛ-Õ(dimension) where n = #points (Extends to other norms)
They were awarded the
2010 Gödel Prize
for this discovery
Algorithmic Frontiers of Doubling Metric Spaces 5
PTAS Beyond Euclidean? To achieve a PTAS, two properties were assumed
Euclidean space (at least approximately) Fixed dimension
Are both these assumptions required?
Fixed dimension is necessary No PTAS for (log n)-dimensions unless P=NP [Trevisan’00]
Is Euclidean necessary? Consider metric spaces with low Euclidean intrinsic dimension…
Algorithmic Frontiers of Doubling Metric Spaces 66
Doubling Dimension Definition: Ball B(x,r) = all points within distance r from x.
The doubling constant (of a metric M) is the minimum value >0 such that every ball can be covered by balls of half the radius First used by [Assoud‘83], algorithmically by [Clarkson‘97]. The doubling dimension is ddim(M)=log (M) [Gupta-K. -Lee‘03] M is called doubling if its doubling dimension is constant
Packing property of doubling spaces A set with diameter D>0 and inter-point distance ≥a,
contains at most (D/a)O(ddim) points
Algorithmic Frontiers of Doubling Metric Spaces
Here ≤7.
7
Applications of Doubling Dimension Nearest neighbor search
[K.-Lee’04; HarPeled-Mendel’06; Beygelzimer-Kakade-Langford’06; Cole-Gottlieb‘06]
Spanners, routing [Talwar’04; Kleinberg-Slivkines-Wexler’04;
Abraham-Gavoille-Goldberg-Malkhi’05; Konjevod-Richa-Xia-Yu’07, Gottlieb-Roditty’08; Elkin-Solomon‘12;]
Distance oracles [HarPeled-Mendel’06; Bartal-Gottlieb-Roditty-Kopelowitz-Lewenstein’11]
Dimension reduction [Bartal-Recht-Schulman’11, Gottlieb-K.’11]
Machine learning and statistics [Bshouty-Yi-Long‘09; Gottlieb-Kontorovich-K.’10,‘12; ]
Algorithmic Frontiers of Doubling Metric Spaces 8
G
2
11
H
2
11
1
8
PTAS for Metric TSP? Does TSP on doubling metrics admit a PTAS?
Arora and Mitchell made strong use of Euclidean properties “Most fascinating problem left open in this area” [James Lee, tcsmath
blog, June ’10] Some attempts
Quasi-PTAS [Talwar‘04] (First description of problem) Quasi-PTAS for TSP w/neighborhoods [Mitchell’07; Chan-Elbassioni‘11] Subexponential-TAS, under weaker assumption [Chan-Gupta‘08]
Our result: TSP on doubling metrics admits a PTAS Find (1+Ɛ)-approximate tour In time: n2O(ddim) 2Ɛ-Õ(ddim) 2O(ddim2) log½n
Euclidean (to compare): n∙(log n)Ɛ-Õ(dimension)
Algorithmic Frontiers of Doubling Metric Spaces 9
Throughout, think of ddim and ε as constants
9
Metric Partition A quadtree-like hierarchy
[Bartal’96, Gupta-K.-Lee’03,
Talwar‘04]
At level i:
Algorithmic Frontiers of Doubling Metric Spaces
Centers are 2i-apart in arbitrary order
Random radii Ri 2 [2i, 2·2i]
10
Metric Partition (2)
Algorithmic Frontiers of Doubling Metric Spaces
Random radii Ri-1 2 [2i-1, 2·2i-1]
11
A quadtree-like hierarchy
[Bartal’96, Gupta-K.-Lee’03,
Talwar‘04]
Recursively to level i-1:
Caveat: log(n) hiearchical levels suffice Ignore tiny distances < 1/n2
Dense Areas Key observation:
The points (metric space) can be decomposed into sparse areas
Call a level i ball “dense” if local tour weight (i.e. inside Ri-ball) is ≥ Ri/Ɛ
Such a ball can be removed, solving
each sub-problem separately
Cost to join tours is relatively small: only Ri
Algorithmic Frontiers of Doubling Metric Spaces 12
Sparsification Sparse decomposition:
Search hierarchy bottom-up for dense balls. Remove dense ball:
Ball is composed of 2O(ddim) sparse sub-balls So it’s barely dense, i.e. local tour weight ≤ 2O(ddim) Ri-1/Ɛ
Recurse on remaining point set
But how do we know the local weight of the tour in a ball? Can be estimated using the local MST Modulo caveats like “long” edges…
OPT Ʌ B(u,R) ≤ O(MST(S)) OPT Ʌ B(u,3R) ≥ Ω(MST(S)) - Ɛ-O(ddim) R
Algorithmic Frontiers of Doubling Metric Spaces
Henceforth, we assume the input is sparse
13
Light Tours
Algorithmic Frontiers of Doubling Metric Spaces
2i-1/M
14
Definition: A tour is (m,r)-light on a hierarchy if it enters all cells (clusters) At most r times, and Only via m designated portals
Choose portals as (2i/M)–net points Then m = MO(ddim)
Optimizing over Light Tours Theorem [Arora‘98,Talwar‘04]: Given a hierarchical partition, a
minimum-length (m,r)-light tour for it can be computed exactly In time mr∙O(ddim) n∙log n Via dynamic programming
Join tours for small clusters
into tour for larger cluster
Algorithmic Frontiers of Doubling Metric Spaces
Typically both m,r ≈ polylog(n/ε), thus mr ≈ npolylog n
15
Better Partitions and Lighter Tours Our Theorem: For every (optimal) tour T, there is a partition with an
(m,r)-light tour T’ such that M = ddim∙log n/Ɛ m = MO(ddim) = (log n/Ɛ)Õ(ddim)
r = ε-O(ddim) loglog n And length(T’) ≤ (1+Ɛ)∙length(T)
If the partition were known, then a tour like T’ could be found in time mr O(ddim) n∙log n = n 2Ɛ-Õ(ddim) loglog2n
It remains to prove the Theorem, and show how to find the partition
Algorithmic Frontiers of Doubling Metric Spaces
Now mr ≈ poly(n)
a bit later
after that
16
Constructing Light Tours
Algorithmic Frontiers of Doubling Metric Spaces
2i-1/M
17
Modify a tour T to be (m,r)-light [Arora‘98, Talwar‘04] Part I: Focus on m (i.e. net points) Move cut edges to be incident on net points
Expected cost at one level (for edge of unit length) Radius Ri-12i-1
Pr[cut edge] ≤ O(ddim/Ri-1) Expected cost
≤ (Ri-1/M)(ddim/Ri-1) = ddim/M = Ɛ/log n
Expected cost to edge over all levels:≤ log n ∙ Ɛ/log n = Ɛ
We thus constructed a (1+Ɛ)-approximate tour
Constructing Light Tours (2) Modify a tour to be (m,r)-light [Arora‘98, Talwar‘04]
Part II: Focus on r (i.e. number of crossing edges) Reduce number of crossings
Patching step: Reroute (almost all) crossings back into cluster Cost ≈ length of tour on the patched endpoints
≈ MST of these points
MST Theorem [Talwar ‘04]: For a set S of points MST(S) ≤ diam(S)∙|S|1-1/ddim Cost per point ≤ diam(S) / |S|1/ddim
Algorithmic Frontiers of Doubling Metric Spaces
diam(S)
18
Constructing Light Tours (3) Modify a tour to be (m,r)-light [Arora‘98, Talwar‘04]
Part II: Focus on r (i.e. number of crossing edges) Reduce number of crossings
Expected cost to edge at level i-1 Radius Ri-1 ≈ 2i-1
Pr [edge is patched ] ≤ Pr[edge is cut ] Expected cost
≤ (Ri-1/r1/ddim)(ddim/Ri-1) = ddim/r1/ddim
As before, want this to be ≤ Ɛ/log n (because we sum over log n levels) Could take r = (ddim∙log n /Ɛ)ddim
But dynamic program runs in time mr QPTAS! [Talwar ‘04]
Algorithmic Frontiers of Doubling Metric Spaces
2Ri-1
Challenge: smaller value for r
19
Patching in Sparse Areas
Algorithmic Frontiers of Doubling Metric Spaces
Ri-1/M
20
Suppose a tour is q-sparse with respect to hierarchy Every R-ball contains weight qR (for all R=2i) Expectation: Random R-ball cuts weight Rq/R = q
Cluster formed by cuts from many levels Expectation: weight q is cut per level
If r = q∙2loglog n Expectation: level i-1 patching includes
edges cut at much higher levels Charge only “top” half of patched edges
Each charged about 2Ri-1
Pr[edge is charged for patching]
≤ Pr[edge is cut at level i+loglog n]
≤ ddim/(Ri-1 log n)
Wrapping Up (Patching Sparse Areas) Modify a tour to be (m,r)-light [Arora‘98, Talwar‘04]
Part II: Focus on r (i.e. number of crossing edges) Reduce number of crossings
Expected cost at level i-1 Expected cost
≤ (Ri-1/r1/ddim)(ddim/Ri-1log n) = ddim/log n∙r1/ddim
As before, want this term to be equal to Ɛ/log n Take r = (ddim/Ɛ)ddim
Obtain PTAS!
Algorithmic Frontiers of Doubling Metric Spaces
2Ri-1
21
Technical Subtleties
Ri-1/M
22Algorithmic Frontiers of Doubling Metric Spaces
Outstanding problem: Previous analysis assumed ball cuts only q edges True in expectation… Not good enough Solution: try many hierarchies
Choose at random log n radii for each ball and try all their combinations! WHP, some hierarchy cuts q edges in every ball
Drives up runtime of dynamic program
Algorithmic Frontiers of Doubling Metrics
Robert Krauthgamer Weizmann Institute of Science
Joint work with Lee-Ad Gottlieb and Aryeh Kontorovich
Large-margin classification in metric spaces [vonLuxburg-Bousquet’04] Unknown distribution D of labeled points (x,y) 2 M£{-1,1}
M is a metric space (generalizes Rdim) Labels are L-Lipschitz: |yi-yj| ≤ L∙d(xi,xj) (generalizes margin)
Resource: Sample of labeled points
Goal: Build hypothesis f:M {-1,1} that has (1-ε)-agreement with D Statistical complexity: How many samples needed? Computational complexity: Running time?
Extensions: Small fraction of labels are wrong (adversarial noise) Real-valued labels y2[-1,1] (metric
regression)
Machine Learning in Doubling Metrics
Algorithmic Frontiers of Doubling Metric Spaces 24
-12/L
2/L
+1
f
Generalization Bounds Our approach: Assume M is doubling and use generalized VC-theory
[Alon-BenDavid-CesaBianchi-Haussler’97, Bartlett-ShaweTaylor’99] Example: Earthmover distance (EMD) in the plane between sets of size k
has ddim ≤ O(k log k) Standard algorithm: pick hypothesis that fits all/most observed samples
Theorem: Class of L-Lipschitz functions has fat-shattering dimension
fsdim ≤ (c∙L∙diam(M))ddim.
Corollary: If f is L-Lipschitz and classifies n samples correctly, WHP
PrD[sgn(f(x)) ≠ y] ≤ O(fsdim∙(log n)2/n).
Similarly, if f correctly classifies all but η-fraction, then WHP
PrD[sgn(f(x)) ≠ y] ≤ η + O(fsdim∙(log n)2/n)1/2. Bounds incomparable to [vonLuxburg-Bousquet’04]
Algorithmic Frontiers of Doubling Metric Spaces 25
Algorithmic Aspects (noise-free) Computing a hypothesis f from the samples (xi,yi):
Where S+ and S- are the positively and negatively labeled samples
Lemma (Lipschitz extension):
If labels are L-Lipschitz, so is f.
Evaluating f(x) requires solving Nearest Neighbor Search Explains a common classification heuristic, e.g. [Cover-Hart’67] But might require Ω(n) time…
We show how to use (1+ε)-Nearest Neighbor Search This can be solved quickly in doubling metrics We prove similar generalization bound by sandwiching sgn(f(x))
Algorithmic Frontiers of Doubling Metric Spaces
f : x 7! mini
Ã
yi + 2d(x;xi)
d(S+ ;S¡ )
!
26
-1
+1
f
?
Extensions (noisy case)1. A small fraction of labels are wrong (adversarial noise) How to compute a hypothesis?
Build a bipartite graph (on S+[S-) of all violations to Lipschitz condition (edge between two points at distance < 2/L).
Compute a minimum vertex cover (or faster: 2-approximation)
2. Real-valued labels y2[-1,1] (metric regression) Minimize risk (expected loss) Ex,y |f(x)-y| Extend the statistical framework by similar ideas But how to compute a hypothesis?
Write LP: minimize Σi |f(xi)-yi|
subject to |f(xi)-f(xj)| ≤ L∙d(xi,xj) 8 i,j Reduce #constraints from O(n2) to O(ε-ddim n) using (1+ε)-spanner on xi’s Apply fast approximate LP solver
Algorithmic Frontiers of Doubling Metric Spaces 27
Conclusion General paradigm:
low-dim. Euclidean spaces $ doubling metric spaces Mathematically– latter is different (strictly bigger) family
Not even low-distortion embeddings [Laakso’00,’01] For algorithmic efficiency – strong analogy/similarity
E.g., nearest neighbor search, distributed computing and networking, combinatorial optimization, machine learning
Research directions: Other computational tasks or application areas?
Particularly in machine learning, data structures Scenarios where analogy fails?
E.g. [Indyk-Naor’05] which uses random projections Other metric models? E.g. hyperbolic …
Algorithmic Frontiers of Doubling Metric Spaces 28