Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science...

28
Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb, Aryeh Kontorovich

Transcript of Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science...

Page 1: Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,

Algorithmic Frontiers of Doubling Metric SpacesRobert Krauthgamer

Weizmann Institute of Science

Based on joint works with Yair Bartal, Lee-Ad Gottlieb, Aryeh Kontorovich

Page 2: Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,

The Traveling Salesman Problem: Low-dimensionality implies PTAS

Robert Krauthgamer Weizmann Institute of Science

Joint work with Yair Bartal and Lee-Ad Gottlieb

Page 3: Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,

Traveling Salesman Problem (TSP) Definition: Given a set of cities (points), find a minimum-length tour

that visits all points Classic, well-studied NP-hard problem

[Karp‘72; Papadimitriou-Vempala‘06] Mentioned in a handbook from 1832!

Common benchmark for optimization methods Many books devoted to TSP…

Numerous variants Closed/open tour Multiple tours Average visit time (repairman) Etc…

Algorithmic Frontiers of Doubling Metric Spaces

Optimal tour

3

Page 4: Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,

Metric TSP Basic assumptions on distances

Symmetric d(x,y) = d(y,x)

Metric Triangle inequality: d(x,y) + d(y,z) ≤ d(x,z)

Easy 2-approximation via MST Since OPT ≥ MST

Can do better… MST+Matching OPT [Christofides’76]

Algorithmic Frontiers of Doubling Metric Spaces

MST

4

Page 5: Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,

Euclidean TSP Sanjeev Arora [JACM‘98] and Joe Mitchell [SICOMP‘99]:

Euclidean TSP with fixed dimension admits a PTAS Find (1+Ɛ)-approximate tour In time n∙(log n)Ɛ-Õ(dimension) where n = #points (Extends to other norms)

They were awarded the

2010 Gödel Prize

for this discovery

Algorithmic Frontiers of Doubling Metric Spaces 5

Page 6: Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,

PTAS Beyond Euclidean? To achieve a PTAS, two properties were assumed

Euclidean space (at least approximately) Fixed dimension

Are both these assumptions required?

Fixed dimension is necessary No PTAS for (log n)-dimensions unless P=NP [Trevisan’00]

Is Euclidean necessary? Consider metric spaces with low Euclidean intrinsic dimension…

Algorithmic Frontiers of Doubling Metric Spaces 66

Page 7: Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,

Doubling Dimension Definition: Ball B(x,r) = all points within distance r from x.

The doubling constant (of a metric M) is the minimum value >0 such that every ball can be covered by balls of half the radius First used by [Assoud‘83], algorithmically by [Clarkson‘97]. The doubling dimension is ddim(M)=log (M) [Gupta-K. -Lee‘03] M is called doubling if its doubling dimension is constant

Packing property of doubling spaces A set with diameter D>0 and inter-point distance ≥a,

contains at most (D/a)O(ddim) points

Algorithmic Frontiers of Doubling Metric Spaces

Here ≤7.

7

Page 8: Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,

Applications of Doubling Dimension Nearest neighbor search

[K.-Lee’04; HarPeled-Mendel’06; Beygelzimer-Kakade-Langford’06; Cole-Gottlieb‘06]

Spanners, routing [Talwar’04; Kleinberg-Slivkines-Wexler’04;

Abraham-Gavoille-Goldberg-Malkhi’05; Konjevod-Richa-Xia-Yu’07, Gottlieb-Roditty’08; Elkin-Solomon‘12;]

Distance oracles [HarPeled-Mendel’06; Bartal-Gottlieb-Roditty-Kopelowitz-Lewenstein’11]

Dimension reduction [Bartal-Recht-Schulman’11, Gottlieb-K.’11]

Machine learning and statistics [Bshouty-Yi-Long‘09; Gottlieb-Kontorovich-K.’10,‘12; ]

Algorithmic Frontiers of Doubling Metric Spaces 8

G

2

11

H

2

11

1

8

Page 9: Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,

PTAS for Metric TSP? Does TSP on doubling metrics admit a PTAS?

Arora and Mitchell made strong use of Euclidean properties “Most fascinating problem left open in this area” [James Lee, tcsmath

blog, June ’10] Some attempts

Quasi-PTAS [Talwar‘04] (First description of problem) Quasi-PTAS for TSP w/neighborhoods [Mitchell’07; Chan-Elbassioni‘11] Subexponential-TAS, under weaker assumption [Chan-Gupta‘08]

Our result: TSP on doubling metrics admits a PTAS Find (1+Ɛ)-approximate tour In time: n2O(ddim) 2Ɛ-Õ(ddim) 2O(ddim2) log½n

Euclidean (to compare): n∙(log n)Ɛ-Õ(dimension)

Algorithmic Frontiers of Doubling Metric Spaces 9

Throughout, think of ddim and ε as constants

9

Page 10: Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,

Metric Partition A quadtree-like hierarchy

[Bartal’96, Gupta-K.-Lee’03,

Talwar‘04]

At level i:

Algorithmic Frontiers of Doubling Metric Spaces

Centers are 2i-apart in arbitrary order

Random radii Ri 2 [2i, 2·2i]

10

Page 11: Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,

Metric Partition (2)

Algorithmic Frontiers of Doubling Metric Spaces

Random radii Ri-1 2 [2i-1, 2·2i-1]

11

A quadtree-like hierarchy

[Bartal’96, Gupta-K.-Lee’03,

Talwar‘04]

Recursively to level i-1:

Caveat: log(n) hiearchical levels suffice Ignore tiny distances < 1/n2

Page 12: Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,

Dense Areas Key observation:

The points (metric space) can be decomposed into sparse areas

Call a level i ball “dense” if local tour weight (i.e. inside Ri-ball) is ≥ Ri/Ɛ

Such a ball can be removed, solving

each sub-problem separately

Cost to join tours is relatively small: only Ri

Algorithmic Frontiers of Doubling Metric Spaces 12

Page 13: Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,

Sparsification Sparse decomposition:

Search hierarchy bottom-up for dense balls. Remove dense ball:

Ball is composed of 2O(ddim) sparse sub-balls So it’s barely dense, i.e. local tour weight ≤ 2O(ddim) Ri-1/Ɛ

Recurse on remaining point set

But how do we know the local weight of the tour in a ball? Can be estimated using the local MST Modulo caveats like “long” edges…

OPT Ʌ B(u,R) ≤ O(MST(S)) OPT Ʌ B(u,3R) ≥ Ω(MST(S)) - Ɛ-O(ddim) R

Algorithmic Frontiers of Doubling Metric Spaces

Henceforth, we assume the input is sparse

13

Page 14: Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,

Light Tours

Algorithmic Frontiers of Doubling Metric Spaces

2i-1/M

14

Definition: A tour is (m,r)-light on a hierarchy if it enters all cells (clusters) At most r times, and Only via m designated portals

Choose portals as (2i/M)–net points Then m = MO(ddim)

Page 15: Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,

Optimizing over Light Tours Theorem [Arora‘98,Talwar‘04]: Given a hierarchical partition, a

minimum-length (m,r)-light tour for it can be computed exactly In time mr∙O(ddim) n∙log n Via dynamic programming

Join tours for small clusters

into tour for larger cluster

Algorithmic Frontiers of Doubling Metric Spaces

Typically both m,r ≈ polylog(n/ε), thus mr ≈ npolylog n

15

Page 16: Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,

Better Partitions and Lighter Tours Our Theorem: For every (optimal) tour T, there is a partition with an

(m,r)-light tour T’ such that M = ddim∙log n/Ɛ m = MO(ddim) = (log n/Ɛ)Õ(ddim)

r = ε-O(ddim) loglog n And length(T’) ≤ (1+Ɛ)∙length(T)

If the partition were known, then a tour like T’ could be found in time mr O(ddim) n∙log n = n 2Ɛ-Õ(ddim) loglog2n

It remains to prove the Theorem, and show how to find the partition

Algorithmic Frontiers of Doubling Metric Spaces

Now mr ≈ poly(n)

a bit later

after that

16

Page 17: Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,

Constructing Light Tours

Algorithmic Frontiers of Doubling Metric Spaces

2i-1/M

17

Modify a tour T to be (m,r)-light [Arora‘98, Talwar‘04] Part I: Focus on m (i.e. net points) Move cut edges to be incident on net points

Expected cost at one level (for edge of unit length) Radius Ri-12i-1

Pr[cut edge] ≤ O(ddim/Ri-1) Expected cost

≤ (Ri-1/M)(ddim/Ri-1) = ddim/M = Ɛ/log n

Expected cost to edge over all levels:≤ log n ∙ Ɛ/log n = Ɛ

We thus constructed a (1+Ɛ)-approximate tour

Page 18: Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,

Constructing Light Tours (2) Modify a tour to be (m,r)-light [Arora‘98, Talwar‘04]

Part II: Focus on r (i.e. number of crossing edges) Reduce number of crossings

Patching step: Reroute (almost all) crossings back into cluster Cost ≈ length of tour on the patched endpoints

≈ MST of these points

MST Theorem [Talwar ‘04]: For a set S of points MST(S) ≤ diam(S)∙|S|1-1/ddim Cost per point ≤ diam(S) / |S|1/ddim

Algorithmic Frontiers of Doubling Metric Spaces

diam(S)

18

Page 19: Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,

Constructing Light Tours (3) Modify a tour to be (m,r)-light [Arora‘98, Talwar‘04]

Part II: Focus on r (i.e. number of crossing edges) Reduce number of crossings

Expected cost to edge at level i-1 Radius Ri-1 ≈ 2i-1

Pr [edge is patched ] ≤ Pr[edge is cut ] Expected cost

≤ (Ri-1/r1/ddim)(ddim/Ri-1) = ddim/r1/ddim

As before, want this to be ≤ Ɛ/log n (because we sum over log n levels) Could take r = (ddim∙log n /Ɛ)ddim

But dynamic program runs in time mr QPTAS! [Talwar ‘04]

Algorithmic Frontiers of Doubling Metric Spaces

2Ri-1

Challenge: smaller value for r

19

Page 20: Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,

Patching in Sparse Areas

Algorithmic Frontiers of Doubling Metric Spaces

Ri-1/M

20

Suppose a tour is q-sparse with respect to hierarchy Every R-ball contains weight qR (for all R=2i) Expectation: Random R-ball cuts weight Rq/R = q

Cluster formed by cuts from many levels Expectation: weight q is cut per level

If r = q∙2loglog n Expectation: level i-1 patching includes

edges cut at much higher levels Charge only “top” half of patched edges

Each charged about 2Ri-1

Pr[edge is charged for patching]

≤ Pr[edge is cut at level i+loglog n]

≤ ddim/(Ri-1 log n)

Page 21: Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,

Wrapping Up (Patching Sparse Areas) Modify a tour to be (m,r)-light [Arora‘98, Talwar‘04]

Part II: Focus on r (i.e. number of crossing edges) Reduce number of crossings

Expected cost at level i-1 Expected cost

≤ (Ri-1/r1/ddim)(ddim/Ri-1log n) = ddim/log n∙r1/ddim

As before, want this term to be equal to Ɛ/log n Take r = (ddim/Ɛ)ddim

Obtain PTAS!

Algorithmic Frontiers of Doubling Metric Spaces

2Ri-1

21

Page 22: Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,

Technical Subtleties

Ri-1/M

22Algorithmic Frontiers of Doubling Metric Spaces

Outstanding problem: Previous analysis assumed ball cuts only q edges True in expectation… Not good enough Solution: try many hierarchies

Choose at random log n radii for each ball and try all their combinations! WHP, some hierarchy cuts q edges in every ball

Drives up runtime of dynamic program

Page 23: Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,

Algorithmic Frontiers of Doubling Metrics

Robert Krauthgamer Weizmann Institute of Science

Joint work with Lee-Ad Gottlieb and Aryeh Kontorovich

Page 24: Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,

Large-margin classification in metric spaces [vonLuxburg-Bousquet’04] Unknown distribution D of labeled points (x,y) 2 M£{-1,1}

M is a metric space (generalizes Rdim) Labels are L-Lipschitz: |yi-yj| ≤ L∙d(xi,xj) (generalizes margin)

Resource: Sample of labeled points

Goal: Build hypothesis f:M {-1,1} that has (1-ε)-agreement with D Statistical complexity: How many samples needed? Computational complexity: Running time?

Extensions: Small fraction of labels are wrong (adversarial noise) Real-valued labels y2[-1,1] (metric

regression)

Machine Learning in Doubling Metrics

Algorithmic Frontiers of Doubling Metric Spaces 24

-12/L

2/L

+1

f

Page 25: Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,

Generalization Bounds Our approach: Assume M is doubling and use generalized VC-theory

[Alon-BenDavid-CesaBianchi-Haussler’97, Bartlett-ShaweTaylor’99] Example: Earthmover distance (EMD) in the plane between sets of size k

has ddim ≤ O(k log k) Standard algorithm: pick hypothesis that fits all/most observed samples

Theorem: Class of L-Lipschitz functions has fat-shattering dimension

fsdim ≤ (c∙L∙diam(M))ddim.

Corollary: If f is L-Lipschitz and classifies n samples correctly, WHP

PrD[sgn(f(x)) ≠ y] ≤ O(fsdim∙(log n)2/n).

Similarly, if f correctly classifies all but η-fraction, then WHP

PrD[sgn(f(x)) ≠ y] ≤ η + O(fsdim∙(log n)2/n)1/2. Bounds incomparable to [vonLuxburg-Bousquet’04]

Algorithmic Frontiers of Doubling Metric Spaces 25

Page 26: Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,

Algorithmic Aspects (noise-free) Computing a hypothesis f from the samples (xi,yi):

Where S+ and S- are the positively and negatively labeled samples

Lemma (Lipschitz extension):

If labels are L-Lipschitz, so is f.

Evaluating f(x) requires solving Nearest Neighbor Search Explains a common classification heuristic, e.g. [Cover-Hart’67] But might require Ω(n) time…

We show how to use (1+ε)-Nearest Neighbor Search This can be solved quickly in doubling metrics We prove similar generalization bound by sandwiching sgn(f(x))

Algorithmic Frontiers of Doubling Metric Spaces

f : x 7! mini

Ã

yi + 2d(x;xi)

d(S+ ;S¡ )

!

26

-1

+1

f

?

Page 27: Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,

Extensions (noisy case)1. A small fraction of labels are wrong (adversarial noise) How to compute a hypothesis?

Build a bipartite graph (on S+[S-) of all violations to Lipschitz condition (edge between two points at distance < 2/L).

Compute a minimum vertex cover (or faster: 2-approximation)

2. Real-valued labels y2[-1,1] (metric regression) Minimize risk (expected loss) Ex,y |f(x)-y| Extend the statistical framework by similar ideas But how to compute a hypothesis?

Write LP: minimize Σi |f(xi)-yi|

subject to |f(xi)-f(xj)| ≤ L∙d(xi,xj) 8 i,j Reduce #constraints from O(n2) to O(ε-ddim n) using (1+ε)-spanner on xi’s Apply fast approximate LP solver

Algorithmic Frontiers of Doubling Metric Spaces 27

Page 28: Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,

Conclusion General paradigm:

low-dim. Euclidean spaces $ doubling metric spaces Mathematically– latter is different (strictly bigger) family

Not even low-distortion embeddings [Laakso’00,’01] For algorithmic efficiency – strong analogy/similarity

E.g., nearest neighbor search, distributed computing and networking, combinatorial optimization, machine learning

Research directions: Other computational tasks or application areas?

Particularly in machine learning, data structures Scenarios where analogy fails?

E.g. [Indyk-Naor’05] which uses random projections Other metric models? E.g. hyperbolic …

Algorithmic Frontiers of Doubling Metric Spaces 28