Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science...

Algorithmic Frontiers of Doubling Metric SpacesRobert Krauthgamer

Weizmann Institute of Science

Based on joint works with Yair Bartal, Lee-Ad Gottlieb, Aryeh Kontorovich

The Traveling Salesman Problem: Low-dimensionality implies PTAS

Robert Krauthgamer Weizmann Institute of Science

Joint work with Yair Bartal and Lee-Ad Gottlieb

Traveling Salesman Problem (TSP) Definition: Given a set of cities (points), find a minimum-length tour

that visits all points Classic, well-studied NP-hard problem

[Karp‘72; Papadimitriou-Vempala‘06] Mentioned in a handbook from 1832!

Common benchmark for optimization methods Many books devoted to TSP…

Numerous variants Closed/open tour Multiple tours Average visit time (repairman) Etc…

Algorithmic Frontiers of Doubling Metric Spaces

Optimal tour

3

Metric TSP Basic assumptions on distances

Symmetric d(x,y) = d(y,x)

Metric Triangle inequality: d(x,y) + d(y,z) ≤ d(x,z)

Easy 2-approximation via MST Since OPT ≥ MST

Can do better… MST+Matching OPT [Christofides’76]


MST

4

Euclidean TSP Sanjeev Arora [JACM‘98] and Joe Mitchell [SICOMP‘99]:

Euclidean TSP with fixed dimension admits a PTAS Find (1+Ɛ)-approximate tour In time n∙(log n)Ɛ-Õ(dimension) where n = #points (Extends to other norms)

They were awarded the

2010 Gödel Prize

for this discovery

Algorithmic Frontiers of Doubling Metric Spaces 5

PTAS Beyond Euclidean? To achieve a PTAS, two properties were assumed

Euclidean space (at least approximately) Fixed dimension

Are both these assumptions required?

Fixed dimension is necessary No PTAS for (log n)-dimensions unless P=NP [Trevisan’00]

Is Euclidean necessary? Consider metric spaces with low Euclidean intrinsic dimension…


Doubling Dimension Definition: Ball B(x,r) = all points within distance r from x.

The doubling constant (of a metric M) is the minimum value >0 such that every ball can be covered by balls of half the radius First used by [Assoud‘83], algorithmically by [Clarkson‘97]. The doubling dimension is ddim(M)=log (M) [Gupta-K. -Lee‘03] M is called doubling if its doubling dimension is constant

Packing property of doubling spaces A set with diameter D>0 and inter-point distance ≥a,

contains at most (D/a)O(ddim) points


Here ≤7.

7

Applications of Doubling Dimension Nearest neighbor search

[K.-Lee’04; HarPeled-Mendel’06; Beygelzimer-Kakade-Langford’06; Cole-Gottlieb‘06]

Spanners, routing [Talwar’04; Kleinberg-Slivkines-Wexler’04;

Abraham-Gavoille-Goldberg-Malkhi’05; Konjevod-Richa-Xia-Yu’07, Gottlieb-Roditty’08; Elkin-Solomon‘12;]

Distance oracles [HarPeled-Mendel’06; Bartal-Gottlieb-Roditty-Kopelowitz-Lewenstein’11]

Dimension reduction [Bartal-Recht-Schulman’11, Gottlieb-K.’11]

Machine learning and statistics [Bshouty-Yi-Long‘09; Gottlieb-Kontorovich-K.’10,‘12; ]


G

2

11

H

2

11

1

8

PTAS for Metric TSP? Does TSP on doubling metrics admit a PTAS?

Arora and Mitchell made strong use of Euclidean properties “Most fascinating problem left open in this area” [James Lee, tcsmath

blog, June ’10] Some attempts

Quasi-PTAS [Talwar‘04] (First description of problem) Quasi-PTAS for TSP w/neighborhoods [Mitchell’07; Chan-Elbassioni‘11] Subexponential-TAS, under weaker assumption [Chan-Gupta‘08]

Our result: TSP on doubling metrics admits a PTAS Find (1+Ɛ)-approximate tour In time: n2O(ddim) 2Ɛ-Õ(ddim) 2O(ddim2) log½n

Euclidean (to compare): n∙(log n)Ɛ-Õ(dimension)


Throughout, think of ddim and ε as constants

9

Metric Partition A quadtree-like hierarchy

[Bartal’96, Gupta-K.-Lee’03,

Talwar‘04]

At level i:


Centers are 2i-apart in arbitrary order

Random radii Ri 2 [2i, 2·2i]

10

Metric Partition (2)


Random radii Ri-1 2 [2i-1, 2·2i-1]

11

A quadtree-like hierarchy

[Bartal’96, Gupta-K.-Lee’03,

Talwar‘04]

Recursively to level i-1:

Caveat: log(n) hiearchical levels suffice Ignore tiny distances < 1/n2

Dense Areas Key observation:

The points (metric space) can be decomposed into sparse areas

Call a level i ball “dense” if local tour weight (i.e. inside Ri-ball) is ≥ Ri/Ɛ

Such a ball can be removed, solving

each sub-problem separately

Cost to join tours is relatively small: only Ri


Sparsification Sparse decomposition:

Search hierarchy bottom-up for dense balls. Remove dense ball:

Ball is composed of 2O(ddim) sparse sub-balls So it’s barely dense, i.e. local tour weight ≤ 2O(ddim) Ri-1/Ɛ

Recurse on remaining point set

But how do we know the local weight of the tour in a ball? Can be estimated using the local MST Modulo caveats like “long” edges…

OPT Ʌ B(u,R) ≤ O(MST(S)) OPT Ʌ B(u,3R) ≥ Ω(MST(S)) - Ɛ-O(ddim) R


Henceforth, we assume the input is sparse

13

Light Tours


2i-1/M

14

Definition: A tour is (m,r)-light on a hierarchy if it enters all cells (clusters) At most r times, and Only via m designated portals

Choose portals as (2i/M)–net points Then m = MO(ddim)

Optimizing over Light Tours Theorem [Arora‘98,Talwar‘04]: Given a hierarchical partition, a

minimum-length (m,r)-light tour for it can be computed exactly In time mr∙O(ddim) n∙log n Via dynamic programming

Join tours for small clusters

into tour for larger cluster


Typically both m,r ≈ polylog(n/ε), thus mr ≈ npolylog n

15

Better Partitions and Lighter Tours Our Theorem: For every (optimal) tour T, there is a partition with an

(m,r)-light tour T’ such that M = ddim∙log n/Ɛ m = MO(ddim) = (log n/Ɛ)Õ(ddim)

r = ε-O(ddim) loglog n And length(T’) ≤ (1+Ɛ)∙length(T)

If the partition were known, then a tour like T’ could be found in time mr O(ddim) n∙log n = n 2Ɛ-Õ(ddim) loglog2n

It remains to prove the Theorem, and show how to find the partition


Now mr ≈ poly(n)

a bit later

after that

16

Constructing Light Tours


2i-1/M

17

Modify a tour T to be (m,r)-light [Arora‘98, Talwar‘04] Part I: Focus on m (i.e. net points) Move cut edges to be incident on net points

Expected cost at one level (for edge of unit length) Radius Ri-12i-1

Pr[cut edge] ≤ O(ddim/Ri-1) Expected cost

≤ (Ri-1/M)(ddim/Ri-1) = ddim/M = Ɛ/log n

Expected cost to edge over all levels:≤ log n ∙ Ɛ/log n = Ɛ

We thus constructed a (1+Ɛ)-approximate tour

Constructing Light Tours (2) Modify a tour to be (m,r)-light [Arora‘98, Talwar‘04]

Part II: Focus on r (i.e. number of crossing edges) Reduce number of crossings

Patching step: Reroute (almost all) crossings back into cluster Cost ≈ length of tour on the patched endpoints

≈ MST of these points

MST Theorem [Talwar ‘04]: For a set S of points MST(S) ≤ diam(S)∙|S|1-1/ddim Cost per point ≤ diam(S) / |S|1/ddim


diam(S)

18

Constructing Light Tours (3) Modify a tour to be (m,r)-light [Arora‘98, Talwar‘04]


Expected cost to edge at level i-1 Radius Ri-1 ≈ 2i-1

Pr [edge is patched ] ≤ Pr[edge is cut ] Expected cost

≤ (Ri-1/r1/ddim)(ddim/Ri-1) = ddim/r1/ddim

As before, want this to be ≤ Ɛ/log n (because we sum over log n levels) Could take r = (ddim∙log n /Ɛ)ddim

But dynamic program runs in time mr QPTAS! [Talwar ‘04]


2Ri-1

Challenge: smaller value for r

19

Patching in Sparse Areas


Ri-1/M

20

Suppose a tour is q-sparse with respect to hierarchy Every R-ball contains weight qR (for all R=2i) Expectation: Random R-ball cuts weight Rq/R = q

Cluster formed by cuts from many levels Expectation: weight q is cut per level

If r = q∙2loglog n Expectation: level i-1 patching includes

edges cut at much higher levels Charge only “top” half of patched edges

Each charged about 2Ri-1

Pr[edge is charged for patching]

≤ Pr[edge is cut at level i+loglog n]

≤ ddim/(Ri-1 log n)

Wrapping Up (Patching Sparse Areas) Modify a tour to be (m,r)-light [Arora‘98, Talwar‘04]


Expected cost at level i-1 Expected cost

≤ (Ri-1/r1/ddim)(ddim/Ri-1log n) = ddim/log n∙r1/ddim

As before, want this term to be equal to Ɛ/log n Take r = (ddim/Ɛ)ddim

Obtain PTAS!


2Ri-1

21

Technical Subtleties

Ri-1/M

22Algorithmic Frontiers of Doubling Metric Spaces

Outstanding problem: Previous analysis assumed ball cuts only q edges True in expectation… Not good enough Solution: try many hierarchies

Choose at random log n radii for each ball and try all their combinations! WHP, some hierarchy cuts q edges in every ball

Drives up runtime of dynamic program

Algorithmic Frontiers of Doubling Metrics

Robert Krauthgamer Weizmann Institute of Science

Joint work with Lee-Ad Gottlieb and Aryeh Kontorovich

Large-margin classification in metric spaces [vonLuxburg-Bousquet’04] Unknown distribution D of labeled points (x,y) 2 M£{-1,1}

M is a metric space (generalizes Rdim) Labels are L-Lipschitz: |yi-yj| ≤ L∙d(xi,xj) (generalizes margin)

Resource: Sample of labeled points

Goal: Build hypothesis f:M {-1,1} that has (1-ε)-agreement with D Statistical complexity: How many samples needed? Computational complexity: Running time?

Extensions: Small fraction of labels are wrong (adversarial noise) Real-valued labels y2[-1,1] (metric

regression)

Machine Learning in Doubling Metrics


-12/L

2/L

+1

f

Generalization Bounds Our approach: Assume M is doubling and use generalized VC-theory

[Alon-BenDavid-CesaBianchi-Haussler’97, Bartlett-ShaweTaylor’99] Example: Earthmover distance (EMD) in the plane between sets of size k

has ddim ≤ O(k log k) Standard algorithm: pick hypothesis that fits all/most observed samples

Theorem: Class of L-Lipschitz functions has fat-shattering dimension

fsdim ≤ (c∙L∙diam(M))ddim.

Corollary: If f is L-Lipschitz and classifies n samples correctly, WHP

PrD[sgn(f(x)) ≠ y] ≤ O(fsdim∙(log n)2/n).

Similarly, if f correctly classifies all but η-fraction, then WHP

PrD[sgn(f(x)) ≠ y] ≤ η + O(fsdim∙(log n)2/n)1/2. Bounds incomparable to [vonLuxburg-Bousquet’04]


Algorithmic Aspects (noise-free) Computing a hypothesis f from the samples (xi,yi):

Where S+ and S- are the positively and negatively labeled samples

Lemma (Lipschitz extension):

If labels are L-Lipschitz, so is f.

Evaluating f(x) requires solving Nearest Neighbor Search Explains a common classification heuristic, e.g. [Cover-Hart’67] But might require Ω(n) time…

We show how to use (1+ε)-Nearest Neighbor Search This can be solved quickly in doubling metrics We prove similar generalization bound by sandwiching sgn(f(x))


f : x 7! mini

Ã

yi + 2d(x;xi)

d(S+ ;S¡ )

!

26

-1

+1

f

?

Extensions (noisy case)1. A small fraction of labels are wrong (adversarial noise) How to compute a hypothesis?

Build a bipartite graph (on S+[S-) of all violations to Lipschitz condition (edge between two points at distance < 2/L).

Compute a minimum vertex cover (or faster: 2-approximation)

2. Real-valued labels y2[-1,1] (metric regression) Minimize risk (expected loss) Ex,y |f(x)-y| Extend the statistical framework by similar ideas But how to compute a hypothesis?

Write LP: minimize Σi |f(xi)-yi|

subject to |f(xi)-f(xj)| ≤ L∙d(xi,xj) 8 i,j Reduce #constraints from O(n2) to O(ε-ddim n) using (1+ε)-spanner on xi’s Apply fast approximate LP solver


Conclusion General paradigm:

low-dim. Euclidean spaces $ doubling metric spaces Mathematically– latter is different (strictly bigger) family

Not even low-distortion embeddings [Laakso’00,’01] For algorithmic efficiency – strong analogy/similarity

E.g., nearest neighbor search, distributed computing and networking, combinatorial optimization, machine learning

Research directions: Other computational tasks or application areas?

Particularly in machine learning, data structures Scenarios where analogy fails?

E.g. [Indyk-Naor’05] which uses random projections Other metric models? E.g. hyperbolic …


Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science...

Documents

Transcript of Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science...