A Dependent LP-Rounding Approach for the k-Median Problem Moses Charikar 1 Shi Li 1 1 Department of...
-
Upload
alejandro-colie -
Category
Documents
-
view
218 -
download
0
Transcript of A Dependent LP-Rounding Approach for the k-Median Problem Moses Charikar 1 Shi Li 1 1 Department of...
A Dependent LP-Rounding Approach for the k-Median Problem
Moses Charikar1 Shi Li11Department of Computer Science
Princeton University
ICALP 2012, Warwick, UK
• Introduction• Linear Programming Relaxation• Simple Pseudo-Approx. for k-median• Our Algorithm for k-median
Outline
k-Median as a Clustering Problem
• Given: metric (X, d), k• Partition X into k clusters• Select a center for each
cluster• Minimize sum of distances to
the centers:
• Quantifies how well a set can be divided into k partitions
k = 4
k-Median in Operation Research
• Given metric (F C, d), ko F : set of facilitieso C : set of clients
• Open k facilities• Connect each client to its
nearest open facility• Minimize total connection
cost
k = 4
Related Problem : Facility Location
Problem• Given metric (F C, d), k
o F : set of facilitieso C : set of clientso fi : facility cost of opening i
• Open k facilities • Connect each client to its
nearest open facility• Minimize total connection
cost
{fi ≥ 0 : i F}
Open a set F' F of facilities
Minimize sum of facility cost and connection cost,
k = 4
Known Results
• *local search: if switching p facilities can not improve a solution, then the solution is a 3+2/p-approx.
• Integrality gap of the natural linear programming is between 2 and 3o the proof of the upper bound 3 is non-constructive
Approx. Hardness of appox.
facility location 1.488 [Li11] 1.463 [GK98,Sri02]
k-median 3+ε* [AGK+01] 1+2/e+ε [JMS02]
Our Results• A LP-rounding approach for k-median
o prove 3.25 approximation ratioo thus give a constructive proof for the 3.25 integrality gapo faster running time compared to the local search algorithmo potential to improve the 3+ε approximation
• the upper bound 3.25 is not tight• our algorithm may already give approximation ratio smaller than 3
Our Resultsprev. best approx. ratio our approx. ratio
k-facility location [Zha06] 3.25
matroid median 16 [KKN+11] 9
knapsack median ≥ 1000 [Kum12] 34
• k-facility location: facility location problem with constraint that at most k facilities can be open
• matroid median: the set of open facilities must be an independent set of a given matroid
• knapsack median problem: each facility has a cost, the total cost of open facilities can not exceed a budget B
• Introduction• Linear Programming Relaxation• Simple Pseudo-Approx. for k-median• Our Algorithm for k-median
Outline
Natural LP Relaxation• yi{0,1}, iF : whether facility i is open
• xi,j{0,1}, iF, jC : whether client i is connected to facility j
Every client j must be connected to 1 facility
Client j can only be connected to an open facility
We can open at most k facilities
Canonical Instance
• km facilities• every client j is connected to its nearest m facilities
• in the LP solution, yi=1/m, xi,j{0,1/m}
facilities clients
j
Canonical Instance
• Fj: the set of m facilities that j is connected to
• average distance from j to Fj
• maximum distance from j to Fj
• LP value =
facilities clients
j
• Introduction• Linear Programming Relaxation• Simple Pseudo-Approx. for k-median• Our Algorithm for k-median
Outline
Pseudo-Approximation• An (α, c)-pseudo approximation is a solution that
opens at most αk facilities and whose connection cost is at most c times the optimal cost
• A warm-up : (1 + ε, O(1/ε))-pseudo approximation for k-median
Pseudo-Approximation
• Let m' = m / (1+ε), y'i=(1+ε)yi=1/m'
• Every client only needs to connect to m' facilities• We fractionally open km(1/m')=(1+ε)k facilities
• Define F'j, d'av(j),d'max(j) similarly
facilities clients
j
Pseudo-Approximation
• Two clients j and j' conflict if F'jF'j' ≠ ∅
• Select a set C' of clients such that no two clients in C' conflict each other
facilities clients
j
j'
Pseudo-Approximation
• greedily constructing C'C with no conflictiono while C ≠ ,∅
• select jC with the minimum dav(j)
• add j to C' • remove j and all clients that conflict j from C
facilities clients
Pseudo-Approximation
• open facilitieso For every j C', randomly open 1 of the m' facility in F'jo For any facility i that is not inside jC'F'j , open i with probability 1/m'
• connect each client to its nearest open facility
facilities clients
Fact: every facility is open with probability 1/m'
Pseudo-Approximation
Proof Enough to assume j C' • ∃j' C' s.t
o F'jF'j' ≠ and ∅ d'av(j') ≤ d'av(j)
• E[Cj] ≤ E[Cj']+d(j, j')
≤ E[Cj']+d'max(j)+d'max(j')
≤ d'av(j')+(1/ε)d'av(j')+(1/ε)d'av(j')
≤ (1+2/ε)d'av(j) ≤ (1+2/ε)dav(j)
Lemma E[Cj] ≤ O(1/ε)dav(j), where Cj is the connection cost of j
j
j'
facilities clients
F'j
F'j'
• Introduction• Linear Programming Relaxation• Simple Pseudo-Approx. for k-median• Our Algorithm for k-median
Outline
Barrier to Obtain True Approximation
• If ε=0, then F'j=Fj
• dmax(j) >> dav(j)
• With non-zero prob., j will be connected to facilities in Fj'
• The expected connection cost of j is unbounded compared to dav(j)
facilities clients
Fj
Fj'
j
j'
Remove the Barrier• Solution: j only “claims”
close facilities in Fj
• Let Uj be the set of claimed facilities
• Use Uj to replace Fj in the algorithm
• New Barrier: |Uj| < m might happen
• can not guarantee always a facility open in Uj
Fj
Uj
j
Remove the New Barrier
• can guarantee |Uj| ≥ m/2
• |UjUj'| ≥ m if Uj and Uj' are disjoint
• pair the clients in C'• always open 1 facility (possibly
2 facilities) in UjUj' for a matched pair (j, j')
j
Uj
Uj'
j'
Remove the New Barrier
• How to open facilities for a matched pair?
• m boxes in a line
• Permute facilities in Uj put them in the leftmost |Uj| boxes
• Permute facilities in Uj' put them in the rightmost |Uj'| boxes
• Open facilities in a random selected box
m
Uj
Uj'
The Algorithm• Filtering
o 2 clients j and j' conflict if d(j, j') ≤ 4max{dav(j),dav(j')}
o while C ≠ ∅
• select j C that minimizes dav(j);
• add j to C'• remove j and all clients that
conflict j from C
The Algorithm• Filtering• Claiming
o For any j C', let 2Rj be the distance between j and its nearest neighbor in C'
o A facility i is claimed by j, if
• i Fj and
• d(i, j) ≤ Rj
i.e, Uj = Fj Ball(j, Rj)
Fact: any client j C' will claim at least m/2 and at most m facilities.
The Algorithm• Filtering• Claiming• Matching
o while there are at least 2 unmatched clients in C'• select 2 unmatched clients j and j'
that minimizes d(j, j')• match j and j'
The Algorithm• Filtering• Claiming• Matching• Rounding
o For each matched pair (j, j'), open 1 or 2 facilities in UjUj'
o If there is an unmatched client j, open 0 or 1 facility in Uj
o For each facility i that is not inside any Uj, open i with probability 1/m
o Connect each client to its nearest open facility
Proof of Constant Approx.
Ratio
Proof • it is enough to assume jC'
o Assume jC', there exists a client j' such that
d(j', j) ≤ 4dav(j) and dav(j') ≤ dav(j)
o Assume E[Cj'] ≤ αdav(j')
o E[Cj] ≤ d(j, j') + E[Cj'] ≤ 4dav(j)+αdav(j') ≤ (4+α)dav(j)
• W.L.O.G, assume dav(j) = 1
Lemma E[Cj] ≤ O(1)dav(j), where Cj is the connection cost of j
Proof of Constant Approx.
Ratio
j j1j2
nearest neighbor of j in C' j2 is matched with j1
2Rj 2Rj1 ≤ 2Rj
Rj Rj1 Rj2
• There is always 1 facility open in Uj1Uj2
• Any facility in Uj1Uj2 is at most 2Rj+2Rj1+Rj2≤ 5Rj away from j
• |Uj| ≥ m(1-1/Rj)
• with prob. 1-1/Rj, connect to a random facility in Uj
• only with prob. 1/Rj, connect to a facility that is 5Rj away
• E[Cj] ≤ 5 n
Uj Uj1 Uj2
Proof of 3.25 approx. ratio
• complicated, details omitted• rough idea : for a client j C'
o j1C' is the client that conflicts and removes j in the filtering phase
o j2C' is the nearest neighbor of j1 in C'
o j3C' is the client matched with j2
o Consider the nearest open facility of j in FjFj1Uj2Uj3
• Our algorithm opens k facilities in expectation• Can be easily transformed so that it always opens k
facilities• Algorithm naturally extends to k-FL problem
Ongoing Work• Joint work with Svensson, improved the best
approximation ratio (3+ε) for k-median
Summary• We introduced a LP-rounding algorithm for k-median
problemo proved 3.25 approximation ratio for the problemo it has potential to improve the decade-long 3 approximation
• Improved approximation algorithms for the following problemso k-facility location problem 3.25o Matroid median problem 9o Knapsack median problem 34
Thanks