On the Birkhoff von Neumann decomposition - … · On the Birkho {von Neumann decomposition Bora...

26
On the Birkhoff–von Neumann decomposition BoraU¸car CNRS & ENS Lyon, France SIAM PP16, Paris From joint work with: Michele Benzi Emory Univ., US Fanny Dufoss´ e Inria, France Kamer Kaya Sabancı Univ., Turkey Alex Pothen Purdue Univ., US 1/26 Doubly stochastic matrices

Transcript of On the Birkhoff von Neumann decomposition - … · On the Birkho {von Neumann decomposition Bora...

On the Birkhoff–von Neumanndecomposition

Bora Ucar

CNRS & ENS Lyon, France

SIAM PP16, Paris

From joint work with:

Michele BenziEmory Univ.,

US

Fanny DufosseInria,

France

Kamer KayaSabancı Univ.,

Turkey

Alex PothenPurdue Univ.,

US

1/26 Doubly stochastic matrices

Introduction Problem complexity and heuristics Experimental results Concluding remarks

Context

An n × n matrix A is doubly stochastic if aij ≥ 0 for all i , j andAe = AT e = e, where e is the vector of all ones.

A doubly stochastic matrix has total support.

Birkhoff’s Theorem: A is a doubly stochastic matrix

there exist α1, α2, . . . , αk ∈ (0, 1) with∑k

i=1 αi = 1 and permutationmatrices P1,P2, . . . ,Pk such that:

A = α1P1 + α2P2 + · · ·+ αkPk .

Also called Birkhoff-von Neumann (BvN) decomposition.

Not unique, neither k , nor Pi s in general.

2/26 Doubly stochastic matrices

Introduction Problem complexity and heuristics Experimental results Concluding remarks

Combinatorial problem

Input: A doubly stochastic matrix A.

Output: A Birkhoff-von Neumann decomposition of A asA = α1P1 + α2P2 + · · ·+ αkPk .

Measure: The number k of permutation matrices in thedecomposition.

Exploit the BvN decomposition for preconditioning.

We show that the problem is NP-hard.

We also propose a heuristic and investigate some of its properties.

3/26 Doubly stochastic matrices

Introduction Problem complexity and heuristics Experimental results Concluding remarks

Motivation

Consider solving αPx = b for x where P is a permutation matrix.

0BB@

0 0 0 10 0 1 01 0 0 00 1 0 0

1CCA

0BB@

x1

x2

x3

x4

1CCA =

0BB@

b1

b2

b3

b4

1CCA yields

x4 = b1/↵x3 = b2/↵x1 = b3/↵x2 = b4/↵

We just scale the input and write at unique (permuted) positions in theoutput. Should be very efficient (no dependency).

Next consider solving (α1P1 + α2P2)x = b for x .

4/26 Doubly stochastic matrices

Introduction Problem complexity and heuristics Experimental results Concluding remarks

Motivation

Consider solving (α1P1 + α2P2)x = b for x .

Matrix splitting and stationary iterations

for an invertible A = M−N with invertible M

x i+1 = Hx i + c , where H = M−1N and c = M−1b

where i = 0, 1, . . . and x0 is arbitrary.

Computation: At every step, multiply with N and solve with M (orapply M−1).

Converges to the solution of Ax = b for any x0 iff ρ(H) < 1.

5/26 Doubly stochastic matrices

Introduction Problem complexity and heuristics Experimental results Concluding remarks

Motivation

Theorem

Let A = α1P1 + α2P2 and α1 ≥ α2. Then, A is invertible if

(i) α1 6= α2,

(ii) α1 = α2 and all connected components of the bipartite graph of Ahave an odd number of rows (and columns). If any such block is ofeven order, A is singular.

Define the splitting A = α1P1 − (−α2P2).

The iterations are convergent with the rate α2/α1 for α1 > α2.

Next generalize to more than two permutation matrices.

6/26 Doubly stochastic matrices

Introduction Problem complexity and heuristics Experimental results Concluding remarks

Motivation: Let’s generalize

Let A = α1P1 + α2P2 + · · ·+ αkPk be a BvN decomposition of A.

Assume α1 ≥ · · · ≥ αk . Pick an integer r between 1 and k − 1 and splitA as A = M−N where

M = α1P1 + · · ·+ αrPr , N = −αr+1Pr+1 − · · · − αkPk .

(M and −N are doubly substochastic matrices.)

Computation: At every step

multiply with N (k − r parallel steps).

apply M−1 (or solves with the doubly stochastic matrix 1∑ri=1 αi

M);

a recursive solver.

7/26 Doubly stochastic matrices

Introduction Problem complexity and heuristics Experimental results Concluding remarks

Motivation: Let’s generalize

Splitting A = M−N where

M = α1P1 + · · ·+ αrPr , N = −αr+1Pr+1 − · · · − αkPk .

Theorem

A sufficient condition for M =∑r

i=1 αiPi to be invertible: α1 is greaterthan the sum of the remaining ones.

Theorem

Suppose that α1 is greater than the sum of all the other αi . Thenρ(M−1N) < 1 and the stationary iterative method converges for all x0 tothe unique solution of Ax = b.

This is a sufficient condition; . . .and it is rather restrictive in practice./

8/26 Doubly stochastic matrices

Introduction Problem complexity and heuristics Experimental results Concluding remarks

Motivation: Let’s generalize

Open question: Identify other, less restrictive conditions on the αi (with1 ≤ i ≤ r) that will ensure convergence.

The natural idea (put the mass in M) did not always work: we haveexamples with k = 3, r = 2 and α1 + α2 > α3 where the splitting did notconverge.

Or, renounce convergence and use M as a preconditioner for a Krylovsubspace method like GMRES.

9/26 Doubly stochastic matrices

Introduction Problem complexity and heuristics Experimental results Concluding remarks

Motivation: Let’s generalize

M as a preconditioner for a Krylov subspace method like GMRES.

. . . need generalize to matrices with negative and positive entries.

Fact: Any nonnegative matrix A with total support can be scaled withtwo (unique) positive diagonal matrices R and C such that RAC isdoubly stochastic [Sinkhorn & Knopp,’67 and Knight, Ruiz and U.,’14].

Let A be n × n with total support and positive and negative entries.

B = abs(A) is nonnegative and RBC is doubly stochastic.

We can write RBC =∑αiPi .

10/26 Doubly stochastic matrices

Introduction Problem complexity and heuristics Experimental results Concluding remarks

Motivation: Let’s generalize

B = abs(A) and RBC =∑k

i=1 αiPi .

RAC =k∑

i=1

αiQi .

where Qi = [q(i)jk ]n×n is obtained from Pi = [p

(i)jk ]n×n as follows:

q(i)jk = sgn(ajk)p

(i)jk .

Generalizing Birkhoff–von Neumann decomposition

Any (real) matrix A with total support can be written as a convexcombination of a set of signed, scaled permutation matrices.

Define M (for splitting or for defining the preconditioner) as before.

11/26 Doubly stochastic matrices

Introduction Problem complexity and heuristics Experimental results Concluding remarks

Linear algebraic problem and combinatorial problem

Reduce the complexity of the solverby reducing the cost of applying M.

Find a BvN decomposition with thesmallest number of perm. matrices.

Input: A doubly stochastic matrix A.

Output: A Birkhoff-von Neumann decomposition of A asA = α1P1 + α2P2 + · · ·+ αkPk .

Measure: The number k of permutation matrices in thedecomposition.

This is NP-hard.

12/26 Doubly stochastic matrices

Introduction Problem complexity and heuristics Experimental results Concluding remarks

Heuristics: Birkhoff-like heuristic

1: A(0) = A2: for j = 1, . . . do3: find a permutation matrix Pj ⊆ A(j−1)

4: the minimum element of A(j−1) at the nonzero positions of Pj is αj

5: A(j) ← A(j−1) − αjPj

Birkhoff’s heuristic: Remove the smallest element

Let µ be the smallest nonzero of A(j−1).

A step 4, find a perfect matching M in the graph of A(j−1) containing µ.

13/26 Doubly stochastic matrices

Introduction Problem complexity and heuristics Experimental results Concluding remarks

Heuristics: generalized Birkhoff heuristic

1: A(0) = A2: for j = 1, . . . do3: find a permutation matrix Pj ⊆ A(j−1)

4: the minimum element of A(j−1) at the nonzero positions of Pj is αj

5: A(j) ← A(j−1) − αjPj

Proposed greedy heuristic: Get the maximum αj at every step

At step 4, among all perfect matchings in A(j−1) find one whoseminimum element is the maximum.

Bottleneck matching: efficient implementations exist [Duff & Koster,’01].

14/26 Doubly stochastic matrices

Introduction Problem complexity and heuristics Experimental results Concluding remarks

Experiments (heuristics)

τ : the number of nonzeros in a matrix.dmax: the maximum number of nonzeros in a row or a column.

Birkhoff greedy

matrix n τ dmax

∑ki=1 αi k

∑ki=1 αi k

aft01 8205 125567 21 0.16 2000 1.00 120bcspwr10 5300 21842 14 0.38 2000 1.00 63EX6 6545 295680 48 0.03 2000 1.00 226flowmeter0 9669 67391 11 0.51 2000 1.00 58fxm3 6 5026 94026 129 0.13 2000 1.00 383g3rmt3m3 5357 207695 48 0.05 2000 1.00 223mplate 5962 142190 36 0.03 2000 1.00 153n3c6-b7 6435 51480 8 1.00 8 1.00 8olm5000 5000 19996 6 0.75 283 1.00 14s2rmq4m1 5489 263351 54 0.00 2000 1.00 208

[At most 2000 permutation matrices, or accumulated a sum of at least 0.9999.]

15/26 Doubly stochastic matrices

Introduction Problem complexity and heuristics Experimental results Concluding remarks

Experiments (linear systems)

Number of GMRES iterations

M with different rmatrix ilu(0) 1 2 16 32

bp 200 2 38 31 22 23gemat11 168 1916 1606 620 297gemat12 254 2574 1275 570 386lns3937 348 1702 801 48 36

mahindas 37 225 158 43 32orani678 23 172 140 71 58

Number of nonzeros in Mmatrix ilu(0) 1 2 16 32

bp 200 125 0.32 0.52 0.75 0.75gemat11 31425 0.15 0.21 0.69 0.88gemat12 31184 0.15 0.20 0.64 0.79lns3937 24002 0.15 0.25 0.59 0.65

mahindas 4744 0.12 0.16 0.29 0.36orani678 47823 0.04 0.04 0.05 0.06

num. perm (k)

matrix k

bp 200 5gemat11 48gemat12 34lns3937 57

mahindas 154orani678 1053

16/26 Doubly stochastic matrices

Introduction Problem complexity and heuristics Experimental results Concluding remarks

Concluding remarks

Open problem 1: Can we find less restrictive conditions for having aconvergent solution?

Open problem 2: A better heuristic than the proposed greedy one?Approximation guarantee?

Open problem 3: Special, interesting cases that we can solve?

Open problem 4: Efficient algorithms for BvN decomposition?

17/26 Doubly stochastic matrices

Introduction Problem complexity and heuristics Experimental results Concluding remarks

Thanks!

Thanks for your attention.

http://perso.ens-lyon.fr/bora.ucar/

More information

R. A. Brualdi, Notes on the Birkhoff algorithm for doubly stochasticmatrices, Canadian Mathematical Bulletin, 25 (1982), pp. 191–199.

F. Dufosse, B. Ucar, Notes on Birkhoff–von Neumanndecomposition of doubly stochastic matrices, Linear Algebra and itsApplications 497 (2016) 108–115.

18/26 Doubly stochastic matrices

Introduction Problem complexity and heuristics Experimental results Concluding remarks

References I

R. A. Brualdi, The diagonal hypergraph of a matrix (bipartite graph),Discrete Mathematics 27 (2) (1979) 127–147.

R. A. Brualdi, Notes on the Birkhoff algorithm for doubly stochasticmatrices, Canadian Mathematical Bulletin, 25 (1982), pp. 191–199.

R. A. Brualdi and P. M. Gibson, Convex polyhedra of doubly stochasticmatrices: I. Applications of the permanent function, Journal ofCombinatorial Theory, Series A, 22 (1977), pp. 194–230.

C. S. Chang , W. J. Chen, and H. Y. Huang, On service guarantees forinput buffered crossbar switches: A capacity decomposition approach byBirkhoff and von Neumann, IEEE IWQos’99, 79–86, London, UK.

I. S. Duff and J. Koster, On algorithms for permuting large entries to thediagonal of a sparse matrix, SIAM Journal on Matrix Analysis andApplications 22 (2001) 973–996.

19/26 Doubly stochastic matrices

Introduction Problem complexity and heuristics Experimental results Concluding remarks

References II

F. Dufosse and B. Ucar, Notes on Birkhoff–von Neumann decompositionof doubly stochastic matrices, Linear Algebra and its Applications 497(2016) 108–115.

P. A. Knight, The Sinkhorn–Knopp algorithm: Convergence andapplications, SIAM J. Matrix Anal. A. 30 (1) (2008) 261–275.

P. A. Knight and D. Ruiz, A fast algorithm for matrix balancing, IMAJournal of Numerical Analysis 33 (3) (2013) 1029–1047.

P. A. Knight, D. Ruiz, and B. Ucar, A symmetry preserving algorithm formatrix scaling, SIAM J. Matrix Anal. A. 35 (3) (2014) 931–955.

M. Marcus and R. Ree, Diagonals of doubly stochastic matrices, TheQuarterly Journal of Mathematics 10 (1) (1959) 296–302.

A. Pothen and C.-J. Fan, Computing the block triangular form of a sparsematrix, ACM T. Math. Software 16 (4) (1990) 303–324.

20/26 Doubly stochastic matrices

Introduction Problem complexity and heuristics Experimental results Concluding remarks

References III

D. Ruiz, A scaling algorithm to equilibrate both row and column norms inmatrices, Tech. Rep. TR-2001-034, RAL (2001).

R. Sinkhorn and P. Knopp, Concerning nonnegative matrices and doublystochastic matrices, Pacific J. Math. 21 (1967) 343–348.

D. de Werra, Variations on the Theorem of Birkhoff–von Neumann andextensions, Graphs and Combinatorics, 19 (2003), 263–278.

D. de Werra, Partitioning the edge set of a bipartite graph into chainpackings: Complexity of some variations, Linear Algebra and itsApplications, 368 (2003) 315–327.

21/26 Doubly stochastic matrices

Introduction Problem complexity and heuristics Experimental results Concluding remarks

Known results for the num. of permutationmatrices: Upper bound

Marcus–Ree Theorem [’59]: k ≤ n2 − 2n + 2 for dense matrices;

[Brualdi& Gibson,’77] and [Brualdi,’82]: for a sparse matrix with τ nonzerosk ≤ τ − 2n + `+ 1 (containing ` submatrices with total total support;take ` = 2)

22/26 Doubly stochastic matrices

Introduction Problem complexity and heuristics Experimental results Concluding remarks

Known results for the num. of permutationmatrices: lower bound

A set U of positions of the nonzeros of A is called strongly stable[Brualdi,’79]: if for each permutation matrix P ⊆ A, pkl = 1 for at mostone pair (k, l) ∈ U.

Lemma (Brualdi,’82)

Let A be a doubly stochastic matrix. Then, in a BvN decomposition ofA, there are at least γ(A) permutation matrices, where γ(A) is themaximum cardinality of a strongly stable set of positions of A.

γ(A) ≥ the maximum number of nonzeros in a row or a column of A

[Brualdi,’82] shows that for any integer t with 1 ≤ t ≤ dn/2ed(n + 1)/2e,there exists an n × n doubly stochastic matrix A such that γ(A) = t.

23/26 Doubly stochastic matrices

Introduction Problem complexity and heuristics Experimental results Concluding remarks

Known results for the num. of permutationmatrices: lower bound

γ(A) ≥ the maximum number of nonzerosin a row or a column of A

[Brualdi,’82] shows that for any integer twith 1 ≤ t ≤ dn/2ed(n + 1)/2e, there existsan n × n doubly stochastic matrix A suchthat γ(A) = t.

24/26 Doubly stochastic matrices

Introduction Problem complexity and heuristics Experimental results Concluding remarks

Combinatorial problem: Sketch of the proof ofNP-completeness of the decision version

Reduction from 3-Partition: given an array A = a1, . . . , a3m of 3m

positive integers, a positive integer B such that∑3m

i=1 ai = mB andB/4 < ai < B/2, does there exist a partition of A into m disjoint arraysS1, . . . ,Sm such that each Si has three elements whose sum is B.

T =

[1mEm O

O 1mB C

]where C =

a1 a2 . . . a3m

a3m a1 . . . a3m−1

......

. . ....

a2 a3 . . . a1

.

Em is an m ×m matrix of 1s; T is doubly stochastic.

T has a decomposition with 3m permutation matrices iff we can solvethe 3-Partition instance a1, . . . , a3m.

25/26 Doubly stochastic matrices

Introduction Problem complexity and heuristics Experimental results Concluding remarks

Heuristics (properties)

Let S be n × n doubly stochastic, m = n!.

The map f : (c1, . . . , cm)→ c1P1 + · · ·+ cmPm from the(m − 1)-dimensional simplex (c1, . . . , cm) : ci ≥ 0,

∑i ci = 1 to Ωn is a

continuous surjection, and for each S ∈ Ωn, f −1(S) is a compact convexset.

Lemma

There is an extreme point of S with the minimum number ofpermutation matrices.

Lemma

A heuristic of the generalized Birkhoff family finds an extreme point ofthe convex polytope f −1(S).

26/26 Doubly stochastic matrices