A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU...

83
A totally unimodular view of structured sparsity Volkan Cevher volkan.cevher@epfl.ch Laboratory for Information and Inference Systems (LIONS) École Polytechnique Fédérale de Lausanne (EPFL) Switzerland DISCML (NIPS) [December 13, 2014] Joint work with Marwa El Halabi, Luca Baldassarre and Baran Gözcü @ LIONS Anastasios Kyrillidis and Bubacarr Bah @ UT Austin Nirav Bhan @ MIT

Transcript of A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU...

Page 1: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

A totally unimodular view ofstructured sparsity

Volkan [email protected]

Laboratory for Information and Inference Systems (LIONS)École Polytechnique Fédérale de Lausanne (EPFL)

Switzerland

DISCML (NIPS)[December 13, 2014]

Joint work withMarwa El Halabi, Luca Baldassarre and Baran Gözcü @ LIONS

Anastasios Kyrillidis and Bubacarr Bah @ UT AustinNirav Bhan @ MIT

Page 2: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Outline

Total unimodularity in discrete optimization

From sparsity to structured sparsity

Convex relaxations for structured sparse recovery

Enter nonconvexity

Conclusions

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 2/ 45

Page 3: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Integer linear programming

Discrete optimizationSearch for an optimum object within a finite collection of objects.

Integer linear programMany important discrete optimization problemscan be formulated as an integer linear program

β\ ∈ arg maxβ∈Zm

{θTβ : Mβ ≤ c,β ≥ 0} (ILP)

NP-Hard (in general)I vertex cover, set packing, maximum flow,traveling salesman, boolean satisfiability.

A general approachAttempt the following convex relaxation

β? ∈ arg maxβ∈Rm

{θTβ : Mβ ≤ c,β ≥ 0} (LP)

Obtains an upperbound

�1

�2

�\

Polyhedra & Polytopes

P = {β|Mβ ≤ c,β ≥ 0}

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 3/ 45

Page 4: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Integer linear programming

Discrete optimizationSearch for an optimum object within a finite collection of objects.

Integer linear programMany important discrete optimization problemscan be formulated as an integer linear program

β\ ∈ arg maxβ∈Zm

{θTβ : Mβ ≤ c,β ≥ 0} (ILP)

NP-Hard (in general)I vertex cover, set packing, maximum flow,traveling salesman, boolean satisfiability.

A general approachAttempt the following convex relaxation

β? ∈ arg maxβ∈Rm

{θTβ : Mβ ≤ c,β ≥ 0} (LP)

Obtains an upperbound

�1

�2

M� c �\ ✓T�\

Polyhedra & Polytopes

P = {β|Mβ ≤ c,β ≥ 0}

(β ∈ Rm , c ∈ Rm)

Polytope: A bounded polyhedron

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 3/ 45

Page 5: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Integer linear programming

Discrete optimizationSearch for an optimum object within a finite collection of objects.

Integer linear programMany important discrete optimization problemscan be formulated as an integer linear program

β\ ∈ arg maxβ∈Zm

{θTβ : Mβ ≤ c,β ≥ 0} (ILP)

NP-Hard (in general)I vertex cover, set packing, maximum flow,traveling salesman, boolean satisfiability.

A general approachAttempt the following convex relaxation

β? ∈ arg maxβ∈Rm

{θTβ : Mβ ≤ c,β ≥ 0} (LP)

Obtains an upperbound

�1

�2

M� c ✓T�\�\

�?

✓T�? � ✓T�\

Polyhedra & Polytopes

P = {β|Mβ ≤ c,β ≥ 0}

(β ∈ Rm , c ∈ Rm)

Polytope: A bounded polyhedron

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 3/ 45

Page 6: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Integer linear programming

Discrete optimizationSearch for an optimum object within a finite collection of objects.

Integer linear programMany important discrete optimization problemscan be formulated as an integer linear program

β\ ∈ arg maxβ∈Zm

{θTβ : Mβ ≤ c,β ≥ 0} (ILP)

NP-Hard (in general)I vertex cover, set packing, maximum flow,traveling salesman, boolean satisfiability.

A general approachAttempt the following convex relaxation

β? ∈ arg maxβ∈Rm

{θTβ : Mβ ≤ c,β ≥ 0} (LP)

Obtains an upperbound

�1

�2

M� c ✓T�\

�?=�\

Polyhedra & Polytopes

P = {β|Mβ ≤ c,β ≥ 0}

Observation:When every vertex of P is integer,LP is a “correct” relaxation.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 3/ 45

Page 7: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

A sufficient condition

Polyhedra P = {Mβ ≤ c,β ≥ 0} has integer vertices when M is TU and c is integer

Definition (Total unimodularity)A matrix M ∈ Rl×m is totally unimodular(TU) iff the determinant of every squaresubmatrix of M is 0, or ±1.

Correctness of LP [23]When M is TU and c is integer, then the LP

maxβ∈Rm

{θTβ : Mβ ≤ c,β ≥ 0}

has integer optimal solutions (i.e., ILP ⊆ LP).

�1

�2

M� c

Verifying if a matrix is TU is in P [31]

TU matrices are not rare!I Regular matroids have TU representations [29]I Network flow problems & interval constraints involve TU matrices [23]I Incidence matrices of undirected bipartite graphs are TU [23]

Computational complexity of LP

I Polynomial time in l (i.e., number of constraints) and m (i.e., ambient dimension)

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 4/ 45

Page 8: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

A sufficient condition

Polyhedra P = {Mβ ≤ c,β ≥ 0} has integer vertices when M is TU and c is integer

Definition (Total unimodularity)A matrix M ∈ Rl×m is totally unimodular(TU) iff the determinant of every squaresubmatrix of M is 0, or ±1.

Correctness of LP [23]When M is TU and c is integer, then the LP

maxβ∈Rm

{θTβ : Mβ ≤ c,β ≥ 0}

has integer optimal solutions (i.e., ILP ⊆ LP).

�1

�2

M� c

Verifying if a matrix is TU is in P [31]

Computational complexity of LP

I Polynomial time in l (i.e., number of constraints) and m (i.e., ambient dimension)I IPM performs O

(√l log l

ε

)iterations (l > m) with up to O(m2l) operations,

where ε is the absolute solution accuracy

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 4/ 45

Page 9: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

A sufficient condition

Polyhedra P = {Mβ ≤ c,β ≥ 0} has integer vertices when M is TU and c is integer

Definition (Total unimodularity)A matrix M ∈ Rl×m is totally unimodular(TU) iff the determinant of every squaresubmatrix of M is 0, or ±1.

Correctness of LP [23]When M is TU and c is integer, then the LP

maxβ∈Rm

{θTβ : Mβ ≤ c,β ≥ 0}

has integer optimal solutions (i.e., ILP ⊆ LP).

�1

�2

M� c

Verifying if a matrix is TU is in P [31]

Computational complexity of LP

I Polynomial time in l (i.e., number of constraints) and m (i.e., ambient dimension)I What if l is exponentially large?

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 4/ 45

Page 10: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

A weaker sufficient conditionSubmodularity & submodular polyhedron [15]F : 2V → R is submodular iff it has the following diminishing returns property:

F(S ∪ {e})− F(S) ≥ F(T ∪ {e})− F(T ),

∀S ⊆ T ⊆ V, ∀e ∈ V \ T . The submodular polyhedron is defined as

P(F) := {β ∈ Rm | ∀S ⊆ V,βT1S ≤ F(S)}

where 1S is the support indicator vector, i.e., (1S)i = 1 if i ∈ S, 0 otherwise.I We cannot verify submodularity in polynomial time [28].I Submodular polyhedron is TDI: LP is a “correct” relaxation of ILP.

Total dual integrality (TDI) [17]A system Mβ ≤ c is called TDI when primal objective is finite and the dual problem

minα∈Rl

{αT c : α ≥ 0,αT M = θT

}has integer optimum solutions for all rational M and c, and for each integer θ.I A polynomial time (in l and m) algorithm can verify if Mβ ≤ c is TDI [12].

Structure matters! LP is efficiently solvable on the submodular polyhedra P(F).

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 5/ 45

Page 11: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

A weaker sufficient conditionSubmodularity & submodular polyhedron [15]F : 2V → R is submodular iff it has the following diminishing returns property:

F(S ∪ {e})− F(S) ≥ F(T ∪ {e})− F(T ),

∀S ⊆ T ⊆ V, ∀e ∈ V \ T . The submodular polyhedron is defined as

P(F) := {β ∈ Rm | ∀S ⊆ V,βT1S ≤ F(S)}

where 1S is the support indicator vector, i.e., (1S)i = 1 if i ∈ S, 0 otherwise.I We cannot verify submodularity in polynomial time [28].I Submodular polyhedron is TDI: LP is a “correct” relaxation of ILP.

Total dual integrality (TDI) [17]A system Mβ ≤ c is called TDI when primal objective is finite and the dual problem

minα∈Rl

{αT c : α ≥ 0,αT M = θT

}has integer optimum solutions for all rational M and c, and for each integer θ.I A polynomial time (in l and m) algorithm can verify if Mβ ≤ c is TDI [12].

Structure matters! LP is efficiently solvable on the submodular polyhedra P(F).

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 5/ 45

Page 12: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

In the rest of the talk...

We can use these concepts in obtainingI tight convex relaxationsI efficient nonconvex projections

for supervised learning and inverse problems

b = Axx1

x2

x3

Thursday, June 19, 14

Running example:

b A x\ w

n × p

A difficult estimation challenge when n < p:

Nullspace (null) of A: x\ + δ → b, ∀δ ∈ null(A)

I Needle in a haystack: We need additional information on x\!

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 6/ 45

Page 13: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

In the rest of the talk...

We can use these concepts in obtainingI tight convex relaxationsI efficient nonconvex projections

for supervised learning and inverse problems

b = Axx1

x2

x3

Thursday, June 19, 14

Running example:

b A x\ w

n × p

Applications: Machine learning, signal processing, theoretical computer science...

A difficult estimation challenge when n < p:

Nullspace (null) of A: x\ + δ → b, ∀δ ∈ null(A)

I Needle in a haystack: We need additional information on x\!

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 6/ 45

Page 14: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

In the rest of the talk...

We can use these concepts in obtainingI tight convex relaxationsI efficient nonconvex projections

for supervised learning and inverse problemsb = Ax

x1

x2

x3

Thursday, June 19, 14

Running example:

b A x\ w

n × p

A difficult estimation challenge when n < p:

Nullspace (null) of A: x\ + δ → b, ∀δ ∈ null(A)I Needle in a haystack: We need additional information on x\!

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 6/ 45

Page 15: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Outline

Total unimodularity in discrete optimization

From sparsity to structured sparsity

Convex relaxations for structured sparse recovery

Enter nonconvexity

Conclusions

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 7/ 45

Page 16: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Three key insights

1. Sparse or compressible x\

not sufficient alone

2. Recovery

tractable & stable

3. Projection A

information preserving

b A x\ w

n × p

Typical goals:

1. Find x? to minimize ‖x? − x\‖

2. Find x? to minimize L(

x?(a),x\(a) + w)

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 8/ 45

Page 17: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Swiss army knife of signal models

Definition (s-sparse vector)A vector x ∈ Rp is s-sparse, i.e., x ∈ Σs,if it has at most s non-zero entries.

Rp

x\

Sparse representations:y\ has sparse transform coefficients x\

I Basis representations Ψ ∈ Rp×p

I Wavelets, DCT, ...I Frame representations Ψ ∈ Rm×p,

m > pI Gabor, curvelets, shearlets, ...

I Other dictionary representations...

∥∥x\∥∥

0:= |{i : x\i , 0}| = s

=y\ x\

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 9/ 45

Page 18: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Sparse representations strike back!

b A y\

I b ∈ Rn , A ∈ Rn×p, and n < p

Impact: Support restricted columns of A leads to an overcomplete system.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 10/ 45

Page 19: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Sparse representations strike back!

b A x\

I b ∈ Rn , A ∈ Rn×p, and n < pI Ψ ∈ Rp×p, x\ ∈ Σs, and s < n < p

Impact: Support restricted columns of A leads to an overcomplete system.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 10/ 45

Page 20: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Sparse representations strike back!

b A x\

I b ∈ Rn , A ∈ Rn×p, x\ ∈ Σs, and s < n < p

Impact: Support restricted columns of A leads to an overcomplete system.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 10/ 45

Page 21: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Sparse representations strike back!

b A x\

n × 1 n × s s × 1

I b ∈ Rn , A ∈ Rn×p, x\ ∈ Σs, and s < n < p

Impact: Support restricted columns of A leads to an overcomplete system.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 10/ 45

Page 22: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Enter sparsity

A combinatorial approach for estimating x\ from b = Ax\ + wWe may consider the estimator with the least number of non-zero entries. That is,

x? ∈ arg minx∈Rp

{‖x‖0 : ‖b−Ax‖2 ≤ κ

}(P0)

with some κ ≥ 0. If κ = ‖w‖2, then x\ is a feasible solution.

P0 has the following characteristics:I sample complexity: O(s)I computational effort: NP-HardI stability: No

Tightest convex relaxation:‖x‖∗∗0 is the biconjugate (Fenchelconjugate of Fenchel conjugate)

Fenchel conjugate:f ∗(y) := supx:dom(f ) xT y− f (x).

A technicality: Restrict x\ ∈ [−1, 1]p.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 11/ 45

Page 23: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Enter sparsity

A combinatorial approach for estimating x\ from b = Ax\ + wWe may consider the estimator with the least number of non-zero entries. That is,

x? ∈ arg minx∈Rp

{‖x‖0 : ‖b−Ax‖2 ≤ κ

}(P0)

with some κ ≥ 0. If κ = ‖w‖2, then x\ is a feasible solution.

P0 has the following characteristics:I sample complexity: O(s)I computational effort: NP-HardI stability: No

Tightest convex relaxation:‖x‖∗∗0 is the biconjugate (Fenchelconjugate of Fenchel conjugate)

Fenchel conjugate:f ∗(y) := supx:dom(f ) xT y− f (x).

‖x‖0 over the unit `∞-ball

A technicality: Restrict x\ ∈ [−1, 1]p.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 11/ 45

Page 24: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Enter sparsity

A combinatorial approach for estimating x\ from b = Ax\ + wWe may consider the estimator with the least number of non-zero entries. That is,

x? ∈ arg minx∈Rp

{‖x‖0 : ‖b−Ax‖2 ≤ κ

}(P0)

with some κ ≥ 0. If κ = ‖w‖2, then x\ is a feasible solution.

P0 has the following characteristics:I sample complexity: O(s)I computational effort: NP-HardI stability: No

Tightest convex relaxation:‖x‖∗∗0 is the biconjugate (Fenchelconjugate of Fenchel conjugate)

Fenchel conjugate:f ∗(y) := supx:dom(f ) xT y− f (x).

‖x‖0 over the unit `∞-ball

A technicality: Restrict x\ ∈ [−1, 1]p.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 11/ 45

Page 25: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Enter sparsity

A combinatorial approach for estimating x\ from b = Ax\ + wWe may consider the estimator with the least number of non-zero entries. That is,

x? ∈ arg minx∈Rp

{‖x‖0 : ‖b−Ax‖2 ≤ κ

}(P0)

with some κ ≥ 0. If κ = ‖w‖2, then x\ is a feasible solution.

P0 has the following characteristics:I sample complexity: O(s)I computational effort: NP-HardI stability: No

Tightest convex relaxation:‖x‖∗∗0 is the biconjugate (Fenchelconjugate of Fenchel conjugate)

Fenchel conjugate:f ∗(y) := supx:dom(f ) xT y− f (x).

‖x‖1 is the convex envelope of ‖x‖0

A technicality: Restrict x\ ∈ [−1, 1]p.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 11/ 45

Page 26: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

The role of convexity: Tractable & stable recovery

A convex candidate solution for b = Ax\ + w

x? ∈ arg minx∈Rp

{‖x‖1 : ‖b−Ax‖2 ≤ ‖w‖2, ‖x‖∞ ≤ 1

}. (SOCP)

Theorem (A model recovery guarantee [27])Let A ∈ Rn×p be a matrix of i.i.d. Gaussian random variables with zero mean andvariances 1/n. For any t > 0 with probability at least 1− 6 exp

(−t2/26

), we have

∥∥x? − x\∥∥

2≤

[2√

2s log( ps ) + 5

4 s√

n −√

2s log( ps ) + 5

4 s − t

]‖w‖2 B ε, when ‖x\‖0 ≤ s.

Observations:I perfect recovery (i.e., ε = 0) with n ≥ 2s log( p

s ) + 54 s whp when w = 0.

I ε-accurate solution in k = O(√

2p + 1 log( 1ε))iterations via IPM1

with each iteration requiring the solution of a structured n× 2p linear system.2

I robust to noise.1For a rigorous primal-dual algorithm for this class of problems, see my NIPS 2014 paper [30].2When w = 0, the IPM complexity (# of iterations × cost per iteration) amounts to O(n2p1.5 log( 1

ε)).

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 12/ 45

Page 27: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

The role of the matrix A: Preserving information

Proposition (Condition for exact recovery in the noiseless case)We have successful recovery with x? ∈ arg minx∈Rp {f (x) : b = Ax, ‖x‖∞ ≤ 1},i.e., δ := x? − x\ = 0, if and only if null(A) ∩ Df (x\) = {0}.

null (A)

x

x\

�x : f(x) f(x\)

Df (x\)

Assume that the constraint ‖x‖∞ ≤ 1 is inactive.

Descent cone: Df (x\) := cone({

x : f (x\ + x) ≤ f (x\)})

.

x\

null (A)

b = Ax\

xx � x\

�x : f(x) f(x\)

Df (x\)

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 13/ 45

Page 28: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

The role of the matrix A: Preserving information

Proposition (Condition for exact recovery in the noiseless case)We have successful recovery with x? ∈ arg minx∈Rp {f (x) : b = Ax, ‖x‖∞ ≤ 1},i.e., δ := x? − x\ = 0, if and only if null(A) ∩ Df (x\) = {0}.

x\

null (A)

b = Ax\

xx � x\

�x : f(x) f(x\)

Df (x\)

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 13/ 45

Page 29: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

The role of the matrix A: Preserving information

P{

x? = x\}

= P{

null(A) ∩ Df (x\) = {0}}

Definition (Statistical dimension [2]3)Let C ⊆ Rp be a closed convex cone. The statistical dimension of C is defined as

d(C) := E[‖projC(g)‖2

2].

Theorem (Approximate kinematic formula [2])Let A ∈ Rn×p, n < p, be a matrix of i.i.d. standard Gaussian random variables, andlet C ⊆ Rp be a closed convex cone. Let η ∈ (0, 1), then we have

n ≥ d(C) + cη√p ⇒ P {null(A) ∩ C = {0}} ≥ 1− η;

n ≤ d(C)− cη√p ⇒ P {null(A) ∩ C = {0}} ≤ η,

where cη :=√

8 log(4/η).

We can compute d(C) . 2s log( ps ) + 5

4 s for C = D‖·‖1 (x\) when x\ ∈ Σs.

3The statistical dimension is closely related to the Gaussian complexity [7], Gaussian width [10], mean width[32], and Gaussian squared complexity [9].

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 14/ 45

Page 30: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Beyond sparsity towards model-based or structured sparsity

I The following signals can look the same from a sparsity perspective!

Sparse image Wavelet coefficients Spike train Background substractedof a natural image image

sorted index s p

|x\|(i)

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 15/ 45

Page 31: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Beyond sparsity towards model-based or structured sparsity

I The following signals can look the same from a sparsity perspective!

Sparse image Wavelet coefficients Spike train Background substractedof a natural image image

I In reality, these signals have additional structures beyond the simple sparsity

Sparse image Wavelet coefficients Spike train Background substractedof a natural image image

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 15/ 45

Page 32: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Beyond sparsity towards model-based or structured sparsity

Sparsity model: Union of all s-dimensionalcanonical subspaces.

Structured sparsity model: A particularunion of ms s-dimensional canonicalsubspaces.

Rp✓p

s

Rp

ms

Model-based or structured sparsityStructured sparsity models are discrete structures describing the interdependencybetween the non-zero coefficients of a vector.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 16/ 45

Page 33: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Three upshots of structured sparsity

Key properties of the statistical dimension [2]I The statistical dimension is invariant under unitary transformations (rotations).I Let C1 and C2 be closed convex cones. If C1 ⊆ C2, then d(C1) ≤ d(C2).

1. The smaller the statistical dimension is, the less we need to sample

Example (If Df1(x\) ⊆ Df2(x\) ⊆ Rn, then d(Df1(x\)) ≤ d(Df2(x\)). )f1(x) := max{|x1|, |x2|}+ |x3|f2(x) := ‖x‖1

x\ = [1,−1, 0]T

Observations:1. n1 < n2 for x\

2. n1 > n2 for z\ = [0, 0, 1]T

I Reduced sample complexity: phase transition at the statistical dimension

I Better noise robustness: denoising capabilities depend on the statistical dimension

maxσ>0

E[‖proxf (x\ + σw, σλ)− x\‖2

]σ2 ≤ d(λDf (x\))

Minimize a bound to the minimax risk via the regularization parameter λ [27]I Better interpretability: geometry can enhance interpretability

`1-ball: TV-ball:Influence the recovered support via customized convex geometry

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 17/ 45

Page 34: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Three upshots of structured sparsity

Key properties of the statistical dimension [2]I The statistical dimension is invariant under unitary transformations (rotations).I Let C1 and C2 be closed convex cones. If C1 ⊆ C2, then d(C1) ≤ d(C2).

2. The smaller the statistical dimension is, the better we can denoiseI Reduced sample complexity: phase transition at the statistical dimensionI Better noise robustness: denoising capabilities depend on the statistical dimension

maxσ>0

E[‖proxf (x\ + σw, σλ)− x\‖2

]σ2 ≤ d(λDf (x\))

Minimize a bound to the minimax risk via the regularization parameter λ [27]

I Better interpretability: geometry can enhance interpretability

`1-ball: TV-ball:Influence the recovered support via customized convex geometry

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 17/ 45

Page 35: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Three upshots of structured sparsity

Key properties of the statistical dimension [2]I The statistical dimension is invariant under unitary transformations (rotations).I Let C1 and C2 be closed convex cones. If C1 ⊆ C2, then d(C1) ≤ d(C2).

3. The smaller the statistical dimension is, the better we can enforce structureI Reduced sample complexity: phase transition at the statistical dimensionI Better noise robustness: denoising capabilities depend on the statistical dimension

maxσ>0

E[‖proxf (x\ + σw, σλ)− x\‖2

]σ2 ≤ d(λDf (x\))

Minimize a bound to the minimax risk via the regularization parameter λ [27]I Better interpretability: geometry can enhance interpretability

`1-ball: TV-ball:Influence the recovered support via customized convex geometry

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 17/ 45

Page 36: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Outline

Total unimodularity in discrete optimization

From sparsity to structured sparsity

Convex relaxations for structured sparse recovery

Enter nonconvexity

Conclusions

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 18/ 45

Page 37: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

A simple template for linear inverse problems

Find the “sparsest” x subject to structure and data.

I SparsityWe can generalize this desideratum to other notions of simplicity

I StructureWe only allow certain sparsity patterns

I Data fidelityWe have many choices of convex constraints & losses to represent data; e.g.,

‖b−Ax‖2 ≤ κ

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 19/ 45

Page 38: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

A convex proto-problem for structured sparsity

A combinatorial approach for estimating x\ from b = Ax\ + wWe may consider the sparsest estimator or its surrogate with a valid sparsity pattern:

x? ∈ arg minx∈Rp

{‖x‖s : ‖b−Ax‖2 ≤ κ, ‖x‖∞ ≤ 1

}(Ps)

with some κ ≥ 0. If κ = ‖w‖2, then the structured sparse x\ is a feasible solution.

Sparsity and structure together [14]Given some weights d ∈ Rd , e ∈ Rp and an integer input c ∈ Zl , we define

‖x‖s := minω{dTω + eT s : M

[ωs

]≤ c,1supp(x) = s,ω ∈ {0, 1}d}

for all feasible x, ∞ otherwise. The parameter ω is useful for latent modeling.

A convex candidate solution for b = Ax\ + wWe use the convex estimator based on the tightest convex relaxation of ‖x‖s:

x? ∈ arg minx∈dom(‖·‖s)

{‖x‖∗∗s : ‖b−Ax‖2 ≤ κ

}with some κ ≥ 0, dom(‖·‖s) := {x : ‖x‖s <∞}.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 20/ 45

Page 39: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

A convex proto-problem for structured sparsity

A combinatorial approach for estimating x\ from b = Ax\ + wWe may consider the sparsest estimator or its surrogate with a valid sparsity pattern:

x? ∈ arg minx∈Rp

{‖x‖s : ‖b−Ax‖2 ≤ κ, ‖x‖∞ ≤ 1

}(Ps)

with some κ ≥ 0. If κ = ‖w‖2, then the structured sparse x\ is a feasible solution.

Sparsity and structure together [14]Given some weights d ∈ Rd , e ∈ Rp and an integer input c ∈ Zl , we define

‖x‖s := minω{dTω + eT s : M

[ωs

]≤ c,1supp(x) = s,ω ∈ {0, 1}d}

for all feasible x, ∞ otherwise. The parameter ω is useful for latent modeling.

A convex candidate solution for b = Ax\ + wWe use the convex estimator based on the tightest convex relaxation of ‖x‖s:

x? ∈ arg minx∈dom(‖·‖s)

{‖x‖∗∗s : ‖b−Ax‖2 ≤ κ

}with some κ ≥ 0, dom(‖·‖s) := {x : ‖x‖s <∞}.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 20/ 45

Page 40: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Tractability & tightness of biconjugation

Proposition (Hardness of conjugation)Let F(s) : 2P → R ∪ {+∞} be a set function defined on the support s = supp(x).Conjugate of F over the unit infinity ball ‖x‖∞ ≤ 1 is given by

g∗(y) = sups∈{0,1}p

|y|T s − F(s).

Observations:

I F(s) is general set functionComputation: NP-Hard

I F(s) = ‖x‖sComputation: ILP in general. However, ifI M is TUI (M , c) is TDI

then tight convex relaxations with an LP (“usually” tractable)

Otherwise, relax to LP anyway!

I F(s) is submodularComputation: Polynomial-time

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 21/ 45

Page 41: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Tree sparsity [21, 13, 6, 33]

Wavelet coefficients Wavelet tree Valid selection of nodes Invalid selection of nodes

Structure: We seek the sparsest signal with a rooted connected subtree support.

Linear description: A valid support satisfy sparent ≥ schild over tree T

T1supp(x) := Ts ≥ 0

where T is the directed edge-node incidence matrix, which is TU.

Biconjugate: ‖x‖∗∗s = mins∈[0,1]p{1T s : Ts ≥ 0, |x| ≤ s}

?=∑G∈GH

‖xG‖∞

for x ∈ [−1, 1]p, ∞ otherwise.

The set G ∈ GH are defined as each node and all its descendants.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 22/ 45

Page 42: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Tree sparsity [21, 13, 6, 33]

Wavelet coefficients Wavelet tree Valid selection of nodes Invalid selection of nodes

Structure: We seek the sparsest signal with a rooted connected subtree support.

Linear description: A valid support satisfy sparent ≥ schild over tree T

T1supp(x) := Ts ≥ 0

where T is the directed edge-node incidence matrix, which is TU.

Biconjugate: ‖x‖∗∗s = mins∈[0,1]p{1T s : Ts ≥ 0, |x| ≤ s}

?=∑G∈GH

‖xG‖∞

for x ∈ [−1, 1]p, ∞ otherwise.

The set G ∈ GH are defined as each node and all its descendants.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 22/ 45

Page 43: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Tree sparsity [21, 13, 6, 33]

GH = {{1, 2, 3}, {2}, {3}} valid selection of nodes

Structure: We seek the sparsest signal with a rooted connected subtree support.

Linear description: A valid support satisfy sparent ≥ schild over tree T

T1supp(x) := Ts ≥ 0

where T is the directed edge-node incidence matrix, which is TU.

Biconjugate: ‖x‖∗∗s = mins∈[0,1]p{1T s : Ts ≥ 0, |x| ≤ s} ?=∑G∈GH

‖xG‖∞for x ∈ [−1, 1]p, ∞ otherwise.

The set G ∈ GH are defined as each node and all its descendants.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 22/ 45

Page 44: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Tree sparsity example: 1:100-compressive sensing [30, 1]

World [1Gpix] Lac Léman World [10Mpix]

sparse tree-sparse

PNSR = 31.83db PNSR = 32.48db

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 23/ 45

Page 45: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Tree sparsity example: TV & TU-relax 1:15-compression [30, 1]

Original Original BP

TU-relax TV

TV with BP TV with TU-relax

Regularization:

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.054

0.055

0.056

0.057

0.058

0.059

0.06

0.061

α

l2 e

rror

tv−BPmin errortv−TU relaxmin error

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 24/ 45

Page 46: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Group knapsack sparsity [35, 18, 16]

G2 = {1, 2, 3, 4, 5}

x1

x2

x3

x4

x5

x6

x7

x8

0

1

0

0

1

0

1

0

1supp(x)

supportindicator vector

sparse

1

2

2

3

1

G3 = {5, 6, 7, 8}

G4 = {2, 5, 7}

G5 = {6, 8}

G1 = {1}

knapsackconstraints vector

cu

Structure: We seek the sparsest signal with group allocation constraints.

Linear description: A valid support obeys budget constraints over G

BT s ≤ cu

where B is the biadjacency matrix of G, i.e., Bij = 1 iff i-th coefficient is in Gj .When B is an interval matrix or G has a loopless group intersection graph, it is TU.Remark: We can also budget a lowerbound c` ≤ BT s ≤ cu .

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 25/ 45

Page 47: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Group knapsack sparsity [35, 18, 16]

BT =

1 1 · · · 1 1 0 0 · · · 0

0 1 1 · · · 1 1 0 · · · 0

...

0 · · · 0 0 1 1 · · · 1 1

(p−∆+1)×p

Structure: We seek the sparsest signal with group allocation constraints.

Linear description: A valid support obeys budget constraints over G

BT s ≤ cu

where B is the biadjacency matrix of G, i.e., Bij = 1 iff i-th coefficient is in Gj .When B is an interval matrix or G has a loopless group intersection graph, it is TU.Remark: We can also budget a lowerbound c` ≤ BT s ≤ cu .

Biconjugate: ‖x‖∗∗s ={‖x‖1 if x ∈ [−1, 1]p,BT |x| ≤ cu ,∞ otherwise

For the neuronal spike example, we have cu = 1.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 25/ 45

Page 48: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Group knapsack sparsity [35, 18, 16]

(left) ‖x‖∗∗s ≤ 1 (middle) ‖x‖∗∗s ≤ 1.5 (right) ‖x‖∗∗s ≤ 2 for G = {{1, 2}, {2, 3}}

Structure: We seek the sparsest signal with group allocation constraints.

Linear description: A valid support obeys budget constraints over G

BT s ≤ cu

where B is the biadjacency matrix of G, i.e., Bij = 1 iff i-th coefficient is in Gj .When B is an interval matrix or G has a loopless group intersection graph, it is TU.Remark: We can also budget a lowerbound c` ≤ BT s ≤ cu .

Biconjugate: ‖x‖∗∗s ={‖x‖1 if x ∈ [−1, 1]p,BT |x| ≤ cu ,∞ otherwise

For the neuronal spike example, we have cu = 1.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 25/ 45

Page 49: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Group knapsack sparsity example: A stylized spike train

I Basis pursuit (BP): ‖x‖1I TU-relax (TU):

‖x‖∗∗s ={‖x‖1 if x ∈ [−1, 1]p,BT |x| ≤ cu ,∞ otherwise

0.1 0.2 0.3 0.4

0.2

0.4

0.6

0.8

1

n/p

Error:

‖x−x?‖2

‖x?‖2

BPDBP

Figure: Recovery for n = 0.18p.

0 50 100 150 2000

0.2

0.4

0.6

0.8

1

0 50 100 150 2000

0.2

0.4

0.6

0.8

1

0 50 100 150 2000

0.2

0.4

0.6

0.8

1

x\ xBP solution xTU solution

relative errors:‖x\−xBP‖2‖x\‖2

= .200‖x\−xTU‖2‖x\‖2

= .067

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 26/ 45

Page 50: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Group knapsack sparsity: A simple variation

G2 = {1, 2, 3, 4, 5}

x1

x2

x3

x4

x5

x6

x7

x8

0

1

0

0

1

0

1

0

1supp(x)

supportindicator vector

sparse

1

2

2

3

1

G3 = {5, 6, 7, 8}

G4 = {2, 5, 7}

G5 = {6, 8}

G1 = {1}

knapsackconstraints vector

cu

Structure: We seek the signal with the minimal overall group allocation.

Objective: 1T s → ‖x‖ω ={

minω∈Z++ ω if x ∈ [−1, 1]p,BT |x| ≤ ω1,∞ otherwise

Linear description: A valid support obeys budget constraints over G

BT s ≤ ω1

where B is the biadjacency matrix of G, i.e., Bij = 1 iff i-th coefficient is in Gj .When B is an interval matrix or G has a loopless group intersection graph, it is TU.

Biconjugate: ‖x‖∗∗s ={

maxG∈G ‖xG‖1 if x ∈ [−1, 1]p,∞ otherwise

Remark: The regularizer is known as exclusive Lasso [35, 26].A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 27/ 45

Page 51: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Group cover sparsity: Minimal group cover [5, 25, 19]

G2 = {1, 2, 3, 4, 5}

x1

x2

x3

x4

x5

x6

x7

x8

0

1

0

0

1

0

1

0

1supp(x)

supportindicator vector

sparse

0

0

0

1

0

group “support”indicator vector

Ê

group sparse

G3 = {5, 6, 7, 8}

G4 = {2, 5, 7}

G5 = {6, 8}

G1 = {1}

Structure: We seek the signal covered by a minimal number of groups.

Objective: 1T s → dTω

Linear description: At least one group containing a sparse coefficient is selected

Bω ≥ s

where B is the biadjacency matrix of G, i.e., Bij = 1 iff i-th coefficient is in Gj .When B is an interval matrix, or G has a loopless group intersection graph it is TU.

Remark: Weights d can depend on the sparsity within each groups (not TU) [14].

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 28/ 45

Page 52: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Group cover sparsity: Minimal group cover [5, 25, 19]

G = {{1, 2}, {2, 3}}, unit group weights d = 1.

Structure: We seek the signal covered by a minimal number of groups.

Objective: 1T s → dTω

Linear description: At least one group containing a sparse coefficient is selected

Bω ≥ s

where B is the biadjacency matrix of G, i.e., Bij = 1 iff i-th coefficient is in Gj .When B is an interval matrix, or G has a loopless group intersection graph it is TU.

Biconjugate: ‖x‖∗∗ω = minω∈[0,1]M {dTω : Bω ≥ |x|} for x ∈ [−1, 1]p,∞ otherwise

?= minvi∈Rp{∑M

i=1 di‖vi‖∞ : x =∑M

i=1 vi , ∀supp(vi) ⊆ Gi},Remark: Weights d can depend on the sparsity within each groups (not TU) [14].

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 28/ 45

Page 53: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Group cover sparsity: Minimal group cover [5, 25, 19]

G = {{1, 2}, {2, 3}}, unit group weights d = 1.

Structure: We seek the signal covered by a minimal number of groups.

Objective: 1T s → dTω

Linear description: At least one group containing a sparse coefficient is selected

Bω ≥ s

where B is the biadjacency matrix of G, i.e., Bij = 1 iff i-th coefficient is in Gj .When B is an interval matrix, or G has a loopless group intersection graph it is TU.

Biconjugate: ‖x‖∗∗ω = minω∈[0,1]M {dTω : Bω ≥ |x|} for x ∈ [−1, 1]p,∞ otherwise?= minvi∈Rp{

∑Mi=1 di‖vi‖∞ : x =

∑Mi=1 vi , ∀supp(vi) ⊆ Gi},

Remark: Weights d can depend on the sparsity within each groups (not TU) [14].

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 28/ 45

Page 54: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Group cover sparsity: Minimal group cover [5, 25, 19]

G = {{1, 2}, {2, 3}}, unit group weights d = 1.

Structure: We seek the signal covered by a minimal number of groups.

Objective: 1T s → dTω

Linear description: At least one group containing a sparse coefficient is selected

Bω ≥ s

where B is the biadjacency matrix of G, i.e., Bij = 1 iff i-th coefficient is in Gj .When B is an interval matrix, or G has a loopless group intersection graph it is TU.

Biconjugate: ‖x‖∗∗ω = minω∈[0,1]M {dTω : Bω ≥ |x|} for x ∈ [−1, 1]p,∞ otherwise?= minvi∈Rp{

∑Mi=1 di‖vi‖∞ : x =

∑Mi=1 vi , ∀supp(vi) ⊆ Gi},

Remark: Weights d can depend on the sparsity within each groups (not TU) [14].

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 28/ 45

Page 55: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Budgeted group cover sparsity

G2 = {1, 2, 3, 4, 5}

x1

x2

x3

x4

x5

x6

x7

x8

0

1

0

0

1

0

1

0

1supp(x)

supportindicator vector

sparse

0

0

0

1

0

group “support”indicator vector

Ê

group sparse

G3 = {5, 6, 7, 8}

G4 = {2, 5, 7}

G5 = {6, 8}

G1 = {1}

Structure: We seek the sparsest signal covered by G groups.

Objective: dTω → 1T s

Linear description: At least one of the G selected groups cover each sparse coefficient.

Bω ≥ s,1Tω ≤ G

where B is the biadjacency matrix of G, i.e., Bij = 1 iff i-th coefficient is in Gj .

When[

B1

]is an interval matrix, it is TU.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 29/ 45

Page 56: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Budgeted group cover sparsity

G2 = {1, 2, 3, 4, 5}

x1

x2

x3

x4

x5

x6

x7

x8

0

1

0

0

1

0

1

0

1supp(x)

supportindicator vector

sparse

0

0

0

1

0

group “support”indicator vector

Ê

group sparse

G3 = {5, 6, 7, 8}

G4 = {2, 5, 7}

G5 = {6, 8}

G1 = {1}

Structure: We seek the sparsest signal covered by G groups.

Objective: dTω → 1T s

Linear description: At least one of the G selected groups cover each sparse coefficient.

Bω ≥ s,1Tω ≤ G

where B is the biadjacency matrix of G, i.e., Bij = 1 iff i-th coefficient is in Gj .

When[

B1

]is an interval matrix, it is TU.

Biconjugate: ‖x‖∗∗ω = minω∈[0,1]M {‖x‖1 : Bω ≥ |x|,1Tω ≤ G}for x ∈ [−1, 1]p,∞ otherwise.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 29/ 45

Page 57: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Budgeted group cover example: Interval overlapping groups

I Basis pursuit (BP): ‖x‖1I Sparse group Lasso (SGLq):

(1− α)∑G∈G

√|G|‖xG‖q + α‖xG‖1

I TU-relax (TU):

‖x‖∗∗ω = minω∈[0,1]M

{‖x‖1 : Bω ≥ |x|, 1Tω ≤ G}

for x ∈ [−1, 1]p,∞ otherwise.0.1 0.2 0.3 0.4 0.5

0.2

0.4

0.6

0.8

1

n/p

Error:

‖x−x?‖2

‖x?‖2

BPSGLSGL∞

BLGC

Figure: Recovery for n = 0.25p, s = 15, p = 200,G = 5 out of M = 29 groups.

0 50 100 150 2000

0.2

0.4

0.6

0.8

1

0 50 100 150 2000

0.2

0.4

0.6

0.8

1

0 50 100 150 2000

0.2

0.4

0.6

0.8

1

0 50 100 150 2000

0.2

0.4

0.6

0.8

1

0 50 100 150 2000

0.2

0.4

0.6

0.8

1

x\ xBP solution xSGL solution xSGL∞ solution xTU solution

relative errors:‖x\−xBP‖2‖x\‖2

= .128‖x\−xSGL‖2‖x\‖2

= .181‖x\−xSGL∞‖2‖x\‖2

= .085‖x\−xTU‖2‖x\‖2

= .058

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 30/ 45

Page 58: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Group intersection sparsity [20, 34, 3]

G2 = {1, 2, 3, 4, 5}

x1

x2

x3

x4

x5

x6

x7

x8

1supp(x)

supportindicator vector

sparse

0

1

1

1

0

group “support”indicator vector

Ê

group sparse

G3 = {5, 6, 7, 8}

G4 = {2, 5, 7}

x1

x2

x3

x4

x5

x6

x7

x8

0

1

0

0

1

0

1

0

G1 = {1}

G5 = {6, 8}

Structure: We seek the signal intersecting with minimal number of groups.

Objective: 1T s → dTω

Linear description: All groups containing a sparse coefficient are selected

H ks ≤ ω,∀k ∈ P

where H k(i, j) ={

1 if j = k, j ∈ Gi

0 otherwise, which is TU.

Remark: For hierarchical GH , group intersection and tree sparsity models coincide.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 31/ 45

Page 59: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Group intersection sparsity [20, 34, 3]

G = {{1, 2}, {2, 3}}, unit group weights d = 1

(left) intersection (right) cover.

Structure: We seek the signal intersecting with minimal number of groups.Objective: 1T s → dTω

Linear description: All groups containing a sparse coefficient are selected

H ks ≤ ω,∀k ∈ P

where H k(i, j) ={

1 if j = k, j ∈ Gi

0 otherwise, which is TU.

Biconjugate: ‖x‖∗∗ω = minω∈[0,1]M {dTω : H k |x| ≤ ω, ∀k ∈ P}

?=∑G∈G ‖xG‖∞

for x ∈ [−1, 1]p,∞ otherwise.

Remark: For hierarchical GH , group intersection and tree sparsity models coincide.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 31/ 45

Page 60: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Group intersection sparsity [20, 34, 3]

G = {{1, 2}, {2, 3}}, unit group weights d = 1

(left) intersection (right) cover.

Structure: We seek the signal intersecting with minimal number of groups.Objective: 1T s → dTω (submodular)

Linear description: All groups containing a sparse coefficient are selected

H ks ≤ ω,∀k ∈ P

where H k(i, j) ={

1 if j = k, j ∈ Gi

0 otherwise, which is TU.

Biconjugate: ‖x‖∗∗ω = minω∈[0,1]M {dTω : H k |x| ≤ ω, ∀k ∈ P}?=∑G∈G ‖xG‖∞

for x ∈ [−1, 1]p,∞ otherwise.

Remark: For hierarchical GH , group intersection and tree sparsity models coincide.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 31/ 45

Page 61: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Group intersection sparsity [20, 34, 3]

G = {{1, 2, 3}, {2}, {3}}, unit group weights d = 1.

Structure: We seek the signal intersecting with minimal number of groups.

Objective: 1T s → dTω (submodular)

Linear description: All groups containing a sparse coefficient are selected

H ks ≤ ω,∀k ∈ P

where H k(i, j) ={

1 if j = k, j ∈ Gi

0 otherwise, which is TU.

Biconjugate: ‖x‖∗∗ω = minω∈[0,1]M {dTω : H k |x| ≤ ω, ∀k ∈ P}?=∑G∈G ‖xG‖∞

for x ∈ [−1, 1]p,∞ otherwise.

Remark: For hierarchical GH , group intersection and tree sparsity models coincide.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 31/ 45

Page 62: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Beyond linear costs: Graph dispersiveness

(left) ‖x‖∗∗s = 0 (right) ‖x‖∗∗s ≤ 1 for E = {{1, 2}, {2, 3}} (chain graph)

Structure: We seek a signal dispersive over a given graph G(P, E)

Objective: 1T s →∑

(i,j)∈E sisj (non-linear, supermodular function)

Linearization:

‖x‖s = minz∈{0,1}|E|{∑

(i,j)∈E zij : zij ≥ si + sj − 1}

When edge-node incidence matrix of G(P, E) is TU (e.g., bipartite graphs), it is TU.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 32/ 45

Page 63: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Beyond linear costs: Graph dispersiveness

(left) ‖x‖∗∗s = 0 (right) ‖x‖∗∗s ≤ 1 for E = {{1, 2}, {2, 3}} (chain graph)

Structure: We seek a signal dispersive over a given graph G(P, E)

Objective: 1T s →∑

(i,j)∈E sisj (non-linear, supermodular function)

Linearization:

‖x‖s = minz∈{0,1}|E|{∑

(i,j)∈E zij : zij ≥ si + sj − 1}

When edge-node incidence matrix of G(P, E) is TU (e.g., bipartite graphs), it is TU.Biconjugate: ‖x‖∗∗s =

∑(i,j)∈E(|xi |+ |xj | − 1)+ for x ∈ [−1, 1]p,∞ otherwise.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 32/ 45

Page 64: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Outline

Total unimodularity in discrete optimization

From sparsity to structured sparsity

Convex relaxations for structured sparse recovery

Enter nonconvexity

Conclusions

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 33/ 45

Page 65: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

An important alternative

Problem (Projection)

DefineMs,G := {x : eT s ≤ s,dTω ≤ G, M[ωs

]≤ c, s = 1supp(x)}.

The projection of x ontoMs,G in `q-norm is defined as Pq,Ms,G (x) : Rp → Rp,

Pq,Ms,G (x) ∈ arg minu∈Rp

{‖x− u‖qq : u ∈Ms,G}

I x = PM k,G (x) is the best model-based approximation of x.I The interesting cases are q = 1, 2.

Observation: Model-based approximation corresponds to an ILPI NP-Hard in general (weighted max cover formulation [5])I TU structures play a major roleI Pseudo-polynomial time solutions via dynamic programming

Model-based CS [6, 22]: n = O(log |Ms,G |) with iid Gaussian (dense)I n = O(s) for tree structureI iterative projected gradient descent xk+1 ∈ P2,Ms,G

(xk + AT (b−Axk)

)Model-based sketching [4]: n = o (s log(p/s)) with expanders (sparse)I n = O(s log(p/s)/ log log(p)) for tree structure (empirical: n = O(s))I iterative projected median descent xk+1 ∈ P1,Ms,G (xk +M(b−Axk))

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 34/ 45

Page 66: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

An important alternative

Problem (Projection)

DefineMs,G := {x : eT s ≤ s,dTω ≤ G, M[ωs

]≤ c, s = 1supp(x)}.

The projection of x ontoMs,G in `q-norm is defined as Pq,Ms,G (x) : Rp → Rp,

Pq,Ms,G (x) ∈ arg minu∈Rp

{‖x− u‖qq : u ∈Ms,G}

I x = PM k,G (x) is the best model-based approximation of x.I The interesting cases are q = 1, 2.

Observation: Model-based approximation corresponds to an ILPI NP-Hard in general (weighted max cover formulation [5])I TU structures play a major roleI Pseudo-polynomial time solutions via dynamic programming

Model-based CS [6, 22]: n = O(log |Ms,G |) with iid Gaussian (dense)I n = O(s) for tree structureI iterative projected gradient descent xk+1 ∈ P2,Ms,G

(xk + AT (b−Axk)

)Model-based sketching [4]: n = o (s log(p/s)) with expanders (sparse)I n = O(s log(p/s)/ log log(p)) for tree structure (empirical: n = O(s))I iterative projected median descent xk+1 ∈ P1,Ms,G (xk +M(b−Axk))

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 34/ 45

Page 67: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Tree sparsity example: Pareto frontier [8, 5]

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 35/ 45

Page 68: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Tree sparsity example: Pareto frontier [8, 5]

Sparsity s0 5 10 15 20 25

Appro

xim

ation

Err

or

0

1

2

3

4

5

6

7

8

9Tree model

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 35/ 45

Page 69: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Tree sparsity example: Pareto frontier [8, 5]

Sparsity s0 5 10 15 20 25

Appro

xim

ation

Err

or

0

1

2

3

4

5

6

7

8

9Tree modelTU-relax

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 35/ 45

Page 70: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Tree sparsity example: Pareto frontier [8, 5]

Sparsity s0 5 10 15 20 25

Appro

xim

ation

Err

or

0

1

2

3

4

5

6

7

8

9Tree modelTU-relaxHGL `1

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 35/ 45

Page 71: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Tree sparsity example: Pareto frontier [8, 5]

Sparsity s0 5 10 15 20 25

Appro

xim

ation

Err

or

0

1

2

3

4

5

6

7

8

9Tree modelTU-relaxHGL `1

Just kidding, they are the same.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 35/ 45

Page 72: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Tree sparsity example: Pareto frontier [8, 5]

Sparsity s0 5 10 15 20 25

Appro

xim

ation

Err

or

0

1

2

3

4

5

6

7

8

9Tree modelTU-relaxHGL `1HGL `2

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 35/ 45

Page 73: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Tree sparsity example: Pareto frontier [8, 5]

�������� �0 5 10 15 20 25

���

���

���

���

0

1

2

3

4

5

6

7

8

9���� ��������������� ����� ��

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 35/ 45

Page 74: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

CLASH [22]

Combinatorial Selection+

Least Absolute Shrinkage

Retains the same theoretical guarantees

ILP and matroid structured models...

CLASH set:

Model-CLASH set:

combinatorial origami

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 36/ 45

Page 75: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Outline

Total unimodularity in discrete optimization

From sparsity to structured sparsity

Convex relaxations for structured sparse recovery

Enter nonconvexity

Conclusions

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 37/ 45

Page 76: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

Conclusions

Our work: TU modeling framework & convex template & non-convex algorithmsI Many more convex programs (not necessarily norms)I TU models: tight convexifications, non-submodular examplesI Easy to design and “usually” efficient via an LPI London calling...

Alternatives:1. Atomic norms [11, 10]

I Given a set A, use the biconjugation of g(x) = inf0≤t≤c t + ιtA(x), for c > 0I Reverse engineer the set to obtain structured sparsityI “Usually” tractable since the norm is reverse engineered

2. Monotone submodular penalties and extensions [3]I Tight convexification via Lovász extensionI Reverse engineer the submodular set function (not always possible)

3. `q-regularized combinatorial functions [24]I Tight convexification (also explains latent group lasso like norms)I Not always efficiently computableI Reverse engineered and may loose structure, e.g., group knapsack model

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 38/ 45

Page 77: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

References I

[1] Ben Adcock, Anders C. Hansen, Clarice Poon, and Bogdan Roman.Breaking the coherence barrier: A new theory for compressed sensing.http://arxiv.org/abs/1302.0561, Feb. 2013.

[2] Dennis Amelunxen, Martin Lotz, Michael B. McCoy, and Joel A. Tropp.Living on the edge: Phase transitions in convex programs with random data.Information and Inference, 3:224–294, 2014.arXiv:1303.6672v2 [cs.IT].

[3] Francis Bach.Structured sparsity-inducing norms through submodular functions.Adv. Neur. Inf. Proc. Sys. (NIPS), pages 118–126, 2010.

[4] B. Bah, L. Baldassarre, and V. Cevher.Model-based sketching and recovery with expanders.In Proc. ACM-SIAM Symp. Disc. Alg., number EPFL-CONF-187484, 2014.

[5] L. Baldassarre, N. Bhan, V. Cevher, and A. Kyrillidis.Group-sparse model selection: Hardness and relaxations.arXiv preprint arXiv:1303.3207, 2013.

[6] R.G. Baraniuk, V. Cevher, M.F. Duarte, and C. Hegde.Model-based compressive sensing.IEEE Trans. Inf. Theory, 56(4):1982–2001, April 2010.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 39/ 45

Page 78: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

References II

[7] Peter L. Barlett and Shahar Mendelson.Rademacher and Gaussian complexities: Risk bounds and structural results.J. Mach. Learn. Res., 3, 2002.

[8] N. Bhan, L. Baldassarre, and V. Cevher.Tractability of interpretability via selection of group-sparse models.In IEEE Int. Symp. Inf. Theory, 2013.

[9] Venkat Chandrasekaran and Michael I. Jordan.Computational and statistical tradeoffs via convex relaxation.Proc. Nat. Acad. Sci., 110(13):E1181–E1190, 2013.

[10] Venkat Chandrasekaran, Benjamin Recht, Pablo A. Parrilo, and Alan S. Willsky.The convex geometry of linear inverse problems.Found. Comp. Math., 12:805–849, 2012.

[11] S. Chen, D. Donoho, and M. Saunders.Atomic decomposition by basis pursuit.SIAM J. Sci. Comp., 20(1):33–61, 1998.

[12] William Cook, László Lovász, and Alexander Schrijver.A polynomial-time test for total dual integrality in fixed dimension.In Mathematical programming at Oberwolfach II, pages 64–69. Springer, 1984.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 40/ 45

Page 79: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

References III

[13] Marco F. Duarte, Dharmpal Davenport, Mark A. adn Takhar, Jason N. Laska,Ting Sun, Kevin F. Kelly, and Richard G. Baraniuk.Single-pixel imaging via compressive sampling.IEEE Sig. Proc. Mag., 25(2):83–91, March 2008.

[14] Marwa El Halabi and Volkan Cevher.A totally unimodular view of structured sparsity.preprint, 2014.arXiv:1411.1990v1 [cs.LG].

[15] S. Fujishige.Submodular functions and optimization, volume 58.Elsevier Science, 2005.

[16] W Gerstner and W. Kistler.Spiking neuron models: Single neurons, populations, plasticity.Cambridge university press, 2002.

[17] FR Giles and William R Pulleyblank.Total dual integrality and integer polyhedra.Linear algebra and its applications, 25:191–196, 1979.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 41/ 45

Page 80: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

References IV

[18] C. Hegde, M. Duarte, and V. Cevher.Compressive sensing recovery of spike trains using a structured sparsity model.In Sig. Proc. with Adapative Sparse Struct. Rep. (SPARS), 2009.

[19] J. Huang, T. Zhang, and D. Metaxas.Learning with structured sparsity.J. Mach. Learn. Res., 12:3371–3412, 2011.

[20] R. Jenatton, A. Gramfort, V. Michel, G. Obozinski, F. Bach, and B. Thirion.Multi-scale mining of fmri data with hierarchical structured sparsity.In Pattern Recognition in NeuroImaging (PRNI), 2011.

[21] R. Jenatton, J. Mairal, G. Obozinski, and F. Bach.Proximal methods for hierarchical sparse coding.J. Mach. Learn. Res., 12:2297–2334, 2011.

[22] A. Kyrillidis and V. Cevher.Combinatorial selection and least absolute shrinkage via the CLASH algorithm.In IEEE Int. Symp. Inf. Theory, pages 2216–2220. Ieee, 2012.

[23] George L Nemhauser and Laurence A Wolsey.Integer and combinatorial optimization, volume 18.Wiley New York, 1999.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 42/ 45

Page 81: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

References V

[24] G. Obozinski and F. Bach.Convex relaxation for combinatorial penalties.arXiv preprint arXiv:1205.1240, 2012.

[25] G. Obozinski, L. Jacob, and J.P. Vert.Group lasso with overlaps: The latent group lasso approach.arXiv preprint arXiv:1110.0413, 2011.

[26] G. Obozinski, B. Taskar, and M.I. Jordan.Joint covariate selection and joint subspace selection for multiple classificationproblems.Statistics and Computing, 20(2):231–252, 2010.

[27] Samet Oymak, Christos Thrampoulidis, and Babak Hassibi.Simple bounds for noisy linear inverse problems with exact side information.2013.arXiv:1312.0641v2 [cs.IT].

[28] C Seshadhri and Jan Vondrák.Is submodularity testable?Algorithmica, 69(1):1–25, 2014.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 43/ 45

Page 82: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

References VI

[29] Paul D Seymour.Decomposition of regular matroids.Journal of combinatorial theory, Series B, 28(3):305–359, 1980.

[30] Quoc Tran-Dinh and Volkan Cevher.Constrained convex minimization via model-based excessive gap.In Adv. Neur. Inf. Proc. Sys. (NIPS), 2014.

[31] Klaus Truemper.Alpha-balanced graphs and matrices and GF(3)-representability of matroids.J. Comb. Theory Ser. B, 32(2):112–139, 1982.

[32] Roman Vershynin.Estimation in high dimensions: a geometric perspective.http://arxiv.org/abs/1405.5103, May 2014.

[33] Peng Zhao, Guilherme Rocha, and Bin Yu.Grouped and hierarchical model selection through composite absolute penalties.Department of Statistics, UC Berkeley, Tech. Rep, 703, 2006.

[34] Peng Zhao and Bin Yu.On model selection consistency of Lasso.J. Mach. Learn. Res., 7:2541–2563, 2006.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 44/ 45

Page 83: A totally unimodular view of structured sparsity · BP solution xSGL SGL∞ solutionxTU relativeerrors: kx \ −xBP 2 kx\ 2 =.128 k x\− SGLk 2 =.181 kx −x ∞ k2 kx\ 2 =.085 kx\

References VII

[35] H. Zhou, M.E. Sehl, J.S. Sinsheimer, and K. Lange.Association screening of common and rare genetic variants by penalized regression.

Bioinformatics, 26(19):2375, 2010.

A TU view of structured sparsity | Volkan Cevher, [email protected] Slide 45/ 45