Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th...

52
Frank-Wolfe Algorithms for Saddle Point problems Gauthier Gidel 1 Tony Jebara 2 Simon Lacoste-Julien 3 1 INRIA Paris, Sierra Team 2 Department of CS, Columbia University 3 Department of CS & OR (DIRO) Université de Montréal 10th December 2016 Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Transcript of Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th...

Page 1: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Frank-Wolfe Algorithms for Saddle Pointproblems

Gauthier Gidel1 Tony Jebara2 Simon Lacoste-Julien3

1INRIA Paris, Sierra Team 2Department of CS, Columbia University

3Department of CS & OR (DIRO) Université de Montréal

10th December 2016Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 2: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Overview

I Frank-Wolfe algorithm (FW) gained in popularity in thelast couple of years.

I Main advantage: FW only needs LMO.

I Extend FW properties to solve saddle point problem.

I Straightforward extension but Non trivial analysis.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 3: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Saddle point and link with variational inequalitiesLet L : X × Y → R, where X and Y are convex and compact.

Saddle point problem: solve minx∈X

maxy∈YL(x,y)

A solution (x∗,y∗) is called a Saddle Point.

I Necessary stationary conditions:

〈x− x∗, ∇xL(x∗,y∗)〉 ≥ 0〈y − y∗,−∇yL(x∗,y∗)〉 ≥ 0

I Variational inequality:

∀z ∈ X × Y 〈z − z∗, g(z∗)〉 ≥ 0

where (x∗,y∗) = z∗ and g(z) = (∇xL(z),−∇yL(z))I Sufficient condition: Global solution if L

convex-concave. ∀(x,y) ∈ X × Y

x′ 7→ L(x′,y) is convex and y′ 7→ L(x,y′) is concave.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 4: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Saddle point and link with variational inequalitiesLet L : X × Y → R, where X and Y are convex and compact.

Saddle point problem: solve minx∈X

maxy∈YL(x,y)

A solution (x∗,y∗) is called a Saddle Point.I Necessary stationary conditions:

〈x− x∗, ∇xL(x∗,y∗)〉 ≥ 0

〈y − y∗,−∇yL(x∗,y∗)〉 ≥ 0I Variational inequality:

∀z ∈ X × Y 〈z − z∗, g(z∗)〉 ≥ 0

where (x∗,y∗) = z∗ and g(z) = (∇xL(z),−∇yL(z))I Sufficient condition: Global solution if L

convex-concave. ∀(x,y) ∈ X × Y

x′ 7→ L(x′,y) is convex and y′ 7→ L(x,y′) is concave.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 5: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Saddle point and link with variational inequalitiesLet L : X × Y → R, where X and Y are convex and compact.

Saddle point problem: solve minx∈X

maxy∈YL(x,y)

A solution (x∗,y∗) is called a Saddle Point.I Necessary stationary conditions:

〈x− x∗, ∇xL(x∗,y∗)〉 ≥ 0〈y − y∗,−∇yL(x∗,y∗)〉 ≥ 0

I Variational inequality:

∀z ∈ X × Y 〈z − z∗, g(z∗)〉 ≥ 0

where (x∗,y∗) = z∗ and g(z) = (∇xL(z),−∇yL(z))I Sufficient condition: Global solution if L

convex-concave. ∀(x,y) ∈ X × Y

x′ 7→ L(x′,y) is convex and y′ 7→ L(x,y′) is concave.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 6: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Saddle point and link with variational inequalitiesLet L : X × Y → R, where X and Y are convex and compact.

Saddle point problem: solve minx∈X

maxy∈YL(x,y)

A solution (x∗,y∗) is called a Saddle Point.I Necessary stationary conditions:

〈x− x∗, ∇xL(x∗,y∗)〉 ≥ 0〈y − y∗,−∇yL(x∗,y∗)〉 ≥ 0

I Variational inequality:

∀z ∈ X × Y 〈z − z∗, g(z∗)〉 ≥ 0

where (x∗,y∗) = z∗ and g(z) = (∇xL(z),−∇yL(z))

I Sufficient condition: Global solution if Lconvex-concave. ∀(x,y) ∈ X × Y

x′ 7→ L(x′,y) is convex and y′ 7→ L(x,y′) is concave.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 7: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Saddle point and link with variational inequalitiesLet L : X × Y → R, where X and Y are convex and compact.

Saddle point problem: solve minx∈X

maxy∈YL(x,y)

A solution (x∗,y∗) is called a Saddle Point.I Necessary stationary conditions:

〈x− x∗, ∇xL(x∗,y∗)〉 ≥ 0〈y − y∗,−∇yL(x∗,y∗)〉 ≥ 0

I Variational inequality:

∀z ∈ X × Y 〈z − z∗, g(z∗)〉 ≥ 0

where (x∗,y∗) = z∗ and g(z) = (∇xL(z),−∇yL(z))I Sufficient condition: Global solution if L

convex-concave. ∀(x,y) ∈ X × Y

x′ 7→ L(x′,y) is convex and y′ 7→ L(x,y′) is concave.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 8: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Motivations: games and robust learningI Zero-sum games with two players:

minx∈∆(I)

maxy∈∆(J)

x>My

I Generative Adversarial Network (GAN)I Robust learning:1 We want to learn

minθ∈Θ

1n

n∑i=1

` (fθ(xi), yi) + λΩ(θ)

with an uncertainty regarding the data:

minθ∈Θ

maxw∈∆n

n∑i=1

ωi` (fθ(xi), yi) + λΩ(θ)

Minimize the worst case → gives robustness

1J. Wen, C. Yu, and R. Greiner. “Robust Learning under UncertainTest Distributions: Relating Covariate Shift to Model Misspecification.”In: ICML. 2014.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 9: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Motivations: games and robust learningI Zero-sum games with two players:

minx∈∆(I)

maxy∈∆(J)

x>My

I Generative Adversarial Network (GAN)

I Robust learning:1 We want to learn

minθ∈Θ

1n

n∑i=1

` (fθ(xi), yi) + λΩ(θ)

with an uncertainty regarding the data:

minθ∈Θ

maxw∈∆n

n∑i=1

ωi` (fθ(xi), yi) + λΩ(θ)

Minimize the worst case → gives robustness

1J. Wen, C. Yu, and R. Greiner. “Robust Learning under UncertainTest Distributions: Relating Covariate Shift to Model Misspecification.”In: ICML. 2014.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 10: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Motivations: games and robust learningI Zero-sum games with two players:

minx∈∆(I)

maxy∈∆(J)

x>My

I Generative Adversarial Network (GAN)I Robust learning:1 We want to learn

minθ∈Θ

1n

n∑i=1

` (fθ(xi), yi) + λΩ(θ)

with an uncertainty regarding the data:

minθ∈Θ

maxw∈∆n

n∑i=1

ωi` (fθ(xi), yi) + λΩ(θ)

Minimize the worst case → gives robustness1J. Wen, C. Yu, and R. Greiner. “Robust Learning under Uncertain

Test Distributions: Relating Covariate Shift to Model Misspecification.”In: ICML. 2014.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 11: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Problem with Hard projection

The structured SVM:

minω∈Rd

λΩ(ω) + 1n

n∑i=1

maxy∈Yi

(Li(y)− 〈ω, φi(y)〉)︸ ︷︷ ︸structured hinge loss

Regularization: penalized → constrained.

minΩ(ω)≤β

maxα∈∆(|Y|)

bTα− ωTMα

Hard to project when:I Structured sparsity norm (group lasso norm).I The output Y is structured: exponential size.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 12: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Problem with Hard projection

The structured SVM:

minω∈Rd

λΩ(ω) + 1n

n∑i=1

maxy∈Yi

(Li(y)− 〈ω, φi(y)〉)︸ ︷︷ ︸structured hinge loss

Regularization: penalized → constrained.

minΩ(ω)≤β

maxα∈∆(|Y|)

bTα− ωTMα

Hard to project when:I Structured sparsity norm (group lasso norm).I The output Y is structured: exponential size.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 13: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Problem with Hard projection

The structured SVM:

minω∈Rd

λΩ(ω) + 1n

n∑i=1

maxy∈Yi

(Li(y)− 〈ω, φi(y)〉)︸ ︷︷ ︸structured hinge loss

Regularization: penalized → constrained.

minΩ(ω)≤β

maxα∈∆(|Y|)

bTα− ωTMα

Hard to project when:I Structured sparsity norm (group lasso norm).I The output Y is structured: exponential size.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 14: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Problem with Hard projection

The structured SVM:

minω∈Rd

λΩ(ω) + 1n

n∑i=1

maxy∈Yi

(Li(y)− 〈ω, φi(y)〉)︸ ︷︷ ︸structured hinge loss

Regularization: penalized → constrained.

minΩ(ω)≤β

maxα∈∆(|Y|)

bTα− ωTMα

Hard to project when:I Structured sparsity norm (group lasso norm).I The output Y is structured: exponential size.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 15: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Problem with Hard projection

The structured SVM:

minω∈Rd

λΩ(ω) + 1n

n∑i=1

maxy∈Yi

(Li(y)− 〈ω, φi(y)〉)︸ ︷︷ ︸structured hinge loss

Regularization: penalized → constrained.

minΩ(ω)≤β

maxα∈∆(|Y|)

bTα− ωTMα

Hard to project when:

I Structured sparsity norm (group lasso norm).I The output Y is structured: exponential size.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 16: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Problem with Hard projection

The structured SVM:

minω∈Rd

λΩ(ω) + 1n

n∑i=1

maxy∈Yi

(Li(y)− 〈ω, φi(y)〉)︸ ︷︷ ︸structured hinge loss

Regularization: penalized → constrained.

minΩ(ω)≤β

maxα∈∆(|Y|)

bTα− ωTMα

Hard to project when:I Structured sparsity norm (group lasso norm).

I The output Y is structured: exponential size.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 17: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Problem with Hard projection

The structured SVM:

minω∈Rd

λΩ(ω) + 1n

n∑i=1

maxy∈Yi

(Li(y)− 〈ω, φi(y)〉)︸ ︷︷ ︸structured hinge loss

Regularization: penalized → constrained.

minΩ(ω)≤β

maxα∈∆(|Y|)

bTα− ωTMα

Hard to project when:I Structured sparsity norm (group lasso norm).I The output Y is structured: exponential size.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 18: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Standard approaches in literature

Simplest algorithm to solve Saddle point problems is theprojected gradient algorithm.

x(t+1) = PX (x(t) − η∇xL(x(t),y(t)))y(t+1) = PY(y(t) + η∇yL(x(t),y(t)))

For non-smooth optimization,

1T

T∑t=1

(x(t),y(t)

)−→T→∞

(x∗,y∗)

Faster algorithm: projected extra-gradient algorithm.

Can use LMO to compute approximate projections2.

2N. He and Z. Harchaoui. “Semi-proximal Mirror-Prox for NonsmoothComposite Minimization”. In: NIPS. 2015.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 19: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Standard approaches in literature

Simplest algorithm to solve Saddle point problems is theprojected gradient algorithm.

x(t+1) = PX (x(t) − η∇xL(x(t),y(t)))y(t+1) = PY(y(t) + η∇yL(x(t),y(t)))

For non-smooth optimization,

1T

T∑t=1

(x(t),y(t)

)−→T→∞

(x∗,y∗)

Faster algorithm: projected extra-gradient algorithm.

Can use LMO to compute approximate projections2.

2N. He and Z. Harchaoui. “Semi-proximal Mirror-Prox for NonsmoothComposite Minimization”. In: NIPS. 2015.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 20: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Standard approaches in literature

Simplest algorithm to solve Saddle point problems is theprojected gradient algorithm.

x(t+1) = PX (x(t) − η∇xL(x(t),y(t)))y(t+1) = PY(y(t) + η∇yL(x(t),y(t)))

For non-smooth optimization,

1T

T∑t=1

(x(t),y(t)

)−→T→∞

(x∗,y∗)

Faster algorithm: projected extra-gradient algorithm.

Can use LMO to compute approximate projections2.

2N. He and Z. Harchaoui. “Semi-proximal Mirror-Prox for NonsmoothComposite Minimization”. In: NIPS. 2015.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 21: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

The FW algorithm

Algorithm Frank-Wolfe algorithm1: Let x(0) ∈ X2: for t = 0 . . . T do3: Compute r(t) = ∇f(x(t))4: Compute s(t) ∈ argmin

s∈X

⟨s, r(t)⟩

5: Compute gt :=⟨x(t) − s(t), r(t)⟩

6: if gt ≤ ε then return x(t)

7: Let γ = 22+t (or do line-search)

8: Update x(t+1) := (1−γ)x(t)+γs(t)

9: end for↵

f(↵)

M

f

Figure: One step of the FWalgorithm

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 22: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

The FW algorithm

Algorithm Frank-Wolfe algorithm1: Let x(0) ∈ X2: for t = 0 . . . T do3: Compute r(t) = ∇f(x(t))4: Compute s(t) ∈ argmin

s∈X

⟨s, r(t)⟩

5: Compute gt :=⟨x(t) − s(t), r(t)⟩

6: if gt ≤ ε then return x(t)

7: Let γ = 22+t (or do line-search)

8: Update x(t+1) := (1−γ)x(t)+γs(t)

9: end for↵

f(↵)

M

f

f(↵) +

s0 ↵,rf(↵)

Figure: One step of the FWalgorithm

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 23: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

The FW algorithm

Algorithm Frank-Wolfe algorithm1: Let x(0) ∈ X2: for t = 0 . . . T do3: Compute r(t) = ∇f(x(t))4: Compute s(t) ∈ argmin

s∈X

⟨s, r(t)⟩

5: Compute gt :=⟨x(t) − s(t), r(t)⟩

6: if gt ≤ ε then return x(t)

7: Let γ = 22+t (or do line-search)

8: Update x(t+1) := (1−γ)x(t)+γs(t)

9: end for↵

f(↵)

M

f

f(↵) +

s0 ↵,rf(↵)

Figure: One step of the FWalgorithm

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 24: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

The FW algorithm

Algorithm Frank-Wolfe algorithm1: Let x(0) ∈ X2: for t = 0 . . . T do3: Compute r(t) = ∇f(x(t))4: Compute s(t) ∈ argmin

s∈X

⟨s, r(t)⟩

5: Compute gt :=⟨x(t) − s(t), r(t)⟩

6: if gt ≤ ε then return x(t)

7: Let γ = 22+t (or do line-search)

8: Update x(t+1) := (1−γ)x(t)+γs(t)

9: end for↵

f(↵)

M

f

s

f(↵) +

s0 ↵,rf(↵)

Figure: One step of the FWalgorithm

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 25: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

SP-FWAlgorithm Saddle point FW algorithm

1: Let z(0) = (x(0),y(0)) ∈ X × Y2: for t = 0 . . . T do3: Compute r(t) :=

(∇xL(x(t),y(t))−∇yL(x(t),y(t))

)4: Compute s(t) ∈ argmin

z∈X×Y

⟨z, r(t)

⟩5: Compute gt :=

⟨z(t) − s(t), r(t)

⟩6: if gt ≤ ε then return z(t)

7: Let γ = min(1, νC gt

)or γ = 2

2+t8: Update z(t+1) := (1− γ)z(t) + γs(t)

9: end for

I One can define FW extension with away step.I γt = 1

1+t ⇒ z(t) = 1t

∑ti=0 s(i).

I (γt = 11+t) + Bilinear objective ↔ fictitious play algorithm.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 26: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

SP-FWAlgorithm Saddle point FW algorithm

1: Let z(0) = (x(0),y(0)) ∈ X × Y2: for t = 0 . . . T do3: Compute r(t) :=

(∇xL(x(t),y(t))−∇yL(x(t),y(t))

)4: Compute s(t) ∈ argmin

z∈X×Y

⟨z, r(t)

⟩5: Compute gt :=

⟨z(t) − s(t), r(t)

⟩6: if gt ≤ ε then return z(t)

7: Let γ = min(1, νC gt

)or γ = 2

2+t8: Update z(t+1) := (1− γ)z(t) + γs(t)

9: end for

I One can define FW extension with away step.I γt = 1

1+t ⇒ z(t) = 1t

∑ti=0 s(i).

I (γt = 11+t) + Bilinear objective ↔ fictitious play algorithm.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 27: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

SP-FWAlgorithm Saddle point FW algorithm

1: Let z(0) = (x(0),y(0)) ∈ X × Y2: for t = 0 . . . T do3: Compute r(t) :=

(∇xL(x(t),y(t))−∇yL(x(t),y(t))

)4: Compute s(t) ∈ argmin

z∈X×Y

⟨z, r(t)

⟩5: Compute gt :=

⟨z(t) − s(t), r(t)

⟩6: if gt ≤ ε then return z(t)

7: Let γ = min(1, νC gt

)or γ = 2

2+t8: Update z(t+1) := (1− γ)z(t) + γs(t)

9: end for

I One can define FW extension with away step.I γt = 1

1+t ⇒ z(t) = 1t

∑ti=0 s(i).

I (γt = 11+t) + Bilinear objective ↔ fictitious play algorithm.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 28: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

SP-FWAlgorithm Saddle point FW algorithm

1: Let z(0) = (x(0),y(0)) ∈ X × Y2: for t = 0 . . . T do3: Compute r(t) :=

(∇xL(x(t),y(t))−∇yL(x(t),y(t))

)4: Compute s(t) ∈ argmin

z∈X×Y

⟨z, r(t)

⟩5: Compute gt :=

⟨z(t) − s(t), r(t)

⟩6: if gt ≤ ε then return z(t)

7: Let γ = min(1, νC gt

)or γ = 2

2+t8: Update z(t+1) := (1− γ)z(t) + γs(t)

9: end forI One can define FW extension with away step.

I γt = 11+t ⇒ z(t) = 1

t

∑ti=0 s(i).

I (γt = 11+t) + Bilinear objective ↔ fictitious play algorithm.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 29: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

SP-FWAlgorithm Saddle point FW algorithm

1: Let z(0) = (x(0),y(0)) ∈ X × Y2: for t = 0 . . . T do3: Compute r(t) :=

(∇xL(x(t),y(t))−∇yL(x(t),y(t))

)4: Compute s(t) ∈ argmin

z∈X×Y

⟨z, r(t)

⟩5: Compute gt :=

⟨z(t) − s(t), r(t)

⟩6: if gt ≤ ε then return z(t)

7: Let γ = min(1, νC gt

)or γ = 2

2+t8: Update z(t+1) := (1− γ)z(t) + γs(t)

9: end forI One can define FW extension with away step.I γt = 1

1+t ⇒ z(t) = 1t

∑ti=0 s(i).

I (γt = 11+t) + Bilinear objective ↔ fictitious play algorithm.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 30: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Advantages of SP-FWSame main property as FW:

Only LMO (linear minimization oracle).

Same other advantages as FW:I Convergence certificate gt for free.I Affine invariance of the algorithm.I Sparsity of the iterates.I Universal step size γt := 2

2+t , adaptive step size γt := νC gt.

Main difference with SP:I No line-search.

When constraints set is a “complicated” structured polytopeprojections can be hard whereas LMO might be tractable.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 31: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Advantages of SP-FWSame main property as FW:

Only LMO (linear minimization oracle).

Same other advantages as FW:I Convergence certificate gt for free.

I Affine invariance of the algorithm.I Sparsity of the iterates.I Universal step size γt := 2

2+t , adaptive step size γt := νC gt.

Main difference with SP:I No line-search.

When constraints set is a “complicated” structured polytopeprojections can be hard whereas LMO might be tractable.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 32: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Advantages of SP-FWSame main property as FW:

Only LMO (linear minimization oracle).

Same other advantages as FW:I Convergence certificate gt for free.I Affine invariance of the algorithm.

I Sparsity of the iterates.I Universal step size γt := 2

2+t , adaptive step size γt := νC gt.

Main difference with SP:I No line-search.

When constraints set is a “complicated” structured polytopeprojections can be hard whereas LMO might be tractable.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 33: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Advantages of SP-FWSame main property as FW:

Only LMO (linear minimization oracle).

Same other advantages as FW:I Convergence certificate gt for free.I Affine invariance of the algorithm.I Sparsity of the iterates.

I Universal step size γt := 22+t , adaptive step size γt := ν

C gt.Main difference with SP:I No line-search.

When constraints set is a “complicated” structured polytopeprojections can be hard whereas LMO might be tractable.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 34: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Advantages of SP-FWSame main property as FW:

Only LMO (linear minimization oracle).

Same other advantages as FW:I Convergence certificate gt for free.I Affine invariance of the algorithm.I Sparsity of the iterates.I Universal step size γt := 2

2+t , adaptive step size γt := νC gt.

Main difference with SP:I No line-search.

When constraints set is a “complicated” structured polytopeprojections can be hard whereas LMO might be tractable.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 35: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Advantages of SP-FWSame main property as FW:

Only LMO (linear minimization oracle).

Same other advantages as FW:I Convergence certificate gt for free.I Affine invariance of the algorithm.I Sparsity of the iterates.I Universal step size γt := 2

2+t , adaptive step size γt := νC gt.

Main difference with SP:I No line-search.

When constraints set is a “complicated” structured polytopeprojections can be hard whereas LMO might be tractable.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 36: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Advantages of SP-FWSame main property as FW:

Only LMO (linear minimization oracle).

Same other advantages as FW:I Convergence certificate gt for free.I Affine invariance of the algorithm.I Sparsity of the iterates.I Universal step size γt := 2

2+t , adaptive step size γt := νC gt.

Main difference with SP:I No line-search.

When constraints set is a “complicated” structured polytopeprojections can be hard whereas LMO might be tractable.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 37: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Theoretical contributionSP extension of FW with away step:

Convergence: Linear rate with adaptive step size.Sublinear rate with universal step size.

I Similar hypothesis as AFW for linear convergence:1. Strong convexity and smoothness of the function.2. X and Y polytopes.

I Additional assumption on the bilinearity.

L(x,y) = f(x) + x>My − g(y)

‖M‖ smaller than the strong convexity constant.I Proof use recent advances on AFW.I Partially answering a 30 years old conjecture3.

3J. Hammond. “Solving asymmetric variational inequality problems andsystems of equations with generalized nonlinear programming algorithms”.PhD thesis. MIT, 1984.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 38: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Theoretical contributionSP extension of FW with away step:

Convergence: Linear rate with adaptive step size.Sublinear rate with universal step size.

I Similar hypothesis as AFW for linear convergence:1. Strong convexity and smoothness of the function.2. X and Y polytopes.

I Additional assumption on the bilinearity.

L(x,y) = f(x) + x>My − g(y)

‖M‖ smaller than the strong convexity constant.I Proof use recent advances on AFW.I Partially answering a 30 years old conjecture3.

3J. Hammond. “Solving asymmetric variational inequality problems andsystems of equations with generalized nonlinear programming algorithms”.PhD thesis. MIT, 1984.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 39: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Theoretical contributionSP extension of FW with away step:

Convergence: Linear rate with adaptive step size.Sublinear rate with universal step size.

I Similar hypothesis as AFW for linear convergence:1. Strong convexity and smoothness of the function.2. X and Y polytopes.

I Additional assumption on the bilinearity.

L(x,y) = f(x) + x>My − g(y)

‖M‖ smaller than the strong convexity constant.

I Proof use recent advances on AFW.I Partially answering a 30 years old conjecture3.

3J. Hammond. “Solving asymmetric variational inequality problems andsystems of equations with generalized nonlinear programming algorithms”.PhD thesis. MIT, 1984.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 40: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Theoretical contributionSP extension of FW with away step:

Convergence: Linear rate with adaptive step size.Sublinear rate with universal step size.

I Similar hypothesis as AFW for linear convergence:1. Strong convexity and smoothness of the function.2. X and Y polytopes.

I Additional assumption on the bilinearity.

L(x,y) = f(x) + x>My − g(y)

‖M‖ smaller than the strong convexity constant.I Proof use recent advances on AFW.I Partially answering a 30 years old conjecture3.3J. Hammond. “Solving asymmetric variational inequality problems and

systems of equations with generalized nonlinear programming algorithms”.PhD thesis. MIT, 1984.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 41: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Difficulties for saddle point

Usual descent Lemma:

ht+1 ≤ ht − γtgt︸︷︷︸≥0

+γ2t

L‖d(t)‖2

2

With γt small enough the sequence decreases.

For saddle point problem the Lipschitz gradient property gives

Lt+1 − L∗ ≤ Lt − L∗ − γt(g

(x)t − g

(y)t

)︸ ︷︷ ︸

arbitrary sign

+γ2t

L‖d(t)‖2

2 .

I Cannot control the oscillation of the sequence.I Must introduce other quantities to establish convergence.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 42: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Difficulties for saddle point

Usual descent Lemma:

ht+1 ≤ ht − γtgt︸︷︷︸≥0

+γ2t

L‖d(t)‖2

2

With γt small enough the sequence decreases.

For saddle point problem the Lipschitz gradient property gives

Lt+1 − L∗ ≤ Lt − L∗ − γt(g

(x)t − g

(y)t

)︸ ︷︷ ︸

arbitrary sign

+γ2t

L‖d(t)‖2

2 .

I Cannot control the oscillation of the sequence.I Must introduce other quantities to establish convergence.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 43: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Toy experiments

SP-AFW on a toy exampled = 30. with theoreticalstep-size γt = ν

C gt.

Figure: SP-AFW on a toyexample d = 30 with heuristicstep-size. γt = gt

C+2 ‖M‖2D2µ

C = 2LD2

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 44: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Toy experiments

SP-AFW on a toy exampled = 30. with theoreticalstep-size γt = ν

C gt.

Figure: SP-AFW on a toyexample d = 30 with heuristicstep-size. γt = gt

C+2 ‖M‖2D2µ

C = 2LD2

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 45: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Conclusion

I SP-FW one of the first SP solver only working with LMO.

I FW resurgence lead to new structured problems.I Same hope as FW for SP-FW

Call for applications !

I Still many theoretical opened questions.I With a bilinear objective this algorithm is highly related

to the fictitious play algorithm.I Rich interplay tapping into this game theory literature.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 46: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Conclusion

I SP-FW one of the first SP solver only working with LMO.I FW resurgence lead to new structured problems.

I Same hope as FW for SP-FW

Call for applications !

I Still many theoretical opened questions.I With a bilinear objective this algorithm is highly related

to the fictitious play algorithm.I Rich interplay tapping into this game theory literature.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 47: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Conclusion

I SP-FW one of the first SP solver only working with LMO.I FW resurgence lead to new structured problems.I Same hope as FW for SP-FW

Call for applications !

I Still many theoretical opened questions.I With a bilinear objective this algorithm is highly related

to the fictitious play algorithm.I Rich interplay tapping into this game theory literature.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 48: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Conclusion

I SP-FW one of the first SP solver only working with LMO.I FW resurgence lead to new structured problems.I Same hope as FW for SP-FW

Call for applications !

I Still many theoretical opened questions.I With a bilinear objective this algorithm is highly related

to the fictitious play algorithm.I Rich interplay tapping into this game theory literature.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 49: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Conclusion

I SP-FW one of the first SP solver only working with LMO.I FW resurgence lead to new structured problems.I Same hope as FW for SP-FW

Call for applications !

I Still many theoretical opened questions.

I With a bilinear objective this algorithm is highly relatedto the fictitious play algorithm.

I Rich interplay tapping into this game theory literature.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 50: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Conclusion

I SP-FW one of the first SP solver only working with LMO.I FW resurgence lead to new structured problems.I Same hope as FW for SP-FW

Call for applications !

I Still many theoretical opened questions.I With a bilinear objective this algorithm is highly related

to the fictitious play algorithm.

I Rich interplay tapping into this game theory literature.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 51: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Conclusion

I SP-FW one of the first SP solver only working with LMO.I FW resurgence lead to new structured problems.I Same hope as FW for SP-FW

Call for applications !

I Still many theoretical opened questions.I With a bilinear objective this algorithm is highly related

to the fictitious play algorithm.I Rich interplay tapping into this game theory literature.

Gauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016

Page 52: Frank-Wolfe Algorithms for Saddle Point problemsGauthier Gidel Frank-Wolfe Algorithms for SP 10th December 2016 Motivations: gamesandrobustlearning IZero-sum games with two players:

Thank You !

Slides available on www.di.ens.fr/~gidel.