Convergence rate analysis of several splitting schemes1

Damek Davis2

Department of MathematicsUniversity of California, Los Angeles

INFORMS annual meeting 2014

1Joint work with Prof. Wotao Yin (UCLA) (http://arxiv.org/abs/1406.4834)2http://www.math.ucla.edu/˜damek

0 / 14

Background/outline

• Topic: Convergence rates of splitting algorithms

• Convergence rates of these algorithms were unknown for many years.

• Today: I’ll present a simple procedure for convergence rate analysis thatgeneralizes to a wide class of algorithms.

• Outline:• Algorithms• Our Question• Challenges/Techniques

1 / 14

What is a splitting?

• We want to:minimize

x∈Hf (x) + g(x).

• H is a Hilbert space, may be infinite dimensional.• f and g are closed, proper, and convex (not necessarily differentiable).• Focus of all algorithms today

2 / 14

Basic operations in splitting algorithms

• The proximal operator: For all x ∈ H and γ > 0

proxγh(x) : = arg miny∈H

h(y) + 12γ ‖y − x‖2

= x− γ∇̃h(proxγh(x))←− implicit subgradient.

• For all z ∈ H, the vector ∇̃h(z) ∈ ∂h(z) is a subgradient.• prox = Main subproblem in splitting algorithms.• Many functions in machine learning and signal processing have simple or

closed form proximal operators (e.g., `1 and matrix norms, indicatorfunctions, quadratic functions,...).

3 / 14

Major first order algorithms: subgradient form

implicit semi-implicit explicit

• (Sub)gradient method:

zk+1 − zk = −γ∇̃(f + g)(zk).

• Proximal point algorithm (PPA):

zk+1 − zk = −γ∇̃(f + g)(zk+1).

• Forward backward splitting (FBS):

zk+1 − zk = −γ∇̃f (zk+1)− γ∇̃g(zk).

• Douglas Rachford splitting (DRS):

zk+1 − zk = −γ∇̃f (xkf )− γ∇̃g(xk

• =⇒ ‖zk+1 − zk‖ controls size of subgradients!

4 / 14

Major first order algorithms: diagram form

SM: z z+−γ∇̃(f + g)(z)

z z+−γ∇̃(f + g)(z+)

−γ∇̃g(z) −γ∇̃f (z+)

5 / 14

Diagram of DRS

xg = proxγg(z)

z ′ = reflγg(z)

xf = proxγf (z ′)

−γ∇̃g(xg)

−γ∇̃g(xg) −γ∇̃f (xf )

−γ∇̃f (xf )

xf − xg

z+ = 12 z + 1

2reflγf ◦ reflγg(z), where refl := 2prox− I

xf − xg = z+ − z = −γ(∇̃f (xf ) + ∇̃g(xg)).

6 / 14

Our main question

How fast and how slow are splitting algorithms?

• For simplicity, let’s consider objective error and the unconstrained problem

minimizex∈H

f (x) + g(x).

• Let x∗ ∈ H be a minimizer of f + g.• Our goal is to measure

f (xk) + g(xk)− f (x∗)− g(x∗)

for certain natural sequences (xj)j≥0.• Note: This talk is not comprehensive.

• The paper analyzes other algorithms and convergence measures.

7 / 14

Results: spectrum of objective error convergence rates

)SMFBS

)DRSADMM

PPASM+smoothFBS+smoothDRS+smoothADMM+strong

Averaging

• The rates are sharp. (new result)• Counterintuitive result: DRS is nearly as slow as subgradient method...• ...but averaging: (x j)j≥0 7→ ( 1

j+1∑j

i=0 x i)j≥0• Smooths objective value sequence.• Nearly as fast as PPA.

• For DRS, the smooth results only require f OR g to be smooth, not both.(FBS needs g smooth and SM needs f + g smooth.)

8 / 14

Should we always average?• Convergence rates improve when we average.

o(1/√

k + 1)→ O(1/(k + 1)).• Should we always average?

• No. Can ruin sparsity patterns in the solution/prolong convergence• Consider DRS applied to basis pursuit problem

minimizex∈Rd

‖x‖1

subject to: Ax = b

100 101 102 103 104

Iteration k

nonergodicergodic

9 / 14

Challenges of convergence analysis• In splitting algorithms, implicit/explicit subgradients are generated at two

different points• Should make Lipschitz continuity assumption.

• Example: C ⊆ H, f = χC (0 in C , ∞ outside), g = ‖ · ‖2, only natural pointto evaluate is in C .

• The objective does not decrease monotonically• =⇒ The classical approaches to obtain convergence rates fail!

100 101 102 103

Iteration k

nonergodicergodic37.3516/(k+1)

10 / 14

Other forms of monotonicity

• Other quantities decrease monotonically:• z∗ be a fixed point of one of the above algorithms.

‖zk+1 − z∗‖2 ≤ ‖zk − z∗‖2 − ‖zk+1 − zk‖2

‖zk+1 − zk‖2 ≤ ‖zk − zk−1‖2

• The above inequalities are key to the convergence analysis.• Implies that ‖zk+1 − zk‖2 is monotonic and summable! (Important)• true in PPA/FBS/DRS/ADMM/forward-Douglas-Rachford

splitting/Chambolle and Pock’s primal-dual algorithm....

• Recall: ‖zk+1 − zk‖ controls subgradient size

11 / 14

Our techniques: nonsmooth case

Our results follow from three tools.

• A lemma that estimates convergence rates of sequences• Roughly: (aj)j≥0 ⊆ R summable and monotonic =⇒ ak = o(1/(k + 1)).

• A theorem that estimates convergence rates of subgradients in splittingalgorithms

• Recall: ‖zk+1 − zk‖2 is monotonic and summable, and so

‖zk+1 − zk‖ = o( 1√

• =⇒ In DRS:

‖xkf − xk

g ‖ = γ‖∇̃f (xkf ) + ∇̃g(xk

g )‖ = ‖zk+1 − zk‖ = o( 1√

• An inequality that bounds objective values by subgradient norms.• =⇒ nonergodic rate o(1/

√k + 1).

• Ergodic O(1/(k + 1)) rates follows from this inequality + Jensen’sinequality.

12 / 14

Conclusions

• We also analyze ADMM and other splitting algorithms.

• All of the obtained rates are sharp! (new result)

• Applications in the paper: New convergence rates for feasibility,distributed model fitting, linear programming, semidefinite programming,and decentralized ADMM problems.

• In a followup paper, we study these algorithms when f and g are regular(e.g., strongly convex or differentiable).3

• The rates automatically improve without knowledge of Lipschitz constantsor strong convexity modulus.

• e.g., for differentiable f or g o(1/√

k + 1)→ o(1/(k + 1)).

• We also generalized these techniques to prove convergence rates of wideclass of primal-dual algorithms4.

3http://arxiv.org/abs/1407.52104http://arxiv.org/abs/1408.4419

13 / 14

References

• Damek Davis, and Wotao Yin.Convergence rate analysis of several splitting schemes.arXiv:1406.4834 (2014).

• Damek Davis, and Wotao Yin.Faster convergence rates of relaxed Peaceman-Rachford and ADMM underregularity assumptions.arXiv:1407.5210 (2014).

• Damek DavisConvergence rate analysis of primal-dual splitting schemes.arXiv:1408.4419 (2014).

• More: http://www.math.ucla.edu/∼damek

14 / 14

Convergence rate analysis of several splitting schemes1 · Conclusions • We also analyze ADMM and...

Transcript of Convergence rate analysis of several splitting schemes1 · Conclusions • We also analyze ADMM and...

Convergence rate analysis of several splitting schemes1 · Conclusions • We also analyze ADMM and...

Documents

Transcript of Convergence rate analysis of several splitting schemes1 · Conclusions • We also analyze ADMM and...

ADMM-SOFTMAX: AN ADMM APPROACH FORetna.mcs.kent.edu/vol.52.2020/pp214-229.dir/pp214-229.pdf · ETNA Kent State University and JohannRadonInstitute(RICAM) ADMM-SOFTMAX : ADMM FOR MULTINOMIAL

THE FUTURE OF THE ADMM/ADMM-PLUS AND DEFENCE DIPLOMACY IN ... … · Event Report THE FUTURE OF THE ADMM/ADMM-PLUS AND DEFENCE DIPLOMACY IN THE ASIA PACIFIC Report of a roundtable

ADMM-Net for Communication Interference Removal in Stepped ...

Deep ADMM-Net for Compressive Sensing MRIpapers.nips.cc/paper/6406-deep-admm-net-for... · Deep ADMM-Net for Compressive Sensing MRI Yan Yang Xi’an Jiaotong University yangyan92@stu.xjtu.edu.cn

Admm Distr Stats

ROUNDTABLE ON THE FUTURE OF THE ADMM/ADMM-PLUS …the ADMM/ADMM-Plus and Defence Diplomacy in the Asia Pacic” on 17 November 2015 in Singapore. The aim of the roundtable was to take

EDOMEX POR COVID-19”: ADMM

Alternating Direction Method of Multipliers (ADMM ...

Consensus ADMM and Proximal ADMM for Economic Dispatch …power.eng.usf.edu/docs/papers/2016NAPS/NAPS2016_ADMM.pdf · 1 Consensus ADMM and Proximal ADMM for Economic Dispatch and

Alternating Direction Method of Multipliers (ADMM)

An ADMM Based Framework for AutoML Pipeline Conﬁguration

Nonconvex ADMM: Theory and Applications › ~wotaoyin › papers › pdf › Nonconvex... · Last 6 years, • ADMM rediscovered as Split Bregman (Goldstein-Osher’09) • Revived

Lecture on ADMM - 北京国际数学研究中心bicmr.pku.edu.cn/~wenzw/opt2015/lect-admm-part2.pdf · Lecture on ADMM Acknowledgement: this slides is based on Prof. Wotao Yin’s

Generalized symmetric ADMM for separable convex …hozhang/papers/GS-ADMM.pdf · Generalized symmetric ADMM for separable convex… Fig. 1 Stepsize region K of GS-ADMM In the above

Anderson Acceleration for Nonconvex ADMM Based on Douglas ...

Admm Slides

ASEAN Defense Ministers’ Meeting (ADMM) and ADMM Plus: A … · 2014-01-09 · ASEAN Defense Ministers’ Meeting (ADMM) and ADMM Plus: A Japanese Perspective Tomotaka Shoji* ...

Hybrid Functionals and ADMM - CP2K

Rethinking mode splitting, splitting in general, …people.atmos.ucla.edu/alex/ROMS/Honolulu2010Talk.pdfRethinking mode splitting, splitting in general, Boussinesq, non-Boussinesq,

รายละเอียดของรายวิชาed-admm/qa/218731.pdf · 4. นักศึกษามีทักษะในการเขียนโครงงานเพื่อส