COM S 672: Advanced Topics in Computational Models of...
Transcript of COM S 672: Advanced Topics in Computational Models of...
![Page 1: COM S 672: Advanced Topics in Computational Models of ...web.cs.iastate.edu/~jialiu/teaching/COMS672_F17/Lectures/LN9_b... · COM S 672: Advanced Topics in Computational Models of](https://reader031.fdocuments.net/reader031/viewer/2022030411/5a9de65f7f8b9a0d5a8da867/html5/thumbnails/1.jpg)
COM S 672: Advanced Topics in Computational
Models of Learning – Optimization for Learning
Lecture Note 9: Higher-Order Methods – II
Jia (Kevin) Liu
Assistant Professor
Department of Computer Science
Iowa State University, Ames, Iowa, USA
Fall 2017
JKL (CS@ISU) COM S 672: Lecture 9 1 / 26
![Page 2: COM S 672: Advanced Topics in Computational Models of ...web.cs.iastate.edu/~jialiu/teaching/COMS672_F17/Lectures/LN9_b... · COM S 672: Advanced Topics in Computational Models of](https://reader031.fdocuments.net/reader031/viewer/2022030411/5a9de65f7f8b9a0d5a8da867/html5/thumbnails/2.jpg)
Outline
In this lecture:
Quasi-Newton methods
Interior-point methods
JKL (CS@ISU) COM S 672: Lecture 9 2 / 26
![Page 3: COM S 672: Advanced Topics in Computational Models of ...web.cs.iastate.edu/~jialiu/teaching/COMS672_F17/Lectures/LN9_b... · COM S 672: Advanced Topics in Computational Models of](https://reader031.fdocuments.net/reader031/viewer/2022030411/5a9de65f7f8b9a0d5a8da867/html5/thumbnails/3.jpg)
Quasi-Newton Theory
Key idea: Maintains an approximation to the Hessian that’s filled in usinginformation gained on successive steps and generate H-conjugate directions
Suppose f(x) = c>x+ 1
2x>Hx, where H ⌫ 0
Define pk = xk+1 � xk and qk = rf(xk+1)�rf(xk). Note that
Hpk = H(xk+1 � xk) = (c+Hxk+1)� (c+Hxk) = qk
Construct an estimate Bk for H satisfying: Bkpk = qk, 8k thus far. Thus:
H�1
Bkpj = H�1
qj = pj
This implies: (H�1Bk)pj = pj , 8j = 1, . . . , k � 1, i.e., p1, . . . ,pk�1 are
eigenvectors of (H�1Bk) with unit eigenvectors
Hence, (HBn+1)pk = pk, 8k = 1, . . . , n
JKL (CS@ISU) COM S 672: Lecture 9 3 / 26
[ BSS.ch 8.8 ]
- -
c-Hessian necc . condition )
§ g.Jzt :
- ikt
![Page 4: COM S 672: Advanced Topics in Computational Models of ...web.cs.iastate.edu/~jialiu/teaching/COMS672_F17/Lectures/LN9_b... · COM S 672: Advanced Topics in Computational Models of](https://reader031.fdocuments.net/reader031/viewer/2022030411/5a9de65f7f8b9a0d5a8da867/html5/thumbnails/4.jpg)
Quasi-Newton Theory
Suppose that p1, . . . ,pn are linear independent
Denote P = [p1 p2 · · · pn] 2 Rn⇥n. Then we have
(H�1Bn+1)P = P
which implies:
H�1
Bn+1 = I, i.e., Bn+1 = H
Thus, the goal of Quasi-Newton methods is to find a sequence {Bk} ofapproximate Hessian to satisfy that, for all k,
Bkpj = qj , 8j = 1, . . . , k � 1,
which is term quasi-Newton equation or secant equation
Once Bk is determined, find dk that satisfies Bkdk = �rf(xk). It can beshown that the generated d1, . . . ,dn are H-conjugate
JKL (CS@ISU) COM S 672: Lecture 9 4 / 26
¥7tp÷=t"est
.
⇐.- - '
I} Enfifth.
Hk tktl-
exactly recover it after n steps .
qd,
⇒dk=
- Irtftek)
![Page 5: COM S 672: Advanced Topics in Computational Models of ...web.cs.iastate.edu/~jialiu/teaching/COMS672_F17/Lectures/LN9_b... · COM S 672: Advanced Topics in Computational Models of](https://reader031.fdocuments.net/reader031/viewer/2022030411/5a9de65f7f8b9a0d5a8da867/html5/thumbnails/5.jpg)
Quasi-Newton Theory
From secant equations, designing Quasi-Newton methods boils down to:
I Given some Bk ⌫ 0 such that Bkpj = qj , 8j = 1, . . . , k � 1
I Want to find a Bk+1 ⌫ 0, such that Bk+1pj = qj , 8j = 1, . . . , k
Key Idea: Try Bk+1 = Bk +Ck for some correction matrix Ck
I This implies: Bkpj +Ckpj = qj , 8j = 1, . . . , k, i.e.,
Ckpj = 0, for j = 1, . . . , k � 1
Ckpk = qk �Bkpk
These two equations give rise to a variety of Quasi-Newton methods:
I Broyden family (Broyden-Fletcher-Goldfarb-Shanno (BFGS) update)
I Davidon-Fletcher-Powell Method (dual construct of Broyden family)
I See [BSS Ch. 8.8] for an excellent treatment on Quasi-Newton theory
JKL (CS@ISU) COM S 672: Lecture 9 5 / 26
![Page 6: COM S 672: Advanced Topics in Computational Models of ...web.cs.iastate.edu/~jialiu/teaching/COMS672_F17/Lectures/LN9_b... · COM S 672: Advanced Topics in Computational Models of](https://reader031.fdocuments.net/reader031/viewer/2022030411/5a9de65f7f8b9a0d5a8da867/html5/thumbnails/6.jpg)
Broyden-Fletcher-Goldfarb-Shanno (BFGS) Update
Try the following correction matrix CBFGSk :
CBFGSk =
qkq>k
q>k pk
�Bkpkp
>k Bk
p>k Bkpk
Obtained independently by Broyden, Fletcher, Goldfarb, and Shanno in theyear of 1970, hence the name BFGS
Highly successful due to its e�ciency & robustness, implemented in manynumerical optimizers (e.g., MATLAB, R, GNU C regression libraries...)
JKL (CS@ISU) COM S 672: Lecture 9 6 / 26
![Page 7: COM S 672: Advanced Topics in Computational Models of ...web.cs.iastate.edu/~jialiu/teaching/COMS672_F17/Lectures/LN9_b... · COM S 672: Advanced Topics in Computational Models of](https://reader031.fdocuments.net/reader031/viewer/2022030411/5a9de65f7f8b9a0d5a8da867/html5/thumbnails/7.jpg)
Implementing BFGS in Practice
Having found Bk+1, find dk+1 by solving Bk+1dk+1 = �rf(xk+1), i.e.,
dk+1 = �B�1k+1rf(xk+1)
Often more convenient to update inverse series {Dk} , {B�1k } directly:
I Let D1 = B�11 = I.
I In iteration k, given Dk, compute Dk+1 as follows:
Dk+1 = [Bk+1]�1 = [Bk +CBFGS
k ]�1
= [Bk + a1b>1 + a2b
>2 ]
�1, (1)
where a1=qk/(q>k pk), b1=qk, a2=�(Bkpk)/(p
>k Bkpk), and b2=Bkpk
I Eq. (1) shows that Bk+1 can be obtained from Bk with a rank-two update
JKL (CS@ISU) COM S 672: Lecture 9 7 / 26
![Page 8: COM S 672: Advanced Topics in Computational Models of ...web.cs.iastate.edu/~jialiu/teaching/COMS672_F17/Lectures/LN9_b... · COM S 672: Advanced Topics in Computational Models of](https://reader031.fdocuments.net/reader031/viewer/2022030411/5a9de65f7f8b9a0d5a8da867/html5/thumbnails/8.jpg)
Implementing BFGS in Practice
Therefore, Dk+1 can be computed by using two sequential applications ofSherman-Morrison-Woodbury (SMW) matrix inverse formula:
[A+ ab>]�1 = A
�1�
A�1
ab>A
�1
1 + b�1A�1a
Note: In general, SMW inverse formula is advantageous to use when A�1 is
known of cheap to compute (e.g., diagonal, sparse, structured, etc.)
As a result, we obtain the following BFGS update for the sequence {Dk}:
Dk+1 = Dk +pkp
>k
p>k qk
✓1 +
q>k Dkqk
p>k qk
◆�
Dkqkp>k + pkq
>k Dk
p>k qk| {z }
,C̄BFGSk
Can prove superlinear local convergence for BFGS (and other Quasi-Newtonmethods): kxk+1 � x
⇤k/kxk � x
⇤k ! 0. Not as fast as Newton, but fast!
JKL (CS@ISU) COM S 672: Lecture 9 8 / 26
s A. tut .
( k )Generalized SMW :(lA=tuE¥5" = # - AI'u=(Et+¥Atu5' IAI '
,A.ernxn.u.tk#e.v=elRM.EeRh
"
![Page 9: COM S 672: Advanced Topics in Computational Models of ...web.cs.iastate.edu/~jialiu/teaching/COMS672_F17/Lectures/LN9_b... · COM S 672: Advanced Topics in Computational Models of](https://reader031.fdocuments.net/reader031/viewer/2022030411/5a9de65f7f8b9a0d5a8da867/html5/thumbnails/9.jpg)
L-BFGS
In BFGS (and other quasi-Newton methods), we need n⇥ n storage space tocompute approximate Hessian Bk (or approximate inverse Dk)
Still expensive when n is large. Enter the limited memory BFGS (L-BFGS)!
L-BFGS doesn’t store Bk or Dk. Rather, it only keeps track of pk and qk
from the last few iterations (say 5 to 10), and reconstruct matrices as needed
I Take an initial B0 or D0 and assume m steps have been taken since
I Compute Bkpk via a series of inner and outer products with matrices frompk�j and qk�j from last m iterations: j = 1, . . . ,m� 1
Attractive for problems when n is large (typical in machine learningproblems). Require 2mn storage and O(mn) linear algebra operations, pluscost of function and gradient evaluations, and line search
No superlinear convergence proof, but good behavior has been observed inmany applications (see [Liu & Nocedal, ’89], [Nocedal & Wright, Chap. 7.2])
JKL (CS@ISU) COM S 672: Lecture 9 9 / 26
![Page 10: COM S 672: Advanced Topics in Computational Models of ...web.cs.iastate.edu/~jialiu/teaching/COMS672_F17/Lectures/LN9_b... · COM S 672: Advanced Topics in Computational Models of](https://reader031.fdocuments.net/reader031/viewer/2022030411/5a9de65f7f8b9a0d5a8da867/html5/thumbnails/10.jpg)
Interior-Point Methods
Consider the following constrained minimization problem:
Minimize f(x)
subject to gi(x) 0, i = 1, . . . ,m
Ax = b
where:
I f and gi are convex, twice continuously di↵erentiable
I A 2 Rp⇥n with rank(A) = p
I Assume that p⇤. is finite and attainable
I Assume that the problem is strictly feasible (Slater’s condition), i.e., 9x̃ with
x̃ 2 dom{f}, gi(x̃) < 0, i = 1, . . . ,m, Ax̃ = b,
hence strong duality holds and dual optimum is attainable
JKL (CS@ISU) COM S 672: Lecture 9 10 / 26
•
![Page 11: COM S 672: Advanced Topics in Computational Models of ...web.cs.iastate.edu/~jialiu/teaching/COMS672_F17/Lectures/LN9_b... · COM S 672: Advanced Topics in Computational Models of](https://reader031.fdocuments.net/reader031/viewer/2022030411/5a9de65f7f8b9a0d5a8da867/html5/thumbnails/11.jpg)
Logarithmic Barrier Function
Reformulate the problem via indicator function:
Minimize f(x) +mX
i=1
�(gi(x))
subject to Ax = b,
where �(u) = 0 if u 0, �(u) = 1 otherwise (indicator function of R�)
Consider the approximation through logarithmic barrier
Minimize f(x)�⇣ 1
µ
⌘ mX
i=1
log(�gi(x))
subject to Ax = b
where µ > 0 is a parameter
JKL (CS@ISU) COM S 672: Lecture 9 11 / 26
¥
![Page 12: COM S 672: Advanced Topics in Computational Models of ...web.cs.iastate.edu/~jialiu/teaching/COMS672_F17/Lectures/LN9_b... · COM S 672: Advanced Topics in Computational Models of](https://reader031.fdocuments.net/reader031/viewer/2022030411/5a9de65f7f8b9a0d5a8da867/html5/thumbnails/12.jpg)
The Log Barrier Approximate Problem
An equality constrained problem
For µ > 0, �(1/µ) log(�u) is a smooth approximation of �(·)
Approximation gets better as µ ! 1
JKL (CS@ISU) COM S 672: Lecture 9 12 / 26
![Page 13: COM S 672: Advanced Topics in Computational Models of ...web.cs.iastate.edu/~jialiu/teaching/COMS672_F17/Lectures/LN9_b... · COM S 672: Advanced Topics in Computational Models of](https://reader031.fdocuments.net/reader031/viewer/2022030411/5a9de65f7f8b9a0d5a8da867/html5/thumbnails/13.jpg)
Properties of Log Barrier Function
�(x) = �
mX
i=1
log(�gi(x)), dom{�} = {x|gi(x) < 0, i = 1, . . . ,m}
Convex (following composition rules of convexity)
Twice continuously di↵erentiable with derivatives:
r�(x) = �
mX
i=1
1
gi(x)rgi(x)
H�(x) =mX
i=1
1
gi(x)2rgi(x)gi(x)
>�
mX
i=1
1
gi(x)Hgi(x)
JKL (CS@ISU) COM S 672: Lecture 9 13 / 26
Y
![Page 14: COM S 672: Advanced Topics in Computational Models of ...web.cs.iastate.edu/~jialiu/teaching/COMS672_F17/Lectures/LN9_b... · COM S 672: Advanced Topics in Computational Models of](https://reader031.fdocuments.net/reader031/viewer/2022030411/5a9de65f7f8b9a0d5a8da867/html5/thumbnails/14.jpg)
Central Path
For µ > 0, define x⇤(µ) as the solution of
Minimize µf(x) + �(x)
subject to Ax = b,
Assume that x⇤(µ) exists and is unique for all µ > 0
Central path is defined as {x⇤(µ)|µ > 0}
Example: Central path for an LP
JKL (CS@ISU) COM S 672: Lecture 9 14 / 26
our
.ee#Ieeyminer ← '
is:
... -
ay- C- YX,
''
l'
'
sit Eirebi .Vi
, , p ,\ ;
' II.i' !
nyperpbneetn . Emmis¥¥tgEyM1i€tangential to the level carnyaadw
" '
it.
.¥==÷¥'
of 9 through 'z*4w) \ "i
,
![Page 15: COM S 672: Advanced Topics in Computational Models of ...web.cs.iastate.edu/~jialiu/teaching/COMS672_F17/Lectures/LN9_b... · COM S 672: Advanced Topics in Computational Models of](https://reader031.fdocuments.net/reader031/viewer/2022030411/5a9de65f7f8b9a0d5a8da867/html5/thumbnails/15.jpg)
Dual Points on Central Path
For x = x⇤(µ), if there exists a w such that
µrf(x)�mX
i=1
1
gi(x)rgi(x) +A
>w = 0, Ax = b
Then, x⇤(µ) minimizes the Lagrangian
L(x,u⇤(µ),v⇤(µ)) = f(x) +mX
i=1
u⇤i (µ)gi(x) + v
⇤(µ)>(Ax� b),
where u⇤i (µ) , 1/(�µgi(x⇤(µ))) and v
⇤(µ) , w/µ
This confirms the intuitive idea that f(x⇤(µ)) ! p⇤ as µ ! 1 since:
p⇤� ⇥(u⇤(µ),v⇤(µ)) = L(x⇤(µ),u⇤(µ),v⇤(µ)) = f(x⇤(µ))�m/µ,
which implies f(x⇤(µ))� p⇤
mµ # 0 as µ ! 1
JKL (CS@ISU) COM S 672: Lecture 9 15 / 26
r¥**÷
![Page 16: COM S 672: Advanced Topics in Computational Models of ...web.cs.iastate.edu/~jialiu/teaching/COMS672_F17/Lectures/LN9_b... · COM S 672: Advanced Topics in Computational Models of](https://reader031.fdocuments.net/reader031/viewer/2022030411/5a9de65f7f8b9a0d5a8da867/html5/thumbnails/16.jpg)
Interpretation as Perturbed KKT System
The primal-dual solutions x = x⇤(µ), u = u
⇤(µ), and v = v⇤(µ) satisfy:
(ST): rf(x) +Pm
i=1 uirgi(x) +A>v = 0
( 1µ -CS): uigi(x) = �1µ , i = 1, . . . ,m
(PF): gi(x) 0, i = 1, . . . ,m, Ax = b
(DF): u � 0, v unconstrained
That is, the di↵erence between KKT is that ( 1µ -CS) replaces (CS): uigi(x) = 0
JKL (CS@ISU) COM S 672: Lecture 9 16 / 26
![Page 17: COM S 672: Advanced Topics in Computational Models of ...web.cs.iastate.edu/~jialiu/teaching/COMS672_F17/Lectures/LN9_b... · COM S 672: Advanced Topics in Computational Models of](https://reader031.fdocuments.net/reader031/viewer/2022030411/5a9de65f7f8b9a0d5a8da867/html5/thumbnails/17.jpg)
Force Field Interpretation
Consider the following “centering” problem (without equality constraints)
Minimize µf(x)�mX
i=1
log(�gi(x))
It admits the following force field interpretation:
µf(x) is potential of force field F0(x) = �µrf(x)
� log(�gi(x)) is potential of force field Fi(x) = (1/gi(x))rgi(x)
The forces balance at x⇤(µ):
F0(x⇤(µ)) +
mX
i=1
Fi(x⇤(µ)) = 0
JKL (CS@ISU) COM S 672: Lecture 9 17 / 26
![Page 18: COM S 672: Advanced Topics in Computational Models of ...web.cs.iastate.edu/~jialiu/teaching/COMS672_F17/Lectures/LN9_b... · COM S 672: Advanced Topics in Computational Models of](https://reader031.fdocuments.net/reader031/viewer/2022030411/5a9de65f7f8b9a0d5a8da867/html5/thumbnails/18.jpg)
Force Field Interpretation
Example: Minimize c>x
subject to a>i x bi, i = 1, . . . ,m
Objective force field is a constant: F0(x) = �µc
Constraint force field decays as inverse distance to constraint hyperplane:
Fi(x) = �ai
bi � a>i x
, kFi(x)k2 =1
dist(x,Hi)
where Hi = {x|a>i x = bi}
JKL (CS@ISU) COM S 672: Lecture 9 18 / 26
µ= 1 µ⇒ .
![Page 19: COM S 672: Advanced Topics in Computational Models of ...web.cs.iastate.edu/~jialiu/teaching/COMS672_F17/Lectures/LN9_b... · COM S 672: Advanced Topics in Computational Models of](https://reader031.fdocuments.net/reader031/viewer/2022030411/5a9de65f7f8b9a0d5a8da867/html5/thumbnails/19.jpg)
The Barrier Method
1 Initialization: A strictly feasible x (interior point), µ = µ0 > 0, � > 1,tolerance ✏ > 0.
2 Centering step: Compute x⇤(µ) by minimizing µf + �, subject to Ax = b.
Update x = x⇤(µ).
3 Stop if mµ < ✏. Otherwise, let µ = �µ and go to Step 2.
Remarks:
Terminates with f(x)� p⇤ ✏ (following from f(x⇤(µ))� p
⇤
mµ )
Centering usually done using Newton’s method, starting at current x
Choice of � involves a trade-o↵: large � means fewer outer iterations, moreinner (Newton) iterations; typical values: � 2 [10, 20]
As µ gets larger (nearer the optimal solution), it’s getting harder and harderfor Newton’s method to converge (due to ill condition with large µ)
JKL (CS@ISU) COM S 672: Lecture 9 19 / 26
It's not nrecc . to solve x.*( µ ) accurately
![Page 20: COM S 672: Advanced Topics in Computational Models of ...web.cs.iastate.edu/~jialiu/teaching/COMS672_F17/Lectures/LN9_b... · COM S 672: Advanced Topics in Computational Models of](https://reader031.fdocuments.net/reader031/viewer/2022030411/5a9de65f7f8b9a0d5a8da867/html5/thumbnails/20.jpg)
Convergence Analysis
Number of outer (centering) iterations:
⇠log(m/(✏µ0))
log �
⇡
plus the initial centering step (to compute x⇤(µ0))
Convergence of the centering problem
Minimize µf(x) + �(x)
follows the convergence analysis of Newton’s method:
I µf + � must have closed sublevel sets for µ > µ0
I Classical analysis requires strong convexity, Lipschitz condition
I Analysis via self-concordance requires self-concordance of µf + �
JKL (CS@ISU) COM S 672: Lecture 9 20 / 26
![Page 21: COM S 672: Advanced Topics in Computational Models of ...web.cs.iastate.edu/~jialiu/teaching/COMS672_F17/Lectures/LN9_b... · COM S 672: Advanced Topics in Computational Models of](https://reader031.fdocuments.net/reader031/viewer/2022030411/5a9de65f7f8b9a0d5a8da867/html5/thumbnails/21.jpg)
Feasibility and Phase I Methods
Feasibility problem: Find x such that
gi(x) 0, i = 1, . . . ,m, Ax = b (2)
Phase I: Computes strictly feasible starting point for barrier method
Minimizex,s
s
subject to gi(x) s, i = 1, . . . ,m (3)
Ax = b
I If x, s feasible, with s < 0, then x is strictly feasible for (2)
I If optimal value p̄⇤ of (3) is positive, then (2) is infeasible
I if p̄⇤ = 0 and attained, then problem (2) is feasible (but not strictly)
I if p̄⇤ = 0 and not attained, then problem (2) is infeasible
JKL (CS@ISU) COM S 672: Lecture 9 21 / 26
![Page 22: COM S 672: Advanced Topics in Computational Models of ...web.cs.iastate.edu/~jialiu/teaching/COMS672_F17/Lectures/LN9_b... · COM S 672: Advanced Topics in Computational Models of](https://reader031.fdocuments.net/reader031/viewer/2022030411/5a9de65f7f8b9a0d5a8da867/html5/thumbnails/22.jpg)
Primal-Dual Interior-Point Methods
Primal-dual interior-point methods are another class of interior-pointmethods powerful for linear and convex quadratic programming
Consider the following linear constrained quadratic programming problem:
Minimize c>x+
1
2x>Qx
subject to Ax = b, x � 0
where Q is symmetric PSD (LP is a special case with Q = 0)
KKT conditions are that there exist u and v such that:
Qx+ c�A>u� v = 0, Ax = b, (x,v) � 0, xivi = 0, i = 1, . . . , n
Defining:
X , Diag(x1, . . . , xn), S , Diag(s1, . . . , sn),
so we can rewrite the last condition as XSe = 0, where e = [1, 1, . . . , 1]>
JKL (CS@ISU) COM S 672: Lecture 9 22 / 26
Hs )
i i
CCS )
![Page 23: COM S 672: Advanced Topics in Computational Models of ...web.cs.iastate.edu/~jialiu/teaching/COMS672_F17/Lectures/LN9_b... · COM S 672: Advanced Topics in Computational Models of](https://reader031.fdocuments.net/reader031/viewer/2022030411/5a9de65f7f8b9a0d5a8da867/html5/thumbnails/23.jpg)
Primal-Dual Interior-Point Methods
Thus, KKT conditions can be rewritten as a square system of constrained,nonlinear equations:
2
4Qx+ c�A
>u� v
Ax� b
XSe
3
5 = 0, (x,v) = 0
Primal-dual interior-point methods generate iterates (xk,uk,vk) with:
I (xk,vk) > 0 (i.e., interior)
I Each step (�xk,�uk,�sk) is a Newton step on a perturbed version of theequations (the perturbation eventually goes to zero)
I Use step-size sk to maintain (xk+1, sk+1) > 0. Set
(xk+1,uk+1, sk+1) = (xk,uk, sk) + sk(�xk,�uk,�sk)
JKL (CS@ISU) COM S 672: Lecture 9 23 / 26
< s
![Page 24: COM S 672: Advanced Topics in Computational Models of ...web.cs.iastate.edu/~jialiu/teaching/COMS672_F17/Lectures/LN9_b... · COM S 672: Advanced Topics in Computational Models of](https://reader031.fdocuments.net/reader031/viewer/2022030411/5a9de65f7f8b9a0d5a8da867/html5/thumbnails/24.jpg)
Primal-Dual Interior-Point Methods
The perturbed Newton step is a linear system:
2
4Q �A
>�I
A 0 0
S 0 0
3
5
2
4�xk
�uk
�vk
3
5 =
2
64r(x)k
r(u)k
r(v)k
3
75
where r(x)k = �(Qxk + c�A
>uk � vk)
r(u)k = �(Axk � b)
r(v)k = �XkSke+ �k�ke
Here, r(x)k , r(u)k , r
(v)k are current residuals, �k = (x>
k vk)/n is the currentduality gap, and �k 2 (0, 1] is a centering parameter
A lot of structure in the system that can be exploited for algorithm design.More e�cient than barrier method if high accuracy is needed
See [Wright, ’97] for a description of primal-dual interior-point method
JKL (CS@ISU) COM S 672: Lecture 9 24 / 26
![Page 25: COM S 672: Advanced Topics in Computational Models of ...web.cs.iastate.edu/~jialiu/teaching/COMS672_F17/Lectures/LN9_b... · COM S 672: Advanced Topics in Computational Models of](https://reader031.fdocuments.net/reader031/viewer/2022030411/5a9de65f7f8b9a0d5a8da867/html5/thumbnails/25.jpg)
Interior-Point Methods for Learning Problems
Interior-point methods were used early for compressed sensing, regularized leastsquares, SVM:
SVM with hinge loss formulated as QP, solved with primal-dual interior-pointmethod (e.g., [Gertz & Wright, ’03], [Fine & Scheinberg, 01], [Ferris &Munson, ’02]
Compressed sensing & LASSO variable selection formulated as bound-constrained QP and solved by primal-dual; or SOCP solved by barrier (e.g.,[Cades & Romberg, ’05])
However, they were mostly superseded by first-order methods due to increasinglylarge size of machine learning problems
Stochastic gradient descent (low accuracy, simple data access)
Gradient projection with sparsity regularization and prox-gradient incompressed sensing (require only matrix-vector multiplications)
Perhaps just a few clever ideas away to revive interior-point methods?
JKL (CS@ISU) COM S 672: Lecture 9 25 / 26
![Page 26: COM S 672: Advanced Topics in Computational Models of ...web.cs.iastate.edu/~jialiu/teaching/COMS672_F17/Lectures/LN9_b... · COM S 672: Advanced Topics in Computational Models of](https://reader031.fdocuments.net/reader031/viewer/2022030411/5a9de65f7f8b9a0d5a8da867/html5/thumbnails/26.jpg)
Next Class
Sparse/Regularized Optimization
JKL (CS@ISU) COM S 672: Lecture 9 26 / 26
![Page 27: COM S 672: Advanced Topics in Computational Models of ...web.cs.iastate.edu/~jialiu/teaching/COMS672_F17/Lectures/LN9_b... · COM S 672: Advanced Topics in Computational Models of](https://reader031.fdocuments.net/reader031/viewer/2022030411/5a9de65f7f8b9a0d5a8da867/html5/thumbnails/27.jpg)
Check BFGS :
C=k= -9k¥ l Ekfk )l¥kFr5'
ftp.T.EE ( rank -2 update )
WTS : Ekfj =Q for JH ,- - iky
Ekfk=Fk -
Be kfk
check :
.pe#IyMHrrtYEIfhphpD=qk-Brpr✓
Eke-Epittitotjn atI¥ '
=P
= teeth-threat
'tFEFR FIEKFR
#Fj
=
.EE?EEE#*= - KEEFE =e . ✓
FEERFN-
BFGS Update is derived by solving the
following optimization problem :
"
minimal changes"
.
minHetI±±⇐""Tf"t.ge?grkgrjapmYjnnp#ei. .|st'
lE¥.EE#i9kaeonewituyewestasmmptionlsmpweitg
- should be selected.
![Page 28: COM S 672: Advanced Topics in Computational Models of ...web.cs.iastate.edu/~jialiu/teaching/COMS672_F17/Lectures/LN9_b... · COM S 672: Advanced Topics in Computational Models of](https://reader031.fdocuments.net/reader031/viewer/2022030411/5a9de65f7f8b9a0d5a8da867/html5/thumbnails/28.jpg)
Barrierpmonm method / Path following method :
[ Fiacco ,Macormrk ' 697 sequential unconstr
. minimisation techniqueI ( Saint ) .
a
karmarlea '
84 at Bell Labs .
"
Interior pt .method for LP
"
.
khachiyan '72
"
Ellipsoid Method' '
.
Nesterov & Nemiroski : special class of barriers ( self - concordant ) to encode
any= convex sets =) # of iterations bounded by a polynomial in both the
dim. of the problem and the accuracy ,