Conjugate Gradients I: Setup
CS 205A:Mathematical Methods for Robotics, Vision, and Graphics
Doug James (and Justin Solomon)
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 1 / 1
Time for Gaussian Elimination
A ∈ Rn×n =⇒
O(n3)
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 2 / 1
Time for Gaussian Elimination
A ∈ Rn×n =⇒ O(n3)
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 2 / 1
Common Case
“Easy to apply,hard to invert.”
I Sparse matrices
I Special structure
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 3 / 1
New Philosophy
Iteratively improveapproximation rather than
solve in closed form.
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 4 / 1
For Today
A~x = ~b
I SquareI Symmetric
I Positive definiteCS 205A: Mathematical Methods Conjugate Gradients I: Setup 5 / 1
Variational Viewpoint
A~x = ~b
m
min~x
[1
2~x>A~x−~b>~x + c
]CS 205A: Mathematical Methods Conjugate Gradients I: Setup 6 / 1
Gradient Descent Strategy
1. Compute search direction
~dk ≡ −∇f (~xk−1) = ~b− A~xk−1
2. Do line search to find
~xk ≡ ~xk−1 + αk ~dk.
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 7 / 1
Gradient Descent Strategy
1. Compute search direction
~dk ≡ −∇f (~xk−1) = ~b− A~xk−1
2. Do line search to find
~xk ≡ ~xk−1 + αk ~dk.
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 7 / 1
Gradient Descent Strategy
1. Compute search direction
~dk ≡ −∇f (~xk−1) = ~b− A~xk−1
2. Do line search to find
~xk ≡ ~xk−1 + αk ~dk.
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 7 / 1
Line Search Along ~d from ~x
minαg(α) ≡ f (~x + α~d)
α =~d>(~b− A~x)
~d>A~d
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 8 / 1
Line Search Along ~d from ~x
minαg(α) ≡ f (~x + α~d)
α =~d>(~b− A~x)
~d>A~dCS 205A: Mathematical Methods Conjugate Gradients I: Setup 8 / 1
Gradient Descent withClosed-Form Line Search
~dk = ~b− A~xk−1
αk =~d>k~dk
~d>kA~dk
~xk = ~xk−1 + αk~dkCS 205A: Mathematical Methods Conjugate Gradients I: Setup 9 / 1
Convergence
See book.
f(~xk)− f(~x∗)f(~xk−1)− f(~x∗)
≤ 1− 1
condA
Conclusions:
I Conditioning affects speed and quality
I Unconditional convergence (condA ≥ 1)
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 10 / 1
Can We Do Better?
I Can iterate forever:Should stop after O(n) iterations!
I Lots of repeated workwhen poorly conditioned
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 12 / 1
Observation
f (~x) =1
2(~x− ~x∗)>A(~x− ~x∗) + const.
A = LL>
=⇒ f (~x) =1
2‖L>(~x− ~x∗)‖22 + const.
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 13 / 1
Observation
f (~x) =1
2(~x− ~x∗)>A(~x− ~x∗) + const.
A = LL>
=⇒ f (~x) =1
2‖L>(~x− ~x∗)‖22 + const.
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 13 / 1
Observation
f (~x) =1
2(~x− ~x∗)>A(~x− ~x∗) + const.
A = LL>
=⇒ f (~x) =1
2‖L>(~x− ~x∗)‖22 + const.
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 13 / 1
Substitution
~y ≡ L>~x, ~y∗ ≡ L>~x∗
=⇒ f̄ (~y) = ‖~y − ~y∗‖22
Proposition
Suppose {~w1, . . . , ~wn} are orthogonal in Rn.
Then, f̄ is minimized in at most n steps by line
searching in direction ~w1, then direction ~w2, and
so on.
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 14 / 1
Substitution
~y ≡ L>~x, ~y∗ ≡ L>~x∗
=⇒ f̄ (~y) = ‖~y − ~y∗‖22
Proposition
Suppose {~w1, . . . , ~wn} are orthogonal in Rn.
Then, f̄ is minimized in at most n steps by line
searching in direction ~w1, then direction ~w2, and
so on.
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 14 / 1
Visualization
Any two orthogonal directions suffice!
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 15 / 1
Undoing Change of Coordinates
Line search on f̄ along ~w is the same asline search on f along (L>)−1 ~w.
0 = ~wi · ~wj = (L>~vi)>(L>~vj)
= ~v>i (LL>)~vj = ~v>i A~vj
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 16 / 1
Undoing Change of Coordinates
Line search on f̄ along ~w is the same asline search on f along (L>)−1 ~w.
0 = ~wi · ~wj = (L>~vi)>(L>~vj)
= ~v>i (LL>)~vj = ~v>i A~vj
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 16 / 1
Conjugate Directions
A-Conjugate VectorsTwo vectors ~v and ~w are A-conjugate if
~v>A~w = 0.
Corollary
Suppose {~v1, . . . , ~vn} are A-conjugate. Then, f
is minimized in at most n steps by line searching
in direction ~v1, then direction ~v2, and so on.
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 17 / 1
Conjugate Directions
A-Conjugate VectorsTwo vectors ~v and ~w are A-conjugate if
~v>A~w = 0.
Corollary
Suppose {~v1, . . . , ~vn} are A-conjugate. Then, f
is minimized in at most n steps by line searching
in direction ~v1, then direction ~v2, and so on.
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 17 / 1
High-Level Ideas So Far
I Steepest descent may not be fastest descent
(surprising!)
I Two inner products:
~v · ~w〈~v, ~w〉A ≡ ~v>A~w
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 18 / 1
High-Level Ideas So Far
I Steepest descent may not be fastest descent
(surprising!)
I Two inner products:
~v · ~w〈~v, ~w〉A ≡ ~v>A~w
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 18 / 1
New Problem
Find nA-conjugate directions.
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 19 / 1
Gram-Schmidt?
I Potentially unstable
I Storage increases witheach iteration
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 20 / 1
Gram-Schmidt?
I Potentially unstable
I Storage increases witheach iteration
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 20 / 1
Gram-Schmidt?
I Potentially unstable
I Storage increases witheach iteration
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 20 / 1
Another Clue
~rk ≡ ~b− A~xk
~rk+1 = ~rk − αk+1A~vk+1
PropositionWhen performing gradient descent on f ,
span {~r0, . . . , ~rk} = span {~r0, A~r0, . . . , Ak~r0}.
Krylov space?!
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 21 / 1
Another Clue
~rk ≡ ~b− A~xk~rk+1 = ~rk − αk+1A~vk+1
PropositionWhen performing gradient descent on f ,
span {~r0, . . . , ~rk} = span {~r0, A~r0, . . . , Ak~r0}.
Krylov space?!
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 21 / 1
Another Clue
~rk ≡ ~b− A~xk~rk+1 = ~rk − αk+1A~vk+1
PropositionWhen performing gradient descent on f ,
span {~r0, . . . , ~rk} = span {~r0, A~r0, . . . , Ak~r0}.
Krylov space?!
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 21 / 1
Another Clue
~rk ≡ ~b− A~xk~rk+1 = ~rk − αk+1A~vk+1
PropositionWhen performing gradient descent on f ,
span {~r0, . . . , ~rk} = span {~r0, A~r0, . . . , Ak~r0}.
Krylov space?!
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 21 / 1
Gradient Descent: Issue
~xk − ~x0 6= arg min~v∈span {~r0,A~r0,...,Ak−1~r0}
f (~x0 + ~v)
But if this did hold...Convergence in n steps!
Next
CS 205A: Mathematical Methods Conjugate Gradients I: Setup 22 / 1
Top Related