of 38

• date post

11-Jun-2020
• Category

## Documents

• view

0

0

Embed Size (px)

### Transcript of Conjugate Gradients I: Setup - Computer Conjugate Directions A-Conjugate Vectors Two vectors ~vand...

CS 205A: Mathematical Methods for Robotics, Vision, and Graphics

Doug James (and Justin Solomon)

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 1 / 1

• Time for Gaussian Elimination

A ∈ Rn×n =⇒

O(n3)

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 2 / 1

• Time for Gaussian Elimination

A ∈ Rn×n =⇒ O(n3)

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 2 / 1

• Common Case

“Easy to apply, hard to invert.”

I Sparse matrices

I Special structure

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 3 / 1

• New Philosophy

Iteratively improve approximation rather than

solve in closed form.

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 4 / 1

• For Today

A~x = ~b

I Square I Symmetric

I Positive definite CS 205A: Mathematical Methods Conjugate Gradients I: Setup 5 / 1

• Variational Viewpoint

A~x = ~b

m

min ~x

[ 1

2 ~x>A~x−~b>~x + c

] CS 205A: Mathematical Methods Conjugate Gradients I: Setup 6 / 1

1. Compute search direction

~dk ≡ −∇f (~xk−1) = ~b− A~xk−1

2. Do line search to find

~xk ≡ ~xk−1 + αk ~dk.

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 7 / 1

1. Compute search direction

~dk ≡ −∇f (~xk−1) = ~b− A~xk−1

2. Do line search to find

~xk ≡ ~xk−1 + αk ~dk.

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 7 / 1

1. Compute search direction

~dk ≡ −∇f (~xk−1) = ~b− A~xk−1

2. Do line search to find

~xk ≡ ~xk−1 + αk ~dk.

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 7 / 1

• Line Search Along ~d from ~x

min α g(α) ≡ f (~x + α~d)

α = ~d>(~b− A~x) ~d>A~d

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 8 / 1

• Line Search Along ~d from ~x

min α g(α) ≡ f (~x + α~d)

α = ~d>(~b− A~x) ~d>A~d

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 8 / 1

• Gradient Descent with Closed-Form Line Search

~dk = ~b− A~xk−1

αk = ~d>k ~dk

~d>kA ~dk

~xk = ~xk−1 + αk~dk CS 205A: Mathematical Methods Conjugate Gradients I: Setup 9 / 1

• Convergence

See book.

f(~xk)− f(~x∗) f(~xk−1)− f(~x∗)

≤ 1− 1 condA

Conclusions:

I Conditioning affects speed and quality

I Unconditional convergence (condA ≥ 1)

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 10 / 1

• Visualization

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 11 / 1

• Can We Do Better?

I Can iterate forever: Should stop after O(n) iterations!

I Lots of repeated work when poorly conditioned

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 12 / 1

• Observation

f (~x) = 1

2 (~x− ~x∗)>A(~x− ~x∗) + const.

A = LL>

=⇒ f (~x) = 1 2 ‖L>(~x− ~x∗)‖22 + const.

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 13 / 1

• Observation

f (~x) = 1

2 (~x− ~x∗)>A(~x− ~x∗) + const.

A = LL>

=⇒ f (~x) = 1 2 ‖L>(~x− ~x∗)‖22 + const.

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 13 / 1

• Observation

f (~x) = 1

2 (~x− ~x∗)>A(~x− ~x∗) + const.

A = LL>

=⇒ f (~x) = 1 2 ‖L>(~x− ~x∗)‖22 + const.

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 13 / 1

• Substitution

~y ≡ L>~x, ~y∗ ≡ L>~x∗

=⇒ f̄ (~y) = ‖~y − ~y∗‖22

Proposition

Suppose {~w1, . . . , ~wn} are orthogonal in Rn. Then, f̄ is minimized in at most n steps by line

searching in direction ~w1, then direction ~w2, and

so on.

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 14 / 1

• Substitution

~y ≡ L>~x, ~y∗ ≡ L>~x∗

=⇒ f̄ (~y) = ‖~y − ~y∗‖22

Proposition

Suppose {~w1, . . . , ~wn} are orthogonal in Rn. Then, f̄ is minimized in at most n steps by line

searching in direction ~w1, then direction ~w2, and

so on.

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 14 / 1

• Visualization

Any two orthogonal directions suffice!

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 15 / 1

• Undoing Change of Coordinates

Line search on f̄ along ~w is the same as line search on f along (L>)−1 ~w.

0 = ~wi · ~wj = (L>~vi)>(L>~vj) = ~v>i (LL

>)~vj = ~v > i A~vj

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 16 / 1

• Undoing Change of Coordinates

Line search on f̄ along ~w is the same as line search on f along (L>)−1 ~w.

0 = ~wi · ~wj = (L>~vi)>(L>~vj) = ~v>i (LL

>)~vj = ~v > i A~vj

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 16 / 1

• Conjugate Directions

A-Conjugate Vectors Two vectors ~v and ~w are A-conjugate if

~v>A~w = 0.

Corollary

Suppose {~v1, . . . , ~vn} are A-conjugate. Then, f is minimized in at most n steps by line searching

in direction ~v1, then direction ~v2, and so on.

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 17 / 1

• Conjugate Directions

A-Conjugate Vectors Two vectors ~v and ~w are A-conjugate if

~v>A~w = 0.

Corollary

Suppose {~v1, . . . , ~vn} are A-conjugate. Then, f is minimized in at most n steps by line searching

in direction ~v1, then direction ~v2, and so on.

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 17 / 1

• High-Level Ideas So Far

I Steepest descent may not be fastest descent

(surprising!)

I Two inner products:

~v · ~w 〈~v, ~w〉A ≡ ~v>A~w

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 18 / 1

• High-Level Ideas So Far

I Steepest descent may not be fastest descent

(surprising!)

I Two inner products:

~v · ~w 〈~v, ~w〉A ≡ ~v>A~w

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 18 / 1

• New Problem

Find n A-conjugate directions.

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 19 / 1

• Gram-Schmidt?

I Potentially unstable

I Storage increases with each iteration

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 20 / 1

• Gram-Schmidt?

I Potentially unstable

I Storage increases with each iteration

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 20 / 1

• Gram-Schmidt?

I Potentially unstable

I Storage increases with each iteration

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 20 / 1

• Another Clue

~rk ≡ ~b− A~xk

~rk+1 = ~rk − αk+1A~vk+1

Proposition When performing gradient descent on f ,

span {~r0, . . . , ~rk} = span {~r0, A~r0, . . . , Ak~r0}.

Krylov space?!

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 21 / 1

• Another Clue

~rk ≡ ~b− A~xk ~rk+1 = ~rk − αk+1A~vk+1

Proposition When performing gradient descent on f ,

span {~r0, . . . , ~rk} = span {~r0, A~r0, . . . , Ak~r0}.

Krylov space?!

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 21 / 1

• Another Clue

~rk ≡ ~b− A~xk ~rk+1 = ~rk − αk+1A~vk+1

Proposition When performing gradient descent on f ,

span {~r0, . . . , ~rk} = span {~r0, A~r0, . . . , Ak~r0}.

Krylov space?!

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 21 / 1

• Another Clue

~rk ≡ ~b− A~xk ~rk+1 = ~rk − αk+1A~vk+1

Proposition When performing gradient descent on f ,

span {~r0, . . . , ~rk} = span {~r0, A~r0, . . . , Ak~r0}.

Krylov space?!

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 21 / 1

~xk − ~x0 6= arg min ~v∈span {~r0,A~r0,...,Ak−1~r0}

f (~x0 + ~v)

But if this did hold... Convergence in n steps!

Next

CS 205A: Mathematical Methods Conjugate Gradients I: Setup 22 / 1