The Conjugate Gradient Method - Stanford...

72
The Conjugate Gradient Method Jason E. Hicken Aerospace Design Lab Department of Aeronautics & Astronautics Stanford University 14 July 2011

Transcript of The Conjugate Gradient Method - Stanford...

Page 1: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

The Conjugate Gradient Method

Jason E. Hicken

Aerospace Design Lab

Department of Aeronautics & Astronautics

Stanford University

14 July 2011

Page 2: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

Lecture Objectives

� describe when CG can be used to solve Ax = b

� relate CG to the method of conjugate directions

� describe what CG does geometrically

� explain each line in the CG algorithm

Page 3: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

We are interested in solving the linear system

Ax = b

where x , b ∈ Rn and A ∈ R

n×n

Matrix is symmetric positive-definite (SPD)

AT = A (symmetric)

xTAx > 0, ∀ x 6= 0 (positive-definite)

• discretization of elliptic PDEs

• optimization of quadratic functionals

• nonlinear optimization problems

Page 4: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

We are interested in solving the linear system

Ax = b

where x , b ∈ Rn and A ∈ R

n×n

Matrix is symmetric positive-definite (SPD)

AT = A (symmetric)

xTAx > 0, ∀ x 6= 0 (positive-definite)

• discretization of elliptic PDEs

• optimization of quadratic functionals

• nonlinear optimization problems

Page 5: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

When A is SPD, solving the linear system is thesame as minimizing the quadratic form

f (x) =1

2xTAx − bTx .

Why? If x⋆ is the minimizing point, then

∇f (x⋆) = Ax⋆ − b = 0

and, for x 6= x⋆

f (x)− f (x⋆) > 0. (homework)

Page 6: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

When A is SPD, solving the linear system is thesame as minimizing the quadratic form

f (x) =1

2xTAx − bTx .

Why? If x⋆ is the minimizing point, then

∇f (x⋆) = Ax⋆ − b = 0

and, for x 6= x⋆

f (x)− f (x⋆) > 0. (homework)

Page 7: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

DefinitionsLet xi be the approximate solution to Ax = b atiteration i .

error: ei ≡ xi − x

residual: ri ≡ b − Axi

The following identities for the residual will beuseful later.

ri = −Aei

ri = −∇f (xi)

Page 8: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

Model problem[

5 −3−3 5

](x1x2

)

=

(44

)

0 0.5 1 1.5 2 2.5 3 3.5 40.5

1

1.5

2

2.5

3

3.5

x1

x2

x = [2 2]T5x 1

− 3

x 2 =

4

−3x 1 + 5x 2

= 4

Page 9: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

Model problem[

5 −3−3 5

](x1x2

)

=

(44

)

x1

x2

x = [2 2]T5x 1

− 3

x 2 =

4

−3x 1 + 5x 2

= 4

0 0.5 1 1.5 2 2.5 3 3.5 40.5

1

1.5

2

2.5

3

3.5

Page 10: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

Review: Steepest Descent MethodQualitatively, how will steepest descent proceed onour model problem, starting at x0 = (1

3, 1)T ?

x1

x2

x0

0 0.5 1 1.5 2 2.5 3 3.5 40.5

1

1.5

2

2.5

3

3.5

Page 11: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

Review: Steepest Descent MethodQualitatively, how will steepest descent proceed onour model problem, starting at x0 = (1

3, 1)T ?

x1

x2

x0

x1

0 0.5 1 1.5 2 2.5 3 3.5 40.5

1

1.5

2

2.5

3

3.5

Page 12: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

Review: Steepest Descent MethodQualitatively, how will steepest descent proceed onour model problem, starting at x0 = (1

3, 1)T ?

x1

x2

x0

0 0.5 1 1.5 2 2.5 3 3.5 40.5

1

1.5

2

2.5

3

3.5

Page 13: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

How can we eliminate this zig-zag behaviour?

To find the answer, we begin by considering theeasier problem

[2 00 8

](x1x2

)

=

(

4√2

0

)

,

f (x) = x21 + 4x22 − 4√2x1.

Here, the equations are decoupled, so we canminimize in each direction independently.What do the contours of the correspondingquadratic form look like?

Page 14: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

How can we eliminate this zig-zag behaviour?

To find the answer, we begin by considering theeasier problem

[2 00 8

](x1x2

)

=

(

4√2

0

)

,

f (x) = x21 + 4x22 − 4√2x1.

Here, the equations are decoupled, so we canminimize in each direction independently.What do the contours of the correspondingquadratic form look like?

Page 15: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

Simplified problem[2 00 8

](x1x2

)

=

(

4√2

0

)

x1

x2

0 0.5 1 1.5 2 2.5 3 3.5 4−1.5

−1

−0.5

0

0.5

1

1.5

Page 16: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

Simplified problem[2 00 8

](x1x2

)

=

(

4√2

0

)

x1

x2

0 0.5 1 1.5 2 2.5 3 3.5 4−1.5

−1

−0.5

0

0.5

1

1.5

Page 17: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

Simplified problem[2 00 8

](x1x2

)

=

(

4√2

0

)

x1

x2

e0

0 0.5 1 1.5 2 2.5 3 3.5 4−1.5

−1

−0.5

0

0.5

1

1.5

Page 18: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

Simplified problem[2 00 8

](x1x2

)

=

(

4√2

0

)

x1

x2

e0

0 0.5 1 1.5 2 2.5 3 3.5 4−1.5

−1

−0.5

0

0.5

1

1.5

Page 19: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

Method of Orthogonal Directions

Idea: Express error as a sum of n orthogonal searchdirections

e ≡ x0 − x =n−1∑

i=0

αidi .

At iteration i + 1, eliminate component αidi .

• never need to search along di again

• converge in n iterations!

How would we apply the method of orthogonaldirections to a non-diagonal matrix?

Page 20: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

Method of Orthogonal Directions

Idea: Express error as a sum of n orthogonal searchdirections

e ≡ x0 − x =n−1∑

i=0

αidi .

At iteration i + 1, eliminate component αidi .

• never need to search along di again

• converge in n iterations!

How would we apply the method of orthogonaldirections to a non-diagonal matrix?

Page 21: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

Method of Orthogonal Directions

Idea: Express error as a sum of n orthogonal searchdirections

e ≡ x0 − x =n−1∑

i=0

αidi .

At iteration i + 1, eliminate component αidi .

• never need to search along di again

• converge in n iterations!

How would we apply the method of orthogonaldirections to a non-diagonal matrix?

Page 22: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

Review of Inner ProductsThe search directions in the method of orthogonaldirections are orthogonal with respect to the dotproduct.

The dot product is an example of an inner product.

Inner ProductFor x , y , z ∈ R

n and α ∈ R, an inner product(, ) : Rn × R

n → R satisfies

• symmetry: (x , y) = (y , x)

• linearity: (αx + y , z) = α(x , z) + (y , z)

• positive-definiteness: (x , x) > 0 ⇔ x 6= 0

Page 23: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

Review of Inner ProductsThe search directions in the method of orthogonaldirections are orthogonal with respect to the dotproduct.

The dot product is an example of an inner product.

Inner ProductFor x , y , z ∈ R

n and α ∈ R, an inner product(, ) : Rn × R

n → R satisfies

• symmetry: (x , y) = (y , x)

• linearity: (αx + y , z) = α(x , z) + (y , z)

• positive-definiteness: (x , x) > 0 ⇔ x 6= 0

Page 24: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

Fact: (x , y)A ≡ xTAy is an inner product

A-orthogonality (conjugacy)We say two vectors x , y ∈ R

n are A-orthogonal, orconjugate, if

(x , y)A = xTAy = 0.

What happens if we use A-orthogonality rather thanstandard orthogonality in the method of orthogonaldirections?

Page 25: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

Let {p0, p1, . . . , pn−1} be a set of n linearlyindependent vectors that are A-orthogonal. If pi isthe i th column of P, then

PTAP = Σ

where Σ is a diagonal matrix.

Substitute x = Py into the quadratic form:

f (Py) = yTΣy − (PTb)Ty .

We can apply the method of orthogonal directionsin y -space.

Page 26: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

Let {p0, p1, . . . , pn−1} be a set of n linearlyindependent vectors that are A-orthogonal. If pi isthe i th column of P, then

PTAP = Σ

where Σ is a diagonal matrix.

Substitute x = Py into the quadratic form:

f (Py) = yTΣy − (PTb)Ty .

We can apply the method of orthogonal directionsin y -space.

Page 27: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

New Problem: how do we get the set {pi} ofconjugate vectors?

Gram-Schmidt ConjugationLet {d0, d1, . . . , dn−1} be a set of linearlyindependent vectors, e.g., coordinate axes.

• set p0 = d0

• for i > 0

pi = di −i−1∑

j=0

βijpj

where βij = (di , pj)A/(pj , pj)A.

Page 28: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

New Problem: how do we get the set {pi} ofconjugate vectors?

Gram-Schmidt ConjugationLet {d0, d1, . . . , dn−1} be a set of linearlyindependent vectors, e.g., coordinate axes.

• set p0 = d0

• for i > 0

pi = di −i−1∑

j=0

βijpj

where βij = (di , pj)A/(pj , pj)A.

Page 29: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

The Method of Conjugate DirectionsForce the error at iteration i + 1 to be conjugate tothe search direction pi .

pTi Aei+1 = pTi A(ei + αipi) = 0

⇒ αi = −pTi Aei

pTi Api

=pTi ri

pTi Api

• never need to search along pi again

• converge in n iterations!

Page 30: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

The Method of Conjugate DirectionsForce the error at iteration i + 1 to be conjugate tothe search direction pi .

pTi Aei+1 = pTi A(ei + αipi) = 0

⇒ αi = −pTi Aei

pTi Api

=pTi ri

pTi Api

• never need to search along pi again

• converge in n iterations!

Page 31: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

The Method of Conjugate DirectionsForce the error at iteration i + 1 to be conjugate tothe search direction pi .

pTi Aei+1 = pTi A(ei + αipi) = 0

⇒ αi = −pTi Aei

pTi Api

=pTi ri

pTi Api

• never need to search along pi again

• converge in n iterations!

Page 32: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

The Method of Conjugate Directions[

5 −3−3 5

](x1x2

)

=

(44

)

x1

x2

d0

d1

0 0.5 1 1.5 2 2.5 3 3.5 40.5

1

1.5

2

2.5

3

3.5

Page 33: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

The Method of Conjugate Directions[

5 −3−3 5

](x1x2

)

=

(44

)

x1

x2

p0

p1

0 0.5 1 1.5 2 2.5 3 3.5 40.5

1

1.5

2

2.5

3

3.5

Page 34: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

The Method of Conjugate Directions[

5 −3−3 5

](x1x2

)

=

(44

)

x1

x2

x0

0 0.5 1 1.5 2 2.5 3 3.5 40.5

1

1.5

2

2.5

3

3.5

Page 35: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

The Method of Conjugate Directions[

5 −3−3 5

](x1x2

)

=

(44

)

x1

x2

x0

0 0.5 1 1.5 2 2.5 3 3.5 40.5

1

1.5

2

2.5

3

3.5

Page 36: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

The Method of Conjugate Directions[

5 −3−3 5

](x1x2

)

=

(44

)

x1

x2

x0

0 0.5 1 1.5 2 2.5 3 3.5 40.5

1

1.5

2

2.5

3

3.5

Page 37: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

The Method of Conjugate Directions is well defined,and avoids the “zig-zagging”of Steepest Descent.

What about computational expense?

• If we choose the di in Gram-Schmidtconjugation to be the coordinate axes, theMethod of Conjugate Directions is equivalentto Gaussian elimination.

• Keeping all the pi is the same as storing adense matrix!

Can we find a smarter choice for di?

Page 38: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

The Method of Conjugate Directions is well defined,and avoids the “zig-zagging”of Steepest Descent.

What about computational expense?

• If we choose the di in Gram-Schmidtconjugation to be the coordinate axes, theMethod of Conjugate Directions is equivalentto Gaussian elimination.

• Keeping all the pi is the same as storing adense matrix!

Can we find a smarter choice for di?

Page 39: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

Error Decomposition Using pi

x1

x2

e0

0 0.5 1 1.5 2 2.5 3 3.5 40.5

1

1.5

2

2.5

3

3.5

Page 40: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

Error Decomposition Using pi

x1

x2

α0 p

0

α1 p

1

e0

0 0.5 1 1.5 2 2.5 3 3.5 40.5

1

1.5

2

2.5

3

3.5

Page 41: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

The error at iteration i can be expressed as

ei =n−1∑

k=i

αkpk ,

so the error must be conjugate to pj for j < i :

pTj Aei = 0, ⇒ pTj ri = 0,

but from Gram-Schmidt conjugation we have

=

=

Page 42: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

The error at iteration i can be expressed as

ei =n−1∑

k=i

αkpk ,

so the error must be conjugate to pj for j < i :

pTj Aei = 0, ⇒ pTj ri = 0,

but from Gram-Schmidt conjugation we have

=

=

Page 43: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

The error at iteration i can be expressed as

ei =n−1∑

k=i

αkpk ,

so the error must be conjugate to pj for j < i :

pTj Aei = 0, ⇒ pTj ri = 0,

but from Gram-Schmidt conjugation we have

=

=

Page 44: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

The error at iteration i can be expressed as

ei =n−1∑

k=i

αkpk ,

so the error must be conjugate to pj for j < i :

pTj Aei = 0, ⇒ pTj ri = 0,

but from Gram-Schmidt conjugation we have

pj = dj −j−1∑

k=0

βjkpk .

Page 45: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

The error at iteration i can be expressed as

ei =n−1∑

k=i

αkpk ,

so the error must be conjugate to pj for j < i :

pTj Aei = 0, ⇒ pTj ri = 0,

but from Gram-Schmidt conjugation we have

pTj ri = dTj ri −

j−1∑

k=0

βjkpTk ri

0 = dTj ri , j < i .

Page 46: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

Thus, the residual at iteration i is orthogonal to thevectors dj used in the previous iterations:

dTj ri = 0, j < i

Idea: what happens if we choose di = ri?

• residuals become mutually orthogonal

• ri is orthogonal to pj , for j < i ⋆

• ri+1 becomes conjugate to pj , for j < i

This last point is not immediately obvious, so wewill prove it. This result has significant implicationsfor Gram-Schmidt conjugation.

⋆we showed this is true for any choice of di

Page 47: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

Thus, the residual at iteration i is orthogonal to thevectors dj used in the previous iterations:

dTj ri = 0, j < i

Idea: what happens if we choose di = ri?

• residuals become mutually orthogonal

• ri is orthogonal to pj , for j < i ⋆

• ri+1 becomes conjugate to pj , for j < i

This last point is not immediately obvious, so wewill prove it. This result has significant implicationsfor Gram-Schmidt conjugation.

⋆we showed this is true for any choice of di

Page 48: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

The solution is updated according to

xj+1 = xj + αjpj

⇒ rj+1 = rj − αjApj

⇒ Apj =1

αj

(rj − rj+1).

Next, take the dot product of both sides with anarbitrary residual ri :

rTi Apj =

rTi riαi

, i = j

− rTi riαi−1

, i = j + 1

0, otherwise.

Page 49: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

The solution is updated according to

xj+1 = xj + αjpj

⇒ rj+1 = rj − αjApj

⇒ Apj =1

αj

(rj − rj+1).

Next, take the dot product of both sides with anarbitrary residual ri :

rTi Apj =

rTi riαi

, i = j

− rTi riαi−1

, i = j + 1

0, otherwise.

Page 50: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

The solution is updated according to

xj+1 = xj + αjpj

⇒ rj+1 = rj − αjApj

⇒ Apj =1

αj

(rj − rj+1).

Next, take the dot product of both sides with anarbitrary residual ri :

rTi Apj =

rTi riαi

, i = j

− rTi riαi−1

, i = j + 1

0, otherwise.

Page 51: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

The solution is updated according to

xj+1 = xj + αjpj

⇒ rj+1 = rj − αjApj

⇒ Apj =1

αj

(rj − rj+1).

Next, take the dot product of both sides with anarbitrary residual ri :

rTi Apj =

rTi riαi

, i = j

− rTi riαi−1

, i = j + 1

0, otherwise.

Page 52: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

The solution is updated according to

xj+1 = xj + αjpj

⇒ rj+1 = rj − αjApj

⇒ Apj =1

αj

(rj − rj+1).

Next, take the dot product of both sides with anarbitrary residual ri :

rTi Apj =

rTi riαi

, i = j

− rTi riαi−1

, i = j + 1

0, otherwise.

Page 53: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

We can show that the first case (i = j) contains nonew information (homework). Divide the remainingcases by pTj Apj and insert the definition of αi−1:

rTi Apj

pTj Apj︸ ︷︷ ︸

βij

=

− rTi ri

rTi−1ri−1

, i = j + 1

0, otherwise.

We recognize the L.H.S. as the coefficients inGram-Schmidt conjugation

• only one coefficient is nonzero!

Page 54: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

We can show that the first case (i = j) contains nonew information (homework). Divide the remainingcases by pTj Apj and insert the definition of αi−1:

rTi Apj

pTj Apj︸ ︷︷ ︸

βij

=

− rTi ri

rTi−1ri−1

, i = j + 1

0, otherwise.

We recognize the L.H.S. as the coefficients inGram-Schmidt conjugation

• only one coefficient is nonzero!

Page 55: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

The Conjugate Gradient MethodSet p0 = r0 = b − Ax0 and i = 0

αi = (pTi ri)/(pTi Api) (step length)

xi+1 = xi + αipi (sol. update)

ri+1 = ri − αiApi (resid. update)

βi+1,i = −(rTi+1ri+1)/(rTi ri) (G.S. coeff.)

pi+1 = ri+1 − βi+1,i pi (Gram Schmidt)

i := i + 1

Page 56: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

The Conjugate Gradient MethodSet p0 = r0 = b − Ax0 and i = 0

αi = (pTi ri)/(pTi Api) (step length)

xi+1 = xi + αipi (sol. update)

ri+1 = ri − αiApi (resid. update)

βi+1,i = −(rTi+1ri+1)/(rTi ri) (G.S. coeff.)

pi+1 = ri+1 − βi+1,i pi (Gram Schmidt)

i := i + 1

Page 57: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

The Conjugate Gradient MethodSet p0 = r0 = b − Ax0 and i = 0

αi = (pTi ri)/(pTi Api) (step length)

xi+1 = xi + αipi (sol. update)

ri+1 = ri − αiApi (resid. update)

βi+1,i = −(rTi+1ri+1)/(rTi ri) (G.S. coeff.)

pi+1 = ri+1 − βi+1,i pi (Gram Schmidt)

i := i + 1

Page 58: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

The Conjugate Gradient MethodSet p0 = r0 = b − Ax0 and i = 0

αi = (pTi ri)/(pTi Api) (step length)

xi+1 = xi + αipi (sol. update)

ri+1 = ri − αiApi (resid. update)

βi+1,i = −(rTi+1ri+1)/(rTi ri) (G.S. coeff.)

pi+1 = ri+1 − βi+1,i pi (Gram Schmidt)

i := i + 1

Page 59: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

The Conjugate Gradient MethodSet p0 = r0 = b − Ax0 and i = 0

αi = (pTi ri)/(pTi Api) (step length)

xi+1 = xi + αipi (sol. update)

ri+1 = ri − αiApi (resid. update)

βi+1,i = −(rTi+1ri+1)/(rTi ri) (G.S. coeff.)

pi+1 = ri+1 − βi+1,i pi (Gram Schmidt)

i := i + 1

Page 60: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

The Conjugate Gradient MethodSet p0 = r0 = b − Ax0 and i = 0

αi = (pTi ri)/(pTi Api) (step length)

xi+1 = xi + αipi (sol. update)

ri+1 = ri − αiApi (resid. update)

βi+1,i = −(rTi+1ri+1)/(rTi ri) (G.S. coeff.)

pi+1 = ri+1 − βi+1,i pi (Gram Schmidt)

i := i + 1

Page 61: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

The Conjugate Gradient MethodSet p0 = r0 = b − Ax0 and i = 0

αi = (pTi ri)/(pTi Api) (step length)

xi+1 = xi + αipi (sol. update)

ri+1 = ri − αiApi (resid. update)

βi+1,i = −(rTi+1ri+1)/(rTi ri) (G.S. coeff.)

pi+1 = ri+1 − βi+1,i pi (Gram Schmidt)

i := i + 1

Page 62: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

The Conjugate Gradient Method[

5 −3−3 5

](x1x2

)

=

(44

)

x1

x2

x0

0 0.5 1 1.5 2 2.5 3 3.5 40.5

1

1.5

2

2.5

3

3.5

Page 63: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

The Conjugate Gradient Method[

5 −3−3 5

](x1x2

)

=

(44

)

x1

x2

0 0.5 1 1.5 2 2.5 3 3.5 40.5

1

1.5

2

2.5

3

3.5

Page 64: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

The Conjugate Gradient Method[

5 −3−3 5

](x1x2

)

=

(44

)

x1

x2

0 0.5 1 1.5 2 2.5 3 3.5 40.5

1

1.5

2

2.5

3

3.5

Page 65: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

Lecture Objectives� describe when CG can be used to solve Ax = b

A must be symmetric positive-definite

� relate CG to the method of conjugate directionsCG is a method of conjugate directions withthe choice di = ri , which simplifiesGram-Schmidt conjugation

� describe what CG does geometricallyPerforms the method of orthogonal directionsin a transformed space where the contours ofthe quadratic form are aligned with thecoordinate axes

� explain each line in the CG algorithm

Page 66: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

Lecture Objectives� describe when CG can be used to solve Ax = b

A must be symmetric positive-definite

� relate CG to the method of conjugate directionsCG is a method of conjugate directions withthe choice di = ri , which simplifiesGram-Schmidt conjugation

� describe what CG does geometricallyPerforms the method of orthogonal directionsin a transformed space where the contours ofthe quadratic form are aligned with thecoordinate axes

� explain each line in the CG algorithm

Page 67: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

Lecture Objectives� describe when CG can be used to solve Ax = b

A must be symmetric positive-definite

� relate CG to the method of conjugate directionsCG is a method of conjugate directions withthe choice di = ri , which simplifiesGram-Schmidt conjugation

� describe what CG does geometricallyPerforms the method of orthogonal directionsin a transformed space where the contours ofthe quadratic form are aligned with thecoordinate axes

� explain each line in the CG algorithm

Page 68: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

Lecture Objectives� describe when CG can be used to solve Ax = b

A must be symmetric positive-definite

� relate CG to the method of conjugate directionsCG is a method of conjugate directions withthe choice di = ri , which simplifiesGram-Schmidt conjugation

� describe what CG does geometricallyPerforms the method of orthogonal directionsin a transformed space where the contours ofthe quadratic form are aligned with thecoordinate axes

� explain each line in the CG algorithm

Page 69: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

Lecture Objectives� describe when CG can be used to solve Ax = b

A must be symmetric positive-definite

� relate CG to the method of conjugate directionsCG is a method of conjugate directions withthe choice di = ri , which simplifiesGram-Schmidt conjugation

� describe what CG does geometricallyPerforms the method of orthogonal directionsin a transformed space where the contours ofthe quadratic form are aligned with thecoordinate axes

� explain each line in the CG algorithm

Page 70: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

Lecture Objectives� describe when CG can be used to solve Ax = b

A must be symmetric positive-definite

� relate CG to the method of conjugate directionsCG is a method of conjugate directions withthe choice di = ri , which simplifiesGram-Schmidt conjugation

� describe what CG does geometricallyPerforms the method of orthogonal directionsin a transformed space where the contours ofthe quadratic form are aligned with thecoordinate axes

� explain each line in the CG algorithm

Page 71: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

Lecture Objectives� describe when CG can be used to solve Ax = b

A must be symmetric positive-definite

� relate CG to the method of conjugate directionsCG is a method of conjugate directions withthe choice di = ri , which simplifiesGram-Schmidt conjugation

� describe what CG does geometricallyPerforms the method of orthogonal directionsin a transformed space where the contours ofthe quadratic form are aligned with thecoordinate axes

� explain each line in the CG algorithm

Page 72: The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

References

• Saad, Y., “Iterative Methods for Sparse LinearSystems”, second edition

• Shewchuk, J. R., “An introduction to theConjugate Gradient method without theagonizing pain”