The Conjugate Gradient Method - Stanford...

Post on 09-Apr-2020

15 views 0 download

Transcript of The Conjugate Gradient Method - Stanford...

The Conjugate Gradient Method

Jason E. Hicken

Aerospace Design Lab

Department of Aeronautics & Astronautics

Stanford University

14 July 2011

Lecture Objectives

� describe when CG can be used to solve Ax = b

� relate CG to the method of conjugate directions

� describe what CG does geometrically

� explain each line in the CG algorithm

We are interested in solving the linear system

Ax = b

where x , b ∈ Rn and A ∈ R

n×n

Matrix is symmetric positive-definite (SPD)

AT = A (symmetric)

xTAx > 0, ∀ x 6= 0 (positive-definite)

• discretization of elliptic PDEs

• optimization of quadratic functionals

• nonlinear optimization problems

We are interested in solving the linear system

Ax = b

where x , b ∈ Rn and A ∈ R

n×n

Matrix is symmetric positive-definite (SPD)

AT = A (symmetric)

xTAx > 0, ∀ x 6= 0 (positive-definite)

• discretization of elliptic PDEs

• optimization of quadratic functionals

• nonlinear optimization problems

When A is SPD, solving the linear system is thesame as minimizing the quadratic form

f (x) =1

2xTAx − bTx .

Why? If x⋆ is the minimizing point, then

∇f (x⋆) = Ax⋆ − b = 0

and, for x 6= x⋆

f (x)− f (x⋆) > 0. (homework)

When A is SPD, solving the linear system is thesame as minimizing the quadratic form

f (x) =1

2xTAx − bTx .

Why? If x⋆ is the minimizing point, then

∇f (x⋆) = Ax⋆ − b = 0

and, for x 6= x⋆

f (x)− f (x⋆) > 0. (homework)

DefinitionsLet xi be the approximate solution to Ax = b atiteration i .

error: ei ≡ xi − x

residual: ri ≡ b − Axi

The following identities for the residual will beuseful later.

ri = −Aei

ri = −∇f (xi)

Model problem[

5 −3−3 5

](x1x2

)

=

(44

)

0 0.5 1 1.5 2 2.5 3 3.5 40.5

1

1.5

2

2.5

3

3.5

x1

x2

x = [2 2]T5x 1

− 3

x 2 =

4

−3x 1 + 5x 2

= 4

Model problem[

5 −3−3 5

](x1x2

)

=

(44

)

x1

x2

x = [2 2]T5x 1

− 3

x 2 =

4

−3x 1 + 5x 2

= 4

0 0.5 1 1.5 2 2.5 3 3.5 40.5

1

1.5

2

2.5

3

3.5

Review: Steepest Descent MethodQualitatively, how will steepest descent proceed onour model problem, starting at x0 = (1

3, 1)T ?

x1

x2

x0

0 0.5 1 1.5 2 2.5 3 3.5 40.5

1

1.5

2

2.5

3

3.5

Review: Steepest Descent MethodQualitatively, how will steepest descent proceed onour model problem, starting at x0 = (1

3, 1)T ?

x1

x2

x0

x1

0 0.5 1 1.5 2 2.5 3 3.5 40.5

1

1.5

2

2.5

3

3.5

Review: Steepest Descent MethodQualitatively, how will steepest descent proceed onour model problem, starting at x0 = (1

3, 1)T ?

x1

x2

x0

0 0.5 1 1.5 2 2.5 3 3.5 40.5

1

1.5

2

2.5

3

3.5

How can we eliminate this zig-zag behaviour?

To find the answer, we begin by considering theeasier problem

[2 00 8

](x1x2

)

=

(

4√2

0

)

,

f (x) = x21 + 4x22 − 4√2x1.

Here, the equations are decoupled, so we canminimize in each direction independently.What do the contours of the correspondingquadratic form look like?

How can we eliminate this zig-zag behaviour?

To find the answer, we begin by considering theeasier problem

[2 00 8

](x1x2

)

=

(

4√2

0

)

,

f (x) = x21 + 4x22 − 4√2x1.

Here, the equations are decoupled, so we canminimize in each direction independently.What do the contours of the correspondingquadratic form look like?

Simplified problem[2 00 8

](x1x2

)

=

(

4√2

0

)

x1

x2

0 0.5 1 1.5 2 2.5 3 3.5 4−1.5

−1

−0.5

0

0.5

1

1.5

Simplified problem[2 00 8

](x1x2

)

=

(

4√2

0

)

x1

x2

0 0.5 1 1.5 2 2.5 3 3.5 4−1.5

−1

−0.5

0

0.5

1

1.5

Simplified problem[2 00 8

](x1x2

)

=

(

4√2

0

)

x1

x2

e0

0 0.5 1 1.5 2 2.5 3 3.5 4−1.5

−1

−0.5

0

0.5

1

1.5

Simplified problem[2 00 8

](x1x2

)

=

(

4√2

0

)

x1

x2

e0

0 0.5 1 1.5 2 2.5 3 3.5 4−1.5

−1

−0.5

0

0.5

1

1.5

Method of Orthogonal Directions

Idea: Express error as a sum of n orthogonal searchdirections

e ≡ x0 − x =n−1∑

i=0

αidi .

At iteration i + 1, eliminate component αidi .

• never need to search along di again

• converge in n iterations!

How would we apply the method of orthogonaldirections to a non-diagonal matrix?

Method of Orthogonal Directions

Idea: Express error as a sum of n orthogonal searchdirections

e ≡ x0 − x =n−1∑

i=0

αidi .

At iteration i + 1, eliminate component αidi .

• never need to search along di again

• converge in n iterations!

How would we apply the method of orthogonaldirections to a non-diagonal matrix?

Method of Orthogonal Directions

Idea: Express error as a sum of n orthogonal searchdirections

e ≡ x0 − x =n−1∑

i=0

αidi .

At iteration i + 1, eliminate component αidi .

• never need to search along di again

• converge in n iterations!

How would we apply the method of orthogonaldirections to a non-diagonal matrix?

Review of Inner ProductsThe search directions in the method of orthogonaldirections are orthogonal with respect to the dotproduct.

The dot product is an example of an inner product.

Inner ProductFor x , y , z ∈ R

n and α ∈ R, an inner product(, ) : Rn × R

n → R satisfies

• symmetry: (x , y) = (y , x)

• linearity: (αx + y , z) = α(x , z) + (y , z)

• positive-definiteness: (x , x) > 0 ⇔ x 6= 0

Review of Inner ProductsThe search directions in the method of orthogonaldirections are orthogonal with respect to the dotproduct.

The dot product is an example of an inner product.

Inner ProductFor x , y , z ∈ R

n and α ∈ R, an inner product(, ) : Rn × R

n → R satisfies

• symmetry: (x , y) = (y , x)

• linearity: (αx + y , z) = α(x , z) + (y , z)

• positive-definiteness: (x , x) > 0 ⇔ x 6= 0

Fact: (x , y)A ≡ xTAy is an inner product

A-orthogonality (conjugacy)We say two vectors x , y ∈ R

n are A-orthogonal, orconjugate, if

(x , y)A = xTAy = 0.

What happens if we use A-orthogonality rather thanstandard orthogonality in the method of orthogonaldirections?

Let {p0, p1, . . . , pn−1} be a set of n linearlyindependent vectors that are A-orthogonal. If pi isthe i th column of P, then

PTAP = Σ

where Σ is a diagonal matrix.

Substitute x = Py into the quadratic form:

f (Py) = yTΣy − (PTb)Ty .

We can apply the method of orthogonal directionsin y -space.

Let {p0, p1, . . . , pn−1} be a set of n linearlyindependent vectors that are A-orthogonal. If pi isthe i th column of P, then

PTAP = Σ

where Σ is a diagonal matrix.

Substitute x = Py into the quadratic form:

f (Py) = yTΣy − (PTb)Ty .

We can apply the method of orthogonal directionsin y -space.

New Problem: how do we get the set {pi} ofconjugate vectors?

Gram-Schmidt ConjugationLet {d0, d1, . . . , dn−1} be a set of linearlyindependent vectors, e.g., coordinate axes.

• set p0 = d0

• for i > 0

pi = di −i−1∑

j=0

βijpj

where βij = (di , pj)A/(pj , pj)A.

New Problem: how do we get the set {pi} ofconjugate vectors?

Gram-Schmidt ConjugationLet {d0, d1, . . . , dn−1} be a set of linearlyindependent vectors, e.g., coordinate axes.

• set p0 = d0

• for i > 0

pi = di −i−1∑

j=0

βijpj

where βij = (di , pj)A/(pj , pj)A.

The Method of Conjugate DirectionsForce the error at iteration i + 1 to be conjugate tothe search direction pi .

pTi Aei+1 = pTi A(ei + αipi) = 0

⇒ αi = −pTi Aei

pTi Api

=pTi ri

pTi Api

• never need to search along pi again

• converge in n iterations!

The Method of Conjugate DirectionsForce the error at iteration i + 1 to be conjugate tothe search direction pi .

pTi Aei+1 = pTi A(ei + αipi) = 0

⇒ αi = −pTi Aei

pTi Api

=pTi ri

pTi Api

• never need to search along pi again

• converge in n iterations!

The Method of Conjugate DirectionsForce the error at iteration i + 1 to be conjugate tothe search direction pi .

pTi Aei+1 = pTi A(ei + αipi) = 0

⇒ αi = −pTi Aei

pTi Api

=pTi ri

pTi Api

• never need to search along pi again

• converge in n iterations!

The Method of Conjugate Directions[

5 −3−3 5

](x1x2

)

=

(44

)

x1

x2

d0

d1

0 0.5 1 1.5 2 2.5 3 3.5 40.5

1

1.5

2

2.5

3

3.5

The Method of Conjugate Directions[

5 −3−3 5

](x1x2

)

=

(44

)

x1

x2

p0

p1

0 0.5 1 1.5 2 2.5 3 3.5 40.5

1

1.5

2

2.5

3

3.5

The Method of Conjugate Directions[

5 −3−3 5

](x1x2

)

=

(44

)

x1

x2

x0

0 0.5 1 1.5 2 2.5 3 3.5 40.5

1

1.5

2

2.5

3

3.5

The Method of Conjugate Directions[

5 −3−3 5

](x1x2

)

=

(44

)

x1

x2

x0

0 0.5 1 1.5 2 2.5 3 3.5 40.5

1

1.5

2

2.5

3

3.5

The Method of Conjugate Directions[

5 −3−3 5

](x1x2

)

=

(44

)

x1

x2

x0

0 0.5 1 1.5 2 2.5 3 3.5 40.5

1

1.5

2

2.5

3

3.5

The Method of Conjugate Directions is well defined,and avoids the “zig-zagging”of Steepest Descent.

What about computational expense?

• If we choose the di in Gram-Schmidtconjugation to be the coordinate axes, theMethod of Conjugate Directions is equivalentto Gaussian elimination.

• Keeping all the pi is the same as storing adense matrix!

Can we find a smarter choice for di?

The Method of Conjugate Directions is well defined,and avoids the “zig-zagging”of Steepest Descent.

What about computational expense?

• If we choose the di in Gram-Schmidtconjugation to be the coordinate axes, theMethod of Conjugate Directions is equivalentto Gaussian elimination.

• Keeping all the pi is the same as storing adense matrix!

Can we find a smarter choice for di?

Error Decomposition Using pi

x1

x2

e0

0 0.5 1 1.5 2 2.5 3 3.5 40.5

1

1.5

2

2.5

3

3.5

Error Decomposition Using pi

x1

x2

α0 p

0

α1 p

1

e0

0 0.5 1 1.5 2 2.5 3 3.5 40.5

1

1.5

2

2.5

3

3.5

The error at iteration i can be expressed as

ei =n−1∑

k=i

αkpk ,

so the error must be conjugate to pj for j < i :

pTj Aei = 0, ⇒ pTj ri = 0,

but from Gram-Schmidt conjugation we have

=

=

The error at iteration i can be expressed as

ei =n−1∑

k=i

αkpk ,

so the error must be conjugate to pj for j < i :

pTj Aei = 0, ⇒ pTj ri = 0,

but from Gram-Schmidt conjugation we have

=

=

The error at iteration i can be expressed as

ei =n−1∑

k=i

αkpk ,

so the error must be conjugate to pj for j < i :

pTj Aei = 0, ⇒ pTj ri = 0,

but from Gram-Schmidt conjugation we have

=

=

The error at iteration i can be expressed as

ei =n−1∑

k=i

αkpk ,

so the error must be conjugate to pj for j < i :

pTj Aei = 0, ⇒ pTj ri = 0,

but from Gram-Schmidt conjugation we have

pj = dj −j−1∑

k=0

βjkpk .

The error at iteration i can be expressed as

ei =n−1∑

k=i

αkpk ,

so the error must be conjugate to pj for j < i :

pTj Aei = 0, ⇒ pTj ri = 0,

but from Gram-Schmidt conjugation we have

pTj ri = dTj ri −

j−1∑

k=0

βjkpTk ri

0 = dTj ri , j < i .

Thus, the residual at iteration i is orthogonal to thevectors dj used in the previous iterations:

dTj ri = 0, j < i

Idea: what happens if we choose di = ri?

• residuals become mutually orthogonal

• ri is orthogonal to pj , for j < i ⋆

• ri+1 becomes conjugate to pj , for j < i

This last point is not immediately obvious, so wewill prove it. This result has significant implicationsfor Gram-Schmidt conjugation.

⋆we showed this is true for any choice of di

Thus, the residual at iteration i is orthogonal to thevectors dj used in the previous iterations:

dTj ri = 0, j < i

Idea: what happens if we choose di = ri?

• residuals become mutually orthogonal

• ri is orthogonal to pj , for j < i ⋆

• ri+1 becomes conjugate to pj , for j < i

This last point is not immediately obvious, so wewill prove it. This result has significant implicationsfor Gram-Schmidt conjugation.

⋆we showed this is true for any choice of di

The solution is updated according to

xj+1 = xj + αjpj

⇒ rj+1 = rj − αjApj

⇒ Apj =1

αj

(rj − rj+1).

Next, take the dot product of both sides with anarbitrary residual ri :

rTi Apj =

rTi riαi

, i = j

− rTi riαi−1

, i = j + 1

0, otherwise.

The solution is updated according to

xj+1 = xj + αjpj

⇒ rj+1 = rj − αjApj

⇒ Apj =1

αj

(rj − rj+1).

Next, take the dot product of both sides with anarbitrary residual ri :

rTi Apj =

rTi riαi

, i = j

− rTi riαi−1

, i = j + 1

0, otherwise.

The solution is updated according to

xj+1 = xj + αjpj

⇒ rj+1 = rj − αjApj

⇒ Apj =1

αj

(rj − rj+1).

Next, take the dot product of both sides with anarbitrary residual ri :

rTi Apj =

rTi riαi

, i = j

− rTi riαi−1

, i = j + 1

0, otherwise.

The solution is updated according to

xj+1 = xj + αjpj

⇒ rj+1 = rj − αjApj

⇒ Apj =1

αj

(rj − rj+1).

Next, take the dot product of both sides with anarbitrary residual ri :

rTi Apj =

rTi riαi

, i = j

− rTi riαi−1

, i = j + 1

0, otherwise.

The solution is updated according to

xj+1 = xj + αjpj

⇒ rj+1 = rj − αjApj

⇒ Apj =1

αj

(rj − rj+1).

Next, take the dot product of both sides with anarbitrary residual ri :

rTi Apj =

rTi riαi

, i = j

− rTi riαi−1

, i = j + 1

0, otherwise.

We can show that the first case (i = j) contains nonew information (homework). Divide the remainingcases by pTj Apj and insert the definition of αi−1:

rTi Apj

pTj Apj︸ ︷︷ ︸

βij

=

− rTi ri

rTi−1ri−1

, i = j + 1

0, otherwise.

We recognize the L.H.S. as the coefficients inGram-Schmidt conjugation

• only one coefficient is nonzero!

We can show that the first case (i = j) contains nonew information (homework). Divide the remainingcases by pTj Apj and insert the definition of αi−1:

rTi Apj

pTj Apj︸ ︷︷ ︸

βij

=

− rTi ri

rTi−1ri−1

, i = j + 1

0, otherwise.

We recognize the L.H.S. as the coefficients inGram-Schmidt conjugation

• only one coefficient is nonzero!

The Conjugate Gradient MethodSet p0 = r0 = b − Ax0 and i = 0

αi = (pTi ri)/(pTi Api) (step length)

xi+1 = xi + αipi (sol. update)

ri+1 = ri − αiApi (resid. update)

βi+1,i = −(rTi+1ri+1)/(rTi ri) (G.S. coeff.)

pi+1 = ri+1 − βi+1,i pi (Gram Schmidt)

i := i + 1

The Conjugate Gradient MethodSet p0 = r0 = b − Ax0 and i = 0

αi = (pTi ri)/(pTi Api) (step length)

xi+1 = xi + αipi (sol. update)

ri+1 = ri − αiApi (resid. update)

βi+1,i = −(rTi+1ri+1)/(rTi ri) (G.S. coeff.)

pi+1 = ri+1 − βi+1,i pi (Gram Schmidt)

i := i + 1

The Conjugate Gradient MethodSet p0 = r0 = b − Ax0 and i = 0

αi = (pTi ri)/(pTi Api) (step length)

xi+1 = xi + αipi (sol. update)

ri+1 = ri − αiApi (resid. update)

βi+1,i = −(rTi+1ri+1)/(rTi ri) (G.S. coeff.)

pi+1 = ri+1 − βi+1,i pi (Gram Schmidt)

i := i + 1

The Conjugate Gradient MethodSet p0 = r0 = b − Ax0 and i = 0

αi = (pTi ri)/(pTi Api) (step length)

xi+1 = xi + αipi (sol. update)

ri+1 = ri − αiApi (resid. update)

βi+1,i = −(rTi+1ri+1)/(rTi ri) (G.S. coeff.)

pi+1 = ri+1 − βi+1,i pi (Gram Schmidt)

i := i + 1

The Conjugate Gradient MethodSet p0 = r0 = b − Ax0 and i = 0

αi = (pTi ri)/(pTi Api) (step length)

xi+1 = xi + αipi (sol. update)

ri+1 = ri − αiApi (resid. update)

βi+1,i = −(rTi+1ri+1)/(rTi ri) (G.S. coeff.)

pi+1 = ri+1 − βi+1,i pi (Gram Schmidt)

i := i + 1

The Conjugate Gradient MethodSet p0 = r0 = b − Ax0 and i = 0

αi = (pTi ri)/(pTi Api) (step length)

xi+1 = xi + αipi (sol. update)

ri+1 = ri − αiApi (resid. update)

βi+1,i = −(rTi+1ri+1)/(rTi ri) (G.S. coeff.)

pi+1 = ri+1 − βi+1,i pi (Gram Schmidt)

i := i + 1

The Conjugate Gradient MethodSet p0 = r0 = b − Ax0 and i = 0

αi = (pTi ri)/(pTi Api) (step length)

xi+1 = xi + αipi (sol. update)

ri+1 = ri − αiApi (resid. update)

βi+1,i = −(rTi+1ri+1)/(rTi ri) (G.S. coeff.)

pi+1 = ri+1 − βi+1,i pi (Gram Schmidt)

i := i + 1

The Conjugate Gradient Method[

5 −3−3 5

](x1x2

)

=

(44

)

x1

x2

x0

0 0.5 1 1.5 2 2.5 3 3.5 40.5

1

1.5

2

2.5

3

3.5

The Conjugate Gradient Method[

5 −3−3 5

](x1x2

)

=

(44

)

x1

x2

0 0.5 1 1.5 2 2.5 3 3.5 40.5

1

1.5

2

2.5

3

3.5

The Conjugate Gradient Method[

5 −3−3 5

](x1x2

)

=

(44

)

x1

x2

0 0.5 1 1.5 2 2.5 3 3.5 40.5

1

1.5

2

2.5

3

3.5

Lecture Objectives� describe when CG can be used to solve Ax = b

A must be symmetric positive-definite

� relate CG to the method of conjugate directionsCG is a method of conjugate directions withthe choice di = ri , which simplifiesGram-Schmidt conjugation

� describe what CG does geometricallyPerforms the method of orthogonal directionsin a transformed space where the contours ofthe quadratic form are aligned with thecoordinate axes

� explain each line in the CG algorithm

Lecture Objectives� describe when CG can be used to solve Ax = b

A must be symmetric positive-definite

� relate CG to the method of conjugate directionsCG is a method of conjugate directions withthe choice di = ri , which simplifiesGram-Schmidt conjugation

� describe what CG does geometricallyPerforms the method of orthogonal directionsin a transformed space where the contours ofthe quadratic form are aligned with thecoordinate axes

� explain each line in the CG algorithm

Lecture Objectives� describe when CG can be used to solve Ax = b

A must be symmetric positive-definite

� relate CG to the method of conjugate directionsCG is a method of conjugate directions withthe choice di = ri , which simplifiesGram-Schmidt conjugation

� describe what CG does geometricallyPerforms the method of orthogonal directionsin a transformed space where the contours ofthe quadratic form are aligned with thecoordinate axes

� explain each line in the CG algorithm

Lecture Objectives� describe when CG can be used to solve Ax = b

A must be symmetric positive-definite

� relate CG to the method of conjugate directionsCG is a method of conjugate directions withthe choice di = ri , which simplifiesGram-Schmidt conjugation

� describe what CG does geometricallyPerforms the method of orthogonal directionsin a transformed space where the contours ofthe quadratic form are aligned with thecoordinate axes

� explain each line in the CG algorithm

Lecture Objectives� describe when CG can be used to solve Ax = b

A must be symmetric positive-definite

� relate CG to the method of conjugate directionsCG is a method of conjugate directions withthe choice di = ri , which simplifiesGram-Schmidt conjugation

� describe what CG does geometricallyPerforms the method of orthogonal directionsin a transformed space where the contours ofthe quadratic form are aligned with thecoordinate axes

� explain each line in the CG algorithm

Lecture Objectives� describe when CG can be used to solve Ax = b

A must be symmetric positive-definite

� relate CG to the method of conjugate directionsCG is a method of conjugate directions withthe choice di = ri , which simplifiesGram-Schmidt conjugation

� describe what CG does geometricallyPerforms the method of orthogonal directionsin a transformed space where the contours ofthe quadratic form are aligned with thecoordinate axes

� explain each line in the CG algorithm

Lecture Objectives� describe when CG can be used to solve Ax = b

A must be symmetric positive-definite

� relate CG to the method of conjugate directionsCG is a method of conjugate directions withthe choice di = ri , which simplifiesGram-Schmidt conjugation

� describe what CG does geometricallyPerforms the method of orthogonal directionsin a transformed space where the contours ofthe quadratic form are aligned with thecoordinate axes

� explain each line in the CG algorithm

References

• Saad, Y., “Iterative Methods for Sparse LinearSystems”, second edition

• Shewchuk, J. R., “An introduction to theConjugate Gradient method without theagonizing pain”