The Conjugate Gradient Method

Jason E. Hicken

Aerospace Design Lab

Department of Aeronautics & Astronautics

Stanford University

14 July 2011

Lecture Objectives

� describe when CG can be used to solve Ax = b

� relate CG to the method of conjugate directions

� describe what CG does geometrically

� explain each line in the CG algorithm

We are interested in solving the linear system

Ax = b

where x , b ∈ Rn and A ∈ R

Matrix is symmetric positive-definite (SPD)

AT = A (symmetric)

xTAx > 0, ∀ x 6= 0 (positive-definite)

• discretization of elliptic PDEs

• optimization of quadratic functionals

• nonlinear optimization problems

We are interested in solving the linear system

Ax = b

where x , b ∈ Rn and A ∈ R

Matrix is symmetric positive-definite (SPD)

AT = A (symmetric)

xTAx > 0, ∀ x 6= 0 (positive-definite)

• discretization of elliptic PDEs

• optimization of quadratic functionals

• nonlinear optimization problems

When A is SPD, solving the linear system is thesame as minimizing the quadratic form

f (x) =1

2xTAx − bTx .

Why? If x⋆ is the minimizing point, then

∇f (x⋆) = Ax⋆ − b = 0

and, for x 6= x⋆

f (x)− f (x⋆) > 0. (homework)

When A is SPD, solving the linear system is thesame as minimizing the quadratic form

f (x) =1

2xTAx − bTx .

Why? If x⋆ is the minimizing point, then

∇f (x⋆) = Ax⋆ − b = 0

and, for x 6= x⋆

f (x)− f (x⋆) > 0. (homework)

DefinitionsLet xi be the approximate solution to Ax = b atiteration i .

error: ei ≡ xi − x

residual: ri ≡ b − Axi

The following identities for the residual will beuseful later.

ri = −Aei

ri = −∇f (xi)

Model problem[

5 −3−3 5

](x1x2

0 0.5 1 1.5 2 2.5 3 3.5 40.5

x = [2 2]T5x 1

−3x 1 + 5x 2

Model problem[

5 −3−3 5

](x1x2

x = [2 2]T5x 1

−3x 1 + 5x 2

0 0.5 1 1.5 2 2.5 3 3.5 40.5

Review: Steepest Descent MethodQualitatively, how will steepest descent proceed onour model problem, starting at x0 = (1

3, 1)T ?

0 0.5 1 1.5 2 2.5 3 3.5 40.5

3, 1)T ?

0 0.5 1 1.5 2 2.5 3 3.5 40.5

3, 1)T ?

0 0.5 1 1.5 2 2.5 3 3.5 40.5

How can we eliminate this zig-zag behaviour?

To find the answer, we begin by considering theeasier problem

[2 00 8

](x1x2

f (x) = x21 + 4x22 − 4√2x1.

Here, the equations are decoupled, so we canminimize in each direction independently.What do the contours of the correspondingquadratic form look like?

How can we eliminate this zig-zag behaviour?

To find the answer, we begin by considering theeasier problem

[2 00 8

](x1x2

f (x) = x21 + 4x22 − 4√2x1.

Here, the equations are decoupled, so we canminimize in each direction independently.What do the contours of the correspondingquadratic form look like?

Simplified problem[2 00 8

](x1x2

0 0.5 1 1.5 2 2.5 3 3.5 4−1.5

−0.5

](x1x2

0 0.5 1 1.5 2 2.5 3 3.5 4−1.5

−0.5

](x1x2

0 0.5 1 1.5 2 2.5 3 3.5 4−1.5

−0.5

](x1x2

0 0.5 1 1.5 2 2.5 3 3.5 4−1.5

−0.5

Method of Orthogonal Directions

Idea: Express error as a sum of n orthogonal searchdirections

e ≡ x0 − x =n−1∑

αidi .

At iteration i + 1, eliminate component αidi .

• never need to search along di again

• converge in n iterations!

How would we apply the method of orthogonaldirections to a non-diagonal matrix?

e ≡ x0 − x =n−1∑

αidi .

e ≡ x0 − x =n−1∑

αidi .

Review of Inner ProductsThe search directions in the method of orthogonaldirections are orthogonal with respect to the dotproduct.

The dot product is an example of an inner product.

Inner ProductFor x , y , z ∈ R

n and α ∈ R, an inner product(, ) : Rn × R

n → R satisfies

• symmetry: (x , y) = (y , x)

• linearity: (αx + y , z) = α(x , z) + (y , z)

• positive-definiteness: (x , x) > 0 ⇔ x 6= 0

Review of Inner ProductsThe search directions in the method of orthogonaldirections are orthogonal with respect to the dotproduct.

The dot product is an example of an inner product.

Inner ProductFor x , y , z ∈ R

n and α ∈ R, an inner product(, ) : Rn × R

n → R satisfies

• symmetry: (x , y) = (y , x)

• linearity: (αx + y , z) = α(x , z) + (y , z)

• positive-definiteness: (x , x) > 0 ⇔ x 6= 0

Fact: (x , y)A ≡ xTAy is an inner product

A-orthogonality (conjugacy)We say two vectors x , y ∈ R

n are A-orthogonal, orconjugate, if

(x , y)A = xTAy = 0.

What happens if we use A-orthogonality rather thanstandard orthogonality in the method of orthogonaldirections?

Let {p0, p1, . . . , pn−1} be a set of n linearlyindependent vectors that are A-orthogonal. If pi isthe i th column of P, then

PTAP = Σ

where Σ is a diagonal matrix.

Substitute x = Py into the quadratic form:

f (Py) = yTΣy − (PTb)Ty .

We can apply the method of orthogonal directionsin y -space.

Let {p0, p1, . . . , pn−1} be a set of n linearlyindependent vectors that are A-orthogonal. If pi isthe i th column of P, then

PTAP = Σ

where Σ is a diagonal matrix.

Substitute x = Py into the quadratic form:

f (Py) = yTΣy − (PTb)Ty .

We can apply the method of orthogonal directionsin y -space.

New Problem: how do we get the set {pi} ofconjugate vectors?

Gram-Schmidt ConjugationLet {d0, d1, . . . , dn−1} be a set of linearlyindependent vectors, e.g., coordinate axes.

• set p0 = d0

• for i > 0

pi = di −i−1∑

βijpj

where βij = (di , pj)A/(pj , pj)A.

New Problem: how do we get the set {pi} ofconjugate vectors?

Gram-Schmidt ConjugationLet {d0, d1, . . . , dn−1} be a set of linearlyindependent vectors, e.g., coordinate axes.

• set p0 = d0

• for i > 0

pi = di −i−1∑

βijpj

where βij = (di , pj)A/(pj , pj)A.

The Method of Conjugate DirectionsForce the error at iteration i + 1 to be conjugate tothe search direction pi .

pTi Aei+1 = pTi A(ei + αipi) = 0

⇒ αi = −pTi Aei

pTi Api

=pTi ri

pTi Api

• never need to search along pi again

pTi Api

=pTi ri

pTi Api

=pTi ri

pTi Api

The Method of Conjugate Directions[

5 −3−3 5

](x1x2

0 0.5 1 1.5 2 2.5 3 3.5 40.5

5 −3−3 5

](x1x2

0 0.5 1 1.5 2 2.5 3 3.5 40.5

5 −3−3 5

](x1x2

0 0.5 1 1.5 2 2.5 3 3.5 40.5

5 −3−3 5

](x1x2

0 0.5 1 1.5 2 2.5 3 3.5 40.5

5 −3−3 5

](x1x2

0 0.5 1 1.5 2 2.5 3 3.5 40.5

The Method of Conjugate Directions is well defined,and avoids the “zig-zagging”of Steepest Descent.

What about computational expense?

• If we choose the di in Gram-Schmidtconjugation to be the coordinate axes, theMethod of Conjugate Directions is equivalentto Gaussian elimination.

• Keeping all the pi is the same as storing adense matrix!

Can we find a smarter choice for di?

The Method of Conjugate Directions is well defined,and avoids the “zig-zagging”of Steepest Descent.

What about computational expense?

• If we choose the di in Gram-Schmidtconjugation to be the coordinate axes, theMethod of Conjugate Directions is equivalentto Gaussian elimination.

• Keeping all the pi is the same as storing adense matrix!

Can we find a smarter choice for di?

Error Decomposition Using pi

0 0.5 1 1.5 2 2.5 3 3.5 40.5

Error Decomposition Using pi

0 0.5 1 1.5 2 2.5 3 3.5 40.5

The error at iteration i can be expressed as

ei =n−1∑

αkpk ,

so the error must be conjugate to pj for j < i :

pTj Aei = 0, ⇒ pTj ri = 0,

but from Gram-Schmidt conjugation we have

ei =n−1∑

αkpk ,

ei =n−1∑

αkpk ,

ei =n−1∑

αkpk ,

pj = dj −j−1∑

βjkpk .

ei =n−1∑

αkpk ,

pTj ri = dTj ri −

j−1∑

βjkpTk ri

0 = dTj ri , j < i .

Thus, the residual at iteration i is orthogonal to thevectors dj used in the previous iterations:

dTj ri = 0, j < i

Idea: what happens if we choose di = ri?

• residuals become mutually orthogonal

• ri is orthogonal to pj , for j < i ⋆

• ri+1 becomes conjugate to pj , for j < i

This last point is not immediately obvious, so wewill prove it. This result has significant implicationsfor Gram-Schmidt conjugation.

⋆we showed this is true for any choice of di

Thus, the residual at iteration i is orthogonal to thevectors dj used in the previous iterations:

dTj ri = 0, j < i

Idea: what happens if we choose di = ri?

• residuals become mutually orthogonal

• ri is orthogonal to pj , for j < i ⋆

• ri+1 becomes conjugate to pj , for j < i

This last point is not immediately obvious, so wewill prove it. This result has significant implicationsfor Gram-Schmidt conjugation.

⋆we showed this is true for any choice of di

The solution is updated according to

xj+1 = xj + αjpj

⇒ rj+1 = rj − αjApj

⇒ Apj =1

(rj − rj+1).

Next, take the dot product of both sides with anarbitrary residual ri :

rTi Apj =

rTi riαi

, i = j

− rTi riαi−1

, i = j + 1

0, otherwise.

xj+1 = xj + αjpj

⇒ Apj =1

(rj − rj+1).

rTi Apj =

rTi riαi

, i = j

− rTi riαi−1

, i = j + 1

0, otherwise.

xj+1 = xj + αjpj

⇒ Apj =1

(rj − rj+1).

rTi Apj =

rTi riαi

, i = j

− rTi riαi−1

, i = j + 1

0, otherwise.

xj+1 = xj + αjpj

⇒ Apj =1

(rj − rj+1).

rTi Apj =

rTi riαi

, i = j

− rTi riαi−1

, i = j + 1

0, otherwise.

xj+1 = xj + αjpj

⇒ Apj =1

(rj − rj+1).

rTi Apj =

rTi riαi

, i = j

− rTi riαi−1

, i = j + 1

0, otherwise.

We can show that the first case (i = j) contains nonew information (homework). Divide the remainingcases by pTj Apj and insert the definition of αi−1:

rTi Apj

pTj Apj︸︷︷︸

− rTi ri

rTi−1ri−1

, i = j + 1

0, otherwise.

We recognize the L.H.S. as the coefficients inGram-Schmidt conjugation

• only one coefficient is nonzero!

We can show that the first case (i = j) contains nonew information (homework). Divide the remainingcases by pTj Apj and insert the definition of αi−1:

rTi Apj

pTj Apj︸︷︷︸

− rTi ri

rTi−1ri−1

, i = j + 1

0, otherwise.

We recognize the L.H.S. as the coefficients inGram-Schmidt conjugation

• only one coefficient is nonzero!

The Conjugate Gradient MethodSet p0 = r0 = b − Ax0 and i = 0

αi = (pTi ri)/(pTi Api) (step length)

xi+1 = xi + αipi (sol. update)

ri+1 = ri − αiApi (resid. update)

βi+1,i = −(rTi+1ri+1)/(rTi ri) (G.S. coeff.)

pi+1 = ri+1 − βi+1,i pi (Gram Schmidt)

i := i + 1

The Conjugate Gradient Method[

5 −3−3 5

](x1x2

0 0.5 1 1.5 2 2.5 3 3.5 40.5

5 −3−3 5

](x1x2

0 0.5 1 1.5 2 2.5 3 3.5 40.5

5 −3−3 5

](x1x2

0 0.5 1 1.5 2 2.5 3 3.5 40.5

Lecture Objectives� describe when CG can be used to solve Ax = b

A must be symmetric positive-definite

� relate CG to the method of conjugate directionsCG is a method of conjugate directions withthe choice di = ri , which simplifiesGram-Schmidt conjugation

� describe what CG does geometricallyPerforms the method of orthogonal directionsin a transformed space where the contours ofthe quadratic form are aligned with thecoordinate axes

References

• Saad, Y., “Iterative Methods for Sparse LinearSystems”, second edition

• Shewchuk, J. R., “An introduction to theConjugate Gradient method without theagonizing pain”

The Conjugate Gradient Method - Stanford...

Transcript of The Conjugate Gradient Method - Stanford...

The Conjugate Gradient Method - Stanford...

Documents

Transcript of The Conjugate Gradient Method - Stanford...

Painless Conjugate Gradient - Carnegie Mellon School of ...

Conjugate Gradient:Conjugate Gradientxhx/courses/ConvexOpt/projects/Patrick...• Shewchuck , “An Introduction to the Conjugate Gradient MethodAn Introduction to the Conjugate Gradient

The HPC Conjugate Gradient (HPCG)

3. Conjugate gradient method

Conjugate Gradient Explanation

Gradient Methods April 2004. Preview Background Steepest Descent Conjugate Gradient.

Iteration-Fusing Conjugate Gradient

Approximate Riemannian Conjugate Gradient Learning for Fixed

Gradient Conjugate

Gradient-Free Optimization - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/chapter6_gradfree.pdf · Gradient-Free Optimization ... Particle Swarm Optimization 6.2

Classroom Figures for the Conjugate Gradient Method ...

Conjugate Gradient Method

ordering methods for preconditioned conjugate gradient methods

Conjugate Gradient Methods for Multidimensional Optimizationdmitra/SciComp/19Fall/Presentations/Conjugate... · Conjugate Gradient Methods for Multidimensional Optimization Stephen

PRECONDITIONED CONJUGATE GRADIENT SOLVER FOR …

Painless Conjugate Gradient

Nonlinear Conjugate Gradient Methods Conjugate Gradient … · are provided for the conjugate gradient method assuming the descent property of each search direction. Some research

The Conjugate Gradient Algorithm - University of Washington

The Conjugate Gradient Method...Conjugate Gradient Algorithm [Conjugate Gradient Iteration] The positive deﬁnite linear system Ax = b is solved by the conjugate gradient method.

Non-Linear Conjugate Gradient Magnetotelluric Inversion ...