Embed Size (px)
Transcript of Conjugate Gradient
0. HistoryWhy iterate?Direct algorithms require O(n) work.
1950: n=201965: n=2001980: n=20001995: n=20000dimensional increase: 103computer hardware: 109
0. HistoryIf matrix problems could be solved in O(n) time, matrices could be 30 bigger.
There are direct algorithms that run in about O(n2.4) time, but their constant factors are to big for practicle use.
For certain matrices, iterative methods have the potential to reduce computation time to O(m).
1. IntroductionCG is the most popular method for solving large systems of linear equations Ax = b. CG is an iterative method, suited for use with sparse matrices with certain properties.
In practise, we generally dont find dense matrices of a huge dimension, since the huge matrices often arise from discretisation of differential of integral equations.
2. NotationMatrix: A, with components Aij
Vector, n x 1 matrix: x, with components xi
Linear equation: Ax=b with components Aij xj = bi
2. NotationTransponation of a matirx: (AT)ij = Aji
Inner product of two vectors: xTy = xiyi
If xTy = 0, then x and y are orthogonal
3. Properties of AA has to be an n x n matrix.
A has to be positive definite, xTAx > 0
A has to be symmetric, AT = A
4. Quadratic FormsA QF is a scalar quadratic function of a vector:
4. Quadratic FormsGradient: Points to the greatest increase of f(x)
4. Quadratic Formspositive definitexT A x > 0negative definitexT A x < 0positive indefinitexT A x 0indefinite
5. Steepest DescentStart at an arbitrary point and slide down to the bottom of the paraboloid. Steps x(1), x(2), in the direction f(xi)
Error e(i) = x(i) xResidual r(i) = b Ax(i)
r(i) = - Ae(i)r(i) = - f`(x(i))
5. Steepest Descentx(i+1) = x(i) + r(i) , but how big is ?
f`(x(i+1)) orthogonal to r(i)
search line r(i)
5. Steepest DescentThe algorithm above requires two matrix multiplications per iteration. One can be eliminated by multiplying the last equation by A.
5. Steepest DescentThis sequence is generated without any feedback of x(i). Therefore, floatingpoint roundoff errors may accumulate and the sequence could converge at some point near x. This effect can be avoided by periodically recomputing the correct residual using x(i).
6. Eigenvectorsv is an eigenvector of A, if a scalar so that A v = v is then called an eigenvalue.
A symmetric n x n matrix always has n independent eigenvectors which are orthogonal. A positive definite matrix has positive eigenvalues.
7. Convergence of SDConvergence of SD requires the error e(i) to vanish. To measure e(i), we use the A- norm:
Some math now yields
7. Convergence of SDSpectral condition number .An upper bound for is found by setting
.We therefore have instant convergence if all the eigenvalues of A are the same.
7. Convergence of SDlarge small large large small small small large
8. Conjugate DirectionsSteepest Descent often takes steps in the same direction as earlier steps. The solution is to take a set of A-orthogonal search directions d(0), d(1), , d(n-1) and take exactly one step of the right length in each direction.
8. Conjugate DirectionsA-orthogonalorthogonal
8. Conjugate DirectionsDemanding dT(i) to be A-orthogonal on the next error e(i+1), we get . Generating search directions by Gram-Schmidt Conjugation. Problem: O (n)
8. Conjugate DirectionsCD chooses , so that is minimized. The error term is therefore A-orthogonal to all the old search directions.
9. Conjugate GradientThe residual is orthogonal to the previous search directions.
9. Conjugate GradientGram-Schmidt conjugation becomes easy, because r(i+1) is already A-orthogonal to all the previous search directions except d(i).r(i+1) is A-orthogonal to Di
9. Conjugate Gradient
11. PreconditioningImproving the condition number of the matrix before the calculation. Example:
Attempt to strech the quadratic form to make it more spherical.Many more sophisticated preconditioners have been developed and are nearly always used.
12. OutlookCG can also be used to solve
To solve non-linear Problems with CG, one has to make changes in the algorithm. There are several possibilities, and the best choice is still under research. .
12. OutlookIn non-linear problems, there may be several local minima to which CG might converge.
It is therefore hard to determine the right step size.
12. OutlookThere are other algorithms in numerical linear algebra closely related to CG.They all use Krylov subspaces.