Conjugate Gradient algorithm - Aalto SCI MS Conjugate Gradient method for Least Squares (CGLS)...

download Conjugate Gradient algorithm - Aalto SCI MS Conjugate Gradient method for Least Squares (CGLS) ¢â‚¬  Need:

of 21

  • date post

    09-Apr-2020
  • Category

    Documents

  • view

    4
  • download

    0

Embed Size (px)

Transcript of Conjugate Gradient algorithm - Aalto SCI MS Conjugate Gradient method for Least Squares (CGLS)...

  • E. Somersalo

    Conjugate Gradient algorithm

    • Need: A symmetric positive definite; • Cost: 1 matrix-vector product per step; • Storage: fixed, independent of number of steps.

    The CG method minimizes the A norm of the error,

    xk = arg min x∈Kk(A,b)

    ‖x− x∗‖2A.

    x∗ = true solution, ‖z‖2A = zTAz.

    Computational Methods in Inverse Problems, Mat–1.3626 0-0

  • E. Somersalo

    Krylov subspaces

    The kth Krylov subspace associated with the matrix A and the vector b is

    Kk(A, b) = span{b, Ab, . . . , Ak−1b}.

    Iterative methods which seek the solution in a Krylov subspace are called Krylov subspace iterative methods.

    Computational Methods in Inverse Problems, Mat–1.3626 0-1

  • E. Somersalo

    At each step, minimize

    α 7→ ‖xk−1 + αpk−1 − x∗‖2A Solution:

    αk = ‖rk−1‖2

    pTk−1Apk−1

    New update xk = xk−1 + αk−1pk−1.

    Search directions: p0 = r0 = b−Ax0,

    Computational Methods in Inverse Problems, Mat–1.3626 0-2

  • E. Somersalo

    Iteratively, A-conjugate to the previous ones:

    pTk Apj = 0, 0 ≤ j ≤ k − 1.

    Found by writing

    pk = rk + βkpk−1, rk = b−Axk,

    βk = ‖rk‖2 ‖rk−1‖2 ,

    Computational Methods in Inverse Problems, Mat–1.3626 0-3

  • E. Somersalo

    Algorithm (CG)

    Initialize: x0 = 0; r0 = b−Ax0; p0 = r0;

    for k = 1, 2, . . . until stopping criterion is satisfied

    αk = ‖rk−1‖2

    pTk−1Apk−1 ;

    xk = xk−1 + αkpk−1;

    rk = rk−1 − αkApk−1;

    βk = ‖rk‖2 ‖rk−1‖2 ;

    pk = rk + βkpk−1;

    end

    Computational Methods in Inverse Problems, Mat–1.3626 0-4

  • E. Somersalo

    CGLS Method

    Conjugate Gradient method for Least Squares (CGLS)

    • Need: A can be rectangular (non-square); • Cost: 2 matrix-vector products (one with A, one with AT) per step; • Storage: fixed, independent of number of steps.

    Mathematically equivalent to applying CG to normal equations

    ATAx = ATb

    without actually forming them.

    Computational Methods in Inverse Problems, Mat–1.3626 0-5

  • E. Somersalo

    CGLS minimization problem

    The kth iterate solves the minimization problem

    xk = arg min x∈Kk(ATA,ATb)

    ‖b−Ax‖.

    The kth iterate xk of CGLS method (x0 = 0) is characterized by

    Φ(xk) = min x∈Kk(ATA,ATb)

    Φ(x)

    where Φ(x) :=

    1 2 xTATAx− xTATb.

    Computational Methods in Inverse Problems, Mat–1.3626 0-6

  • E. Somersalo

    Determination of the minimizer

    Perform sequential linear searches along ATA-conjugate directions

    p0, p1, . . . , pk−1

    that span Kk(ATA,ATb).

    Determine xk from xk−1 and pk−1 according to

    xk := xk−1 + αk−1pk−1

    where αk−1 ∈ R solves

    min α∈R

    Φ(xk−1 + αpk−1).

    Computational Methods in Inverse Problems, Mat–1.3626 0-7

  • E. Somersalo

    Residual Error

    Introduce the residual error associated with xk:

    rk := ATb−ATAxk.

    Then pk := rk + βk−1pk−1

    Choose βk−1 so that pk is ATA-conjugate to the previous search directions:

    pTk A TApj = 0, 1 ≤ j ≤ k − 1.

    Computational Methods in Inverse Problems, Mat–1.3626 0-8

  • E. Somersalo

    Discrepancy

    The discrepancy associated with x is

    dk = b−Axk.

    Then rk = ATdk

    It was shown by Hestenes and Stiefel that

    ‖dk+1‖ ≤ ‖dk‖; ‖xk+1‖ ≥ ‖xk‖.

    Computational Methods in Inverse Problems, Mat–1.3626 0-9

  • E. Somersalo

    Algorithm (CGLS)

    x0 := 0; d0 = b; r0 = ATb; p0 = r0; t0 = Ap0;

    for k = 1, 2, . . . until stopping criterion is satisfied

    αk = ‖rk−1‖2/‖tk−1‖2

    xk = xk−1 + αkpk−1;

    dk = dk−1 − αktk−1; rk = ATdk;

    βk = ‖rk‖2/‖rk−1‖2; pk = rk + βkpk−1;

    tk = Apk;

    end

    Computational Methods in Inverse Problems, Mat–1.3626 0-10

  • E. Somersalo

    Example: a Toy Problem

    An invertible 2× 2-matrix A,

    yj = Ax∗ + εj , j = 1, 2.

    Preimages, xj = A−1yj , j = 1, 2,

    Computational Methods in Inverse Problems, Mat–1.3626 0-11

  • E. Somersalo

    x *

    x 1

    x 2

    A

    y *

    y 1

    y 2

    A−1

    Computational Methods in Inverse Problems, Mat–1.3626 0-12

  • E. Somersalo

    solution by iterative methods: semiconvergence.

    Write B = Ax∗ + E, E ∼ N (0, σ2I),

    and generate a sample of data vectors, b1, b2, . . . , bn

    Solve with CG

    Computational Methods in Inverse Problems, Mat–1.3626 0-13

  • E. Somersalo

    Computational Methods in Inverse Problems, Mat–1.3626 0-14

  • E. Somersalo

    When should one stop iterating?

    Let Ax = b∗ + ε = Ax∗ + ε = b,

    Approximate information ‖ε‖ ≈ η,

    where η > 0 is known. Write

    ‖A(x− x∗)‖ = ‖ε‖ ≈ η

    Any solution satisfying ‖Ax− b‖ ≤ τη

    is reasonable.

    Morozov discrepancy principle

    Computational Methods in Inverse Problems, Mat–1.3626 0-15

  • E. Somersalo

    Example: numerical differentiation

    Let f : [0, 1] → R be a differentiable function, f(0) = 0.

    Data = f(tj)+ noise, tj = j

    n , j = 1, 2, . . . n.

    Problem: Estimate f ′(tj).

    Direct numerical differentiation by, e.g., finite difference formula does not work: the noise takes over.

    Where is the inverse problem?

    Computational Methods in Inverse Problems, Mat–1.3626 0-16

  • E. Somersalo

    Data

    −3 −2 −1 0 1 2 3 −0.5

    0

    0.5

    1

    1.5

    2

    2.5

    Noise level 5%

    Computational Methods in Inverse Problems, Mat–1.3626 0-17

  • E. Somersalo

    Solution by finite difference

    −3 −2 −1 0 1 2 3 −1.5

    −1

    −0.5

    0

    0.5

    1

    1.5

    2

    2.5

    Computational Methods in Inverse Problems, Mat–1.3626 0-18

  • E. Somersalo

    Formulation as an inverse problem

    Denote g(t) = f ′(t). Then,

    f(t) = ∫ t

    0

    g(τ)dτ.

    Linear model:

    Data = yj = f(tj) + ej = ∫ tj

    0

    g(τ)dτ + ej ,

    where ej is the noise.

    Computational Methods in Inverse Problems, Mat–1.3626 0-19

  • E. Somersalo

    Discretization

    Write ∫ tj 0

    g(τ)dτ ≈ 1 n

    j∑

    k=1

    g(tk).

    By denoting g(tk) = xk, y = Ax + e,

    where

    A = 1 n

     

    1 1 1 ...

    . . . 1 1

      .

    Computational Methods in Inverse Problems, Mat–1.3626 0-20