Numerical Analysis — an Introduction
Transcript of Numerical Analysis — an Introduction
Numerical Analysis —an Introduction
Review
www.maths.lth.se/na/courses/FMN011
Carmen Arevalo
Textbook: Numerical Analysis, by Timothy Sauer.
Pearson Addison Wesley.
Numerisk Analys, Matematikcentrum, Lunds Universitet, 2013
Numerical Analysis — an Introduction Review – p. 1/48
Error analysis
◮ the absolute error is Ep = |p− p|
◮ the relative error is Rp =|p−p||p|
◮ correct (significant) digits
◮ types of errors: truncation, round-off, noise
◮ loss of significant digits
◮ If f(r) = 0, and x approximates r, the residual is |f(x)|and the error is |r − x|. Desirable: small residual ⇒small error
Numerical Analysis — an Introduction Review – p. 2/48
Bisection theorem (to solve f(x) = 0)
Suppose
◮ f is continuous in [a, b]
◮ f(r) = 0 for some r ∈ [a, b]
◮ f(a) and f(b) have opposite signs
If {cn} is the sequence produced by the bisection method,then
|r − cn| ≤ bn−an2
= b−a2n+1
so limn→∞
cn = r
Numerical Analysis — an Introduction Review – p. 3/48
Fixed Point Iteration (to solve f(x) = 0)
Rewrite f(x) = 0 as x = g(x)
r is a fixed point of the function g if r = g(r)
Theorem
g : [a, b] → R has a unique fixed point if:
◮ g is continuous on [a, b]
◮ g : [a, b] → [a, b] (assures existence)
◮ |g′(x)| < 1 for all x ∈ [a, b] (assures uniqueness)
A fixed point iteration has the form pk+1 = g(pk)
◮ |g′(r)| ≤ K < 1 ⇒ {pn} −→ r
◮ |g′(r)| > 1 ⇒ {pn} 9 r
Numerical Analysis — an Introduction Review – p. 4/48
Newton-Raphson Method (to solve f(x) = 0)
To solve f(x) = 0 with quadratic convergence
pk+1 = pk −f(pk)
f ′(pk)
Multiple roots: linear convergence; modified Newton’smethod for root of multiplicity m: (quadratic convergence)
pk+1 = pk −mf(pk)
f ′(pk)
Secant method: convergence rate of ≈ 1.6
pk = pk−1 −f(pk−1)(pk−1 − pk−2)
f(pk−1)− f(pk−2)
Numerical Analysis — an Introduction Review – p. 5/48
Newton’s Method for Systems
If f(x) = [f1(x), . . . , fn(x)]T ,
Newton’s method has the form
pk+1 = pk − Jf (pk)−1f(pk)
where Jf (x) is the Jacobian matrix of f
[
∂f(x)
∂x1
,∂f(x)
∂x2
, . . . ,∂f(x)
∂xn
]
Numerical Analysis — an Introduction Review – p. 6/48
Solving a system of equations, Ax = b
Equivalent systems have the same solution
Elementary operations on rows that yield an equivalentsystem:
◮ Row interchanges
◮ Multiplication by a constant
◮ rowr = rowr −mrp × rowp
To solve a system:
1. Perform a Gaussian elimination (to obtain an uppertriangular matrix)
2. Perform a back substitution
Numerical Analysis — an Introduction Review – p. 7/48
Solving Triangular Linear Systems
Upper triangular matrix: back substitution
Lower triangular matrix: forward substitution
Computational complexity:Total number of operations = N 2
Numerical Analysis — an Introduction Review – p. 8/48
Triangular factorization, A = LU
Ax = b
1. Solve Ly = b with forward substitution to get y
2. Use y in Ux = y and solve with back substitution) toget x
Computational complexity:
Total number of operations =2N 3
3− N 2
2− N
6
Numerical Analysis — an Introduction Review – p. 9/48
Vector and matrix norms
◮ 1-norm: ||x||1 =∑n
i=1 |xi|◮ ||A||1 = maxj
∑ni=1 |aij|
◮ 2-norm: ‖x‖2 =√∑n
i=1 |xi|2
◮ ‖A‖2 =√
ρ(ATA)
◮ ∞-norm: ‖x‖∞ maxi |xi|◮ ||A||∞ = maxi
∑nj=1 |aij|
Numerical Analysis — an Introduction Review – p. 10/48
Ill conditioning and pivoting
Ax = b is ill conditioned if small perturbations in thecoefficients of A or b produce large changes in x
κp(A) = ||A||p · ||A−1||p
If κ(A) ≈ 10k, about k significant digits will be lost in solvingAx = b.
Partial pivoting: choose largest magnitude in column
Numerical Analysis — an Introduction Review – p. 11/48
LU factorization with pivoting
Permutation matrix: P 2 = P (rows are permutations of therows of I).
If A is nonsingular, there is a P such that PA = LU
Ax = b ⇒ LUx = Pb
1. Compute L, U and P
2. Compute Pb
3. Solve Ly = Pb with forward substitution
4. Solve Ux = y with backward substitution
Numerical Analysis — an Introduction Review – p. 12/48
Iterative Methods for Linear Systems
Given x0, we construct the method
xk+1 = Bxk + c
so that a fixed point of g(x) = Bx+ c is a solution of Ax = b.
A = M −N with M nonsingular
xk+1 = M−1Nxk +M−1b
x0 can be arbitrary; however, convergence will be faster ifwe start with a good guess of the solution.
Numerical Analysis — an Introduction Review – p. 13/48
Jacobi, Gauss-Seidel and SOR methods
Separate A into upper, diagonal and lower parts:A = L+D + U
◮ Jacobi: M = D
◮ Gauss-Seidel: M = L+D
◮ SOR: accelerates GS with parameter 1 ≤ ω < 2
A is strictly diagonally dominant if |akk| >N∑
j=1,j 6=k
|akj|
If A is strictly diagonally dominant, then these methodsconverge for any choice of x0.
Numerical Analysis — an Introduction Review – p. 14/48
Convergence Theorems
Spectral radius of A: radius of smallest circle centered at 0in the complex plane that contains all eigenvalues of A
ρ(A) = max{|λ| : det(λI −A) = 0}
Suppose we have an iterative method
xk+1 = Bxk + c
1. The iterative method converges for any x0 if ‖B‖p < 1
for some p.
2. The iterative method converges for any x0) if and onlyif ρ(B) < 1.
Numerical Analysis — an Introduction Review – p. 15/48
Interpolation
y = f(x) interpolates {(x1, y1), (x2, y2), . . . , (xn, yn)} iff(xi) = yi for each i = 1, 2, . . . , n
Basis functions Φ1,Φ2, . . . ,Φn : f(x) =n∑
j=1
yjΦj(x)
To determine coefficients yj: solve
Φ1(x1) Φ2(x1) · · · Φn(x1)
Φ1(x2) Φ2(x2) · · · Φn(x2)...
.... . .
...
Φ1(xn) Φ2(xn) · · · Φn(xn)
y1
y2...
yn
=
f(x1)
f(x2)...
f(xn)
Numerical Analysis — an Introduction Review – p. 16/48
Polynomial interpolation
Unique polynomial of degree n− 1 through n distinct points
◮ Monomial: {1, x, x2, . . . , xn−1}, Vandermonde matrix
◮ Lagrange:Lj(x) =
∏nk=1,k 6=j(x− xk)
∏nk=1,k 6=j(xj − xk)
, I matrix
◮ Newton:1, x− x1, . . . , (x− x1)(x− x2) · · · (x− xn−1)
,
triangular matrix (table of divided differences)
◮ Bernstein: Bni (t) =
(
n
i
)
(1− t)n−iti t ∈ [0, 1]
Numerical Analysis — an Introduction Review – p. 17/48
Interpolation error and Chebyshev nodes
f(x)− P (x) =f (n)(θ)
n!(x− x1)(x− x2) . . . (x− xn)
where θ ∈ [x1, xn] is unknown.
Error is reduced by choosing {x1, x2, . . . , xn} as the zerosof the Chebyshev polynomials
These nodes minimize e(x) = |(x−x1)(x−x2) . . . (x−xn)|and the e (not the points) is distributed evenly in [−1, 1].
To interpolate on [a,b], take the Chebyshev nodes on[−1, 1] and use the transformation
x =b+ a
2+
b− a
2t, t ∈ [−1, 1],
to get the nodes on [a, b].
Numerical Analysis — an Introduction Review – p. 18/48
Piecewise polynomials
Large number of data points: use low-degree polynomialsover subintervals.
Partition: a = x1 < x2 < x3 < · · · < xn = b
A different polynomial is used in each [xi−1, xi]
Splines: polynomial pieces joined together with certainsmoothness conditions.
Cubic splines: 2 endpoint conditions to be imposed.Matrix is strictly diagonally dominant, so system has aunique solution.
Numerical Analysis — an Introduction Review – p. 19/48
Parametric curves
If p ∈ Πn([a, b]), we can write it as a linear combination ofBernstein polynomials:
p(t) =
n∑
i=0
biBni (t) where Bn
i (t) = Bni (
t− a
b− a)
The coefficients bi are called Bézier or control points.
Numerical Analysis — an Introduction Review – p. 20/48
Bézier curves
Given a set of control points {Pi = (xi, yi)}ni=1,
A parametric Bézier curve is
X(t) = x0Bn−10 (t) + · · · + xnB
n−1n (t), t ∈ [0, 1]
Y(t) = y0Bn−10 (t) + · · · + ynB
n−1n (t), t ∈ [0, 1]
de Casteljau’s algorithm: points on the curve are evaluatedby successive linear interpolation.
Numerical Analysis — an Introduction Review – p. 21/48
Properties of Cubic Bézier curves◮ P1 = P(0) and P4 = P(1) lie on the Bézier curve
◮ P(t) is continuous and has derivatives of all orders
◮ P′(0) = 3(P2 −P1) andP′(1) = 3(P4 −P3)
◮ The Bézier curve lies in the convex hull of its set ofcontrol points
For planar objects, the convex hull is the polygon formedby "an elastic band encompassing the given object".
Composite Bézier curves: to make the curves meetsmoothly, take the meeting point and the two control pointsnext to it collinearly.
Numerical Analysis — an Introduction Review – p. 22/48
Least Squares Fitting
m data points, n equations (m > n)
1. Choose model (with unknown parameters x)
2. Substitute data into model (construct system Ax = b)
3. Solve normal equations (ATAx = AT b)
x is the least squares solution of the inconsistent systemAx = b.
The least squares solution minimizes ‖b− Ax‖2.r = b−Ax is the residual vector of the least squaressolution.
Numerical Analysis — an Introduction Review – p. 23/48
Periodic data
If g has period P , take as model a Trigonometricpolynomial of order M
TM(x) = a0 +M∑
j=1
(
aj cos(2π
Pjx) + bj sin(
2π
Pjx)
)
For even functions (f(−x) = f(x)): bj = 0,For odd functions (f(−x) = −f(x)): aj = 0.
Numerical Analysis — an Introduction Review – p. 24/48
Model linearization
Model linearization: (e.g., y = cekt)
◮ Linearize (ln y = ln c+ kt)
◮ Substitute (Y = ln y, C = ln c) to get linear equation(Y = kt+ C)
◮ Solve normal equations to get parameters (C and k)
◮ Convert to original parameters (c = eC)
Numerical Analysis — an Introduction Review – p. 25/48
Gram-Schmidt Orthogonalization
Orthogonalize set {v1, v2, . . . , vk}1. y1 = v1, q1 =
v1‖v1‖2
.
2. y2 = v2 − q1(qT1 v2), q2 =
y2‖y2‖2
.
3. · · ·4. yi = vi − q1(q
T1 vi)− · · · − qi−1(q
Ti−1vi), qi =
yi‖yi‖2
.
Note that projqjvi = qj(qTj vi) and qj⊥qi
Complete orthonormal basis by adding vectorsqk+1, . . . ,qn
Numerical Analysis — an Introduction Review – p. 26/48
Least Squares by QR-factorization
Given the n× k overdetermined system Ax = b, findA = QR and set
◮ R = upper k × k submatrix of R
◮ d = upper k elements of d = QT b
Solve Rx = d for least squares solution x.
The least squares solution minimizes‖b− Ax‖2 = ‖b−QRx‖2 = ‖QT b− Rx‖2
Numerical Analysis — an Introduction Review – p. 27/48
QR-factorization with Householder Reflectors◮ x1 is first column of A
◮ w1 = ±(‖x1‖2, 0, 0)◮ v1 = w1 − x1; P = v1v
T1 /v
T1 v1
◮ H1 = I − 2P ; H1A =
x x x
0 x x
0 x x
0 x x
◮ x2 is second column of submatrix starting at secondrow
Repeat the process with submatrices to get
A = H1H2H3R = QR
Numerical Analysis — an Introduction Review – p. 28/48
Gram-Schmidt vs Householder
Number of operations:
◮ Gram-Schmidt: k3
◮ Householder:2
3k3
Householder has lower memory requirements and lesserror amplification
With Gram-Schmidt the orthogonality property of Q mightbe lost because of possible cancelation in a computationlike
y3 = v3 − q1(qT1 v3)− q2(q
T2 v3)
Numerical Analysis — an Introduction Review – p. 29/48
Some Properties of Eigenvalues and Eigenvectors
◮ If u is an eigenvector, then ku is one too.
◮ The corresponding eigenvalue of u is the Rayleigh
quotient, λ =uTAu
uTu
◮ λ eigenvalue of A ⇒ λ−1 eigenvalue of A−1 (sameeigenvector)
◮ λ eigenvalue of A ⇒ λ− s eigenvalue of A− sI (sameeigenvector)
◮ (λ− s)−1 eigenvalue of (A− sI)−1 (same eigenvector)
◮ If A = S−1BS, then A and B have the sameeigenvalues (but not the same eigenvectors)
Numerical Analysis — an Introduction Review – p. 30/48
The Power MethodComputing the dominant eigenvalue/eigenvector
Suppose:• The eigenvectors of A form a basis• A has unique λ1 of maximum modulus
Start with x0 and define
yk−1 =xk−1
‖xk−1‖2xk = Ayk−1
λk = yTk−1xk
Speed of convergence is linear, and governed by |λ2/λ1|
Numerical Analysis — an Introduction Review – p. 31/48
The Shifted Inverse Power Method
To find the eigenvalue nearest to s:
Start with x0
Set B = A− sI
Set yk−1 = xk−1/‖xk−1‖2Solve Bxk = yk−1
Set ηk = xTk yk−1
λ =1
η+ s
Numerical Analysis — an Introduction Review – p. 32/48
QR Algorithm
A0 ≡ A = Q1R1
A1 ≡ R1Q1 = Q2R2
A2 ≡ R2Q2 = Q3R3
A3 ≡ R3Q3 = Q4R4
...If A is symmetric with |λ1| > |λ2| > · · · > |λm|, it convergeslinearly to a diagonal matrix containing the eigenvalues ofA and Q1 · · ·Qj converges to a matrix whose columns arethe corresponding eigenvectors of A.
Modified QR algorithm for A asymmetric: converges to anupper triangular matrix
Numerical Analysis — an Introduction Review – p. 33/48
Singular Values and Singular VectorsEigenvalues of ATA areλ1 = s21 ≥ λ2 = s22 ≥ · · · ≥ λn = s2n ≥ 0
with orthonormal eigenvectors v1, . . . , vn.
Take si ≥ 0. Define ui, i = 1, . . . ,m:
◮ If si 6= 0, ui = Avi/si
◮ If si = 0, ui is any unit vector orthogonal tou1, . . . ui−1.
◮ {v1, . . . , vn} are the (right singular vectors)
◮ {u1, . . . , um} are the (left singular vectors)
◮ Avi = siui, with s1 ≥ · · · ≥ sn ≥ 0 (si are the singularvalues)
Numerical Analysis — an Introduction Review – p. 34/48
Singular Value Decomposition
A = USV T
◮ SVD of Symmetric Matrices: si = |λi|vi are the corresponding unit eigenvectors of Aui are• vi if λi ≥ 0
• −vi if λi < 0
◮ rank(A)=rank(S)=number of nonzero elements of S
◮ | det(A)| = s1 · · · sn◮ A−1 = V S−1UT
Numerical Analysis — an Introduction Review – p. 35/48
SVD and low-rank approximation, compression
Low rank approximation:
A =
rank(A)∑
i=1
siuivTi
The best least squares approximation to A of rank p ≤ r isprovided by retaining the first p terms of the sum
If A is an n× n matrix, it contains n2 entries, but each termin the sum requires 2n+ 1 numbers
If the first singular value is much larger than the rest, mostof the information is captured by the first term.
Numerical Analysis — an Introduction Review – p. 36/48
Fourier matrix
The DFT of x = [x0, . . . , xn−1]T is
1√n
ω0 ω0 ω0 · · · ω0
ω0 ω1 ω2 · · · ωn−1
ω0 ω2 ω4 · · · ω2(n−1)
ω0 ω3 ω6 · · · ω3(n−1)
......
......
ω0 ωn−1 ω2(n−1) · · · ω(n−1)2
x0
x1
x2
...
xn−1
where ω = e−i2π/n.
Numerical Analysis — an Introduction Review – p. 37/48
Discrete Fourier Transform
Fnx = y, where
yk =1√n
n−1∑
j=0
xjωjk
F−1n = Fn
Unitary matrix: F−1 = F T
Orthogonal (real) ↔ Unitary (complex)
If x ∈ Rn, then y0 ∈ R and yn−k = yk
Numerical Analysis — an Introduction Review – p. 38/48
Fast Fourier Transform
Algorithm for computing the DFT: at each stage ittransforms the vector into 2 half-length vectors.
For n = 2N , the computational complexity is n log2 n.
For n prime it is n2.
Numerical Analysis — an Introduction Review – p. 39/48
DFT interpolation
Given x0, x1, . . . , xn−1, lettj = c+ j(d− c)/n, j = 0, 1, . . . , n− 1. Then
Q(t) =1√n
n−1∑
k=0
ykei2πk(t−c)/(d−c)
where yk = Fnxk, satisfies Q(tj) = xj for j = 0, . . . , n− 1.
If the x ∈ Rn and yk = ak + ibk, then
Q(t) =1√n
n−1∑
k=0
(
ak cos2πk(t− c)
d− c− bk sin
2πk(t− c)
d− c
)
Numerical Analysis — an Introduction Review – p. 40/48
Evaluation of trigonometric functions
To plot the interpolating trigonometric function, we caninvert the expanded DFT. The steps are the following:
1. Calculate the DFT of the evenly spaced data points:x → Fnx
2. Multiply by√
p/n: Fnx →√
p/nFnx
3. Expand the n points to p points: add zeros in positionsn/2 + 1 to p− n/2
4. Invert:√
p/nFnx → F−1p
√
p/nFnx.
Numerical Analysis — an Introduction Review – p. 41/48
Orthogonal Function InterpolationIf
A =
f0(t0) f0(t1) · · · f0(tn−1)
f1(t0) f1(t1) · · · f1(tn−1)...
......
fn−1(t0) fn−1(t1) · · · fn−1(tn−1)
is a real orthogonal matrix, then a function that interpolatesthe points (tj , xj) is
F (t) =n−1∑
k=0
ykfk(t),
where y = Ax.
Numerical Analysis — an Introduction Review – p. 42/48
Least squares with DFTLet {t0 = c, t1, . . . , tn−1 = c+ (n− 1)(d− c)/n} be the n
(even) equally spaced points on [c, d] and suppose we wantto have only the m < n functions {f0(t), f1(t), . . . , fm−1(t)},where m is even.
The normal equations are
c = Amx (no solving, just a matrix-vector product!)
and the least squares approximation using the first m basisfunctions is
Fm(t) =
m−1∑
k=0
ykfk(t)
Applications: filtering for audio compression or noiseremoval
Numerical Analysis — an Introduction Review – p. 43/48
Discrete cosine transform
y = Cx
C is a real orthogonal matrix and consists only of cosines.
Like the DFT, the DCT transforms n data points into n
interpolation coefficients.
Like for the DFT, the choice of m < n coefficientsy0, . . . , ym−1 gives a least-squares approximation.
2D-DCT in image processing
Y = CXCT
X = CTY C
Numerical Analysis — an Introduction Review – p. 44/48
Image compression
Crude compression: replace each k × k pixel block by itsaverage value
DCT compression:
1. take the 2D-DCT for each k × k matrix block,
2. do a least-squares approximation,
3. apply the inverse 2D-DCT.
Numerical Analysis — an Introduction Review – p. 45/48
Quantization
Quantization (mod q): round(y/q)Dequantization: y = q · round(y/q)With quantization matrix: YQ = [round(ykl/qkl)]
The larger qkl, the more the loss and the greater thecompression.
Numerical Analysis — an Introduction Review – p. 46/48
Huffman coding
Shannon information
I = −k∑
i=1
pi log2 pi
Huffman tree
Assign shorter codes to symbols with higher probabilities.From bottom up, join symbols with smallest probabilities.Assign a 0 to left branches, 1 to right branches.
Numerical Analysis — an Introduction Review – p. 47/48
Huffman coding for JPEG
The code for y00 (DC component) has two parts, the first isobtained from the DPCM tree, and the second part fromthe integer identifying table. The DC coefficient is a binaryformed by the concatenation of these two parts.
AC components are coded in a run-length pair (n,L), wheren is the length of a run of zeros and L is the length of thenext nonzero entry. Then a Huffman AC tree is used tocode these pairs. After that comes the integer identifyingcode.
Numerical Analysis — an Introduction Review – p. 48/48