ENGINEERING COMPUTATION Lecture 5 Stephen Roberts MichaelmasTerm Approximate

1111

Engineering Computation ECL5-1

ENGINEERING COMPUTATION Lecture 5ENGINEERING COMPUTATION Lecture 5ENGINEERING COMPUTATION Lecture 5ENGINEERING COMPUTATION Lecture 5

Stephen Roberts MichaelmasTerm

Approximate representation of data and functionsApproximate representation of data and functionsApproximate representation of data and functionsApproximate representation of data and functions

Topics covered in this lecture:Topics covered in this lecture:Topics covered in this lecture:Topics covered in this lecture:

1. Fitting to data by polynomial regression (approximation)2. Function approximation by least squares approximation (including

orthonormal basis – Chebychev and Legendre)3 Awareness of other fitting methods (rational polynomials, splines,

FFT). This is not officially on the syllabus


Lecture Course Contents (8 lectures): 1.Linear simultaneous equations Ax=b: rank; nullity; kernels and echelon form.

2.Conditioning of simultaneous equations: ill-conditioning; vector norms; matrix norms; condition number.

3.Iterative solution of simultaneous equations Ax=b: Jacobi and Gauss-Siedel algorithms; convergence.

7.Computing Solutions of Ordinary Differential Equations: Euler method; modified Euler; Higher order equations.

8.Computing Solutions of Partial Differential Equations: elliptical (e.g. Laplace’s), parabolic (e.g. diffusion) and hyperbolic (e.g.wave) equations.

4.Computation of matrix eigenvalues and eigenvectors: power method; Rayleigh quotient, deflation of matrices.

5.Approximate representations of data: regression, overfitting, functional approximation, using orthonormal polynomials as a basis, Chebyshev polynomials, stable regression.

6.Computing derivatives and integrals: difference methods; trapezium method; Simpson’s rule, Romberg Integration.

2222


The question is how do we estimate a “best” fitting curve to a dataset that can be used to represent it for other analysis purposes (maybe visualisation, or for quantification)?

Fitting functions to data

In science, it is often convenient to represent data by fitted curves, both for convenience in storage and later use.

Consider the calibration of a noisy, non-linear pressure transducer. The squares in the plot on the right show the calibration measurements, which are noisy! As such, they are not much use for converting transducer readings in mV into pressure in bar.

However a fitted line can represent the calibration by an equation, in this case:

20.20.105.0 VVP −+=

The useful information in the 11 calibration points is better represented by the vector of the equation coefficients [0.5, 10.0, -2.0]T.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

1

2

3

4

5

6

7

8

9

Transducer output (V)

Pre

ssure

(b

ar)


Polynomial Regression

3333


Regression Polynomial Regression The mathematical process of fitting approximate functions to data is known as Regression. You probably have fitted straight lines to data before - this is Linear Regression. we will now extend this concept to fitting higher order polynomials - Polynomial Regression. We wish to fit a polynomial of degree M,

∑=

=++++=M

m

m

m

M

M xxxxxf0

2

210)( ααααα L

to a data set of N readings, (x1, y1), (x2, y2), ... (xN,yN),

choosing a set of polynomial coefficients ( )Mαααα L,,, 210

so as to minimise the N errors ( )nnn xfye −= .


Different Cases: the process is called

Interpolation M + 1 ≥ N. The polynomial is of higher order than the number of data points. The interpolated polynomial goes through all the data points. All the errors can be reduced to zero.

Approximate curve fitting M + 1 < N.

The polynomial is of lower order than the number of data points. The interpolated polynomial does not go through all the data points. Generally the errors cannot be reduced to zero (except in the unlikely case of the data points lying on a polynomial!).

We consider only the curve approximating case here (see lecture 8 for interpolation).

4444


Let us use vector-matrix notation to express the approximation problem concisely:

Polynomial coefficient vector αααα [ ]T

Mαααα L,,,

210= .

Variables [ ]Tx

Nxxx L,,

21= , [ ]T

yN

yyy L,,21

=

Vandermonde Matrix

=

M

NNN

M

M

xxx

xxx

xxx

L

LLLLL

L

L

2

2

3

22

1

2

11

1

1

1

V with ( )m

nmnxV =

, .

Let the fitted points y1 be the points on the polynomial evaluated at the points x:

y1 = Vαααα (Check this out by multiplying it out!).

Then the question is how do we minimise the error vector e = y - Vαααα to find the best values of α ?


Choice of Error Metric We need to define a suitable error vector. We could choose to minimise the

1. Maximum error

iei

max

This is sometimes referred to as the minimax problem but is difficult to solve. Further, it places too much weight on the importance of a small amount of data that is poor (the outliers).

2. Absolute deviation error

∑i

ie

This is again not an easy functional to minimise.This approach averages the errors but does not give much weighting to points that are considerably out of line with the rest..

3. Least squares error

( )2

∑i

ie

This is the easiest error measure to use. It is a compromise of the above two; it gives some weighting to points that are out-of-line with the rest, (better than (2) but is not so biased as (1).) This is the metric used in the method described next.

5555


Pseudo-inverse method for finding αααα.

In this method we minimise the 2-norm of the error vector 2

e (least squares

fit).

2

2e = y - Vαααα2 = ∑ ∑

= =

−

N

n

n

M

m

mmn yV1

2

0

, α

To minimise this, differentiate 2

2e with respect to each αk , k = 1, 2, ...M

and equate to zero:

∑ ∑= =

−

=

∂

∂=

N

n

knn

M

m

mmn

k

VyV1

,

0

,

2

20 αα

e

Rearrange as ∑∑∑== =

=N

n

nkn

N

n

M

m

mmnkn yVVV1

,

1 0

,, α or VTVαααα = VTy

.

Solving this gives αααα = ((VTV)-1VT)y . The matrix ((VTV)-1VT) is known as the Pseudo-inverse of V .


Example of linear regression (M = 1)

Fit a linear function to the data (0, 0), (1, 2), (2,1), (3,3) (N = 4).

x = [0 1 2 3]T , y = [0 2 1 3]T, so the Vandermonde matrix

=

31

21

11

01

V

With a little algebra,

(VTV)-1VT) =

−

−

3.01.01.03.0

2.01.04.07.0,

and thus, αααα = ((VTV)-1VT)y =

8.0

3.0.

The fitted line is then y = 0.3 +0.8x .

6666


0 0.5 1 1.5 2 2.5 30

0.5

1

1.5

2

2.5

3

x

y

The fitted line looks reasonable – or does it? In practice, check with cross-validation data


A further warning In the above line fitting, The unstated assumption is that

x - values are known exactly, y - values are in error.

If we take the other assumption,

y - values are known exactly, x - values are in error,

we get a different fit,

7777


0 0.5 1 1.5 2 2.5 30

0.5

1

1.5

2

2.5

3

x

yRegression on yassuming xaccurate

Regression on xassuming yaccurate


Example of Polynomial Fit Let us try a real polynomial fit. First let us construct some data, with a little noise added: x=[0:0.1:1]'; y=3*x + sin(1.5*pi*x) + 0.5*rand(length(x),1); plot(x,y,'*')

Now write a function to carry out a polynomial fit:

8888


function regress1(x,y,M,x1)

% fits and plots Mth order polynomial to (x,y)

% Plots fitted data at points of x1

V = (x * [0,M]); %construct Vandermonde matrix

for i = 0:M

V(:,i+1) = x.^i;

end

VI = inv(V'*V)*V'; %Pseudo inverse matrix

alpha = VI*y; %Compute polynomial coefficients.

V1 = (x1 * [0,M]); %construct Vandermonde matrix for x1

for i = 0:M

V1(:,i+1) = x1.^i;

end

f1 = V1*alpha; %evaluate polynomial at each x1

plot(x,y,'*',x1,f1); %plot data and fitted polynomial


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

3

2

3

M = 1

Running this on our data, and plotting against the x values x1 = [0:0.01:1]; gives a fit that gets better as M = 1, 2, 3.

9999


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

3

5

12

8

-0.2 0 0.2 0.4 0.6 0.8 1 1.2-35

-30

-25

-20

-15

-10

-5

0

5

The fifth order polynomial still looks reasonable, but the 8th is looking rather "ripply" at the ends, and the 12th order one no longer fits! MATLAB warns that the pseudo-inverse Matrix is close to singular or badly scaled. Results may be inaccurate. Life gets even worse when we try and use the polynomials to extrapolate beyond the limits of the original data, with the 8th order polynomial. See above right. In practice, high order polynomial fits tend to diverge wildly outside the source data, and do not give a good approximation to the original data.


Ill-conditioning Since the polynomial coefficients came from the solution of the

simultaneous linear equation set VTVαααα = VTy , the Ill-conditioning can be examined by evaluating the condition number k(VTV) using the MATLAB command cond(V’*V). For our example, this gives condition numbers which grow wildly!: m = 1 condition number = 16.1631 m = 2 condition number = 379.754 m = 3 condition number = 10119.7 m = 4 condition number = 298450 m = 5 condition number = 9.84817e+006 m = 6 condition number = 3.73265e+008 m = 7 condition number = 1.6942e+010 m = 8 condition number = 9.84243e+011 m = 9 condition number = 8.25222e+013 m = 10 condition number = 1.37408e+016

10101010


Recommendations for Polynomial fitting: 1. Use the lowest order polynomial that gives a reasonable approximation

to the data. Do not overfit the data, or the fitted polynomial will follow every bit of the noise on the data.

2. Realise that the fit may be bad near the ends of the data.

3. Do not use the polynomial approximation outside the range of the input data. i.e. Do not extrapolate the data!


MATLAB routines for polynomial fits MATLAB has good routines polyfit and polyval to fit and evaluate polynomial approximations. But note that MATLAB assumes that the vector alpha stores the coefficients of the polynomials in the opposite order, i.e. coefficient of xN first.

You may need to use flipud to get them in the desired order.

11111111


Approximation of Functions


Approximation of Functions We now turn to a second important topic in approximation theory. This relates to deriving a simple polynomial function f(x) to fit to an original function g(x), that can be used to approximate the original function. This is particularly useful when the original function is difficult to differentiate or integrate. To introduce this topic we need to give some definitions

12121212


Functional Norms We need a norm: the error in the fit, e(x) = f(x) - g(x) can be quantified as an L2 norm:

( ) ( )( )2

11

0

2d

−= ∫ xxgxfe

In fitting polynomials to functions, minimising this norm gives a least squares fit. [Note that we have integrated over the interval from 0 to 1. this is OK, since any interval can be scaled in to 0 - 1 ].


Inner Product We need an equivalent to the vector scalar product xTy (or x.y ) for continuous functions: The Inner Product is

( ) ( ) xxgxfgf d,

1

0

∫= .

13131313


Formula for Polynomial coefficients It is simple to see that the L2 norm can be expressed in terms of inner product:

eee ,2

=

To find the least squares fit for a polynomial

∑=

=++++=M

m

m

m

M

M xxxxxf0

2

210)( ααααα L

we need to solve the equations Mme

m

,,0 , 0

2

L==∂

∂

α

for the polynomial coefficients αααα [ ]T

210 ,,, Mαααα L= .


By differentiating by parts,

mmm

ee

eee

ααα ∂

∂=

∂

∂=

∂

∂,2

,2

.

Now, ( ) m

mmm

xfgfe

=∂

∂=

∂

−∂=

∂

∂

ααα .

So the equations become

mmm xgxfxe ,,or 0, == for each m = 0, 1, ..., M.

Expressing the polynomial f in terms of the coefficients αm , we get, in matrix-vector form,

Aαααα = b where nm

nm xxA ,, = and n

m xgb ,= .

These are the Normal Equations for the function fitting problem.

14141414


But 1

1,

1

0,

++=∫== +

nmxxxxA

nmnm

nmd

Is a special matrix called the Hilbert Matrix……..


(see lecture 2 notes) The Infamous Hilbert Matrix

When fitting polynomials to data in a later lecture, we will come across the Hilbert

Matrix having elements 1

1

−+=

jiH

ij.

e.g.

=

7/16/15/14/1

6/15/14/13/1

5/14/13/12/1

4/13/12/11

4H .

15151515


Run it: » hilbert(10) k2H(1)) = 1.000000 k2H(2)) = 19.281470 k2H(3)) = 524.056778 k2H(4)) = 15513.738739 k2H(5)) = 476607.250243 k2H(6)) = 14951058.641724 k2H(7)) = 475367356.277700 k2H(8)) = 15257575253.665688 k2H(9)) = 493153214118.786620 k2H(10)) = 16025336322027.105000 » Hilbert Matrices rapidly become ill-conditioned In the particular application you will consider this causes problems when fitting high order polynomials to data (end reminder)


But 1

1,

1

0,

++=∫== +

nmxxxxA

nmnm

nmd

This is the ill-conditioned Hilbert Matrix So these equations are ill-conditioned for M large. The cure is to use an Orthonormal basis. Rather than the simple polynomial basis we have used above. i.e. to approximate g(x) by a sum of orthonormal polynomials such as Legendre or Chebychev polynomials. We discuss this further after an example of using the method……

16161616


Example Find the least squares approximating polynomial of degree one for the function

xexf =)( over the interval [0,2].

The approximating polynomial is 01)( axaxp += .

The normal equations in this case are

dxxedxxaxdxa

dxexdxadxa

x

x

∫∫∫

∫ ∫∫

=+

=+

2

0

2

0

2

1

2

0

0

2

0

2

0

1

2

0

0 1

Integrating out

2

10

2

10

13

82

122

eaa

eaa

+=+

−=+

Subtracting these two equations (2)-(1) gives .0.31 =a Hence from equation (1)

1945.0)61(5.0 2

0 =−−= ea

Hence the solution is 1945.00.3)( += xxp


Chebyshev Polynomials Chebyshev polynomials are an orthonormal basis and very useful for functional approximation.

They are defined by ( ) ( )xnxTn

1coscos −=

Hence ( ) ( ) 1cos0cos 10 == − xxT ,

( ) ( ) xxxT == −11 cos.1cos and

( ) ( )121cos2

2coscos2cos

22

1

2

−=−=

== −

x

xxT

θ

θ

A useful recurrence relationship:

( ) ( ) ( )xTxxTxT nnn 11 2 −+ −= .

Use it to prove that ( ) xxxT 34 3

3 −=

( ) 18824

4 +−= xxxT

17171717


First Four Chebyshev Polynomials

-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1


Chebyshev polynomials are orthonormal in the range 11 ≤≤− x

if we include a weighting factor ( ) .1

1

2xxw

−=

Then, you can easily show that the weighted inner product becomes

( ) ( ) ( ) ( )∫− −

=1

12

d1

1, xxTxT

xxTxT nmnm

nm

nmT

nmTn

≠=

====

≠===

for 0

0for

0for 2

2

0

2

π

π

18181818


Then, if we approximate our function g(x) by a Chebyshev series

( ) ( ) ( ) ( ) ( )∑=

=++++=M

m

mmMM xTxTxTxTxTxf0

221100)( ααααα L ,

the inner product formula ( )∫− −

=1

12

2d)(

1

1xxTxg

xT nnnα can be used to get the

coefficients α. It can be shown that the Chebyshev series thus obtained is the best polynomial

approximation, in the sense of minimising ( ) ( )max

xgxf − over 11 ≤≤− x .

This can be a useful property in some fitting problems where you can not tolerate a large deviation (eg in surface rendering/NC machining). See (Burden and Faires Ch8) for further discussions.


Example of Chebyshev series Find the second order Chebyshev approximation for the

semicircle ( ) 21 xxg −= .

Solving ( ) ( )∫∫−−

=−−

=1

1

1

1

2

2

2dd1

1

1xxTxxTx

xT nnnnα

gives the series ( )123

42)( 2 −−= xxf

ππ

A better approximation than

truncated Taylor series 2

12

xy −= .

19191919


Chebyshev approximation

0

0.2

0.4

0.6

0.8

1

1.2

-1 -0.5 0 0.5 1

x

y

g(x)

f(x)

Taylor


Another useful orthogonal Basis - Legendre Polynomials Legendre polynomials are orthogonal over [-1,1] and have a weighting function

1)( =xw .

The first four polynomials are

,1)(0 =xp xxp =)(1 , 3

1)(

2

2 −= xxp , xxxp5

3)(

3

3 −=

You apply them in the same way as described for Chebyshev polynomials above.

20202020


Other Points

1. As with data fitting, in function approximation the advantages of using algebraic polynomials are that you can take derivatives and integrals of them easily and also readily estimate arbitrary values. 2. The main disadvantage is that they can oscillate. This problem can be

overcome by using rational functions that take on the form )(

)()(

xq

xpxr = where

),(),( xqxp are polynomials. These give better (in the sense of less oscillatory) fits

than polynomials for the same amount of computation (see (Burden & Faires p507)).These are not considered on this course however.


Other Points (2)

1. Another topic not in the syllabus, but we must mention are cubic splines.

(See Kreyszig 7th Ed. pp 949-). The name spline derives from an old draftsman’s device: a flexible strip of wood which can be bent over pins placed at the knots to follow a smooth curve. Cubic-spline based methods involve local interpolation between points (known as knots) in the x-y plane, to give a smooth curve between data points. Normally cubic polynomials are fitted between the knots, with continuous first and second derivatives across the knots. At the free ends, the second derivatives are often put to zero. They are less susceptible to the oscillations than polynomial interpolation as disturbances do not propagate far.

21212121


Other Points

Here is an example, comparing the MATLAB function spline (full curve) through 6 points, compared with a 4th order polynomial:

0 1 2 3 4 5 6-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

Cubic

Spline

5th order

polynomial

ENGINEERING COMPUTATION Lecture 5 Stephen Roberts MichaelmasTerm Approximate

Documents

Transcript of ENGINEERING COMPUTATION Lecture 5 Stephen Roberts MichaelmasTerm Approximate