Lecture 9 Inexact Theories

Lecture 9

Inexact Theories

SyllabusLecture 01 Describing Inverse ProblemsLecture 02 Probability and Measurement Error, Part 1Lecture 03 Probability and Measurement Error, Part 2 Lecture 04 The L2 Norm and Simple Least SquaresLecture 05 A Priori Information and Weighted Least SquaredLecture 06 Resolution and Generalized InversesLecture 07 Backus-Gilbert Inverse and the Trade Off of Resolution and VarianceLecture 08 The Principle of Maximum LikelihoodLecture 09 Inexact TheoriesLecture 10 Nonuniqueness and Localized AveragesLecture 11 Vector Spaces and Singular Value DecompositionLecture 12 Equality and Inequality ConstraintsLecture 13 L1 , L∞ Norm Problems and Linear ProgrammingLecture 14 Nonlinear Problems: Grid and Monte Carlo Searches Lecture 15 Nonlinear Problems: Newton’s Method Lecture 16 Nonlinear Problems: Simulated Annealing and Bootstrap Confidence Intervals Lecture 17 Factor AnalysisLecture 18 Varimax Factors, Empirical Orthogonal FunctionsLecture 19 Backus-Gilbert Theory for Continuous Problems; Radon’s ProblemLecture 20 Linear Operators and Their AdjointsLecture 21 Fréchet DerivativesLecture 22 Exemplary Inverse Problems, incl. Filter DesignLecture 23 Exemplary Inverse Problems, incl. Earthquake LocationLecture 24 Exemplary Inverse Problems, incl. Vibrational Problems

Purpose of the Lecture

Discuss how an inexact theory can be represented

Solve the inexact, linear Gaussian inverse problem

Use maximization of relative entropy as a guiding principle for solving inverse problems

Introduce F-test as way to determine whether one solution is “better” than another

Part 1

How Inexact Theories can be Represented

How do we generalize the case of

an exact theory

to one that is inexact?

0 1 2 3 4 5

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

m

d

dobs model, m

datu

m, d

mapmest

d pre

d=g(m)

exact theory case

theory

0 1 2 3 4 5

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

m

d

dobs model, m

datu

m, d

mapmest

d pre

d=g(m)

to make theory inexact ...must make thetheory probabilisticor fuzzy

0 1 2 3 4 5

0

1

2

3

4

5

m

d

0 1 2 3 4 5

0

1

2

3

4

5

m

d

0 1 2 3 4 5

0

1

2

3

4

5

m

ddobs da

tum

, d

model, m

map

dobs

map

dobs mapmest

dpre

model, m model, mda

tum

, d

datu

m, d

combinationtheorya prior p.d.f.

how do youcombine

two probability density functions ?

how do youcombine

two probability density functions ?

so that the information in them is combined ...

desirable properties

order shouldn’t matter

combining something with the null distribution should leave it unchanged

combination should be invariant under change of variables

Answer

0 1 2 3 4 5

0

1

2

3

4

5

m

d

0 1 2 3 4 5

0

1

2

3

4

5

m

d

0 1 2 3 4 5

0

1

2

3

4

5

m

d0 1 2 3 4 5

0

1

2

3

4

5

m

d

0 1 2 3 4 5

0

1

2

3

4

5

m

d

0 1 2 3 4 5

0

1

2

3

4

5

md

dobs da

tum

, d model, m

mapdobs

map

dobs

mapmest

dpre

model, m model, m

datu

m, d

datu

m, d

theory, pgdobs

datu

m, d

model, m

map

dobs

mapdobs

mapmestdpre

model, m model, mda

tum

, d

datu

m, d

(F)(E)(D)

a priori , pA total, pT

“solution to inverse problem”maximum likelihood point of

(with pN∝constant)

simultaneously givesmest and dpre

probability that the estimated model parameters are near m and the predicted data are near d

probability that the estimated model parameters are near m irrespective of the value of the predicted data

T

conceptual problem

do not necessarily have maximum likelihood points at the same value of m

and

T

0 1 2 3 4 5

0

1

2

3

4

5

0 1 2 3 4 50

0.2

0.4

0.6

0.8

1

m

p(m

)dobs dpre

model, m

datu

m,

d 0 1 2 3 4 5

0

1

2

3

4

5

0 1 2 3 4 50

0.2

0.4

0.6

0.8

1

m

p(m

)

mapmest

model, mmest’

p(m)

illustrates the problem in defining adefinitive solution

to an inverse problem

illustrates the problem in defining adefinitive solution

to an inverse problem

fortunatelyif all distributions are Gaussian

the two points are the same

Part 2

Solution of the inexact linear Gaussian inverse problem

Gaussian a priori information

Gaussian a priori information

a priori values of model

parameterstheir uncertainty

Gaussian observations

Gaussian observations

observed data

measurement error

Gaussian theory

Gaussian theory

linear theory

uncertaintyin theory

mathematical statement of problem

find (m,d) that maximizes

pT(m,d) = pA(m) pA(d) pg(m,d)

and, along the way, work out the form of pT(m,d)

notational simplification

group m and d into single vector x = [dT, mT]T

group [cov m]A and [cov d]A into single matrix

write d-Gm=0 as Fx=0 with F=[I, –G]

after much algebra, we find pT(x) is a Gaussian distribution

with mean

and variance

after much algebra, we find pT(x) is a Gaussian distribution

with mean

and variancesolution to inverse problem

after pulling mest out of x*


reminiscent of GT(GGT)-1 minimum length solution


error in theory adds to error in data


solution depends on the values of the prior information only to the extent that the model resolution matrix is different from an identity matrix

and after algebraic manipulation

which also equals

reminiscent of (GTG)-1 GT least squares solution

interesting aside

weighted least squares solution

is equal to the

weighted minimum length solution

what did we learn?

for linear Gaussian inverse problem

inexactness of theoryjust adds to

inexactness of data

Part 3

Use maximization of relative entropy as a guiding principle for

solving inverse problems

from last lecture

assessing the information contentin pA(m)

Do we know a little about mor

a lot about m ?

Information Gain, S

-S called Relative Entropy

-25 -20 -15 -10 -5 0 5 10 15 20 250

0.1

0.2

0.3

m

p an

d q

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

1

2

3

4

sigmamp

S

mpA(m)

S(σ A )pN(m)

σA

(A)

(B)

Principle ofMaximum Relative Entropy

or if you prefer

Principle ofMinimum Information Gain

find solution p.d.f. pT(m) that has smallest possible new information as compared to a priori p.d.f. pA(m)

find solution p.d.f. pT(m) that has the largest relative entropy as compared to a priori p.d.f. pA(m)

or if you prefer

properly normalized

p.d.f.data is satisfied in the mean

orexpected value of error is zero

After minimization using Lagrange Multipliers process

pT(m) is Gaussian with maximum likelihood point mest satisfying

After minimization using Lagrane Multipliers process

pT(m) is Gaussian with maximum likelihood point mest satisfying

just the weighted minimum length solution

What did we learn?

Only that thePrinciple of Maximum Entropyis yet another way of derivingthe inverse problem solutionswe are already familiar with

Part 4

F-test as way to determine whether one solution is

“better” than another

Common Scenario

two different theories

solution mestAMA model parametersprediction error EA

solution mestBMB model parametersprediction error EB

Suppose EB < EAIs B really better than A ?

What if B has many more model parameters than A

MB >> MAIs B fitting better any surprise?

Need to against Null Hypothesis

The difference in error is due torandom variation

suppose error e has a Gaussian p.d.f.uncorrelated

uniform variance σd

estimate variance

want to known the probability density function of

actually, we’ll use the quantity

which is the same, as long as the two theories that we’re testing is applied to the same data

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

0.51

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

0.51

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5012

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5012

p(FN,2)

p(FN,5)

p(FN,50)F

F

F

F

p(FN,25)

N=2 50

N=2 50

N=2 50

N=2 50

p.d.f. of F is known

as is its mean and variance

example

same dataset fit with

a straight lineand

a cubic polynomial

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1

-0.5

0

0.5

1linear fit

z

d(z)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1

-0.5

0

0.5

1cubic fit

zs

d(z)

(A) Linear fit, N-M=9, E=0.030

(B) Cubic fit, N-M=7, E=0.006zi

zi

di

di

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1

-0.5

0

0.5

1linear fit

z

d(z)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1

-0.5

0

0.5

1cubic fit

zs

d(z)

(A) Linear fit, N-M=9, E=0.030

(B) Cubic fit, N-M=7, E=0.006zi

zi

di

di

F7,9 = 4.1est

probability thatF >F est(cubic fit seems better than linear fit)

by random chance alone

or F < 1/F est(linear fit seems better than cubic fit)

by random chance alone

in MatLab P = 1 - (fcdf(Fobs,vA,vB)-fcdf(1/Fobs,vA,vB));

answer: 6%

The Null Hypothesis

that the difference is due to random variation

cannot be rejected to 95% confidence

Lecture 9 Inexact Theories

Documents

Transcript of Lecture 9 Inexact Theories