Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.
-
Upload
winifred-doyle -
Category
Documents
-
view
234 -
download
0
Transcript of Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.
![Page 1: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/1.jpg)
Environmental Data Analysis with MatLab
Lecture 4:Multivariate Distributions
![Page 2: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/2.jpg)
Lecture 01 Using MatLabLecture 02 Looking At DataLecture 03 Probability and Measurement Error Lecture 04 Multivariate DistributionsLecture 05 Linear ModelsLecture 06 The Principle of Least SquaresLecture 07 Prior InformationLecture 08 Solving Generalized Least Squares Problems Lecture 09 Fourier SeriesLecture 10 Complex Fourier SeriesLecture 11 Lessons Learned from the Fourier TransformLecture 12 Power SpectraLecture 13 Filter Theory Lecture 14 Applications of Filters Lecture 15 Factor Analysis Lecture 16 Orthogonal functions Lecture 17 Covariance and AutocorrelationLecture 18 Cross-correlationLecture 19 Smoothing, Correlation and SpectraLecture 20 Coherence; Tapering and Spectral Analysis Lecture 21 InterpolationLecture 22 Hypothesis testing Lecture 23 Hypothesis Testing continued; F-TestsLecture 24 Confidence Limits of Spectra, Bootstraps
SYLLABUS
![Page 3: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/3.jpg)
purpose of the lecture
understanding propagation of error
from many data
to several inferences
![Page 4: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/4.jpg)
probability
with several variables
![Page 5: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/5.jpg)
example
100 birds live on an island
30 tan pigeons
20 white pigeons
10 tan gulls
40 white gulls
treat the species and color of the birds as random variables
![Page 6: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/6.jpg)
tan, t white, w
pigeon, p 30% 20%
gull, g 10% 40%
color, c
spec
ies,
sJoint Probability, P(s,c)
probability that a bird has a species, s, and a color, c.
![Page 7: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/7.jpg)
probabilities must add up to 100%
![Page 8: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/8.jpg)
tan, t white, w
pigeon, p 30% 20%
gull, g 10% 40%
color, c
spec
ies,
s pigeon, p 50%
gull, g 50%
P(s,c) P(s)
sum rows
tan, t white, w
40% 60%
sum columns
P(c)
spec
ies,
s
color, c
Univariate probabilities can be calculated by summing the rows and columns of P(s,c)
probability of species, irrespective of color
probability of color,
irrespective of species
![Page 9: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/9.jpg)
probability of species, irrespective of color
probability of color, irrespective of species
![Page 10: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/10.jpg)
tan, t white, w
pigeon, p 30% 20%
gull, g 10% 40%
color, c
spec
ies,
s
P(s,c)divide
by row sums
divide by column sums
tan, t white, w
pigeon, p 60% 40%
gull, g 20% 80%
color, c
spec
ies,
s
P(c|s)
tan, t white, w
pigeon, p 75% 33%
gull, g 25% 67%
color, c
spec
ies,
s
P(s|c)
Conditional probabilities: probability of one thing, given that you know another
probability of color, given
species
probability of species, given color
probability of color and species
![Page 11: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/11.jpg)
calculation of conditional probabilities
divide each species by
fraction of birds of that color
divide each color by
fraction of birds of that species
![Page 12: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/12.jpg)
Bayes Theorem
same, so solving for P(s,c)
rearrange
![Page 13: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/13.jpg)
3 ways to write P(c) and P(s)
![Page 14: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/14.jpg)
so 3 ways to write Bayes Therem
the last way seems the most complicated, but it is also the most useful
![Page 15: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/15.jpg)
Beware!
major cause of error bothamong scientists and the general public
![Page 16: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/16.jpg)
example
probability that a dead person succumbed to pancreatic cancer(as contrasted to some other cause of death)
P(cancer|death) = 1.4%
probability that a person diagnosed with pancreatic cancerwill die of it in the next five years
P(death|cancer) = 90%
vastly different numbers
![Page 17: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/17.jpg)
Bayesian Inference“updating information”
An observer on the island has sighted a bird.
We want to know whether it’s a pigeon.
when the observer says, “bird sighted”, the probability that it’s a pigeon is:P(s=p) = 50%
since pigeons comprise half of the birds on the island.
![Page 18: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/18.jpg)
Now the observer says, “the bird is tan”.
The probability that it’s a pigeon changes.
We now want to knowP(s=p|c=t) The conditional probability that it’s a pigeon, given
that we have observed its color to be tan.
![Page 19: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/19.jpg)
we use the formula
% of tan pigeons
% of tan birds
![Page 20: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/20.jpg)
observation of the bird’s color changed the probability that is was a pigeon
from 50%to 75%
thus Bayes Theorem offers a way to assess the value of an observation
![Page 21: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/21.jpg)
continuous variables
joint probability density function, p(d1, d2)
![Page 22: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/22.jpg)
d1
d2p(d1,d2) d2
L d2L
d1L
d1R
if the probability
density function , p(d1,d2),
is though of as a cloud made of
water vapor
the probability
that (d1,d2) is in the box given by the total mass of water vapor in the box
![Page 23: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/23.jpg)
normalized to unit total probability
![Page 24: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/24.jpg)
univariate p.d.f.’s“integrate away” one of the variables
the p.d.f. of d1 irrespective of d2 the p.d.f. of d2 irrespective of d1
![Page 25: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/25.jpg)
d1
d2
d1
integrate over d2
integrate over d1
d2
p(d1,d2)
p(d2)
p(d1)
![Page 26: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/26.jpg)
mean and variance calculated in usual way
![Page 27: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/27.jpg)
correlation
tendency of random variable d1to be large/small
when random variable d2 is large/small
![Page 28: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/28.jpg)
positive correlation:
tall people tend to weigh more than short people
…
negative correlation:
long-time smokers tend to die young
…
![Page 29: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/29.jpg)
d1
d2
positive correlation
d1
d2
d1
d2
negative correlation
uncorrelated
shape of p.d.f.
![Page 30: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/30.jpg)
d1
d2p(d1,d2)
d1
d2
d1
d2s(d1,d2) s(d1,d2) p(d1,d2)
+
+
-
-
quantifying correlation
now multiply and
integrate
p.d.f. 4-quadrant function
![Page 31: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/31.jpg)
covariancequantifies correlation
![Page 32: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/32.jpg)
combine variance and covariance into a matrix, C
C = σ12σ22σ1,2
σ1,2 note that C is symmetric
![Page 33: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/33.jpg)
many random variablesd1, d2, d3 … dN
write d’s a a vectord = [d1, d2, d3 … dN ]T
![Page 34: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/34.jpg)
the mean is then a vector, too
d = [d1, d2, d3 … dN ]T
![Page 35: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/35.jpg)
and the covariance is an N×N matrix, C
C = σ12 σ22σ1,2σ32 …σ1,3σ2,3σ1,2 σ2,3σ1,3
………… … …variance on the main diagonal
![Page 36: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/36.jpg)
multivariate Normal p.d.f.
square root of
determinant of covariance
matrix
inverse of covariance
matrix
data minus its mean
![Page 37: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/37.jpg)
compare with univariate Normal p.d.f.
![Page 38: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/38.jpg)
corresponding terms
![Page 39: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/39.jpg)
error propagationp(d) is Normal with mean d and covariance, Cd.
given model parameters mwhere m is a linear function of dm = MdQ1. What is p(m)?
Q2. What is its mean m and covariance Cm?
![Page 40: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/40.jpg)
AnswerQ1: What is p(m)?
A1: p(m) is Normal
Q2: What is its mean m and covariance Cm?
A2: m = Md and Cm = M Cd MT
![Page 41: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/41.jpg)
where the answer comes fromtransform p(d) to p(m)starting with a Normal p.d.f. for p(d) :and the multivariate transformation rule:
determinant
![Page 42: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/42.jpg)
this is not as hard as it looks
because the Jacobian determinant J(m) is constant:
so, starting with p(d), replaces every occurrence of d with M-1m and multiply the result by |M-1|. This yields:
![Page 43: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/43.jpg)
p(m) where
rule for error propagation
Normal p.d.f. for model parameters
![Page 44: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/44.jpg)
example
d1measurement 1:
weight of A
A A
B
d2measurement 2:
combined weight of A and B
suppose the measurements are uncorrelated and that both have the same variance, σd
2
![Page 45: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/45.jpg)
model parameters
m1weight of B
A
B
m1 = d2 – d1
d1
B=
m2weight of B
minusweight of A A
B A
A
A-
-
+
A
B= + - +
m2 = d2 – 2d1
![Page 46: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/46.jpg)
linear rule relating model parameters to data
m = M dwith
![Page 47: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/47.jpg)
so the means of the model parameters are
![Page 48: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/48.jpg)
and the covariance matrix is
![Page 49: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/49.jpg)
model parameters are correlated, even though data are uncorrelated
bad
variance of model parameters different than variance of data
bad if bigger, good if smaller
![Page 50: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions.](https://reader035.fdocuments.net/reader035/viewer/2022062221/56649d0a5503460f949dcf10/html5/thumbnails/50.jpg)
d1
p(d1,d2)
40
0400 d2
m1
p(m1,m2)
20
-2020-20 m2
The model parameters, (m1, m2), have mean (10, -5), variance (10, 25) and covariance,15.
The data, (d1, d2), have mean (15, 25), variance (5, 5) and zero covariance
example with specific values of d and σd2