MATH 38061/MATH48061/MATH68061: MULTIVARIATE STATISTICS ...saralees/solprob4.pdf · MATH...
Transcript of MATH 38061/MATH48061/MATH68061: MULTIVARIATE STATISTICS ...saralees/solprob4.pdf · MATH...
MATH 38061/MATH48061/MATH68061: MULTIVARIATE STATISTICS
Solutions to Problems on Multivariate Normal Distribution
1. Let X and Y have the joint pdf
f(x, y) =1
2π√
1− ρ2exp
{−x
2 + y2 − 2ρxy
2 (1− ρ2)
}
for −∞ < x <∞, −∞ < y <∞ and −1 < ρ < 1.
First we find the marginal pdfs. The marginal pdf of Y can be obtained as
fY (y) =1
2π√
1− ρ2
∫ ∞−∞
exp
{−x
2 + y2 − 2ρxy
2 (1− ρ2)
}dx
=1
2π√
1− ρ2
∫ ∞−∞
exp
{−(x− ρy)2 + y2
(1− ρ2
)2 (1− ρ2)
}dx
=1√2π
exp
(−y
2
2
)[1√
2π√
1− ρ2
∫ ∞−∞
exp
{− (x− ρy)2
2 (1− ρ2)
}dx
]
=1√2π
exp
(−y
2
2
),
the standard normal pdf. By symmetry, the marginal pdf of X is also standard normal. So,it follows E(X) = E(Y ) = 0 and V ar(X) = V ar(Y ) = 1.
Now consider deriving E(XY ). We have
E(XY ) =1
2π√
1− ρ2
∫ ∞−∞
∫ ∞−∞
yx exp
{−x
2 + y2 − 2ρxy
2 (1− ρ2)
}dxdy
=1
2π√
1− ρ2
∫ ∞−∞
∫ ∞−∞
yx exp
{−(x− ρy)2 + y2
(1− ρ2
)2 (1− ρ2)
}dxdy
=1
2π√
1− ρ2
∫ ∞−∞
y exp
(−y
2
2
)∫ ∞−∞
x exp
{− (x− ρy)2
2 (1− ρ2)
}dxdy
=1
2π√
1− ρ2
∫ ∞−∞
y exp
(−y
2
2
)∫ ∞−∞
(x− ρy + ρy) exp
{− (x− ρy)2
2 (1− ρ2)
}dxdy
=1
2π√
1− ρ2
∫ ∞−∞
y exp
(−y
2
2
)∫ ∞−∞
(z + ρy) exp
{− z2
2 (1− ρ2)
}dzdy
=ρ
2π√
1− ρ2
∫ ∞−∞
y2 exp
(−y
2
2
)∫ ∞−∞
exp
{− z2
2 (1− ρ2)
}dzdy
=ρ√2π
∫ ∞−∞
y2 exp
(−y
2
2
)[1√
2π√
1− ρ2
∫ ∞−∞
exp
{− z2
2 (1− ρ2)
}dz
]dy
=ρ√2π
∫ ∞−∞
y2 exp
(−y
2
2
)dy
=2ρ√2π
∫ ∞0
y2 exp
(−y
2
2
)dy
1
=2√
2ρ√2π
∫ ∞0
w1/2 exp (−w) dw
=2√
2ρ√2π
Γ
(3
2
)=
2√
2ρ√2π
√π
2= ρ.
So, Cov(X,Y ) = E(XY )− E(X)E(Y ) = ρ and Corr(X,Y ) = ρ.
Finally, consider proving that X and Y are independent if and only if Cov(X,Y ) = 0. If Xand Y are independent then by definition Cov(X,Y ) = 0. If Cov(X,Y ) = 0 then ρ = 0, sosubstituting into to the joint pdf we see that it factorizes into the products of two standardnormal pdfs, so X and Y are independent.
2. The conditional pdf of X given Y = y is
f(x | y) =f(x, y)
fY (y)
=1√
2π√
1− ρ2exp
{−x
2 + y2 − 2ρxy
2 (1− ρ2)+y2
2
}
=1√
2π√
1− ρ2exp
{−x
2 + ρ2y2 − 2ρxy
2 (1− ρ2)
}
=1√
2π√
1− ρ2exp
{− (x− ρy)2
2 (1− ρ2)
},
the normal pdf with mean ρy and variance 1− ρ2.Similarly, the conditional pdf of Y given X = x is
f(y | x) =f(x, y)
fX(x)
=1√
2π√
1− ρ2exp
{−x
2 + y2 − 2ρxy
2 (1− ρ2)+x2
2
}
=1√
2π√
1− ρ2exp
{−ρ
2x2 + y2 − 2ρxy
2 (1− ρ2)
}
=1√
2π√
1− ρ2exp
{− (y − ρx)2
2 (1− ρ2)
},
the normal pdf with mean ρx and variance 1− ρ2.
3. Let X and Y have the joint pdf
f(x, y) = exp
(c+ 4x+ 4y − x2
2− y2
2− x2y2
2
)
for −∞ < x <∞ and −∞ < y <∞, where c is a constant.
2
First, we determine the marginal pdfs. The marginal pdf of Y can be obtained as
fY (y) =
∫ ∞−∞
exp
(c+ 4x+ 4y − x2
2− y2
2− x2y2
2
)dx
= exp
(c+ 4y − y2
2
)∫ ∞−∞
exp
{−(1 + y2
)x2
2− 4x
}dx
= exp
(c+ 4y − y2
2
)∫ ∞−∞
exp
{−1 + y2
2
(x2 − 8x
1 + y2
)}dx
= exp
(c+ 4y − y2
2+
8
1 + y2
)∫ ∞−∞
exp
{−1 + y2
2
(x− 4
1 + y2
)2}dx
=
√2π√
1 + y2exp
(c+ 4y − y2
2+
8
1 + y2
)
×
1√
2π(1/√
1 + y2) ∫ ∞−∞
exp
{−1 + y2
2
(x− 4
1 + y2
)2}dx
=
√2π√
1 + y2exp
(c+ 4y − y2
2+
8
1 + y2
).
By symmetry, the marginal pdf of X is
fX(x) =
√2π√
1 + x2exp
(c+ 4x− x2
2+
8
1 + x2
).
So, the conditional pdf of X given Y = y is
f(x | y) =f(x, y)
fY (y)
=
√1 + y2√
2πexp
(4x− x2
2− x2y2
2− 8
1 + y2
)
=
√1 + y2√
2πexp
(− 8
1 + y2
)exp
{−(1 + y2
)x2
2+ 4x
}
=
√1 + y2√
2πexp
(− 8
1 + y2
)exp
{−1 + y2
2
(x2 − 8x
1 + y2
)}
=
√1 + y2√
2πexp
(− 8
1 + y2
)exp
{−1 + y2
2
(x− 4
1 + y2
)2
+8
1 + y2
}
=
√1 + y2√
2πexp
{−1 + y2
2
(x− 4
1 + y2
)2},
the normal pdf with mean 4/(1 + y2) and variance 1/(1 + y2). By symmetry, the conditionalpdf of Y given X = x is normal with mean 4/(1 +x2) and variance 1/(1 +x2). Note howeverthat the joint pdf of X and Y is not bivariate normal.
This is an example of a distribution where joint pdf is not normal but the conditionals arenormal. For other examples of this kind, see Arnold, Castillo, and Sarabia, 2001, StatisticalScience, Volume 16, Issue 3, 249-274.
3
4. Let X and Y have the bivariate normal pdf
f(x, y) =1
2π√
1− ρ2exp
{−x
2 + y2 − 2ρxy
2 (1− ρ2)
}
for −∞ < x <∞, −∞ < y <∞ and −1 < ρ < 1.
The joint moment generating function of X and Y can be written as
M(s, t) = E [exp(sX + tY )]
=1
2π√
1− ρ2
∫ ∞−∞
∫ ∞−∞
exp
{sx+ ty − x2 + y2 − 2ρxy
2 (1− ρ2)
}dydx
=1
2π√
1− ρ2
×∫ ∞−∞
∫ ∞−∞
exp
{−x
2 + y2 − 2(1− ρ2
)sx− 2
(1− ρ2
)ty − 2ρxy
2 (1− ρ2)
}dydx. (1)
We want to rewrite the numerator of the fraction within the exponential term in the form(x−a)2 +(y−b)2−2ρ(x−a)(y−b)−a2−b2 +2ρab for some constants a and b. To determinethese constants, we equate the coefficients of x and y. Equating the coefficients of x, weobtain the equation −2a + 2ρb = −2(1 − ρ2)s. Equating the coefficients of y, we obtain theequation −2b + 2ρa = −2(1 − ρ2)t. Solving these two equations simultaneously, we obtaina = ρt+ s and b = ρs+ t. So, we can rewrite (1) as
M(s, t) =1
2π√
1− ρ2
×∫ ∞−∞
∫ ∞−∞
exp
{−(x− a)2 + (y − b)2 − 2ρ(x− a)(y − b)− a2 − b2 + 2ρab
2 (1− ρ2)
}dydx
= exp
{a2 + b2 − 2ρab
2 (1− ρ2)
}
×[
1
2π√
1− ρ2
∫ ∞−∞
∫ ∞−∞
exp
{−(x− a)2 + (y − b)2 − 2ρ(x− a)(y − b)
2 (1− ρ2)
}dydx
]
= exp
{a2 + b2 − 2ρab
2 (1− ρ2)
}.
5. Let a p× 1 random vector X = (X1, . . . , Xp)T have the p-variate normal pdf
f (x) =1
(2π)p/2 | Σ |1/2exp
{−1
2xTΣ−1x
}for −∞ < xi <∞ for i = 1, . . . , p.
The joint moment generating function of X can be written as
M(t) = E[exp
(tTX
)]=
1
(2π)p/2 | Σ |1/2∫x∈Rp
exp
{tTx− 1
2xTΣ−1x
}dx
=1
(2π)p/2 | Σ |1/2∫x∈Rp
exp
{−1
2
[xTΣ−1x− 2tTx
]}dx. (2)
4
We want to rewrite the terms within square brackets in the form (x−µ)TΣ−1(x−µ)−µTΣ−1µfor some constants µ. To determine these constants, we equate the coefficients of x. Equatingthese coefficients, we obtain the equation −2µTΣ−1 = −2tT . Solving this equation, we obtainµ = Σt. So, we can rewrite (2) as
M(t) =1
(2π)p/2 | Σ |1/2∫x∈Rp
exp
{−1
2
[(x−Σt)TΣ−1(x−Σt)− tTΣt
]}dx
= exp(−tTΣt/2
) [ 1
(2π)p/2 | Σ |1/2∫x∈Rp
exp
{−1
2
[(x−Σt)TΣ−1(x−Σt)
]}dx
]= exp
(−tTΣt/2
).
6. Let X be a standard normal random variable. Let W = 1 or −1, each with probability 1/2,and assume W is independent of X. Let Y = WX. Then
(i) we have
Cov(X,Y ) = E(XY )− E(X)E(Y )
= E(XY )− E(X)E(WX)
= E(XY )− E(X)E(W )E(X)
= E(XY )
= E (E(XY |W ))
= E(X2)
Pr(W = 1) + E(−X2
)Pr(W = −1)
= 1× 1
2+ (−1)× 1
2= 0,
so X and Y are uncorrelated;
(ii) we have
Pr(Y ≤ x) = E (Pr(Y ≤ x |W ))
= Pr (X ≤ x) Pr(W = 1) + Pr (−X ≤ x) Pr(W = −1)
= Φ(x)× 1
2+ Φ(x)× 1
2= Φ(x),
so X and Y have the same normal distribution (where Φ(·) denotes the standard normaldistribution function);
(iii) we |Y | = |X| and Pr(Y > 1|X = 1/2) = 0, so X and Y are not independent.
7. Let X be a standard normal random variable. Let
Y =
{−X, | X |< c,X, otherwise,
where c is the root of the equation∫ c
0x2φ(x)dx = 1/4,
where φ(·) denotes the standard normal pdf. Then
5
(i) we have
Pr(Y ≤ x) = Pr ({| X |< c and −X ≤ x} or {| X |> c and X ≤ x})= Pr (| X |< c and −X ≤ x) + Pr (| X |> c and X ≤ x)
= Pr (| X |< c and X ≤ x) + Pr (| X |> c and X ≤ x)
= Pr (X ≤ x) ,
so X and Y have the same normal distribution;
(ii) we have
Cov(X,Y ) = E(XY )− E(X)E(Y )
= E(XY )− E(X)E(X)
= E(XY )
=
∫ −c−∞
x2φ(x)dx+
∫ ∞c
x2φ(x)dx−∫ c
−cx2φ(x)dx
= E(X2)− 2
∫ c
−cx2φ(x)dx
= 1− 2
∫ c
−cx2φ(x)dx
= 1− 4
∫ c
0x2φ(x)dx
= 0,
so X and Y are uncorrelated;
(iii) X and Y are clearly not independent since X completely determines Y .
8. Let the random vector X ∼ N2(µ,Σ) where µ = 0 and
Σ =
[2 −1−1 4
]
Note that we can write
Y =
[Y1Y2
]=
[X1 −X2
X2
]=
[1 −10 1
] [X1
X2
].
So,
EY =
[1 −10 1
] [00
]=
[00
],
CovY =
[1 −10 1
] [2 −1−1 4
] [1 0−1 1
]=
[8 −5−5 4
]
and
Y ∼ N2
([00
],
[8 −5−5 4
]).
Hence, Cov(Y1, Y2) = −5 implying that Y1 and Y2 are not independently distributed.
6
9. Let X have a N3(µ,Σ) distribution where µ = (0, 0, 0)T and
Σ =
1 −2 0−2 5 0
0 0 2
(i) For X1 and X2, we have[
X1
X2
]∼ N2
([00
],
[1 −2−2 5
]),
so Cov(X1, X2) = −2 implying that X1 and X2 are not independently distributed.
(ii) For X2 and X3, we have[X2
X3
]∼ N2
([00
],
[5 00 2
]),
so Cov(X2, X3) = 0 implying that X2 and X3 are independently distributed.
(iii) For (X1, X2) and X3, Cov((X1, X2), X3) = (0, 0), so (X1, X2) and X3 are independentlydistributed.
(iv) For (X1 +X2)/2 and X3, note that
[(X1 +X2)/2
X3
]=
[1/2 1/2 00 0 1
] X1
X2
X3
.So, [
(X1 +X2)/2X3
]∼ N2
([00
],
[1/2 00 2
]),
so Cov((X1 + X2)/2, X3) = 0 implying that (X1 + X2)/2 and X3 are independentlydistributed.
(v) For X2 and −52X1 +X2 −X3, note that
[−5
2X1 +X2 −X3
X2
]=
[−5/2 1 −1
0 1 0
] X1
X2
X3
.So, [
−52X1 +X2 −X3
X2
]∼ N2
([00
],
[93/4 1010 5
]),
so Cov(−52X1 + X2 − X3, X2) = 10 implying that −5
2X1 + X2 − X3 and X2 are notindependently distributed.
7
10. Let X have a N2(µ,Σ) distribution where µ = 0 and Σ = I2. Let
Y = CX + d ∼ N2
([32
],
[1 −1.5−1.5 4
]).
It follows that (3, 2)T = E(Y) = CE(X) + d = C0 + d = d. Also[1 −1.5−1.5 4
]= CovY = CCov(X)CT = CI2C
T = CCT ,
so
C =
[1 −1.5−1.5 4
]1/2
=
[−0.383 −0.924−0.924 −0.383
] [ √4.621 0
0√
0.379
] [−0.383 −0.924−0.924 −0.383
]
=
[0.840 −0.542−0.542 1.925
].
11. Let X1, . . . ,Xn be a random sample from the Np(µ,Σ) distribution where µ and Σ areunknown but we know that, in this case, Σ = diag(σ11, . . . , σpp). The log-likelihood functioncan be written as
logL(µ,Σ) = −np2
log(2π)− n
2log | Σ | −1
2
n∑i=1
(Xi − µ)TΣ−1(Xi − µ)
= −np2
log(2π)− n
2
p∑j=1
log σjj −1
2
n∑i=1
p∑j=1
(Xij − µj)2
σjj
= −np2
log(2π)− n
2
p∑j=1
log σjj −1
2
p∑j=1
1
σjj
n∑i=1
(Xij − µj)2
Let us suppose the estimate of µj is known as x̄j . Then unknown parameters are σjj , j =1, 2, . . . , p. The first derivative of the log-likelihood function with respect to σjj is
∂ logL(µ,Σ)
∂σjj= − n
2σjj+
1
2σ2jj
n∑i=1
(Xij − x̄j)2 = − n
2σjj+
(n− 1)sjj2σ2jj
.
Setting this to zero and solving, we obtain σ̂jj = (n − 1)sjj/n, that is Σ̂ = (n−1)n D where
D = diag(s11, . . . , spp).
12. Let X1,X2,X3,X4 be independent Np(µ,Σ) random vectors.
(i) Then the marginal distribution of V1 = (X1 −X2 + X3 −X4)/4 is normal with mean(µ − µ + µ − µ)/4 = 0 and covariance matrix ((1,−1, 1,−1)T (1,−1, 1,−1)/16)Σ =(1/4)Σ.
The marginal distribution of V2 = (X1 + X2 −X3 −X4)/4 is normal with mean (µ +µ− µ− µ)/4 = 0 and covariance matrix ((1, 1,−1,−1)T (1, 1,−1,−1)/16)Σ = (1/4)Σ.
8
(ii) The joint pdf of V1 and V2 is that of the normal distribution with mean and covariancegiven by [
00
]and [
Σ/4 00 Σ/4
],
respectively.
13. Let X be N3(µ,Σ) with µT = [−3, 1, 4] and
Σ =
1 −2 0−2 5 00 0 2
.Then Cov(X1, X2) = −2, so X1 and X2 are not independent.
We have Cov(X2, X3) = 0, X2 and X3 are independent.
We have Cov((X1, X2), X3) = (0, 0), so (X1, X2) and X3 are independent.
We have Cov((X1 + X2)/2, X3) = (1/2) × 0 + (1/2) × 0 = 0, so (X1 + X2)/2 and X3 areindependent.
14. Let X be N3(µ,Σ) with µT = [−3, 1, 4] and
Σ =
1 −2 0−2 5 00 0 2
.The conditional distribution of X1 given (X2, X3) = (x2, x3) is normal with mean equal to
µ1 + Σ12Σ−122 (x2 − µ2) = −3 + [−2, 0]
[1/5 00 1/2
][x2 − 1, x3 − 1] = −13/5− 2x2/5
and variance equal to
Σ11 −Σ12Σ−122 Σ21 = 1− [−2, 0]
[1/5 00 1/2
][−2, 0]T = 1/5.
The conditional distribution of X2 given (X1, X3) = (x1, x3) is normal with mean equal to
µ1 + Σ12Σ−122 (x2 − µ2) = 1 + [−2, 0]
[1 00 1/2
][x1 + 3, x3 − 4] = −5− 2x1
and variance equal to
Σ11 −Σ12Σ−122 Σ21 = 5− [−2, 0]
[1 00 1/2
][−2, 0]T = 1.
The conditional distribution of X3 given (X1, X2) = (x1, x2) is the same as that of X3, normalwith mean 4 and variance 2.
9
15. Let X1 be N(0, 1) and let
X2 =
{−X1 if −1 ≤ X1 ≤ 1,X1 otherwise.
Then we have the following.
(i) the distribution of X2 is
Pr (X2 ≤ x) = Pr (X2 ≤ x | |X1| ≤ 1) Pr (|X1| ≤ 1) + Pr (X2 ≤ x | |X1| > 1) Pr (|X1| > 1)
= Pr (−X1 ≤ x) Pr (|X1| ≤ 1) + Pr (X1 ≤ x) Pr (|X1| > 1)
= Pr (X1 ≤ x) Pr (|X1| ≤ 1) + Pr (X1 ≤ x) Pr (|X1| > 1)
= Pr (X1 ≤ x) ,
the distribution function of the N(0, 1) distribution.
(ii) the joint distribution of X1 and X2 is not bivariate normal since Pr(−1 ≤ X1 ≤ 1,−1 ≤X2 ≤ 1) = Pr(−1 ≤ X1 ≤ 1) = Φ(1) − Φ(−1), where Φ(·) is the standard normaldistribution function.
16. If X is distributed as Np(µ,Σ) then GX is distributed as Np(Gµ,GΣGT ). If µ = 0 thenGµ = µ = 0. If Σ = σ2I then GΣGT = σ2GIGT ) = σ2GGT = σ2I = Σ.
17. If X is distributed as Np(µ,Σ) and a is any fixed vector then
X− µ ∼ Np (0,Σ)
⇒ aT (X− µ) ∼ Np
(0,aTΣa
)⇒ aT (X− µ)√
aTΣa∼ N(0, 1)
as required.
18. Let
Σ =
1 ρ ρ2
ρ 1 0ρ2 0 1
.The conditional distribution of (X1, X2) given X3 has the mean vector
[µ1, µ2]T +
[ρ2, 0
]T[1] (X3 − µ3) =
[µ1 + ρ2 (x3 − µ3) , µ2
]Tand covariance matrix[
1 ρρ 1
]−[ρ2, 0
]T[1][ρ2, 0
]=
[1− ρ4 ρρ 1
].
19. Suppose that x ∼ Np(µ,Σ) and a is a fixed vector. Let ri be the correlation between xi andaTx. Write xi = eTi x, where ei is a vector of zeros except for a one at the ith position. ThenCov(xi,a
Tx) = Cov(eTi x,aTx) = aTeiΣ = Σai, V ar(xi) = σii and V ar(aTx) = aTΣa. So,
r = (cD)−1/2Σa, where c = aTΣa and D = diag(Σ). We have r = Σa if Σ = I/√
aTa.
10
20. Suppose x1,x2,x3 are iid Np(µ,Σ) random variables, and y1 = x1 + x2, y2 = x2 + x3 andy3 = x1 + x3.
Let 1 denote a vector of ones of the same length as x1. Let 0 denote a vector of zeros of thesame length as x1. Then we can write y1
y2
y3
=
1 0 00 1 11 0 1
x1
x2
x3
.So, y1
y2
y3
∼ N3p
1 0 0
0 1 11 0 1
µµµ
, 1 0 0
0 1 11 0 1
Σ Σ Σ
Σ Σ ΣΣ Σ Σ
1 0 1
1 1 00 1 1
≡ N3p
2µ
2µ2µ
, Σ Σ 0
0 Σ ΣΣ 0 Σ
1 0 1
1 1 00 1 1
≡ N3p
2µ
2µ2µ
, 2Σ Σ Σ
Σ 2Σ ΣΣ Σ 2Σ
and [y1
y2
]∼ N2p
([2µ2µ
],
[2Σ ΣΣ 2Σ
]).
So, the conditional distribution of y1 given y2 is
y1 | y2 ∼ Np
(2µ+ Σ(2Σ)−1 (y2 − 2µ) , 2Σ−Σ(2Σ)−1Σ
)≡ Np (µ+ (1/2)y2, (3/2)Σ) .
The conditional distribution of y1 given y2 and y3 is
y1 | y2,y3 ∼ Np
(2µ+ [Σ,Σ]
[2Σ ΣΣ 2Σ
]−1 [y2 − 2µy3 − 2µ
],
2Σ + [Σ,Σ]
[2Σ ΣΣ 2Σ
]−1 [ΣΣ
])
≡ Np
(2µ+ [Σ,Σ]
[(2/3)Σ−1 −(4/3)Σ−1
−(4/3)Σ−1 (2/3)Σ−1
] [y2 − 2µy3 − 2µ
],
2Σ + [Σ,Σ]
[(2/3)Σ−1 −(4/3)Σ−1
−(4/3)Σ−1 (2/3)Σ−1
] [ΣΣ
])
≡ Np
(2µ+ [−2/3I,−2/3I]
[y2 − 2µy3 − 2µ
], 2Σ + [−2/3I,−2/3I]
[ΣΣ
])≡ Np ((14/3)µ− (2/3) (y2 + y3) , (10/3)Σ) .
11