Post on 27-Jun-2020
Statistical InferenceLecture 3: Common Families of Distributions
MING GAO
DASE @ ECNU(for course related communications)
mgao@dase.ecnu.edu.cn
Mar. 24, 2020
Outline
1 Discrete Distributions
2 Continuous Distributions
3 Exponential Family
4 Location and Scale Families
5 Take-aways
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 2 / 28
Discrete Distributions
Discrete distributions
A r.v. X is said to have a discrete distribution if the range of X iscountable. In most situations, the r.v. has integer-valued outcomes.
Discrete uniform distribution
A r.v. X has a discrete uniform (1,N) distribution if
P(X = x |N) =1
N, x = 1, 2, · · · ,N,
where N is a specified integer. This distribution puts equal mass oneach of the outcomes 1, 2, · · · ,N.∑k
i=1 i = k(k+1)2 , and
∑ki=1 i
2 = k(k+1)(2k+1)6 .
E (X ) =∑N
x=1 xP(X = x |N) = N+12 ;
Var(X ) = E (X 2)− E (X )2 = (N+1)(N−1)12 .
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 3 / 28
Discrete Distributions
Bernoulli Trials
Definition
Each performance of an experiment with two possible outcomes iscalled a Bernoulli trial.
In general, a possible outcome of a Bernoulli trial is called asuccess or a failure.
If p is the probability of a success and q is the probability of afailure, it follows that p + q = 1.
E (X ) = 0·(1−p)+1·p = p and E (X 2) = 02 ·(1−p)+12 ·p = p;
Var(X ) = p(1− p).
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 4 / 28
Discrete Distributions
Bernoulli Trials
Definition
Each performance of an experiment with two possible outcomes iscalled a Bernoulli trial.
In general, a possible outcome of a Bernoulli trial is called asuccess or a failure.
If p is the probability of a success and q is the probability of afailure, it follows that p + q = 1.
E (X ) = 0·(1−p)+1·p = p and E (X 2) = 02 ·(1−p)+12 ·p = p;
Var(X ) = p(1− p).
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 4 / 28
Discrete Distributions
Binomial distribution
Many problems can be solved by determining the probability of k successeswhen an experiment consists of n mutually independent Bernoulli trials.
Let r.v. Xi be the i−th experimental outcome (i = 1, 2, · · · , n),where Xi denote whether it successes or not. Hence, we have
Xi =
{1, if we obtain head with probability p;0, otherwise with probability (1− p).
Let r.v. X =∑n
i=1 Xi . We have
P(X = x |n, p) =
(n
x
)px(1− p)n−x , x = 0, 1, 2, · · · , n.
We call this function the binomial distribution, i.e., B(k ; n, p) =P(X = k) = C (n, k)pkqn−k .
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 5 / 28
Discrete Distributions
Binomial distribution
Many problems can be solved by determining the probability of k successeswhen an experiment consists of n mutually independent Bernoulli trials.
Let r.v. Xi be the i−th experimental outcome (i = 1, 2, · · · , n),where Xi denote whether it successes or not. Hence, we have
Xi =
{1, if we obtain head with probability p;0, otherwise with probability (1− p).
Let r.v. X =∑n
i=1 Xi . We have
P(X = x |n, p) =
(n
x
)px(1− p)n−x , x = 0, 1, 2, · · · , n.
We call this function the binomial distribution, i.e., B(k ; n, p) =P(X = k) = C (n, k)pkqn−k .
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 5 / 28
Discrete Distributions
Expected value of Binomial r.v.s
Theorem
The expected number of successes when n mutually independentBernoulli trials are performed, where p is the probability of successon each trial, is np.
Proof.
Let X be the r.v. equal to # successes in n trials. We have knownthat P(X = k) = C (n, k)pkqn−k . Hence, we have
E (X ) =n∑
k=0
k · P(X = k) =n∑
k=1
k · C (n, k)pkqn−k
=n∑
k=1
n ·(n − 1
k − 1
)pkqn−k = np
n−1∑j=0
(n − 1
j
)pjqn−1−j
= np(p + q)n−1 = np
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 6 / 28
Discrete Distributions
Expected value of Binomial r.v.s
Theorem
The expected number of successes when n mutually independentBernoulli trials are performed, where p is the probability of successon each trial, is np.
Proof.
Let X be the r.v. equal to # successes in n trials. We have knownthat P(X = k) = C (n, k)pkqn−k . Hence, we have
E (X ) =n∑
k=0
k · P(X = k) =n∑
k=1
k · C (n, k)pkqn−k
=n∑
k=1
n ·(n − 1
k − 1
)pkqn−k = np
n−1∑j=0
(n − 1
j
)pjqn−1−j
= np(p + q)n−1 = np
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 6 / 28
Discrete Distributions
Variance of Binomial r.v.s
Question: Let r.v. X be the number of successes of n mutuallyindependent Bernoulli trials, where p is the probability of success oneach trial. What is the variance of X?
Solution:
E(X 2) =n∑
k=0
k2 · P(X = k) =n∑
k=1
k(k − 1) · P(X = k) +n∑
k=1
k · P(X = k)
= n(n − 1)p2n∑
k=2
(n − 2
k − 2
)pk−2qn−k + np
= n(n − 1)p2n−2∑j=0
(n − 2
j
)pjqn−2−j + np
= n(n − 1)p2(p + q)n−2 + np = n(n − 1)p2 + np,
V (X ) = E(X 2)− (E(X ))2 = n(n − 1)p2 + np − (np)2 = np(1− p).
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 7 / 28
Discrete Distributions
Variance of Binomial r.v.s
Question: Let r.v. X be the number of successes of n mutuallyindependent Bernoulli trials, where p is the probability of success oneach trial. What is the variance of X?Solution:
E(X 2) =n∑
k=0
k2 · P(X = k) =n∑
k=1
k(k − 1) · P(X = k) +n∑
k=1
k · P(X = k)
= n(n − 1)p2n∑
k=2
(n − 2
k − 2
)pk−2qn−k + np
= n(n − 1)p2n−2∑j=0
(n − 2
j
)pjqn−2−j + np
= n(n − 1)p2(p + q)n−2 + np = n(n − 1)p2 + np,
V (X ) = E(X 2)− (E(X ))2 = n(n − 1)p2 + np − (np)2 = np(1− p).
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 7 / 28
Discrete Distributions
Geometric distribution
Let r.v. Y be # experiments until the first success obtained inindependent Bernoulli trials.
P(Y = k) = P(X1 = 0 ∧ X2 = 0 ∧ · · ·Xk−1 = 0 ∧ Xk = 1)
= Πk−1i=1 P(Xi = 0) · P(Xk = 1) = pqk−1
We call this function the Geometric distribution, i.e.,
G (k; p) = pqk−1.
The geometric distribution is sometimes used to model“lifetimes” or “time until failure” of components.
For example,, if the probability is 0.001 that a light bulb willfail on any given day, what is the probability that it will last atleast 30 days?
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 8 / 28
Discrete Distributions
Geometric distribution
Let r.v. Y be # experiments until the first success obtained inindependent Bernoulli trials.
P(Y = k) = P(X1 = 0 ∧ X2 = 0 ∧ · · ·Xk−1 = 0 ∧ Xk = 1)
= Πk−1i=1 P(Xi = 0) · P(Xk = 1) = pqk−1
We call this function the Geometric distribution, i.e.,
G (k; p) = pqk−1.
The geometric distribution is sometimes used to model“lifetimes” or “time until failure” of components.
For example,, if the probability is 0.001 that a light bulb willfail on any given day, what is the probability that it will last atleast 30 days?
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 8 / 28
Discrete Distributions
Geometric distribution
Let r.v. Y be # experiments until the first success obtained inindependent Bernoulli trials.
P(Y = k) = P(X1 = 0 ∧ X2 = 0 ∧ · · ·Xk−1 = 0 ∧ Xk = 1)
= Πk−1i=1 P(Xi = 0) · P(Xk = 1) = pqk−1
We call this function the Geometric distribution, i.e.,
G (k; p) = pqk−1.
The geometric distribution is sometimes used to model“lifetimes” or “time until failure” of components.
For example,, if the probability is 0.001 that a light bulb willfail on any given day, what is the probability that it will last atleast 30 days?
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 8 / 28
Discrete Distributions
Expectation of Geometric r.v.s
Theorem
E (X ) and Var(X ) when a r.v. X follows a Geometric distributionare 1
p and qp2 , where p is the probability of success on each trial.
Proof.
We have known that P(X = k) = qk−1p. Hence, we have
E (X ) =∞∑k=0
k · qk−1p = p(∞∑
m=1
∞∑k=m
qk−1)
= p(∞∑
m=1
qm−1
1− q) =
∞∑m=1
qm−1
=1
1− q=
1
p
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 9 / 28
Discrete Distributions
Expectation of Geometric r.v.s
Theorem
E (X ) and Var(X ) when a r.v. X follows a Geometric distributionare 1
p and qp2 , where p is the probability of success on each trial.
Proof.
We have known that P(X = k) = qk−1p. Hence, we have
E (X ) =∞∑k=0
k · qk−1p = p(∞∑
m=1
∞∑k=m
qk−1)
= p(∞∑
m=1
qm−1
1− q) =
∞∑m=1
qm−1
=1
1− q=
1
p
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 9 / 28
Discrete Distributions
Expectation of Geometric r.v.s
Theorem
E (X ) and Var(X ) when a r.v. X follows a Geometric distributionare 1
p and qp2 , where p is the probability of success on each trial.
Proof.
We have known that P(X = k) = qk−1p. Hence, we have
E (X ) =∞∑k=0
k · qk−1p = p(∞∑
m=1
∞∑k=m
qk−1)
= p(∞∑
m=1
qm−1
1− q) =
∞∑m=1
qm−1
=1
1− q=
1
p
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 9 / 28
Discrete Distributions
Variance of Geometric r.v.s
E (X 2) =∞∑k=0
k2 · P(X = k) =∞∑k=1
[k(k − 1) + k] · P(X = k)
= p∞∑k=2
(2k−1∑j=1
j)qk−1 +1
p
= 2p∞∑j=1
∞∑k=j+1
(jqk−1) +1
p
=2q
p2+
1
p=
2q + p
p2
V (X ) =2q + p
p2− (
1
p)2 =
2q − (1− p)
p2=
q
p2.
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 10 / 28
Discrete Distributions
Variance of Geometric r.v.s
E (X 2) =∞∑k=0
k2 · P(X = k) =∞∑k=1
[k(k − 1) + k] · P(X = k)
= p∞∑k=2
(2k−1∑j=1
j)qk−1 +1
p
= 2p∞∑j=1
∞∑k=j+1
(jqk−1) +1
p
=2q
p2+
1
p=
2q + p
p2
V (X ) =2q + p
p2− (
1
p)2 =
2q − (1− p)
p2=
q
p2.
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 10 / 28
Discrete Distributions
Hypergeometric distributions
Suppose we have a large urn filled with N balls that are identical in everyway except that M are red and N −M are green. Let a r.v., denoted as X ,be the number of red balls in a sample of size K .
The r.v. X has a hypergeometric distribution given by
P(X = x |N,M,K ) =
(Mx
)(N−MK−x
)(NK
) , x = 1, 2, · · · ,K
∑Kx=0
(Mx
)(N−MK−x
)=(NK
);
K∑x=0
P(X = x) =K∑
x=0
(Mx
)(N−MK−x
)(NK
) = 1.
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 11 / 28
Discrete Distributions
Hypergeometric distributions Cont’d
Expectation and variance
E (X ) =K∑
x=0
x
(Mx
)(N−MK−x
)(NK
) =K∑
x=1
M
(M−1x−1
)(N−MK−x
)NK
(N−1K−1
)=
K∑y=0
M
(M−1y
)((N−1)−(M−1)K−1−y
)NK
(N−1K−1
) =KM
N.
Var(X ) =KM
N· (N −M)(N − K )
N(N − 1).
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 12 / 28
Discrete Distributions
Possion distributions
A r.v. X , taking the values in the nonnegative integers, has aPoisson(λ) distribution if
P(X = x |λ) =λx
x!e−λ, x = 0, 1, 2, · · · .
Note that∑∞
k=0xk
k! = ex ;
E (X ) =∑∞
x=0 xλx
x! e−λ = λe−λ
∑∞y=0
λx−1
(x−1)! = λ;
E (X 2) =∞∑x=0
x2λx
x!e−λ =
∞∑x=1
[x(x − 1) + x ]λx
x!e−λ = λ2 + λ;
Var(X ) = λ
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 13 / 28
Continuous Distributions
Continuous uniform distribution
The continuous uniform distribution is defined by spreading massuniformly over an interval [a, b]. Its pdf is given by
f (x |a, b)
{1
b−a , if x ∈ [a, b];
0, otherwise.
E (X ) =∫ ba
xb−adx = a+b
2 ;
Var(X ) =∫ ba
(x− a+b2
)2
b−a dx = (b−a)2
12 .
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 14 / 28
Continuous Distributions
Gamma distribution
Note that, if α > 0, then∫ +∞
0 tα−1e−tdt <∞.Let Γ(α) =
∫ +∞0 tα−1e−tdt, Γ(α + 1) = Γ(α);
Γ(n) = (n − 1)!;
Γ( 12 ) =
√π.
Gamma distribution is defined the interval [0,+∞). Its pdf is givenby
f (x |α, β) =1
Γ(α)βαxα−1e−x/β, 0 ≤ x <∞, α > 0, β > 0.
E (X ) = 1Γ(α)βα
∫ +∞0 xαe−x/βdx = αβ;
Var(X ) = 1Γ(α)βα
∫ +∞0 xα+1e−x/βdx − (αβ)2 = αβ2.
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 15 / 28
Continuous Distributions
Special cases of Gamma distribution
Chi squared distribution
Let α = p2 , and p ∈ Z+, β = 2, then its pdf is given by
f (x |p) =1
Γ( p2 )2
p2
xp2 −1e−x/2, 0 ≤ x <∞,
which is the chi squared pdf with p degree of freedom. Note thatE (X ) = p,Var(X ) = 2p.
Exponential distribution
If we set α = 1 for Gamma distribution, then its pdf is given by
f (x |p) =1
βe−x/β , 0 ≤ x <∞.
Note that E (X ) = β,Var(X ) = β2.
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 16 / 28
Continuous Distributions
Special cases of Gamma distribution
Chi squared distribution
Let α = p2 , and p ∈ Z+, β = 2, then its pdf is given by
f (x |p) =1
Γ( p2 )2
p2
xp2 −1e−x/2, 0 ≤ x <∞,
which is the chi squared pdf with p degree of freedom. Note thatE (X ) = p,Var(X ) = 2p.
Exponential distribution
If we set α = 1 for Gamma distribution, then its pdf is given by
f (x |p) =1
βe−x/β , 0 ≤ x <∞.
Note that E (X ) = β,Var(X ) = β2.
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 16 / 28
Continuous Distributions
Normal distribution/Gaussian distribution
The pdf of the normal distribution with mean µ and variance σ2,denoted as N(µ, σ2), is given by
f (x |µ, σ2) =1√2πσ
e−(x−µ)2
2σ2 ,−∞ < x <∞.
E (X ) = µ,Var(X ) = σ2;
The standard normal distribution is N(0, 1) with pdf
f (x |0, 1) = 1√2πe−
x2
2 ;
If X ∼ N(µ, σ2), the r.v. Z = X−µσ ∼ N(0, 1);
P(|X − µ| ≤ σ) = P(|Z | ≤ 1) = 0.6826; (1)
P(|X − µ| ≤ 2σ) = P(|Z | ≤ 2) = 0.9544; (2)
P(|X − µ| ≤ 3σ) = P(|Z | ≤ 3) = 0.9974. (3)
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 17 / 28
Continuous Distributions
Normal distribution/Gaussian distribution
The pdf of the normal distribution with mean µ and variance σ2,denoted as N(µ, σ2), is given by
f (x |µ, σ2) =1√2πσ
e−(x−µ)2
2σ2 ,−∞ < x <∞.
E (X ) = µ,Var(X ) = σ2;
The standard normal distribution is N(0, 1) with pdf
f (x |0, 1) = 1√2πe−
x2
2 ;
If X ∼ N(µ, σ2), the r.v. Z = X−µσ ∼ N(0, 1);
P(|X − µ| ≤ σ) = P(|Z | ≤ 1) = 0.6826; (1)
P(|X − µ| ≤ 2σ) = P(|Z | ≤ 2) = 0.9544; (2)
P(|X − µ| ≤ 3σ) = P(|Z | ≤ 3) = 0.9974. (3)
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 17 / 28
Continuous Distributions
Normal distribution/Gaussian distribution
The pdf of the normal distribution with mean µ and variance σ2,denoted as N(µ, σ2), is given by
f (x |µ, σ2) =1√2πσ
e−(x−µ)2
2σ2 ,−∞ < x <∞.
E (X ) = µ,Var(X ) = σ2;
The standard normal distribution is N(0, 1) with pdf
f (x |0, 1) = 1√2πe−
x2
2 ;
If X ∼ N(µ, σ2), the r.v. Z = X−µσ ∼ N(0, 1);
P(|X − µ| ≤ σ) = P(|Z | ≤ 1) = 0.6826; (1)
P(|X − µ| ≤ 2σ) = P(|Z | ≤ 2) = 0.9544; (2)
P(|X − µ| ≤ 3σ) = P(|Z | ≤ 3) = 0.9974. (3)
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 17 / 28
Continuous Distributions
Normal distribution/Gaussian distribution
The pdf of the normal distribution with mean µ and variance σ2,denoted as N(µ, σ2), is given by
f (x |µ, σ2) =1√2πσ
e−(x−µ)2
2σ2 ,−∞ < x <∞.
E (X ) = µ,Var(X ) = σ2;
The standard normal distribution is N(0, 1) with pdf
f (x |0, 1) = 1√2πe−
x2
2 ;
If X ∼ N(µ, σ2), the r.v. Z = X−µσ ∼ N(0, 1);
P(|X − µ| ≤ σ) = P(|Z | ≤ 1) = 0.6826; (1)
P(|X − µ| ≤ 2σ) = P(|Z | ≤ 2) = 0.9544; (2)
P(|X − µ| ≤ 3σ) = P(|Z | ≤ 3) = 0.9974. (3)
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 17 / 28
Continuous Distributions
Normal distribution/Gaussian distribution
The pdf of the normal distribution with mean µ and variance σ2,denoted as N(µ, σ2), is given by
f (x |µ, σ2) =1√2πσ
e−(x−µ)2
2σ2 ,−∞ < x <∞.
E (X ) = µ,Var(X ) = σ2;
The standard normal distribution is N(0, 1) with pdf
f (x |0, 1) = 1√2πe−
x2
2 ;
If X ∼ N(µ, σ2), the r.v. Z = X−µσ ∼ N(0, 1);
P(|X − µ| ≤ σ) = P(|Z | ≤ 1) = 0.6826; (1)
P(|X − µ| ≤ 2σ) = P(|Z | ≤ 2) = 0.9544; (2)
P(|X − µ| ≤ 3σ) = P(|Z | ≤ 3) = 0.9974. (3)
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 17 / 28
Continuous Distributions
Normal approximation
Let X ∼ binomial(25, 0.6). We can approximate X with a normalr.v., Y , with mean µ = 25 × 0.6 = 15 and standard deviation σ =√
25× 0.6(1− 0.6) = 2.45. Thus
P(X ≤ 13) ≈ P(Y ≤ 13) = P(Z ≤ 13− 15
2.45) (4)
= P(Z ≤ −0.82) = 0.206; (5)
P(X ≤ 13) =13∑x=0
(25
x
)0.6x0.425−x = 0.267. (6)
In general, X ∼ binomial(n, p), then E (X ) = np and Var(X ) =np(1− p). We can approximate the distribution with N(np, np(1−p)).
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 18 / 28
Continuous Distributions
Normal approximation
Let X ∼ binomial(25, 0.6). We can approximate X with a normalr.v., Y , with mean µ = 25 × 0.6 = 15 and standard deviation σ =√
25× 0.6(1− 0.6) = 2.45. Thus
P(X ≤ 13) ≈ P(Y ≤ 13) = P(Z ≤ 13− 15
2.45) (4)
= P(Z ≤ −0.82) = 0.206; (5)
P(X ≤ 13) =13∑x=0
(25
x
)0.6x0.425−x = 0.267. (6)
In general, X ∼ binomial(n, p), then E (X ) = np and Var(X ) =np(1− p). We can approximate the distribution with N(np, np(1−p)).
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 18 / 28
Continuous Distributions
Beta distribution
The beta family of distribution is a continuous family on (0, 1) in-dexed by two parameters. The Beta(α, β) pdf is
f (x |α, β) =1
B(α, β)xα−1(1− x)β−1, 0 < x < 1, α > 0, β > 0,
where B(α, β) denotes the beta function,
B(α, β) =
∫ 1
0xα−1(1− x)β−1dx .
B(α, β) = Γ(α)Γ(β)Γ(α+β) .
E (X ) = αα+β , and Var(X ) = αβ
(α+β)2(α+β+1);
E (X n) = Γ(α+n)Γ(α+β)Γ(α+β+n)Γ(α) .
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 19 / 28
Continuous Distributions
Beta distribution
The beta family of distribution is a continuous family on (0, 1) in-dexed by two parameters. The Beta(α, β) pdf is
f (x |α, β) =1
B(α, β)xα−1(1− x)β−1, 0 < x < 1, α > 0, β > 0,
where B(α, β) denotes the beta function,
B(α, β) =
∫ 1
0xα−1(1− x)β−1dx .
B(α, β) = Γ(α)Γ(β)Γ(α+β) .
E (X ) = αα+β , and Var(X ) = αβ
(α+β)2(α+β+1);
E (X n) = Γ(α+n)Γ(α+β)Γ(α+β+n)Γ(α) .
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 19 / 28
Continuous Distributions
Beta distribution
The beta family of distribution is a continuous family on (0, 1) in-dexed by two parameters. The Beta(α, β) pdf is
f (x |α, β) =1
B(α, β)xα−1(1− x)β−1, 0 < x < 1, α > 0, β > 0,
where B(α, β) denotes the beta function,
B(α, β) =
∫ 1
0xα−1(1− x)β−1dx .
B(α, β) = Γ(α)Γ(β)Γ(α+β) .
E (X ) = αα+β , and Var(X ) = αβ
(α+β)2(α+β+1);
E (X n) = Γ(α+n)Γ(α+β)Γ(α+β+n)Γ(α) .
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 19 / 28
Continuous Distributions
Beta distribution
The beta family of distribution is a continuous family on (0, 1) in-dexed by two parameters. The Beta(α, β) pdf is
f (x |α, β) =1
B(α, β)xα−1(1− x)β−1, 0 < x < 1, α > 0, β > 0,
where B(α, β) denotes the beta function,
B(α, β) =
∫ 1
0xα−1(1− x)β−1dx .
B(α, β) = Γ(α)Γ(β)Γ(α+β) .
E (X ) = αα+β , and Var(X ) = αβ
(α+β)2(α+β+1);
E (X n) = Γ(α+n)Γ(α+β)Γ(α+β+n)Γ(α) .
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 19 / 28
Continuous Distributions
Cauchy distribution
The Cauchy distribution is a symmetric, bell-shaped distribution on(−∞,+∞) with pdf
f (x |θ) =1
π
1
1 + (x − θ)2,−∞ < x < +∞,−∞ < θ < +∞.
E |X | =∫ +∞−∞
1π
|x |1+(x−θ)2 dx =∞;
The parameter θ does measure the center of the distribution; itis the median.
P(X ≥ θ) = 12 .
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 20 / 28
Continuous Distributions
Cauchy distribution
The Cauchy distribution is a symmetric, bell-shaped distribution on(−∞,+∞) with pdf
f (x |θ) =1
π
1
1 + (x − θ)2,−∞ < x < +∞,−∞ < θ < +∞.
E |X | =∫ +∞−∞
1π
|x |1+(x−θ)2 dx =∞;
The parameter θ does measure the center of the distribution; itis the median.
P(X ≥ θ) = 12 .
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 20 / 28
Continuous Distributions
Cauchy distribution
The Cauchy distribution is a symmetric, bell-shaped distribution on(−∞,+∞) with pdf
f (x |θ) =1
π
1
1 + (x − θ)2,−∞ < x < +∞,−∞ < θ < +∞.
E |X | =∫ +∞−∞
1π
|x |1+(x−θ)2 dx =∞;
The parameter θ does measure the center of the distribution; itis the median.
P(X ≥ θ) = 12 .
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 20 / 28
Continuous Distributions
Cauchy distribution
The Cauchy distribution is a symmetric, bell-shaped distribution on(−∞,+∞) with pdf
f (x |θ) =1
π
1
1 + (x − θ)2,−∞ < x < +∞,−∞ < θ < +∞.
E |X | =∫ +∞−∞
1π
|x |1+(x−θ)2 dx =∞;
The parameter θ does measure the center of the distribution; itis the median.
P(X ≥ θ) = 12 .
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 20 / 28
Continuous Distributions
Lognormal distribution
If X is a r.v. whose logarithm is normally distributed, that is, logX ∼N(µ, σ2), then X has a lognormal distribution. Its pdf is
f (x |µ, σ2) =1√2πσ
1
xe−
(log x−µ)2
2σ2 , x > 0,−∞ < µ < +∞, σ > 0.
E (X ) = E (e log X ) = eµ+σ2/2;
Var(X ) = e2(µ+σ2) − e2µ+σ2.
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 21 / 28
Continuous Distributions
Lognormal distribution
If X is a r.v. whose logarithm is normally distributed, that is, logX ∼N(µ, σ2), then X has a lognormal distribution. Its pdf is
f (x |µ, σ2) =1√2πσ
1
xe−
(log x−µ)2
2σ2 , x > 0,−∞ < µ < +∞, σ > 0.
E (X ) = E (e log X ) = eµ+σ2/2;
Var(X ) = e2(µ+σ2) − e2µ+σ2.
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 21 / 28
Continuous Distributions
Lognormal distribution
If X is a r.v. whose logarithm is normally distributed, that is, logX ∼N(µ, σ2), then X has a lognormal distribution. Its pdf is
f (x |µ, σ2) =1√2πσ
1
xe−
(log x−µ)2
2σ2 , x > 0,−∞ < µ < +∞, σ > 0.
E (X ) = E (e log X ) = eµ+σ2/2;
Var(X ) = e2(µ+σ2) − e2µ+σ2.
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 21 / 28
Exponential Family
Exponential family
A family of pdfs or pmfs is called an exponential family if it can beexpressed as
f (x |θ) = h(x)c(θ)exp
(∑ki=1 wi (θ)ti (x)
).
Here h(x) ≥ 0 and ti (x) is real-valued function of the observationx , and c(θ) ≥ 0 and wi (θ) is real-valued function of the possiblyvector-valued parameter θ.
To verify that a family of pdfs or pmfs is an exponential family,we must identify the functions h(x), c(θ),wi (x), and ti (x) andshow that the family has the above form.
Bernoulli, Gaussian, Binomial, Poisson, Exponential, Weibull,Laplace, Gamma, Beta, Multinomial, Wishart distributions areall exponential families
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 22 / 28
Exponential Family
Exponential family
A family of pdfs or pmfs is called an exponential family if it can beexpressed as
f (x |θ) = h(x)c(θ)exp
(∑ki=1 wi (θ)ti (x)
).
Here h(x) ≥ 0 and ti (x) is real-valued function of the observationx , and c(θ) ≥ 0 and wi (θ) is real-valued function of the possiblyvector-valued parameter θ.
To verify that a family of pdfs or pmfs is an exponential family,we must identify the functions h(x), c(θ),wi (x), and ti (x) andshow that the family has the above form.
Bernoulli, Gaussian, Binomial, Poisson, Exponential, Weibull,Laplace, Gamma, Beta, Multinomial, Wishart distributions areall exponential families
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 22 / 28
Exponential Family
Exponential family
A family of pdfs or pmfs is called an exponential family if it can beexpressed as
f (x |θ) = h(x)c(θ)exp
(∑ki=1 wi (θ)ti (x)
).
Here h(x) ≥ 0 and ti (x) is real-valued function of the observationx , and c(θ) ≥ 0 and wi (θ) is real-valued function of the possiblyvector-valued parameter θ.
To verify that a family of pdfs or pmfs is an exponential family,we must identify the functions h(x), c(θ),wi (x), and ti (x) andshow that the family has the above form.
Bernoulli, Gaussian, Binomial, Poisson, Exponential, Weibull,Laplace, Gamma, Beta, Multinomial, Wishart distributions areall exponential families
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 22 / 28
Exponential Family
Example: binomial distribution
Let consider the binomial(n, p) family, where n ∈ Z+
f (x |p) =
(n
x
)px(1− p)n−x =
(n
x
)(1− p)n(
p
1− p)x
=
(n
x
)(1− p)nexpx log ( p
1−p ) (7)
Define
h(x) =
{ (nx
), x ∈ [0, n];
0, otherwise., c(p) = (1− p)n (8)
w1(p) = log (p
1− p), t1(x) = x . (9)
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 23 / 28
Exponential Family
Example: binomial distribution
Let consider the binomial(n, p) family, where n ∈ Z+
f (x |p) =
(n
x
)px(1− p)n−x =
(n
x
)(1− p)n(
p
1− p)x
=
(n
x
)(1− p)nexpx log ( p
1−p ) (7)
Define
h(x) =
{ (nx
), x ∈ [0, n];
0, otherwise., c(p) = (1− p)n (8)
w1(p) = log (p
1− p), t1(x) = x . (9)
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 23 / 28
Exponential Family
Example: normal distribution
Let consider the normal family N(µ, σ2)
f (x |µ, σ2) =1√2πσ
e−(x−µ)2
2σ2
=1√2πσ
e−µ2
2σ2 e−x2
2σ2 + µx
σ2 (10)
Define
h(x) = 1, c(θ) = c(µ, σ) =1√2πσ
e−µ2
2σ2
w1(µ, σ) =1
σ2,w1(µ, σ) =
µ
σ2
t1(x) = −x2
2, t2(x) = x . (11)
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 24 / 28
Exponential Family
Example: normal distribution
Let consider the normal family N(µ, σ2)
f (x |µ, σ2) =1√2πσ
e−(x−µ)2
2σ2
=1√2πσ
e−µ2
2σ2 e−x2
2σ2 + µx
σ2 (10)
Define
h(x) = 1, c(θ) = c(µ, σ) =1√2πσ
e−µ2
2σ2
w1(µ, σ) =1
σ2,w1(µ, σ) =
µ
σ2
t1(x) = −x2
2, t2(x) = x . (11)
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 24 / 28
Exponential Family
Example: normal distribution
Let consider the binomial(n, p) family, where n ∈ Z+
f (x |p) =
(n
x
)px(1− p)n−x =
(n
x
)(1− p)n(
p
1− p)x
=
(n
x
)(1− p)nexpx log ( p
1−p ) (12)
Define
h(x) =
{ (nx
), x ∈ [0, n];
0, otherwise., c(p) = (1− p)n (13)
w1(p) = log (p
1− p), t1(x) = x . (14)
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 25 / 28
Exponential Family
Example: normal distribution
Let consider the binomial(n, p) family, where n ∈ Z+
f (x |p) =
(n
x
)px(1− p)n−x =
(n
x
)(1− p)n(
p
1− p)x
=
(n
x
)(1− p)nexpx log ( p
1−p ) (12)
Define
h(x) =
{ (nx
), x ∈ [0, n];
0, otherwise., c(p) = (1− p)n (13)
w1(p) = log (p
1− p), t1(x) = x . (14)
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 25 / 28
Exponential Family
Theorem
If X is a r.v. with pdf or pmf,
E(∑k
i=1∂wi (θ)∂θj
ti (X ))
= − ∂∂θj
log c(θ);
Var(∑k
i=1∂wi (θ)∂θj
ti (X ))
=
− ∂2
∂θ2j
log c(θ)− E(∑k
i=1∂2wi (θ)∂θ2
jti (X )
).
Their advantage is that we can replace integration or summation bydifferentiation, which is often more straightforward.
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 26 / 28
Exponential Family
Theorem
If X is a r.v. with pdf or pmf,
E(∑k
i=1∂wi (θ)∂θj
ti (X ))
= − ∂∂θj
log c(θ);
Var(∑k
i=1∂wi (θ)∂θj
ti (X ))
=
− ∂2
∂θ2j
log c(θ)− E(∑k
i=1∂2wi (θ)∂θ2
jti (X )
).
Their advantage is that we can replace integration or summation bydifferentiation, which is often more straightforward.
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 26 / 28
Exponential Family
Binomial mean and variance
For the binomial distribution, we have
d
dpw1(p) =
d
dplog
p
1− p=
1
p(1− p)(15)
d
dplog c(p) =
d
dpn log 1− p =
−np(1− p)
(16)
Thus, we have
E( X
p(1− p)
)=
n
1− p.
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 27 / 28
Exponential Family
Binomial mean and variance
For the binomial distribution, we have
d
dpw1(p) =
d
dplog
p
1− p=
1
p(1− p)(15)
d
dplog c(p) =
d
dpn log 1− p =
−np(1− p)
(16)
Thus, we have
E( X
p(1− p)
)=
n
1− p.
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 27 / 28
Take-aways
Take-aways
Conclusions
Discrete distributions
Continuous distributions
Exponential family
Location and scale families
Inequalities
MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 28 / 28