Statistical Inference - Lecture 3: Common Families of...

Post on 27-Jun-2020

2 views 0 download

Transcript of Statistical Inference - Lecture 3: Common Families of...

Statistical InferenceLecture 3: Common Families of Distributions

MING GAO

DASE @ ECNU(for course related communications)

mgao@dase.ecnu.edu.cn

Mar. 24, 2020

Outline

1 Discrete Distributions

2 Continuous Distributions

3 Exponential Family

4 Location and Scale Families

5 Take-aways

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 2 / 28

Discrete Distributions

Discrete distributions

A r.v. X is said to have a discrete distribution if the range of X iscountable. In most situations, the r.v. has integer-valued outcomes.

Discrete uniform distribution

A r.v. X has a discrete uniform (1,N) distribution if

P(X = x |N) =1

N, x = 1, 2, · · · ,N,

where N is a specified integer. This distribution puts equal mass oneach of the outcomes 1, 2, · · · ,N.∑k

i=1 i = k(k+1)2 , and

∑ki=1 i

2 = k(k+1)(2k+1)6 .

E (X ) =∑N

x=1 xP(X = x |N) = N+12 ;

Var(X ) = E (X 2)− E (X )2 = (N+1)(N−1)12 .

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 3 / 28

Discrete Distributions

Bernoulli Trials

Definition

Each performance of an experiment with two possible outcomes iscalled a Bernoulli trial.

In general, a possible outcome of a Bernoulli trial is called asuccess or a failure.

If p is the probability of a success and q is the probability of afailure, it follows that p + q = 1.

E (X ) = 0·(1−p)+1·p = p and E (X 2) = 02 ·(1−p)+12 ·p = p;

Var(X ) = p(1− p).

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 4 / 28

Discrete Distributions

Bernoulli Trials

Definition

Each performance of an experiment with two possible outcomes iscalled a Bernoulli trial.

In general, a possible outcome of a Bernoulli trial is called asuccess or a failure.

If p is the probability of a success and q is the probability of afailure, it follows that p + q = 1.

E (X ) = 0·(1−p)+1·p = p and E (X 2) = 02 ·(1−p)+12 ·p = p;

Var(X ) = p(1− p).

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 4 / 28

Discrete Distributions

Binomial distribution

Many problems can be solved by determining the probability of k successeswhen an experiment consists of n mutually independent Bernoulli trials.

Let r.v. Xi be the i−th experimental outcome (i = 1, 2, · · · , n),where Xi denote whether it successes or not. Hence, we have

Xi =

{1, if we obtain head with probability p;0, otherwise with probability (1− p).

Let r.v. X =∑n

i=1 Xi . We have

P(X = x |n, p) =

(n

x

)px(1− p)n−x , x = 0, 1, 2, · · · , n.

We call this function the binomial distribution, i.e., B(k ; n, p) =P(X = k) = C (n, k)pkqn−k .

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 5 / 28

Discrete Distributions

Binomial distribution

Many problems can be solved by determining the probability of k successeswhen an experiment consists of n mutually independent Bernoulli trials.

Let r.v. Xi be the i−th experimental outcome (i = 1, 2, · · · , n),where Xi denote whether it successes or not. Hence, we have

Xi =

{1, if we obtain head with probability p;0, otherwise with probability (1− p).

Let r.v. X =∑n

i=1 Xi . We have

P(X = x |n, p) =

(n

x

)px(1− p)n−x , x = 0, 1, 2, · · · , n.

We call this function the binomial distribution, i.e., B(k ; n, p) =P(X = k) = C (n, k)pkqn−k .

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 5 / 28

Discrete Distributions

Expected value of Binomial r.v.s

Theorem

The expected number of successes when n mutually independentBernoulli trials are performed, where p is the probability of successon each trial, is np.

Proof.

Let X be the r.v. equal to # successes in n trials. We have knownthat P(X = k) = C (n, k)pkqn−k . Hence, we have

E (X ) =n∑

k=0

k · P(X = k) =n∑

k=1

k · C (n, k)pkqn−k

=n∑

k=1

n ·(n − 1

k − 1

)pkqn−k = np

n−1∑j=0

(n − 1

j

)pjqn−1−j

= np(p + q)n−1 = np

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 6 / 28

Discrete Distributions

Expected value of Binomial r.v.s

Theorem

The expected number of successes when n mutually independentBernoulli trials are performed, where p is the probability of successon each trial, is np.

Proof.

Let X be the r.v. equal to # successes in n trials. We have knownthat P(X = k) = C (n, k)pkqn−k . Hence, we have

E (X ) =n∑

k=0

k · P(X = k) =n∑

k=1

k · C (n, k)pkqn−k

=n∑

k=1

n ·(n − 1

k − 1

)pkqn−k = np

n−1∑j=0

(n − 1

j

)pjqn−1−j

= np(p + q)n−1 = np

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 6 / 28

Discrete Distributions

Variance of Binomial r.v.s

Question: Let r.v. X be the number of successes of n mutuallyindependent Bernoulli trials, where p is the probability of success oneach trial. What is the variance of X?

Solution:

E(X 2) =n∑

k=0

k2 · P(X = k) =n∑

k=1

k(k − 1) · P(X = k) +n∑

k=1

k · P(X = k)

= n(n − 1)p2n∑

k=2

(n − 2

k − 2

)pk−2qn−k + np

= n(n − 1)p2n−2∑j=0

(n − 2

j

)pjqn−2−j + np

= n(n − 1)p2(p + q)n−2 + np = n(n − 1)p2 + np,

V (X ) = E(X 2)− (E(X ))2 = n(n − 1)p2 + np − (np)2 = np(1− p).

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 7 / 28

Discrete Distributions

Variance of Binomial r.v.s

Question: Let r.v. X be the number of successes of n mutuallyindependent Bernoulli trials, where p is the probability of success oneach trial. What is the variance of X?Solution:

E(X 2) =n∑

k=0

k2 · P(X = k) =n∑

k=1

k(k − 1) · P(X = k) +n∑

k=1

k · P(X = k)

= n(n − 1)p2n∑

k=2

(n − 2

k − 2

)pk−2qn−k + np

= n(n − 1)p2n−2∑j=0

(n − 2

j

)pjqn−2−j + np

= n(n − 1)p2(p + q)n−2 + np = n(n − 1)p2 + np,

V (X ) = E(X 2)− (E(X ))2 = n(n − 1)p2 + np − (np)2 = np(1− p).

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 7 / 28

Discrete Distributions

Geometric distribution

Let r.v. Y be # experiments until the first success obtained inindependent Bernoulli trials.

P(Y = k) = P(X1 = 0 ∧ X2 = 0 ∧ · · ·Xk−1 = 0 ∧ Xk = 1)

= Πk−1i=1 P(Xi = 0) · P(Xk = 1) = pqk−1

We call this function the Geometric distribution, i.e.,

G (k; p) = pqk−1.

The geometric distribution is sometimes used to model“lifetimes” or “time until failure” of components.

For example,, if the probability is 0.001 that a light bulb willfail on any given day, what is the probability that it will last atleast 30 days?

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 8 / 28

Discrete Distributions

Geometric distribution

Let r.v. Y be # experiments until the first success obtained inindependent Bernoulli trials.

P(Y = k) = P(X1 = 0 ∧ X2 = 0 ∧ · · ·Xk−1 = 0 ∧ Xk = 1)

= Πk−1i=1 P(Xi = 0) · P(Xk = 1) = pqk−1

We call this function the Geometric distribution, i.e.,

G (k; p) = pqk−1.

The geometric distribution is sometimes used to model“lifetimes” or “time until failure” of components.

For example,, if the probability is 0.001 that a light bulb willfail on any given day, what is the probability that it will last atleast 30 days?

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 8 / 28

Discrete Distributions

Geometric distribution

Let r.v. Y be # experiments until the first success obtained inindependent Bernoulli trials.

P(Y = k) = P(X1 = 0 ∧ X2 = 0 ∧ · · ·Xk−1 = 0 ∧ Xk = 1)

= Πk−1i=1 P(Xi = 0) · P(Xk = 1) = pqk−1

We call this function the Geometric distribution, i.e.,

G (k; p) = pqk−1.

The geometric distribution is sometimes used to model“lifetimes” or “time until failure” of components.

For example,, if the probability is 0.001 that a light bulb willfail on any given day, what is the probability that it will last atleast 30 days?

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 8 / 28

Discrete Distributions

Expectation of Geometric r.v.s

Theorem

E (X ) and Var(X ) when a r.v. X follows a Geometric distributionare 1

p and qp2 , where p is the probability of success on each trial.

Proof.

We have known that P(X = k) = qk−1p. Hence, we have

E (X ) =∞∑k=0

k · qk−1p = p(∞∑

m=1

∞∑k=m

qk−1)

= p(∞∑

m=1

qm−1

1− q) =

∞∑m=1

qm−1

=1

1− q=

1

p

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 9 / 28

Discrete Distributions

Expectation of Geometric r.v.s

Theorem

E (X ) and Var(X ) when a r.v. X follows a Geometric distributionare 1

p and qp2 , where p is the probability of success on each trial.

Proof.

We have known that P(X = k) = qk−1p. Hence, we have

E (X ) =∞∑k=0

k · qk−1p = p(∞∑

m=1

∞∑k=m

qk−1)

= p(∞∑

m=1

qm−1

1− q) =

∞∑m=1

qm−1

=1

1− q=

1

p

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 9 / 28

Discrete Distributions

Expectation of Geometric r.v.s

Theorem

E (X ) and Var(X ) when a r.v. X follows a Geometric distributionare 1

p and qp2 , where p is the probability of success on each trial.

Proof.

We have known that P(X = k) = qk−1p. Hence, we have

E (X ) =∞∑k=0

k · qk−1p = p(∞∑

m=1

∞∑k=m

qk−1)

= p(∞∑

m=1

qm−1

1− q) =

∞∑m=1

qm−1

=1

1− q=

1

p

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 9 / 28

Discrete Distributions

Variance of Geometric r.v.s

E (X 2) =∞∑k=0

k2 · P(X = k) =∞∑k=1

[k(k − 1) + k] · P(X = k)

= p∞∑k=2

(2k−1∑j=1

j)qk−1 +1

p

= 2p∞∑j=1

∞∑k=j+1

(jqk−1) +1

p

=2q

p2+

1

p=

2q + p

p2

V (X ) =2q + p

p2− (

1

p)2 =

2q − (1− p)

p2=

q

p2.

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 10 / 28

Discrete Distributions

Variance of Geometric r.v.s

E (X 2) =∞∑k=0

k2 · P(X = k) =∞∑k=1

[k(k − 1) + k] · P(X = k)

= p∞∑k=2

(2k−1∑j=1

j)qk−1 +1

p

= 2p∞∑j=1

∞∑k=j+1

(jqk−1) +1

p

=2q

p2+

1

p=

2q + p

p2

V (X ) =2q + p

p2− (

1

p)2 =

2q − (1− p)

p2=

q

p2.

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 10 / 28

Discrete Distributions

Hypergeometric distributions

Suppose we have a large urn filled with N balls that are identical in everyway except that M are red and N −M are green. Let a r.v., denoted as X ,be the number of red balls in a sample of size K .

The r.v. X has a hypergeometric distribution given by

P(X = x |N,M,K ) =

(Mx

)(N−MK−x

)(NK

) , x = 1, 2, · · · ,K

∑Kx=0

(Mx

)(N−MK−x

)=(NK

);

K∑x=0

P(X = x) =K∑

x=0

(Mx

)(N−MK−x

)(NK

) = 1.

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 11 / 28

Discrete Distributions

Hypergeometric distributions Cont’d

Expectation and variance

E (X ) =K∑

x=0

x

(Mx

)(N−MK−x

)(NK

) =K∑

x=1

M

(M−1x−1

)(N−MK−x

)NK

(N−1K−1

)=

K∑y=0

M

(M−1y

)((N−1)−(M−1)K−1−y

)NK

(N−1K−1

) =KM

N.

Var(X ) =KM

N· (N −M)(N − K )

N(N − 1).

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 12 / 28

Discrete Distributions

Possion distributions

A r.v. X , taking the values in the nonnegative integers, has aPoisson(λ) distribution if

P(X = x |λ) =λx

x!e−λ, x = 0, 1, 2, · · · .

Note that∑∞

k=0xk

k! = ex ;

E (X ) =∑∞

x=0 xλx

x! e−λ = λe−λ

∑∞y=0

λx−1

(x−1)! = λ;

E (X 2) =∞∑x=0

x2λx

x!e−λ =

∞∑x=1

[x(x − 1) + x ]λx

x!e−λ = λ2 + λ;

Var(X ) = λ

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 13 / 28

Continuous Distributions

Continuous uniform distribution

The continuous uniform distribution is defined by spreading massuniformly over an interval [a, b]. Its pdf is given by

f (x |a, b)

{1

b−a , if x ∈ [a, b];

0, otherwise.

E (X ) =∫ ba

xb−adx = a+b

2 ;

Var(X ) =∫ ba

(x− a+b2

)2

b−a dx = (b−a)2

12 .

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 14 / 28

Continuous Distributions

Gamma distribution

Note that, if α > 0, then∫ +∞

0 tα−1e−tdt <∞.Let Γ(α) =

∫ +∞0 tα−1e−tdt, Γ(α + 1) = Γ(α);

Γ(n) = (n − 1)!;

Γ( 12 ) =

√π.

Gamma distribution is defined the interval [0,+∞). Its pdf is givenby

f (x |α, β) =1

Γ(α)βαxα−1e−x/β, 0 ≤ x <∞, α > 0, β > 0.

E (X ) = 1Γ(α)βα

∫ +∞0 xαe−x/βdx = αβ;

Var(X ) = 1Γ(α)βα

∫ +∞0 xα+1e−x/βdx − (αβ)2 = αβ2.

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 15 / 28

Continuous Distributions

Special cases of Gamma distribution

Chi squared distribution

Let α = p2 , and p ∈ Z+, β = 2, then its pdf is given by

f (x |p) =1

Γ( p2 )2

p2

xp2 −1e−x/2, 0 ≤ x <∞,

which is the chi squared pdf with p degree of freedom. Note thatE (X ) = p,Var(X ) = 2p.

Exponential distribution

If we set α = 1 for Gamma distribution, then its pdf is given by

f (x |p) =1

βe−x/β , 0 ≤ x <∞.

Note that E (X ) = β,Var(X ) = β2.

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 16 / 28

Continuous Distributions

Special cases of Gamma distribution

Chi squared distribution

Let α = p2 , and p ∈ Z+, β = 2, then its pdf is given by

f (x |p) =1

Γ( p2 )2

p2

xp2 −1e−x/2, 0 ≤ x <∞,

which is the chi squared pdf with p degree of freedom. Note thatE (X ) = p,Var(X ) = 2p.

Exponential distribution

If we set α = 1 for Gamma distribution, then its pdf is given by

f (x |p) =1

βe−x/β , 0 ≤ x <∞.

Note that E (X ) = β,Var(X ) = β2.

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 16 / 28

Continuous Distributions

Normal distribution/Gaussian distribution

The pdf of the normal distribution with mean µ and variance σ2,denoted as N(µ, σ2), is given by

f (x |µ, σ2) =1√2πσ

e−(x−µ)2

2σ2 ,−∞ < x <∞.

E (X ) = µ,Var(X ) = σ2;

The standard normal distribution is N(0, 1) with pdf

f (x |0, 1) = 1√2πe−

x2

2 ;

If X ∼ N(µ, σ2), the r.v. Z = X−µσ ∼ N(0, 1);

P(|X − µ| ≤ σ) = P(|Z | ≤ 1) = 0.6826; (1)

P(|X − µ| ≤ 2σ) = P(|Z | ≤ 2) = 0.9544; (2)

P(|X − µ| ≤ 3σ) = P(|Z | ≤ 3) = 0.9974. (3)

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 17 / 28

Continuous Distributions

Normal distribution/Gaussian distribution

The pdf of the normal distribution with mean µ and variance σ2,denoted as N(µ, σ2), is given by

f (x |µ, σ2) =1√2πσ

e−(x−µ)2

2σ2 ,−∞ < x <∞.

E (X ) = µ,Var(X ) = σ2;

The standard normal distribution is N(0, 1) with pdf

f (x |0, 1) = 1√2πe−

x2

2 ;

If X ∼ N(µ, σ2), the r.v. Z = X−µσ ∼ N(0, 1);

P(|X − µ| ≤ σ) = P(|Z | ≤ 1) = 0.6826; (1)

P(|X − µ| ≤ 2σ) = P(|Z | ≤ 2) = 0.9544; (2)

P(|X − µ| ≤ 3σ) = P(|Z | ≤ 3) = 0.9974. (3)

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 17 / 28

Continuous Distributions

Normal distribution/Gaussian distribution

The pdf of the normal distribution with mean µ and variance σ2,denoted as N(µ, σ2), is given by

f (x |µ, σ2) =1√2πσ

e−(x−µ)2

2σ2 ,−∞ < x <∞.

E (X ) = µ,Var(X ) = σ2;

The standard normal distribution is N(0, 1) with pdf

f (x |0, 1) = 1√2πe−

x2

2 ;

If X ∼ N(µ, σ2), the r.v. Z = X−µσ ∼ N(0, 1);

P(|X − µ| ≤ σ) = P(|Z | ≤ 1) = 0.6826; (1)

P(|X − µ| ≤ 2σ) = P(|Z | ≤ 2) = 0.9544; (2)

P(|X − µ| ≤ 3σ) = P(|Z | ≤ 3) = 0.9974. (3)

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 17 / 28

Continuous Distributions

Normal distribution/Gaussian distribution

The pdf of the normal distribution with mean µ and variance σ2,denoted as N(µ, σ2), is given by

f (x |µ, σ2) =1√2πσ

e−(x−µ)2

2σ2 ,−∞ < x <∞.

E (X ) = µ,Var(X ) = σ2;

The standard normal distribution is N(0, 1) with pdf

f (x |0, 1) = 1√2πe−

x2

2 ;

If X ∼ N(µ, σ2), the r.v. Z = X−µσ ∼ N(0, 1);

P(|X − µ| ≤ σ) = P(|Z | ≤ 1) = 0.6826; (1)

P(|X − µ| ≤ 2σ) = P(|Z | ≤ 2) = 0.9544; (2)

P(|X − µ| ≤ 3σ) = P(|Z | ≤ 3) = 0.9974. (3)

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 17 / 28

Continuous Distributions

Normal distribution/Gaussian distribution

The pdf of the normal distribution with mean µ and variance σ2,denoted as N(µ, σ2), is given by

f (x |µ, σ2) =1√2πσ

e−(x−µ)2

2σ2 ,−∞ < x <∞.

E (X ) = µ,Var(X ) = σ2;

The standard normal distribution is N(0, 1) with pdf

f (x |0, 1) = 1√2πe−

x2

2 ;

If X ∼ N(µ, σ2), the r.v. Z = X−µσ ∼ N(0, 1);

P(|X − µ| ≤ σ) = P(|Z | ≤ 1) = 0.6826; (1)

P(|X − µ| ≤ 2σ) = P(|Z | ≤ 2) = 0.9544; (2)

P(|X − µ| ≤ 3σ) = P(|Z | ≤ 3) = 0.9974. (3)

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 17 / 28

Continuous Distributions

Normal approximation

Let X ∼ binomial(25, 0.6). We can approximate X with a normalr.v., Y , with mean µ = 25 × 0.6 = 15 and standard deviation σ =√

25× 0.6(1− 0.6) = 2.45. Thus

P(X ≤ 13) ≈ P(Y ≤ 13) = P(Z ≤ 13− 15

2.45) (4)

= P(Z ≤ −0.82) = 0.206; (5)

P(X ≤ 13) =13∑x=0

(25

x

)0.6x0.425−x = 0.267. (6)

In general, X ∼ binomial(n, p), then E (X ) = np and Var(X ) =np(1− p). We can approximate the distribution with N(np, np(1−p)).

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 18 / 28

Continuous Distributions

Normal approximation

Let X ∼ binomial(25, 0.6). We can approximate X with a normalr.v., Y , with mean µ = 25 × 0.6 = 15 and standard deviation σ =√

25× 0.6(1− 0.6) = 2.45. Thus

P(X ≤ 13) ≈ P(Y ≤ 13) = P(Z ≤ 13− 15

2.45) (4)

= P(Z ≤ −0.82) = 0.206; (5)

P(X ≤ 13) =13∑x=0

(25

x

)0.6x0.425−x = 0.267. (6)

In general, X ∼ binomial(n, p), then E (X ) = np and Var(X ) =np(1− p). We can approximate the distribution with N(np, np(1−p)).

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 18 / 28

Continuous Distributions

Beta distribution

The beta family of distribution is a continuous family on (0, 1) in-dexed by two parameters. The Beta(α, β) pdf is

f (x |α, β) =1

B(α, β)xα−1(1− x)β−1, 0 < x < 1, α > 0, β > 0,

where B(α, β) denotes the beta function,

B(α, β) =

∫ 1

0xα−1(1− x)β−1dx .

B(α, β) = Γ(α)Γ(β)Γ(α+β) .

E (X ) = αα+β , and Var(X ) = αβ

(α+β)2(α+β+1);

E (X n) = Γ(α+n)Γ(α+β)Γ(α+β+n)Γ(α) .

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 19 / 28

Continuous Distributions

Beta distribution

The beta family of distribution is a continuous family on (0, 1) in-dexed by two parameters. The Beta(α, β) pdf is

f (x |α, β) =1

B(α, β)xα−1(1− x)β−1, 0 < x < 1, α > 0, β > 0,

where B(α, β) denotes the beta function,

B(α, β) =

∫ 1

0xα−1(1− x)β−1dx .

B(α, β) = Γ(α)Γ(β)Γ(α+β) .

E (X ) = αα+β , and Var(X ) = αβ

(α+β)2(α+β+1);

E (X n) = Γ(α+n)Γ(α+β)Γ(α+β+n)Γ(α) .

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 19 / 28

Continuous Distributions

Beta distribution

The beta family of distribution is a continuous family on (0, 1) in-dexed by two parameters. The Beta(α, β) pdf is

f (x |α, β) =1

B(α, β)xα−1(1− x)β−1, 0 < x < 1, α > 0, β > 0,

where B(α, β) denotes the beta function,

B(α, β) =

∫ 1

0xα−1(1− x)β−1dx .

B(α, β) = Γ(α)Γ(β)Γ(α+β) .

E (X ) = αα+β , and Var(X ) = αβ

(α+β)2(α+β+1);

E (X n) = Γ(α+n)Γ(α+β)Γ(α+β+n)Γ(α) .

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 19 / 28

Continuous Distributions

Beta distribution

The beta family of distribution is a continuous family on (0, 1) in-dexed by two parameters. The Beta(α, β) pdf is

f (x |α, β) =1

B(α, β)xα−1(1− x)β−1, 0 < x < 1, α > 0, β > 0,

where B(α, β) denotes the beta function,

B(α, β) =

∫ 1

0xα−1(1− x)β−1dx .

B(α, β) = Γ(α)Γ(β)Γ(α+β) .

E (X ) = αα+β , and Var(X ) = αβ

(α+β)2(α+β+1);

E (X n) = Γ(α+n)Γ(α+β)Γ(α+β+n)Γ(α) .

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 19 / 28

Continuous Distributions

Cauchy distribution

The Cauchy distribution is a symmetric, bell-shaped distribution on(−∞,+∞) with pdf

f (x |θ) =1

π

1

1 + (x − θ)2,−∞ < x < +∞,−∞ < θ < +∞.

E |X | =∫ +∞−∞

|x |1+(x−θ)2 dx =∞;

The parameter θ does measure the center of the distribution; itis the median.

P(X ≥ θ) = 12 .

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 20 / 28

Continuous Distributions

Cauchy distribution

The Cauchy distribution is a symmetric, bell-shaped distribution on(−∞,+∞) with pdf

f (x |θ) =1

π

1

1 + (x − θ)2,−∞ < x < +∞,−∞ < θ < +∞.

E |X | =∫ +∞−∞

|x |1+(x−θ)2 dx =∞;

The parameter θ does measure the center of the distribution; itis the median.

P(X ≥ θ) = 12 .

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 20 / 28

Continuous Distributions

Cauchy distribution

The Cauchy distribution is a symmetric, bell-shaped distribution on(−∞,+∞) with pdf

f (x |θ) =1

π

1

1 + (x − θ)2,−∞ < x < +∞,−∞ < θ < +∞.

E |X | =∫ +∞−∞

|x |1+(x−θ)2 dx =∞;

The parameter θ does measure the center of the distribution; itis the median.

P(X ≥ θ) = 12 .

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 20 / 28

Continuous Distributions

Cauchy distribution

The Cauchy distribution is a symmetric, bell-shaped distribution on(−∞,+∞) with pdf

f (x |θ) =1

π

1

1 + (x − θ)2,−∞ < x < +∞,−∞ < θ < +∞.

E |X | =∫ +∞−∞

|x |1+(x−θ)2 dx =∞;

The parameter θ does measure the center of the distribution; itis the median.

P(X ≥ θ) = 12 .

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 20 / 28

Continuous Distributions

Lognormal distribution

If X is a r.v. whose logarithm is normally distributed, that is, logX ∼N(µ, σ2), then X has a lognormal distribution. Its pdf is

f (x |µ, σ2) =1√2πσ

1

xe−

(log x−µ)2

2σ2 , x > 0,−∞ < µ < +∞, σ > 0.

E (X ) = E (e log X ) = eµ+σ2/2;

Var(X ) = e2(µ+σ2) − e2µ+σ2.

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 21 / 28

Continuous Distributions

Lognormal distribution

If X is a r.v. whose logarithm is normally distributed, that is, logX ∼N(µ, σ2), then X has a lognormal distribution. Its pdf is

f (x |µ, σ2) =1√2πσ

1

xe−

(log x−µ)2

2σ2 , x > 0,−∞ < µ < +∞, σ > 0.

E (X ) = E (e log X ) = eµ+σ2/2;

Var(X ) = e2(µ+σ2) − e2µ+σ2.

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 21 / 28

Continuous Distributions

Lognormal distribution

If X is a r.v. whose logarithm is normally distributed, that is, logX ∼N(µ, σ2), then X has a lognormal distribution. Its pdf is

f (x |µ, σ2) =1√2πσ

1

xe−

(log x−µ)2

2σ2 , x > 0,−∞ < µ < +∞, σ > 0.

E (X ) = E (e log X ) = eµ+σ2/2;

Var(X ) = e2(µ+σ2) − e2µ+σ2.

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 21 / 28

Exponential Family

Exponential family

A family of pdfs or pmfs is called an exponential family if it can beexpressed as

f (x |θ) = h(x)c(θ)exp

(∑ki=1 wi (θ)ti (x)

).

Here h(x) ≥ 0 and ti (x) is real-valued function of the observationx , and c(θ) ≥ 0 and wi (θ) is real-valued function of the possiblyvector-valued parameter θ.

To verify that a family of pdfs or pmfs is an exponential family,we must identify the functions h(x), c(θ),wi (x), and ti (x) andshow that the family has the above form.

Bernoulli, Gaussian, Binomial, Poisson, Exponential, Weibull,Laplace, Gamma, Beta, Multinomial, Wishart distributions areall exponential families

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 22 / 28

Exponential Family

Exponential family

A family of pdfs or pmfs is called an exponential family if it can beexpressed as

f (x |θ) = h(x)c(θ)exp

(∑ki=1 wi (θ)ti (x)

).

Here h(x) ≥ 0 and ti (x) is real-valued function of the observationx , and c(θ) ≥ 0 and wi (θ) is real-valued function of the possiblyvector-valued parameter θ.

To verify that a family of pdfs or pmfs is an exponential family,we must identify the functions h(x), c(θ),wi (x), and ti (x) andshow that the family has the above form.

Bernoulli, Gaussian, Binomial, Poisson, Exponential, Weibull,Laplace, Gamma, Beta, Multinomial, Wishart distributions areall exponential families

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 22 / 28

Exponential Family

Exponential family

A family of pdfs or pmfs is called an exponential family if it can beexpressed as

f (x |θ) = h(x)c(θ)exp

(∑ki=1 wi (θ)ti (x)

).

Here h(x) ≥ 0 and ti (x) is real-valued function of the observationx , and c(θ) ≥ 0 and wi (θ) is real-valued function of the possiblyvector-valued parameter θ.

To verify that a family of pdfs or pmfs is an exponential family,we must identify the functions h(x), c(θ),wi (x), and ti (x) andshow that the family has the above form.

Bernoulli, Gaussian, Binomial, Poisson, Exponential, Weibull,Laplace, Gamma, Beta, Multinomial, Wishart distributions areall exponential families

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 22 / 28

Exponential Family

Example: binomial distribution

Let consider the binomial(n, p) family, where n ∈ Z+

f (x |p) =

(n

x

)px(1− p)n−x =

(n

x

)(1− p)n(

p

1− p)x

=

(n

x

)(1− p)nexpx log ( p

1−p ) (7)

Define

h(x) =

{ (nx

), x ∈ [0, n];

0, otherwise., c(p) = (1− p)n (8)

w1(p) = log (p

1− p), t1(x) = x . (9)

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 23 / 28

Exponential Family

Example: binomial distribution

Let consider the binomial(n, p) family, where n ∈ Z+

f (x |p) =

(n

x

)px(1− p)n−x =

(n

x

)(1− p)n(

p

1− p)x

=

(n

x

)(1− p)nexpx log ( p

1−p ) (7)

Define

h(x) =

{ (nx

), x ∈ [0, n];

0, otherwise., c(p) = (1− p)n (8)

w1(p) = log (p

1− p), t1(x) = x . (9)

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 23 / 28

Exponential Family

Example: normal distribution

Let consider the normal family N(µ, σ2)

f (x |µ, σ2) =1√2πσ

e−(x−µ)2

2σ2

=1√2πσ

e−µ2

2σ2 e−x2

2σ2 + µx

σ2 (10)

Define

h(x) = 1, c(θ) = c(µ, σ) =1√2πσ

e−µ2

2σ2

w1(µ, σ) =1

σ2,w1(µ, σ) =

µ

σ2

t1(x) = −x2

2, t2(x) = x . (11)

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 24 / 28

Exponential Family

Example: normal distribution

Let consider the normal family N(µ, σ2)

f (x |µ, σ2) =1√2πσ

e−(x−µ)2

2σ2

=1√2πσ

e−µ2

2σ2 e−x2

2σ2 + µx

σ2 (10)

Define

h(x) = 1, c(θ) = c(µ, σ) =1√2πσ

e−µ2

2σ2

w1(µ, σ) =1

σ2,w1(µ, σ) =

µ

σ2

t1(x) = −x2

2, t2(x) = x . (11)

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 24 / 28

Exponential Family

Example: normal distribution

Let consider the binomial(n, p) family, where n ∈ Z+

f (x |p) =

(n

x

)px(1− p)n−x =

(n

x

)(1− p)n(

p

1− p)x

=

(n

x

)(1− p)nexpx log ( p

1−p ) (12)

Define

h(x) =

{ (nx

), x ∈ [0, n];

0, otherwise., c(p) = (1− p)n (13)

w1(p) = log (p

1− p), t1(x) = x . (14)

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 25 / 28

Exponential Family

Example: normal distribution

Let consider the binomial(n, p) family, where n ∈ Z+

f (x |p) =

(n

x

)px(1− p)n−x =

(n

x

)(1− p)n(

p

1− p)x

=

(n

x

)(1− p)nexpx log ( p

1−p ) (12)

Define

h(x) =

{ (nx

), x ∈ [0, n];

0, otherwise., c(p) = (1− p)n (13)

w1(p) = log (p

1− p), t1(x) = x . (14)

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 25 / 28

Exponential Family

Theorem

If X is a r.v. with pdf or pmf,

E(∑k

i=1∂wi (θ)∂θj

ti (X ))

= − ∂∂θj

log c(θ);

Var(∑k

i=1∂wi (θ)∂θj

ti (X ))

=

− ∂2

∂θ2j

log c(θ)− E(∑k

i=1∂2wi (θ)∂θ2

jti (X )

).

Their advantage is that we can replace integration or summation bydifferentiation, which is often more straightforward.

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 26 / 28

Exponential Family

Theorem

If X is a r.v. with pdf or pmf,

E(∑k

i=1∂wi (θ)∂θj

ti (X ))

= − ∂∂θj

log c(θ);

Var(∑k

i=1∂wi (θ)∂θj

ti (X ))

=

− ∂2

∂θ2j

log c(θ)− E(∑k

i=1∂2wi (θ)∂θ2

jti (X )

).

Their advantage is that we can replace integration or summation bydifferentiation, which is often more straightforward.

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 26 / 28

Exponential Family

Binomial mean and variance

For the binomial distribution, we have

d

dpw1(p) =

d

dplog

p

1− p=

1

p(1− p)(15)

d

dplog c(p) =

d

dpn log 1− p =

−np(1− p)

(16)

Thus, we have

E( X

p(1− p)

)=

n

1− p.

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 27 / 28

Exponential Family

Binomial mean and variance

For the binomial distribution, we have

d

dpw1(p) =

d

dplog

p

1− p=

1

p(1− p)(15)

d

dplog c(p) =

d

dpn log 1− p =

−np(1− p)

(16)

Thus, we have

E( X

p(1− p)

)=

n

1− p.

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 27 / 28

Take-aways

Take-aways

Conclusions

Discrete distributions

Continuous distributions

Exponential family

Location and scale families

Inequalities

MING GAO (DaSE@ECNU) Statistical Inference Mar. 24, 2020 28 / 28