Harvard Government 2000 Lecture 2

8/14/2019 Harvard Government 2000 Lecture 2

1/49

Definitions and Notation

Random Variables and Distributions

Expectation and Transformations

Elementary Asymptotics

Some Important Distributions

Gov2000: Quantitative Methodology forPolitical Science I

Lecture 2: Basic Probability, Random Variables, and some


September 24, 2007

Gov2000: Quantitative Methodology for Political Science I

Definitions and NotationRandom Variables and Distributions




Outline

1 Definitions and NotationWhat is Probability?Notation and DefinitionsMarginal, Joint and Conditional Probability

2 Random Variables and DistributionsWhat is a Random Variable?Discrete and Continuous DistributionsMarginal, Joint, and Conditional Distributions

3 Expectation and TransformationsExpectation and VarianceConditional Expectation and Variance

4 Elementary AsymptoticsConvergence of a SequenceConvergence in ProbabilityConvergence in Distribution

5 Some Important Distributions



2/49






What is Probability?

Notation and Definitions

Marginal, Joint and Conditional Probability

Intuitive Definition

While there are several interpretations of what probability is,

most modern (post 1935 or so) researchers agree on anaxiomatic definitionof probability.

3 Axioms (Intuitive Version):

1 The probability of any particular event must be

non-negative.

2 The probability of anything occurring among all possible

events must be 1.

3 The probability of one of many mutually exclusive events

happening is the sum of the individual probabilities.

The rules of probability can be derived from these axioms.









Subjective Interpretation

Probability is a subjective belief about the likelihood of an event.

Example 1: The probability of drawing 5 red cards out of 10drawn from a deck of cards is whatever you want it to be.

Example 2: The probability of state failure among partial

democracies is whatever you want it to be.

But...

1 If you dont follow the three axioms, a smart bookie can set

up a Dutch book against you.

2 There is a correct way to update your beliefs once youcollect evidence (data).



3/49









Frequency Interpretation

Suppose some process can produce different events (e.g. coin

flip).

Probability of is the relative frequency with which an event

would occur if the process were repeated a large number of

times under similar conditions.

Example 1: The probability of drawing 5 red cards out of

10 drawn from a deck of cards is the frequency with whichthis event occurs in repeated samples of 10 cards.

Example 2: The probability of state failure among partial

democracies is the ...









If you want to explore this debate further, check out this article

in the Stanford Encyclopedia of Philosophy.

http://plato.stanford.edu/entries/probability-interpret/



4/49









Basic Set Theoretic Notation

Let A denote a set. If a is a member of A we write a A.If a1, a2, and a3 are the members of A, we write

A = {a1, a2, a3}.

The empty set is the set with no members.

If A is a subset of B we write A B.For example, if A = {red, blue} and B = {red, blue, green},then A B.









The intersection of two sets A and B is the set containing all

elements that belong to both sets. We write the intersection of

A and B as A

B.

For example, if A = {red, blue} and B = {blue, green}, thenA B = {blue}

The union of two sets A and B is the set that contains the

intersection of A and B, the elements in A that arent in B and

the elements of B that arent in A.

For example, if A = {red, blue} and B = {blue, green}, thenA B = {red, blue, green}



5/49









Sample Spaces

The sample space is the set of all possible outcomes, and is

often written as .For example, if we flip a coin twice, there are four possible

outcomes,

= {heads, heads}, {heads, tails}, {tails, heads}, {tails, tails}









Events

Events are subsets of the sample space.

For Example, if

={heads, heads}, {heads, tails}, {tails, heads}, {tails, tails},

then

{heads, heads}, {heads, tails}, {tails, tails}{heads, tails}

are all events.

If A is an event, then "everything else" in the sample space is

called the compliment of A, and is written as Ac.



6/49









Probability Function

A probability function P() is a function defined over all subsetsof a sample space and that satisfies the three axioms:

1 P(A) 0 for all A in the set of all events.2 P() = 1

3 if events A1, A2, . . . are mutually exclusive then

P(i=1 Ai) =

i=1 P(Ai).









Marginal and Joint Probability

So far we have only considered situations where we are

interested in the probability of a single event A occurring. Weve

denoted this P(A). P(A) is sometimes called a marginalprobability.

Suppose we are now in a situation where we would like to

express the probability that an event A andan event B occur.

This quantity is written as P(A B), P(B A), P(A, B), orP(B, A) and is the joint probability of A and B.



7/49









Conditional Probability

If P(B) > 0 then the probability of A conditional on B can bewritten as

P(A|B) = P(A, B)P(B)

This implies that

P(A, B) = P(B) P(A|B)









For example, if we randomly draw two cards from a standard 52

card deck and define the events A = {King on Draw 1} andB = {King on Draw 2}, then

P(A) = 4/52P(B|A) = 3/51P(A, B) = P(A) P(B|A) = 4/52 3/51

Question: P(B) =?

a) 3/51

b) 4/52c) 4/51

d) not enough information



8/49









Law of Total Probability (LTP)

With 2 Events:

P(B) = P(B, A) + P(B, Ac)

= P(B|A) P(A) + P(B|Ac) P(Ac)

In general, if {Cn : n = 1, 2, 3, . . . } forms a partition of thesample space, then

P(B) = n

P(B

Cn)

=

n

P(B|Cn) P(Cn)









Confirming Intuition with the LTP

P(B) = P(BA) + P(BAc)

= P(B|A) P(A) + P(B|Ac) P(Ac)P(B) = 3/51 1/13 + 4/51 12/13

=3 + 48

51 13 =1

13



9/49









Some other useful rules

P(A B) = P(A) + P(B) P(A B)

Also, If P(A) > 0 and P(B) > 0, then we can write the following.

P(AB) = P(A)P(B|A) = P(B)P(A|B)

P(A|B) = P(A)P(B|A)P(B)

P(A|B) = P(A)P(B|A)P(B|A) P(A) + P(B|Ac) P(Ac)









False Positive Problem

Suppose we have a test for a rare disease (1/100,000) with the

following properties (shown through extensive trials):

P(+ test| disease) = .999 (Sensitivity)P( test| no disease) = .999 (Specificity)

Question: Suppose you receive a positive test, what is the

probability that you have the disease?

a) < 1/3

b) between 1/3 and 2/3c) > 2/3

d) not enough information



10/49


11/49









Coins vs. Cards

A two coin flip thought experiment provides a good example of

independence because the outcome from the first flip doesntaffect the outcome from the second flip. If A = {Heads on flip 1}and B = {Heads on flip 2}, then

P(A, B) = P(A) P(B)

Contrast this with our two card thought experiment. IfA = {King on Draw 1} and B = {King on Draw 2}, then

P(A, B) = P(A)P(B|A) = 1/13 3/51 = P(A)P(B)









Conditional Independence

Intuitive Definition

Events A and B are conditionally independent given C, if

knowing whether C occurred and knowing whether A occurred

provides no information about whether B occurred.

Formal Definition

With P(C) > 0, we can write

P(A, B|C) = P(A, B, C)P(C)

and we say that A is conditionally independent of B given C

(AB|C) if

P(A, B|C) = P(A|C)P(B|C)Gov2000: Quantitative Methodology for Political Science I


12/49









Rain and Sprinklers

Suppose I flip a coin every morning in the Summer. If it comes

up heads, I turn on my sprinkler. I never turn on my sprinkler inFall, Winter, and Spring.

Events:

A = {the sprinkler was on today}

B = {it rained today}

C = {it is Summer}

Question 1: Are A and B independent?

Question 2: Conditional on knowledge of C, are A and B

independent?









Why is the grass wet?

Suppose I flip a coin every morning. If it comes up heads, I turn

on my sprinkler. When I get home from work at night, I turn the

sprinkler off if it is on.Events:

A = {the sprinkler was on today}

B = {it rained today}

C = {the grass is wet}

Question 1: Are A and B independent?

Question 2: Conditional on knowledge of C, are A and B

independent?



13/49






What is a Random Variable?

Discrete and Continuous Distributions

Marginal, Joint, and Conditional Distributions

A random variable X is a function that maps the sample space

to the real numbers.

Returning to our previous example with

={heads, heads}, {heads, tails}, {tails, heads}, {tails, tails}

we could define a random variable X() to be the function thatreturns the number of heads for each element of .

X({heads, heads}) = 2X({heads, tails}) = 1X({tails, heads}) = 1X({tails, tails}) = 0









Discrete Distributions

For discrete distributions, the random variable X takes on a

finite, or a countably infinite number of values.

Example 1: The number of Clinton supporters in a poll of

1,000 likely voters.

Example 2: The number of calls to the Clinton campaign

headquarters on a given day.

A common shorthand is to think of discrete RVs taking on

distinct values.

A probability mass function (pmf) and a cumulativedistribution function (cdf) are two common ways to define

the distribution for a discrete RV.



14/49









Discrete Probability Mass Functions

A probability mass function f(x) of a random variable X is anon-negative function that gives the probability that X = x and

x f(x) = 1.

For example, when X is the number of heads in two coin flips,

f(x) = 1/4 x = 01/2 x = 11/4 x = 2









PMF Plot

q

q

q

0.5 0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.

2

0.

4

0.

6

0.

8

1.

0

x

f(x)



15/49









Discrete Cumulative Distribution Function

A cumulative distribution function F(x) of a random variable Xis a non-decreasing function that gives the probability that

X x.

For example, when X is the number of heads in two coin flips,

F(x) =

0 x < 01/4 0

x < 1

3/4 1 x < 21 2 x









Discrete CDF Plot

q

q

q

0.5 0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.2

0.4

0.6

0.8

1.0

x

F(x)

q

q

q



16/49









Discrete CDF Question

Question: If X = the number of heads in two coin flips, howcan you calculate the probability of X = 1 with the CDF?

a) F(1)

b) F(2)

c) F(1)

F(0)

d) F(2) F(1)









Continuous Distributions

Continuous random variables take on an uncountablyinfinite number of values.

Example: Segal-Cover scores for US Supreme Court

justices

A probability density function (pdf) and a cumulative

distribution function (cdf) are two common ways to define

the distribution for a continuous RV.



17/49









Continuous Probability Density Function

The probability density function f(x) of a continuous random

variable X is the non-negative function that satisfies1 f(x) 0 for all x R2 f(x)dx = 1

For example

f(x) =

1/4 0 < x < 4

0 otherwise

f(x) =

1/4 0 x 4

0 otherwise

Think of densities as infinite data histograms.









0 1 2 3 4

0.

0

0.

2

0.

4

0.

6

0.8

1.

0

x

f(x)



18/49









Continuous Cumulative Distribution Functions

A cumulative distribution function F(x) of a random variable Xis a non-decreasing function that gives the probability that

X x. However, for a continuous RV, the cdf is continuous.

F(x) =

x

f(z)dz

For example,

F(x) =

0 x < 0x/4 0 x < 4

1 4 x









Continuous CDF Plot

0 1 2 3 4

0.0

0.2

0.4

0.6

0.8

1.0

x

F(x)



19/49









Continuous Probability Questions

For the continuous distribution, described by the following pdf

f(x) =

1/4 0 < x < 4

0 otherwise

Question 1: What is the probability that X = 3?

a) 0

b) 1/4

c) 3/4

Question 2: What is the probability that 1 < X < 3?

a) 1/4

b) 2/4

c) 3/4










Just as marginal, joint, and conditional probabilities can be

defined for two arbitrary events A and B; marginal, joint, and

conditional probability distributions can be defined for two

random variables X and Y.



20/49









Discrete Joint Distributions

The joint mass function fX,Y(x, y) of two discrete random

variables X and Y is the function that gives the probability thatX = x and Y = y for all x and y.

Example:

Y

1 2 3

1 0.22 0.04 0.09 0.35

X 2 0.15 0.10 0.20 0.45

3 0.01 0.07 0.12 0.20

0.38 0.21 0.41 1.00









Continuous Joint Distributions

The joint density function fX,Y(x, y) of two continuous randomvariables X and Y is the function that gives the density height

where X = x and Y = y for all x and y.

0.0 0.2 0.4 0.6 0.8 1.0

0.

0

0.2

0.

4

0.

6

0.

8

1.

0

x

y



21/49









Continuous Joint Distributions

The joint density function fX,Y(x, y) of two discrete random

variables X and Y is the function that gives the density heightwhere X = x and Y = y for all x and y.

x

y

f(x,y)









Discrete Marginal Distributions

The marginal mass function fX(x) of a discrete random variableX gives the probability that X = x for all x, and can becalculated from the joint probability function fX,Y(x, y) of X andY according to

fX(x) =

y

fX,Y(x, y).

Y

1 2 3

1 0.22 0.04 0.09 0.35X 2 0.15 0.10 0.20 0.45

3 0.01 0.07 0.12 0.20

0.38 0.21 0.41 1.00



22/49









Continuous Marginal Distributions

The marginal density function fX(x) of a continuous random

variable X gives the density height that X = x for all x, and canbe calculated from the joint density function fX,Y(x, y) of X andY according to

fX(x) =

fX,Y(x, y)dy.

x

y

f(x,y)

0.0 0.2 0.4 0.6 0.8 1.0

0.

36

0.

37

0.

38

0.

39

0.

40

x

f(x)









Conditional Discrete Distributions

The conditional mass function fX|Y(x|y) of two discrete randomvariables gives the probability that X = x given the fact thatY

=y for all all values of x and y and is given by:

fX|Y(x|y) =fX,Y(x, y)

fY(y)

where it is assumed that fY(y) > 0. It follows that

fX,Y(x, y) = fX|Y(x|y)fY(y),

fY(y) =fX,Y(x, y)

fX|Y(x|y).



23/49









Table: Joint and Marginal Probabilities

Y

1 2 31 0.22 0.04 0.09 0.35

X 2 0.15 0.10 0.20 0.45

3 0.01 0.07 0.12 0.20

0.38 0.21 0.41 1.00

Table: Conditional f(x|y) Probabilities

Y1 2 3

1 0.58 0.19 0.22

X 2 0.39 0.48 0.49

3 0.03 0.33 0.29

1.00 1.00 1.00









Conditional Continuous Distributions

The conditional density function fY|X

(y|x) when Y is a

continuous random variable gives the density height for Y = ygiven the fact that X = x for all all values of x and y and isgiven by:

fY|X(y|x) =fY,X(y, x)

fX(x)

where it is assumed that fX(x) > 0.



24/49










0.0 0.2 0.4 0.6 0.8 1.0

0.

0

0

.2

0.

4

0.6

0.

8

1.0

Joint Density

x

y

0.0 0.2 0.4 0.6 0.8 1.0

0.

0

0

.2

0.

4

0.6

0.

8

1.0

Conditional Density

x

y










x

y

f(x

,y)

Joint Density

x

y

f(y|

x)

Conditional Density



25/49









Conditional Densities- Discrete X

3 2 1 0 1 2 3 4

0.

0

0.2

0

.4

Marginal Density

y

f(y)

3 2 1 0 1 2 3 4

0.

0

0.

2

0.

4

Conditional Density X=1

y

f(y|x)

3 2 1 0 1 2 3 4

0.

0

0.2

0.

4


y

f(y|x)






Expectation and Variance

Conditional Expectation and Variance

Expectation

The expected value of a random variable X is denoted by E[X]and is a measure of central tendency of X. Roughly speaking,

an expected value is like a weighted average.The expected value of a discrete random variable X is defined

as

E[X] =all x

xfX(x).

The expected value of a continuous random variable X is

defined as

E[X] =

xfX(x)dx.



26/49








An example will make this more clear. Suppose X is a discrete

random variable that can take values of 0, 1, and 2. The

probability function of X is given by:

fX(x) =

0.20 if x = 0

0.45 if x = 1

0.35 if x = 2

The expected value of X is:

E[X] = 0 fX(0) + 1 fX(1) + 2 fX(2)= 0 0.20 + 1 0.45 + 2 0.35= 1.15








Interpreting Discrete Expected Value

The expected value for a discrete random variable is the

balance point of the mass function.

q

q

q

0.5 0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.

2

0.

4

0.

6

0.

8

1.

0

x

f(x)



27/49








Interpreting Continuous Expected Value

The expected value for a continuous random variable is the

balance point of the density function.

0 2 4 6 8 10 12

0.

00

0

.05

0.

10

0.

15

x

f(x)








Sample Mean as an Expected Value

Let x1, . . . , xn be our sample. Then the sample mean is definedas the following

x =1

n

ni=1

xi

This can be re-written in the following form:

x =n

i=1

xi 1

n

Note how this resembles the definition of discrete expected

value.



28/49








Example

2 3 4 5 6

0.

0

0.

5

1.

0

1.

5

2.

0

2.

5

3.

0








Example

2 3 4 5 6 7

0.

0

0.

5

1.

0

1.

5

2.

0

2.

5

3.

0



29/49


30/49








Useful Properties of Expected Values

Suppose we have k random variables X1, . . . , Xk. If E[Xi] existsfor all i = 1, . . . , k, then

E

k

i=1

Xi

= E[X1] + + E[Xk]

If two random variables X and Y are independent and have

finite expectations then

E[XY] = E[X]E[Y]








Suppose aand b are constants and X is a random variable.

Then

E[aX] = aE[X]

E[b] = b

E[aX + b] = aE[X] + b



31/49








Expectation Question

Question: If X1, . . . , Xn are random variables withE[X1] = , ..., E[Xn] = , what is the expected value ofXn =

1n(X1 + . . . + Xn)?

a) nb) n

c)








Variance

The expected value of a function of the random variable X

(g(X))is denoted by E[g(X)] and is a measure of central

tendency of g(X).The variance is a special case of this and the variance of a

random variable X (a measure of its dispersion) is given by

V[X] = E[(X E[X])2]= E[X2 2E[X]X + E[X]2]= E[X2] 2E[X]2 + E[X]2= E[X2] E[X]2



32/49








For a discrete random variable X

V[X] =all x

(x E[X])2fX(x)

For a continuous random variable X

V[X] =

(x E[X])2fX(x)dx








Physical Interpretation of Variance

6 2 2 4 6

0.

0

0.

1

0.

2

0.

3

0.

4

x

f(x)

6 2 2 4 6

0.

00

0.

05

0.

10

0.

15

0.

20

x

f(x)



33/49








Sample Variance

The sample variance is usually written in one of two ways:1 1

n

ni=1(xi x)2

2 1n1

ni=1(xi x)2

The first option can be re-written in the following form.

n

i=1

(xi

x)2(1

n)

Notice how this relates to the discrete definition of variance.








Physical Interpretation of Sample Variance

2 3 4 5 6

0.

0

0.

5

1.

0

1.

5

2.

0

2.

5

3.

0



34/49









2 3 4 5 6

0.

0

0.

5

1.

0

1.

5

2.

0

2.

5

3.

0









2 3 4 5 6

0.

0

0.

5

1.

0

1.

5

2.

0

2.

5

3.

0



35/49








Useful Properties of Variances

If X1, . . . , Xn are independent random variables and c1, . . . , cn+1are arbitrary constants then

V[c1X1 + + cnXn + cn+1] = c21 V[X1] + + c2nV[Xn]








Variance Question

Question: If X1, . . . , Xn are i.i.d. random variables withV[X1] = 2,..., V[Xn] =

2, what is the variance of

Xn =1n(X1 + . . . + Xn)?

a) 2

n

b) n2

c) 2



36/49








Conditional Expectation

The concept of conditional expectation is fundamental to

regression analysis.

Suppose we have two RVs X and Y that have some bivariate

distribution.

The conditional expectation of Y given X = x (denoted E[Y|x])is the expected value of Y under the conditional distribution of

Y given X = x.








In the discrete case:

E[Y|x] =

y

yfY|X(y|x)

In the continuous case:

E[Y|x] =

yfY|X(y|x)dy

Similar definitions apply to the case of multiple conditioning

variables.

E[Y|x] is a function of x (realized values of X) and can beinterpreted as the balance point for the conditional distribution.



37/49








Conditional Expectation - X discrete

3 2 1 0 1 2 3 4

0.

0

0.

2

0

.4

Marginal Density

y

f(y)

3 2 1 0 1 2 3 4

0.

0

0.2

0.

4


y

f(y|x)

3 2 1 0 1 2 3 4

0.

0

0.2

0.

4


y

f(y|x)








Conditional Expectation - X continuous

0.0 0.2 0.4 0.6 0.8 1.0

0.

0

0.2

0.

4

0.

6

0.8

1.

0

E[X],E[Y]

x

y q

0.0 0.2 0.4 0.6 0.8 1.0

0.

0

0.2

0.

4

0.

6

0.8

1.

0

E[Y|X]

x

y



38/49








Conditional Variance

Likewise, we can define the conditional varianceof Y given

X = x (denoted V[Y|x]) to be the variance of Y under theconditional distribution of Y given X = x.

In the discrete case:

V[Y|x] =

y

(y E[Y|x])2fY|X(y|x)

In the continuous case:

V[Y|x] =

(y E[Y|x])2fY|X(y|x)dy








Conditional Variance - X discrete

3 2 1 0 1 2 3 4

0

.0

0.2

0.

4

Marginal Density

y

f(y)

3 2 1 0 1 2 3 4

0.

0

0.

2

0.4


y

f(y|x)

3 2 1 0 1 2 3 4

0.

0

0.

2

0.4


y

f(y|x)



39/49






Convergence of a Sequence

Convergence in Probability

Convergence in Distribution

Definition: Convergent Sequences of Real Numbers

A sequence of real numbers cn is said to converge to c if for

every > 0 there exists an integer N such that for n N,|cn c| < .

We will write this as

cn c









Example

If cn is 1 + 1/n, then cn 1.

q

q

q

qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq

0 20 40 60 80 100

1.

0

1.

2

1.

4

1.

6

1.

8

2.

0

n

cn



40/49









Definition: Convergence in Probability

We say that a sequence of random variables Xn converges in

probability to a real number if for every > 0

P(|Xn | > ) 0 as n


Xnp









Example: The Weak Law of Large Numbers

If X1, X2, . . . , Xn, . . . are i.i.d. with < E[X1] = < , thenXnp

0 1 2 3 4

0.0

5

0.1

0

0.1

5

0.2

0

0.2

5

0.3

0

0.3

5

0.4

0

n = 1

Xn

n

0 1 2 3 4

0.0

0.2

0.4

0.6

0.8

1.0

1.2

n = 10

Xn

n

0 1 2 3 4

0

1

2

3

4

n = 100

Xn

n



41/49









Convergence Question

Question: Does Xn appear to be converging in probability to 2?

0 1 2 3 4

0.0

0.1

0.2

0.3

0.4

n = 1

Xn

n

0 1 2 3 4

0.0

0.2

0.4

0.6

0.8

1.0

1.2

n = 10

Xn

n

0 1 2 3 4

0

1

2

3

4

n = 100

Xn

n









Definition: Convergence in Distribution

We say that a sequence of random variables Xn converges in

distribution to a random variable X if the cumulative distributionfunctions Fn and F of Xn and X satisfy the following

Fn(x) F(x) as n for each continuity point x of F


Xnd X



42/49









The Classical Central Limit Theorem

If X1, X2, . . . , Xn, . . . are i.i.d. with E[X1] = and V[X1] = 2

and E|X|2

< , then n(Xn ) d N(0, 2

)

0 2 4 6 8

0.0

0.1

0.2

0.3

0.4

0.5

n = 1

Xn

n

0 2 4 6 8

0.0

0.1

0.2

0.3

0.4

0.5

0.6

n = 10

Xn

n

0 2 4 6 8

0.0

0.5

1.0

1.5

2.0

n = 100

Xn

n






The Univariate Normal Distribution

The univariate normal (Gaussian) probability density function is

given by

fN(x|, 2) = 12

exp 1

22(x )2

4 2 0 2 4

0.0

0

.5

1.0

1.5

2.0

x

Density

N(0,1)N(2, 1)

N(0, .25)



43/49






Some facts about the univariate normal distribution:

The normal distribution with mean 0 and variance 1 is

called the standard normaldistribution

If a large random sample is taken from any distribution with

finite variance the sampling distribution of the sample

mean will be approximately normal

If a sample (X1, . . . , Xn) of any size n is taken from anormal distribution with known variance then the sampling

distribution of the sample mean will be normal with mean

E[X] and variance V[X]/n

A linear function of a normal RV is itself a normal RVThe R functions rnorm(), dnorm(), and pnorm()

calculate pseudo-random normal deviates, the normal

density function, and the normal distribution function

respectively.






The Multivariate Normal Distribution

The d-variate normal density function is given by

fN(x|,) = (2)d/2||1/2 exp1

2(x )1(x )

Here x and are vectors of length d and is a d dpositive-definite matrix. The mean of x is and the

variance-covariance matrix of x is .



44/49






The Chi-Square Distribution

The chi-square probability density function is given by

f2 (x|) =2(/2)

(/2)x(/21) exp(x/2) for x > 0.

where (z) =

0 tz1 exp[t]dt (if z is an integer then

(z) = (z 1)!).The mean of a chi-square random variable is , its variance is2, and (when 2) its modal value is 2.The parameter is referred to as the degrees of freedom.






0 10 20 30 40

0.0

0.1

0.2

0.3

0.4

0.5

x

Density

chisquare 1chisquare 4

chisquare 15



45/49






Some facts about the chi-square distribution:

The chi-square distribution is important because the

asymptotic sampling distribution of many test statistics willbe chi-square.

If the random variables X1, . . . , Xk are i.i.d. and if each ofthese variables has a standard normal distribution, then

the sum of squares X21 + + X2k has a chi-squaredistribution with k degrees of freedom.

If the random variables X1, . . . , Xk are independent and if

Xi follows a chi-square distribution with i degrees offreedom for i = 1, . . . , k then the sum X1 + + Xk has achi-square distribution with 1 + + k degrees offreedom.






If a sample (X1, . . . , Xn) of any size n is taken from anormal distribution then the random variable

1V[X]

ni=1

(Xi Xn)2

follows a chi-square distribution with n 1 degrees offreedom.

The R functions rchisq(), dchisq(), and pchisq()

calculate pseudo-random chi-square deviates, the

chi-square density function, and the chi-square distributionfunction respectively.



46/49






The t Distribution

The t probability density function is given by

ft(x|) = ((+ 1)/2)(/2)

11 + x

2

(+1)/2The mean of a t random variable is 0 and its variance is

/(

2) as long as > 2.

The mean of a t1 RV does not exist.






4 2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

0.6

x

Density

t 1

t 4

t 15



47/49






Some facts about the t distribution:

The t distribution can be motivated as follows. If

Z N(0, 1), Y 2, and Z and Y are independent, then

X ZY

follows a t distribution.

If a sample (X1, . . . , Xn) of any size n is taken from a

normal distribution with zero mean and unknown variancethen the sampling distribution of the sample mean divided

by the sample standard error will have the t distribution

with = n 1.






The sampling distribution of regression coefficients (after

some standardization) can be shown to follow at-distribution.

As the t distribution approaches theN(0, 1)distribution.

The R functions rt(), dt(), and pt() calculate

pseudo-random t deviates, the t density function, and the t

distribution function respectively.



48/49






The FDistribution

The Fdensity is given by:

fF =((1 + 2)/2

(1/2)(2/2)(1/2)

1/2 x(12)/2

1 +12

x

(1+2)/2

1 is sometimes called the numerator degrees of freedomand

2 is sometimes called the denominator degrees of freedom.






0.0 0.5 1.0 1.5 2.0

0

1

2

3

4

x

Density

F 1,2

F 5,5F 30, 20

F 500, 200



49/49






Some facts about the Fdistribution:if X1 and X2 are independent chi-square RVs with 1 and

2 degrees of freedom respectively then (X1/1)/(X2/2)follows an Fdistribution with 1 numerator df and 2denominator df.

If X follows a t distribution with df, then X2 follows an Fdistribution with 1 numerator df and denominator df.

The Fdistribution will be useful for testing hypothesesabout multiple regression coefficients.

The R functions rf(), df(), and pf() calculate

pseudo-random Fdeviates, the Fdensity function, andthe Fdistribution function respectively.


Harvard Government 2000 Lecture 2

Documents

Transcript of Harvard Government 2000 Lecture 2