BS2247 Introduction to Econometrics Lecture 2...

30
BS2247 Introduction to Econometrics Lecture 2: Fundamentals of Probability Dr. Kai Sun Aston Business School 1 / 30

Transcript of BS2247 Introduction to Econometrics Lecture 2...

Page 1: BS2247 Introduction to Econometrics Lecture 2 ...bingweb.binghamton.edu/~ksun1/teaching_files/Lecture 2 handout.pdfFor example, P(16 ≤ wage ≤ 18) = 0.1 says that the probability

BS2247 Introduction to Econometrics

Lecture 2: Fundamentals of Probability

Dr. Kai Sun

Aston Business School

1 / 30

Page 2: BS2247 Introduction to Econometrics Lecture 2 ...bingweb.binghamton.edu/~ksun1/teaching_files/Lecture 2 handout.pdfFor example, P(16 ≤ wage ≤ 18) = 0.1 says that the probability

Why do we care about this topic?

◮ Economic variables (e.g., education, wage, etc.) are random

variables, in the sense that each observation (i.e., realization)

is a random draw from the entire population.

◮ Each random variable has a probability measure. Roughly

speaking, a probability measure is a function which maps from

the occurring of an event (e.g., realization of a random

variable) to the probability of the occurring of the event.

2 / 30

Page 3: BS2247 Introduction to Econometrics Lecture 2 ...bingweb.binghamton.edu/~ksun1/teaching_files/Lecture 2 handout.pdfFor example, P(16 ≤ wage ≤ 18) = 0.1 says that the probability

For example, the probability measure can tell us that:

The probability of, say, educ = 12 years (i.e., realization of a

random variable, education) is, say, 0.2.

This is the same as saying that 20% of the observations in the

sample have educ = 12 years.

3 / 30

Page 4: BS2247 Introduction to Econometrics Lecture 2 ...bingweb.binghamton.edu/~ksun1/teaching_files/Lecture 2 handout.pdfFor example, P(16 ≤ wage ≤ 18) = 0.1 says that the probability

Discrete random variables

◮ Random variables that take on only a finite number of values

◮ For example, consider tossing a single coin.

The two outcomes/events are heads and tails.

◮ A discrete random variable then can be defined as:

x = 1 if the coin turns up heads,

x = 0 if the coin turns up tails.

By tossing the coin a number of times, we can calculate the

probability of x = 1 and x = 0.

4 / 30

Page 5: BS2247 Introduction to Econometrics Lecture 2 ...bingweb.binghamton.edu/~ksun1/teaching_files/Lecture 2 handout.pdfFor example, P(16 ≤ wage ≤ 18) = 0.1 says that the probability

Probability Density Function (pdf)

◮ Generally, if a discrete random variable, X , takes on the n

possible values {x1, . . . , xn}, then the probability measure is

pi = P(X = xi), i = 1, 2, . . . , n,

where 0 ≤ pi ≤ 1 and∑

i pi = 1.

◮ P(·) is also called probability density function (pdf) of X .

“The probability of X = xi is equal to pi”.

5 / 30

Page 6: BS2247 Introduction to Econometrics Lecture 2 ...bingweb.binghamton.edu/~ksun1/teaching_files/Lecture 2 handout.pdfFor example, P(16 ≤ wage ≤ 18) = 0.1 says that the probability

The pdf of heads and tails from the coin-tossing example

This is essentially a histogram!

6 / 30

Page 7: BS2247 Introduction to Econometrics Lecture 2 ...bingweb.binghamton.edu/~ksun1/teaching_files/Lecture 2 handout.pdfFor example, P(16 ≤ wage ≤ 18) = 0.1 says that the probability

Continuous random variables

◮ They are random variables that take on numerous values.

◮ For example, wage should be a continuous random variable.

◮ In practice, education can also be considered as continuous.

However, in theory, education may be discrete.

7 / 30

Page 8: BS2247 Introduction to Econometrics Lecture 2 ...bingweb.binghamton.edu/~ksun1/teaching_files/Lecture 2 handout.pdfFor example, P(16 ≤ wage ≤ 18) = 0.1 says that the probability

Continuous random variables

◮ The pdf of continuous random variable computes the

probability of events (i.e., realizations) involving a range of

values (not a particular value!).

◮ P(a ≤ X ≤ b) measures

the probability that X ranges from a to b.

◮ For example, P(16 ≤ wage ≤ 18) = 0.1 says that the

probability that wage ranges from 16 to 18 is 0.1.

This is the same as saying that 10% of the observations in the

sample have 16/hour ≤ wage ≤ 18/hour .

8 / 30

Page 9: BS2247 Introduction to Econometrics Lecture 2 ...bingweb.binghamton.edu/~ksun1/teaching_files/Lecture 2 handout.pdfFor example, P(16 ≤ wage ≤ 18) = 0.1 says that the probability

Use Histogram to illustrate the pdf of wage

Histogram of wage

wage

pdf o

f wag

e

0 10 20 30

0.00

0.02

0.04

0.06

0.08

0.10

9 / 30

Page 10: BS2247 Introduction to Econometrics Lecture 2 ...bingweb.binghamton.edu/~ksun1/teaching_files/Lecture 2 handout.pdfFor example, P(16 ≤ wage ≤ 18) = 0.1 says that the probability

Use Density plot to illustrate the pdf of wage

Density of wage

wage

pdf o

f wag

e

0 10 20 30

0.00

0.02

0.04

0.06

0.08

0.10

10 / 30

Page 11: BS2247 Introduction to Econometrics Lecture 2 ...bingweb.binghamton.edu/~ksun1/teaching_files/Lecture 2 handout.pdfFor example, P(16 ≤ wage ≤ 18) = 0.1 says that the probability

Normal Distribution

0 10 20 30

0.00

0.02

0.04

0.06

0.08

0.10

Density of wage

wage

pdf o

f wag

e

11 / 30

Page 12: BS2247 Introduction to Econometrics Lecture 2 ...bingweb.binghamton.edu/~ksun1/teaching_files/Lecture 2 handout.pdfFor example, P(16 ≤ wage ≤ 18) = 0.1 says that the probability

Other Distributions

12 / 30

Page 13: BS2247 Introduction to Econometrics Lecture 2 ...bingweb.binghamton.edu/~ksun1/teaching_files/Lecture 2 handout.pdfFor example, P(16 ≤ wage ≤ 18) = 0.1 says that the probability

Cumulative Distribution Function (cdf)

◮ Sometimes it’s easier to work with cumulative distribution

function (cdf), defined as

F (x) = P(X ≤ x)

where X is a random variable (either discrete or continuous),

and x is any real number.

◮ For continuous random variable,

F (x) is the area under the pdf, to the left of the point x .

13 / 30

Page 14: BS2247 Introduction to Econometrics Lecture 2 ...bingweb.binghamton.edu/~ksun1/teaching_files/Lecture 2 handout.pdfFor example, P(16 ≤ wage ≤ 18) = 0.1 says that the probability

Cumulative Distribution Function (cdf)

Properties of cdf:

(1) P(X > c) = 1 − F (c), for any number c

(2) P(a < X ≤ b) = F (b) − F (a)

(3) for continuous random variable, any of the above inequality

can become strict inequality, and vice versa

From the previous example, if P(16 ≤ wage ≤ 18) = 0.1, then

F (18) − F (16) = 0.1, where F is the cdf of wage.

14 / 30

Page 15: BS2247 Introduction to Econometrics Lecture 2 ...bingweb.binghamton.edu/~ksun1/teaching_files/Lecture 2 handout.pdfFor example, P(16 ≤ wage ≤ 18) = 0.1 says that the probability

Features of Random Variables

Expected value (population mean)

◮ It is a weighted average of all possible values of X .

The weights are determined by pdf.

◮ Precisely, if X is discrete and can take values {x1, . . . , xn}

with pdf pi = P(X = xi ), then the expected value of X is

E (X ) = x1p1 + · · · + xnpn =∑

i xipi

(where xi are the realizations, and pi are the weights)

15 / 30

Page 16: BS2247 Introduction to Econometrics Lecture 2 ...bingweb.binghamton.edu/~ksun1/teaching_files/Lecture 2 handout.pdfFor example, P(16 ≤ wage ≤ 18) = 0.1 says that the probability

Example

Question: X = {4, 12, 2, 6}, P(X = 4) = 0.1, P(X = 12) = 0.2,

P(X = 2) = 0.5, P(X = 6) = 0.2, calculate E (X ) and E (X 2).

Answer:

E (X ) = 4×0.1+12×0.2+2×0.5+6×0.2 = 0.4+2.4+1+1.2 = 5.

E (X 2) = 42 × 0.1 + 122 × 0.2 + 22 × 0.5 + 62 × 0.2 =

1.6 + 28.8 + 2 + 7.2 = 39.6.

16 / 30

Page 17: BS2247 Introduction to Econometrics Lecture 2 ...bingweb.binghamton.edu/~ksun1/teaching_files/Lecture 2 handout.pdfFor example, P(16 ≤ wage ≤ 18) = 0.1 says that the probability

Features of Random Variables

Properties of Expected values

(1) E (c) = c , where c is constant (not random!)

(2) E (aX + b) = aE (X ) + b, where a and b are constants

(3) E (∑

i aixi ) =∑

i aiE (xi )

(the expectation of summation is the summation of expectation)

17 / 30

Page 18: BS2247 Introduction to Econometrics Lecture 2 ...bingweb.binghamton.edu/~ksun1/teaching_files/Lecture 2 handout.pdfFor example, P(16 ≤ wage ≤ 18) = 0.1 says that the probability

Features of Random Variables

Median

It is the value in the middle of an ordered sequence of realizations

of a random variable.

Example

Question: X = {4, 12, 2, 6}, find the median of X .

Answer: X = {2, 4, 6, 12}, taking the average of the two numbers

in the middle gives (4 + 6)/2 = 5.

18 / 30

Page 19: BS2247 Introduction to Econometrics Lecture 2 ...bingweb.binghamton.edu/~ksun1/teaching_files/Lecture 2 handout.pdfFor example, P(16 ≤ wage ≤ 18) = 0.1 says that the probability

Features of Random Variables

Variance: measuring spread of pdf

Var(X ) = E (X − E (X ))2 = E (X 2) − (E (X ))2

Properties of variance

(1) Var(aX + b) = a2Var(X )

(2) Var(aX ± bY ) = a2Var(X ) + b2Var(Y ) ± 2abCov(X ,Y )

(where a and b are constants, X and Y are random)

(3)* Var(∑

i aixi) =∑

i a2i Var(xi ) if Cov(xi , xj) = 0 ∀i 6= j

(where ai are constants, xi are random)

19 / 30

Page 20: BS2247 Introduction to Econometrics Lecture 2 ...bingweb.binghamton.edu/~ksun1/teaching_files/Lecture 2 handout.pdfFor example, P(16 ≤ wage ≤ 18) = 0.1 says that the probability

Standard deviation (sd) is the squared root of variance.

Property of Standard deviation:

sd(aX + b) = |a|sd(x)

20 / 30

Page 21: BS2247 Introduction to Econometrics Lecture 2 ...bingweb.binghamton.edu/~ksun1/teaching_files/Lecture 2 handout.pdfFor example, P(16 ≤ wage ≤ 18) = 0.1 says that the probability

Example

Question: X = {4, 12, 2, 6}, P(X = 4) = 0.1, P(X = 12) = 0.2,

P(X = 2) = 0.5, P(X = 6) = 0.2, find the variance and standard

deviation of X .

Answer: Var(X ) = E (X 2) − (E (X ))2. We calculated that

E (X 2) = 39.6 and E (X ) = 5, so Var(X ) = 39.6 − 52 = 14.6.

sd(X ) =√

Var(X ) = 3.82.

21 / 30

Page 22: BS2247 Introduction to Econometrics Lecture 2 ...bingweb.binghamton.edu/~ksun1/teaching_files/Lecture 2 handout.pdfFor example, P(16 ≤ wage ≤ 18) = 0.1 says that the probability

Features of Random Variables

Covariance: measuring association of two random variables

Cov(X ,Y ) = E (X − E (X ))(Y − E (Y )) = E (XY ) − E (X )E (Y )

Properties of covariance

(1) Cov(a1X + b1, a2Y + b2) = a1a2Cov(X ,Y )

(2) If X and Y are independent, then Cov(X ,Y ) = 0

22 / 30

Page 23: BS2247 Introduction to Econometrics Lecture 2 ...bingweb.binghamton.edu/~ksun1/teaching_files/Lecture 2 handout.pdfFor example, P(16 ≤ wage ≤ 18) = 0.1 says that the probability

Features of Random Variables

Correlation coefficient:

measuring association of two random variables

It is the standardized covariance, in the sense that

Corr(X ,Y ) = Cov(X ,Y )sd(X )sd(Y )

Properties of Correlation coefficient

(1) −1 ≤ Corr(X ,Y ) ≤ 1

(2) Corr(X ,Y ) = 0 ⇐⇒ Cov(X ,Y ) = 0

(3) Corr(a1X + b1, a2Y + b2) = Corr(X ,Y )

23 / 30

Page 24: BS2247 Introduction to Econometrics Lecture 2 ...bingweb.binghamton.edu/~ksun1/teaching_files/Lecture 2 handout.pdfFor example, P(16 ≤ wage ≤ 18) = 0.1 says that the probability

Standardizing a Random Variable

◮ We usually write X ∼ (µ, σ2),

where µ is the mean of X , E (X ); and σ2 is the variance of X .

Read as “a random variable X is distributed as mean µ and

variance σ2”

◮ If we define a new random variable Z = X−µ

σ,

we can find that E (Z ) = 0, and Var(Z ) = 1.

So Z ∼ (0, 1), is called a standardized random variable.

◮ Continue with the previous example, E (X ) = 5, and

Var(X ) = 14.6, then Z = (X − 5)/√

14.6 is a standardized

random variable. 24 / 30

Page 25: BS2247 Introduction to Econometrics Lecture 2 ...bingweb.binghamton.edu/~ksun1/teaching_files/Lecture 2 handout.pdfFor example, P(16 ≤ wage ≤ 18) = 0.1 says that the probability

Conditional Expectation E (Y |x)

◮ Read as “(conditional) expectation of Y given x”

◮ Intuitively, this is E (Y ) given a particular value of x .

◮ For example, E (wage|educ = 12) is the average wage for all

people with 12 years of education.

So E (wage|educ) is usually a function of educ , say,

E (wage|educ) = 1.05 + 0.45educ .

From this example,

E (wage|educ = 12) = 1.05 + 0.45 × 12 = 6.45 pounds/hour.

25 / 30

Page 26: BS2247 Introduction to Econometrics Lecture 2 ...bingweb.binghamton.edu/~ksun1/teaching_files/Lecture 2 handout.pdfFor example, P(16 ≤ wage ≤ 18) = 0.1 says that the probability

◮ P(Y |x) is the conditional probability density function (pdf) of

Y given x .

◮ P(wage|educ = 12) is the proportion of people in the

population with 12 years of education.

So P(16 ≤ wage ≤ 18|educ = 12) = 0.1 means that,

for those with 12 years of education, 10% of them have

16/hour ≤ wage ≤ 18/hour .

26 / 30

Page 27: BS2247 Introduction to Econometrics Lecture 2 ...bingweb.binghamton.edu/~ksun1/teaching_files/Lecture 2 handout.pdfFor example, P(16 ≤ wage ≤ 18) = 0.1 says that the probability

Conditional Expectation E (Y |x)

If Y is discrete and can take values y1, . . . , ym with conditional pdf

pj = P(Y = yj |x), then the conditional expectation of Y given x is

E (Y |x) = y1p1 + · · · + ympm =∑

j yjpj

(where yj are the realizations, and pj are the weights)

27 / 30

Page 28: BS2247 Introduction to Econometrics Lecture 2 ...bingweb.binghamton.edu/~ksun1/teaching_files/Lecture 2 handout.pdfFor example, P(16 ≤ wage ≤ 18) = 0.1 says that the probability

Example

Question: Y = {4, 12, 2, 6}, P(Y = 4|x = 1) = 0.1,

P(Y = 12|x = 1) = 0.2, P(Y = 2|x = 1) = 0.5,

P(Y = 6|x = 1) = 0.2, calculate E (Y |x = 1)

Answer:

E (Y |x = 1) = 4 × 0.1 + 12 × 0.2 + 2 × 0.5 + 6 × 0.2 =

0.4 + 2.4 + 1 + 1.2 = 5.

28 / 30

Page 29: BS2247 Introduction to Econometrics Lecture 2 ...bingweb.binghamton.edu/~ksun1/teaching_files/Lecture 2 handout.pdfFor example, P(16 ≤ wage ≤ 18) = 0.1 says that the probability

X and Y are random variables.

Properties of Conditional Expectation:

(1) E [a(X )Y + b(X )|X ] = a(X )E (Y |X ) + b(X )

“we know X , but we don’t know Y , and hence E (Y |X )”

“we know X , and so we know functions of X , a(X ) and b(X )”

(2) E [E (Y |X )] = E (Y ): law of iterated expectation

“the average of average of Y given X is the same as the simple

average of Y ”

(3) If E (Y |X ) = E (Y ), then Cov(X ,Y ) = 0, and Corr(X ,Y ) = 0

“if knowing X doesn’t help to know Y , then X and Y are

uncorrelated”

29 / 30

Page 30: BS2247 Introduction to Econometrics Lecture 2 ...bingweb.binghamton.edu/~ksun1/teaching_files/Lecture 2 handout.pdfFor example, P(16 ≤ wage ≤ 18) = 0.1 says that the probability

Reading

Appendix B, Introductory Econometrics - A Modern Approach,

4th Edition, J. Wooldridge

30 / 30