BS2247 Introduction to Econometrics Lecture 2...
Transcript of BS2247 Introduction to Econometrics Lecture 2...
BS2247 Introduction to Econometrics
Lecture 2: Fundamentals of Probability
Dr. Kai Sun
Aston Business School
1 / 30
Why do we care about this topic?
◮ Economic variables (e.g., education, wage, etc.) are random
variables, in the sense that each observation (i.e., realization)
is a random draw from the entire population.
◮ Each random variable has a probability measure. Roughly
speaking, a probability measure is a function which maps from
the occurring of an event (e.g., realization of a random
variable) to the probability of the occurring of the event.
2 / 30
For example, the probability measure can tell us that:
The probability of, say, educ = 12 years (i.e., realization of a
random variable, education) is, say, 0.2.
This is the same as saying that 20% of the observations in the
sample have educ = 12 years.
3 / 30
Discrete random variables
◮ Random variables that take on only a finite number of values
◮ For example, consider tossing a single coin.
The two outcomes/events are heads and tails.
◮ A discrete random variable then can be defined as:
x = 1 if the coin turns up heads,
x = 0 if the coin turns up tails.
By tossing the coin a number of times, we can calculate the
probability of x = 1 and x = 0.
4 / 30
Probability Density Function (pdf)
◮ Generally, if a discrete random variable, X , takes on the n
possible values {x1, . . . , xn}, then the probability measure is
pi = P(X = xi), i = 1, 2, . . . , n,
where 0 ≤ pi ≤ 1 and∑
i pi = 1.
◮ P(·) is also called probability density function (pdf) of X .
“The probability of X = xi is equal to pi”.
5 / 30
The pdf of heads and tails from the coin-tossing example
This is essentially a histogram!
6 / 30
Continuous random variables
◮ They are random variables that take on numerous values.
◮ For example, wage should be a continuous random variable.
◮ In practice, education can also be considered as continuous.
However, in theory, education may be discrete.
7 / 30
Continuous random variables
◮ The pdf of continuous random variable computes the
probability of events (i.e., realizations) involving a range of
values (not a particular value!).
◮ P(a ≤ X ≤ b) measures
the probability that X ranges from a to b.
◮ For example, P(16 ≤ wage ≤ 18) = 0.1 says that the
probability that wage ranges from 16 to 18 is 0.1.
This is the same as saying that 10% of the observations in the
sample have 16/hour ≤ wage ≤ 18/hour .
8 / 30
Use Histogram to illustrate the pdf of wage
Histogram of wage
wage
pdf o
f wag
e
0 10 20 30
0.00
0.02
0.04
0.06
0.08
0.10
9 / 30
Use Density plot to illustrate the pdf of wage
Density of wage
wage
pdf o
f wag
e
0 10 20 30
0.00
0.02
0.04
0.06
0.08
0.10
10 / 30
Normal Distribution
0 10 20 30
0.00
0.02
0.04
0.06
0.08
0.10
Density of wage
wage
pdf o
f wag
e
11 / 30
Other Distributions
12 / 30
Cumulative Distribution Function (cdf)
◮ Sometimes it’s easier to work with cumulative distribution
function (cdf), defined as
F (x) = P(X ≤ x)
where X is a random variable (either discrete or continuous),
and x is any real number.
◮ For continuous random variable,
F (x) is the area under the pdf, to the left of the point x .
13 / 30
Cumulative Distribution Function (cdf)
Properties of cdf:
(1) P(X > c) = 1 − F (c), for any number c
(2) P(a < X ≤ b) = F (b) − F (a)
(3) for continuous random variable, any of the above inequality
can become strict inequality, and vice versa
From the previous example, if P(16 ≤ wage ≤ 18) = 0.1, then
F (18) − F (16) = 0.1, where F is the cdf of wage.
14 / 30
Features of Random Variables
Expected value (population mean)
◮ It is a weighted average of all possible values of X .
The weights are determined by pdf.
◮ Precisely, if X is discrete and can take values {x1, . . . , xn}
with pdf pi = P(X = xi ), then the expected value of X is
E (X ) = x1p1 + · · · + xnpn =∑
i xipi
(where xi are the realizations, and pi are the weights)
15 / 30
Example
Question: X = {4, 12, 2, 6}, P(X = 4) = 0.1, P(X = 12) = 0.2,
P(X = 2) = 0.5, P(X = 6) = 0.2, calculate E (X ) and E (X 2).
Answer:
E (X ) = 4×0.1+12×0.2+2×0.5+6×0.2 = 0.4+2.4+1+1.2 = 5.
E (X 2) = 42 × 0.1 + 122 × 0.2 + 22 × 0.5 + 62 × 0.2 =
1.6 + 28.8 + 2 + 7.2 = 39.6.
16 / 30
Features of Random Variables
Properties of Expected values
(1) E (c) = c , where c is constant (not random!)
(2) E (aX + b) = aE (X ) + b, where a and b are constants
(3) E (∑
i aixi ) =∑
i aiE (xi )
(the expectation of summation is the summation of expectation)
17 / 30
Features of Random Variables
Median
It is the value in the middle of an ordered sequence of realizations
of a random variable.
Example
Question: X = {4, 12, 2, 6}, find the median of X .
Answer: X = {2, 4, 6, 12}, taking the average of the two numbers
in the middle gives (4 + 6)/2 = 5.
18 / 30
Features of Random Variables
Variance: measuring spread of pdf
Var(X ) = E (X − E (X ))2 = E (X 2) − (E (X ))2
Properties of variance
(1) Var(aX + b) = a2Var(X )
(2) Var(aX ± bY ) = a2Var(X ) + b2Var(Y ) ± 2abCov(X ,Y )
(where a and b are constants, X and Y are random)
(3)* Var(∑
i aixi) =∑
i a2i Var(xi ) if Cov(xi , xj) = 0 ∀i 6= j
(where ai are constants, xi are random)
19 / 30
Standard deviation (sd) is the squared root of variance.
Property of Standard deviation:
sd(aX + b) = |a|sd(x)
20 / 30
Example
Question: X = {4, 12, 2, 6}, P(X = 4) = 0.1, P(X = 12) = 0.2,
P(X = 2) = 0.5, P(X = 6) = 0.2, find the variance and standard
deviation of X .
Answer: Var(X ) = E (X 2) − (E (X ))2. We calculated that
E (X 2) = 39.6 and E (X ) = 5, so Var(X ) = 39.6 − 52 = 14.6.
sd(X ) =√
Var(X ) = 3.82.
21 / 30
Features of Random Variables
Covariance: measuring association of two random variables
Cov(X ,Y ) = E (X − E (X ))(Y − E (Y )) = E (XY ) − E (X )E (Y )
Properties of covariance
(1) Cov(a1X + b1, a2Y + b2) = a1a2Cov(X ,Y )
(2) If X and Y are independent, then Cov(X ,Y ) = 0
22 / 30
Features of Random Variables
Correlation coefficient:
measuring association of two random variables
It is the standardized covariance, in the sense that
Corr(X ,Y ) = Cov(X ,Y )sd(X )sd(Y )
Properties of Correlation coefficient
(1) −1 ≤ Corr(X ,Y ) ≤ 1
(2) Corr(X ,Y ) = 0 ⇐⇒ Cov(X ,Y ) = 0
(3) Corr(a1X + b1, a2Y + b2) = Corr(X ,Y )
23 / 30
Standardizing a Random Variable
◮ We usually write X ∼ (µ, σ2),
where µ is the mean of X , E (X ); and σ2 is the variance of X .
Read as “a random variable X is distributed as mean µ and
variance σ2”
◮ If we define a new random variable Z = X−µ
σ,
we can find that E (Z ) = 0, and Var(Z ) = 1.
So Z ∼ (0, 1), is called a standardized random variable.
◮ Continue with the previous example, E (X ) = 5, and
Var(X ) = 14.6, then Z = (X − 5)/√
14.6 is a standardized
random variable. 24 / 30
Conditional Expectation E (Y |x)
◮ Read as “(conditional) expectation of Y given x”
◮ Intuitively, this is E (Y ) given a particular value of x .
◮ For example, E (wage|educ = 12) is the average wage for all
people with 12 years of education.
So E (wage|educ) is usually a function of educ , say,
E (wage|educ) = 1.05 + 0.45educ .
From this example,
E (wage|educ = 12) = 1.05 + 0.45 × 12 = 6.45 pounds/hour.
25 / 30
◮ P(Y |x) is the conditional probability density function (pdf) of
Y given x .
◮ P(wage|educ = 12) is the proportion of people in the
population with 12 years of education.
So P(16 ≤ wage ≤ 18|educ = 12) = 0.1 means that,
for those with 12 years of education, 10% of them have
16/hour ≤ wage ≤ 18/hour .
26 / 30
Conditional Expectation E (Y |x)
If Y is discrete and can take values y1, . . . , ym with conditional pdf
pj = P(Y = yj |x), then the conditional expectation of Y given x is
E (Y |x) = y1p1 + · · · + ympm =∑
j yjpj
(where yj are the realizations, and pj are the weights)
27 / 30
Example
Question: Y = {4, 12, 2, 6}, P(Y = 4|x = 1) = 0.1,
P(Y = 12|x = 1) = 0.2, P(Y = 2|x = 1) = 0.5,
P(Y = 6|x = 1) = 0.2, calculate E (Y |x = 1)
Answer:
E (Y |x = 1) = 4 × 0.1 + 12 × 0.2 + 2 × 0.5 + 6 × 0.2 =
0.4 + 2.4 + 1 + 1.2 = 5.
28 / 30
X and Y are random variables.
Properties of Conditional Expectation:
(1) E [a(X )Y + b(X )|X ] = a(X )E (Y |X ) + b(X )
“we know X , but we don’t know Y , and hence E (Y |X )”
“we know X , and so we know functions of X , a(X ) and b(X )”
(2) E [E (Y |X )] = E (Y ): law of iterated expectation
“the average of average of Y given X is the same as the simple
average of Y ”
(3) If E (Y |X ) = E (Y ), then Cov(X ,Y ) = 0, and Corr(X ,Y ) = 0
“if knowing X doesn’t help to know Y , then X and Y are
uncorrelated”
29 / 30
Reading
Appendix B, Introductory Econometrics - A Modern Approach,
4th Edition, J. Wooldridge
30 / 30