Ka-fu Wong University of Hong Kong

23
1 Ka-fu Wong University of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting

description

Ka-fu Wong University of Hong Kong. A Brief Review of Probability, Statistics, and Regression for Forecasting. Random variable. A random variable is a mapping from the set of all possible outcomes to the real numbers. - PowerPoint PPT Presentation

Transcript of Ka-fu Wong University of Hong Kong

Page 1: Ka-fu Wong University of Hong Kong

1

Ka-fu WongUniversity of Hong Kong

A Brief Review of Probability, Statistics, and Regression for

Forecasting

Page 2: Ka-fu Wong University of Hong Kong

2

Random variable

A random variable is a mapping from the set of all possible outcomes to the real numbers. Today’s Hang Seng Index can go up, down or stay the

same as yesterday. Consider the movement of Hang Seng Index in a month of 22 trading days. We can define a random variable Y of number of days in which Hang Seng Index goes up. In this case, Y assumes 22 values, y=1, y=2, …, y=22.

Discrete random variables can assume only a countable number of values. A discrete probability distribution describes the probability of occurrence for all the events. For instance, pi is the probability that event i will occur.

Continuous random variables can assume a continuum of values. A probability density function, f(y), is a nonnegative continuous function such that the area under f(y) between any points a and b is the probability that Y assumes a value between a and b.

Page 3: Ka-fu Wong University of Hong Kong

3

Moments

Mean (measures central tendency):

Variance (measures dispersion around mean):

Standard deviation:

Skewness (measures the amount of asymmetry in a distribution):

Kurtosis (measures the thickness of the tails in a distribution):

Page 4: Ka-fu Wong University of Hong Kong

4

Multivariate Random Variables

Joint distribution:

Covariance (measures dependence between two variables):

Correlation:

Conditional distribution:

Conditional mean

Conditional variance

Page 5: Ka-fu Wong University of Hong Kong

5

Statistics

Sample mean:

Sample variance: or

Sample standard deviation:

or

Page 6: Ka-fu Wong University of Hong Kong

6

Statistics

Sample skewness:

Sample kurtosis:

Jarque-Bera test statistics:

Under null of independent normally distributed observations, JB is distributed in large samples as a chi-square distribution with two degrees of freedom.

Page 7: Ka-fu Wong University of Hong Kong

7

Example

What is our expectation of y given x=0?

Page 8: Ka-fu Wong University of Hong Kong

8

Forecast

Suppose we want to forecast the value of a variable y, given the value of a variable x.

Denote that forecast yf│x.

Page 9: Ka-fu Wong University of Hong Kong

9

Conditional expectation as a forecast

Think of y and x as random variables jointly drawn from some underlying population.

It seems reasonable to consider constructing the forecast of y based on x as the expected value of y conditional on x, i.e.,

yf│x = E(y │x ),the average population value of y given that value of x.

E(y │x ) is also called the population regression of y (on x).

Page 10: Ka-fu Wong University of Hong Kong

10

Conditional expectation as a forecast

The expected value of y conditional on xyf│x = E(y │x ),

It turns out that in many reasonable forecasting settings, this forecast has optimal properties (e.g., minimizing

expected loss), and (approximating) this forecast guides our choice of

forecast method.

Page 11: Ka-fu Wong University of Hong Kong

11

Unbiasedness of Conditional expectation as a forecast

The forecast error will be y - E(y │x )

Expected forecast error = E[y - E(y │x )] = E(y)-E[E(y│x )] = E(y)-E(y) = 0

Thus the conditional expectation is an unbiased forecast.

Note that another name for E(y │x ) is the population regression of y (on x).

Page 12: Ka-fu Wong University of Hong Kong

12

Some operational assumptions about E(y | x)

In order to proceed in this direction, we need to make some additional assumptions about the underlying population and, in particular, the form of E(y │x ).

The simplest assumption to make is to assume that the conditional expectation is a linear function of x, i.e., assume

E(y │x ) = β0 + β1x

If β0 and β1 are known, then the forecast problem is completed by setting

yf│x = β0 + β1x

Page 13: Ka-fu Wong University of Hong Kong

13

When parameters are unknown

Even if the conditional expectation is linear in x, the parameters β0 and β1 will be unknown.

The next best thing for us to do would be to estimate the values of β0 and β1 and use the estimated β’s in place of their actual values to form the forecasts.

This substitution will not provide as accurate a forecast, since we’re introducing a new source of forecast error due to “estimation error” or “sampling error.” However, under certain conditions the resulting forecast will still be unbiased and retain certain optimality properties.

Page 14: Ka-fu Wong University of Hong Kong

14

When parameters are unknown

Suppose we have access to a sample of T pairs of (x,y) drawn from the population from which the relevant value of y will be drawn: (x1,y1),(x2,y2),…,(xT,yT).

In this case, a natural estimator of β0 and β1 is the ordinary least squares (OLS) estimator, which is obtained by minimizing the sum of squared residuals

yt –β0 – β1xt

with respect to β0 and β1. The solution are the OLS estimates and .

Then, for a given value of x, we can forecast y according to

0̂β

1̂β

xββy f10

ˆ+ˆ=

Page 15: Ka-fu Wong University of Hong Kong

15

Fitting a regression lineEstimating β0 and β1

Page 16: Ka-fu Wong University of Hong Kong

16

When parameters are unknown

This estimation procedure, also called the sample regression of y on x, will provide us with a “good” estimate of the conditional expectation of y given x (i.e., the population regression of y on x) and, therefore, a “good” forecast of y given x, provided that certain additional assumptions apply to the relationship between y and x.

Let ε denote the difference between y and E(y │x ). That is,ε = y - E(y │x )

i.e., y = E(y │x ) + εand

y = β0 + β1x + ε, if E(y │x ) = β0 + β1x.

Page 17: Ka-fu Wong University of Hong Kong

17

When parameters are unknown

The assumptions that we need pertain to these ε’s (the “other factors” that determine y) and their relationship to the x’s.

For instance, so long as E(εt │x1,…,xT) = 0 for t = 1,…,T, the OLS estimator of β0 and β1 based on the data (x1,y1),…,(xT,yT) will be unbiased and, as a result, the forecast constructed by replacing these “population parameters” with the OLS estimates will be unbiased.

A standard set of assumptions that provide us with a lot of value –

Given x1,…,xT , ε1,…,εT are i.i.d. N(0,σ2) random variables.

Page 18: Ka-fu Wong University of Hong Kong

18

When parameters are unknown

These ideas and procedures extend naturally to the setting where we want to forecast the value of y based on the values of k other variables, say, x1,…,xk.

We begin by considering the conditional expectation or population regression of y on x1,…,xk to make our forecast. That is,

yf│x1,…,xk = E(y│x1,…,xk)

To operationalize this forecast, we first assume that the conditional expectation is linear, i.e.,

E(y│x1,…,xk) = β0 + β1x1 + … + βkxk

Page 19: Ka-fu Wong University of Hong Kong

19

When parameters are unknown

The unknown β’s are generally replaced the estimate from a sample OLS regression.

Suppose we have the data set(y1,x11,…,xk1), (y2,x12,…,xk2), …, (yT,x1T,…,xkT)

The OLS estimate of the unknown parameters are obtained by minimizing the sum-of-squared residuals,

(yt – β0 – β1x1t - … - βkxkt)2, t = 1,…,T.

As in the case of the simple regression model, this procedure to estimate the population regression function will have good properties provided that the regression errors

εt = yt – E(yt│x1t,…,xkt) , t = 1,…,Thave appropriate properties.

Page 20: Ka-fu Wong University of Hong Kong

20

ExampleMultiple Linear regression

Page 21: Ka-fu Wong University of Hong Kong

21

Residual plots

Page 22: Ka-fu Wong University of Hong Kong

22

Density Forecasts and Interval Forecasts

The procedures we described above produce point forecasts of y. They can also be used to produce density and interval forecasts of y, provided that the x’s and the regression errors, i.e., the ε’s, meet certain conditions.

Page 23: Ka-fu Wong University of Hong Kong

23

End