Introduction to Regression Lecture 4.2

27
Diploma in Statistics Introduction to Regression Lecture 4.1 1 Introduction to Regression Lecture 4.2 Indicator variables for estimating seasonal effects in time series another application, Meter Sales analysis Correlated explanatory variables

description

Introduction to Regression Lecture 4.2. Indicator variables for estimating seasonal effects in time series another application, Meter Sales analysis Correlated explanatory variables. Housing Completions case study. Table 1.7 Completions and Quarterly Indicators. Model formulation. - PowerPoint PPT Presentation

Transcript of Introduction to Regression Lecture 4.2

Page 1: Introduction to Regression Lecture 4.2

Diploma in StatisticsIntroduction to Regression

Lecture 4.1 1

Introduction to RegressionLecture 4.2

• Indicator variables for estimating seasonal effects in time series

– another application, Meter Sales analysis

• Correlated explanatory variables

Page 2: Introduction to Regression Lecture 4.2

Diploma in StatisticsIntroduction to Regression

Lecture 4.1 2

Housing Completions case studyQuarter 1978 1979 1980 1981 1982 1983 1984 1985

Q1 5777 7276 3538 6642 5981 4859 5129 4947 Q2 4772 4510 6001 4710 4883 5862 4671 5188 Q3 4579 4278 5879 5570 5354 4663 4947 3930 Q4 4243 4274 6383 6314 4894 4564 3195 3360

Quarter 1986 1987 1988 1989 1990 1991 1992 1993

Q1 5186 4144 3682 3554 4296 4692 4155 3684 Q2 3719 3363 3298 3985 4477 3898 5603 4487 Q3 4533 4391 3747 5277 5011 4600 5919 5121 Q4 3726 3478 3477 4484 4752 5282 5305 6009

Quarter 1994 1995 1996 1997 1998 1999 2000

Q1 4291 5770 6582 7434 8010 9930 10302 Q2 5266 6149 7203 8799 9506 10227 11590 Q3 6871 6806 7634 9140 10103 10788 11892 Q4 7160 7879 8713 10081 11474 12079 12873

Page 3: Introduction to Regression Lecture 4.2

Diploma in StatisticsIntroduction to Regression

Lecture 4.1 3

Table 1.7 Completions and Quarterly Indicators

Year Quarter Time Completions Q1 Q2 Q3 Q4 1993 1 1 3684 1 0 0 0

1993 2 2 4487 0 1 0 0 1993 3 3 5121 0 0 1 0 1993 4 4 6009 0 0 0 1 1994 1 5 4291 1 0 0 0 1994 2 6 5266 0 1 0 0 1994 3 7 6871 0 0 1 0 1994 4 8 7160 0 0 0 1 1995 1 9 5770 1 0 0 0 1995 2 10 6149 0 1 0 0 1995 3 11 6806 0 0 1 0 1995 4 12 7879 0 0 0 1 1996 1 13 6582 1 0 0 0 1996 2 14 7203 0 1 0 0 1996 3 15 7634 0 0 1 0 1996 4 16 8713 0 0 0 1 1997 1 17 7434 1 0 0 0 1997 2 18 8799 0 1 0 0 1997 3 19 9140 0 0 1 0 1997 4 20 10081 0 0 0 1 1998 1 21 8010 1 0 0 0 1998 2 22 9506 0 1 0 0 1998 3 23 10103 0 0 1 0 1998 4 24 11474 0 0 0 1 1999 1 25 9930 1 0 0 0 1999 2 26 10227 0 1 0 0 1999 3 27 10788 0 0 1 0 1999 4 28 12079 0 0 0 1 2000 1 29 10302 1 0 0 0 2000 2 30 11590 0 1 0 0 2000 3 31 11892 0 0 1 0 2000 4 32 12873 0 0 0 1

Page 4: Introduction to Regression Lecture 4.2

Diploma in StatisticsIntroduction to Regression

Lecture 4.1 4

Model formulation

Completions = 1Q1 + 2Q2 + 3Q3 + 4Q4

+ Time

+ .

Quarter 1: Completions = 1 + Time

Quarter 2: Completions = 2 + Time,

Homework 4.1.1: Write down the prediction formulas for future third and fourth quarters.

Page 5: Introduction to Regression Lecture 4.2

Diploma in StatisticsIntroduction to Regression

Lecture 4.1 5

Prediction formula

Predicted completions

= 3,248 Q1 + 3901 Q2 + 4174 Q3 + 5031 Q4

+ 250 Time

500

Exercise: Write down separate prediction formulas for each of the four quarters.

Make predictions for each quarter of 2001 and of 2002.

Page 6: Introduction to Regression Lecture 4.2

Diploma in StatisticsIntroduction to Regression

Lecture 4.1 6

A sales forecasting problem

Southern Oil Products

• vegetable oil producer

• raw material supply is seasonal, variety of sources / countries

• problems in second quarter of last year prompts business review

• forecasts required for

– budgetting and staff planning– quantifying extent of last year's problem

Page 7: Introduction to Regression Lecture 4.2

Diploma in StatisticsIntroduction to Regression

Lecture 4.1 7

Table 9.1 Quarterly production of vegetable oil, in numbers of 50 litre drums, for a six year period

Year Quarter Production Annual Year Quarter Production Annual

1 1 1102 4 1 1407 2 1352 2 1778 3 1333 3 1750 4 1426 4 1852 5213 6787

2 1 1204 5 1 1602 2 1463 2 1907 3 1472 3 1833 4 1528 4 2009 5667 7351

3 1 1352 6 1 1694 2 1648 2 1657 3 1519 3 2185 4 1602 4 2139

6121 7675

Page 8: Introduction to Regression Lecture 4.2

Diploma in StatisticsIntroduction to Regression

Lecture 4.1 8

2420161284

2200

2000

1800

1600

1400

1200

1000

Time

Prod

uctio

n

Initial data analysis

Page 9: Introduction to Regression Lecture 4.2

Diploma in StatisticsIntroduction to Regression

Lecture 4.1 9

A simple linear model for trend (Years 1-5)

Regression Analysis: Production versus Time

The regression equation isProduction = 1177 + 36.2 Time

Exercise: Estimate quarterly/annual growth in production

Predictor Coef SE Coef T PConstant 1176.61 53.61 21.95 0.000Time 36.223 4.475 8.09 0.000

Exercise: Calculate a confidence interval for quarterly/annual growth in production

S = 115.405

Exercise: Comment on prediction

Page 10: Introduction to Regression Lecture 4.2

Diploma in StatisticsIntroduction to Regression

Lecture 4.1 10

Quarterly indicator variables

Special variables Q1, Q2, Q3 and Q4 called indicator variables may be added to the simple regression model to produce a multiple regression model incorporating the seasonal effects.

Each quarterly indicator takes the value 1 in the relevant quarter and 0 otherwise.

Note that, in each row, only one of the quarterly indicators takes the value 1, while the other three take the value 0.

Thus, for each time (row), the indicator with value 1 indicates the corresponding quarter, 1, 2, 3 or 4

Page 11: Introduction to Regression Lecture 4.2

Diploma in StatisticsIntroduction to Regression

Lecture 4.1 11

Table 9.2 Quarterly production of vegetable oil, in numbers of 50 litre drums, with Time and quarterly indicators, for a five year period

Page 12: Introduction to Regression Lecture 4.2

Diploma in StatisticsIntroduction to Regression

Lecture 4.1 12

Multiple regression model

Production = 1Q1 + 2Q2 + 3Q3 + 4Q4

+ Time

+ .

Page 13: Introduction to Regression Lecture 4.2

Diploma in StatisticsIntroduction to Regression

Lecture 4.1 13

Regression AnalysisProduction vs Q1, Q2, Q3, Q4, Time

The regression equation isProduction = 1030 Q1 +1292 Q2 + 1210 Q3 + 1279 Q4 + 33.7 Time

Predictor Coef SE Coef T PNoconstantQ1 1029.87 23.41 43.99 0.000Q2 1292.35 24.45 52.85 0.000Q3 1210.42 25.55 47.37 0.000Q4 1278.70 26.71 47.88 0.000Time 33.725 1.619 20.83 0.000

S = 40.9654

Page 14: Introduction to Regression Lecture 4.2

Diploma in StatisticsIntroduction to Regression

Lecture 4.1 14

Exercise

Predict the first quarter production levels for Year 6 and Year 7

Comment on prediction error with respect to

(i) its previous vale

(ii) recent production levels

Next: Diagnostic analysis

Page 15: Introduction to Regression Lecture 4.2

Diploma in StatisticsIntroduction to Regression

Lecture 4.1 15

Exercise

Following calculation of a revised regression, make a table of initial and revised coefficient estimates and residual standard deviations.

Compare.

Which would you choose?

Why?

Page 16: Introduction to Regression Lecture 4.2

Diploma in StatisticsIntroduction to Regression

Lecture 4.1 16

Exercise

Confirm and quantify the extent of the problem in Year 6, Q2.

Homework 4.2.2

Confirm and quantify the extent of the recovery in Year 6, Q3.

Page 17: Introduction to Regression Lecture 4.2

Diploma in StatisticsIntroduction to Regression

Lecture 4.1 17

Multiple regression model,alternative formulation

Production = 1Q1 + 2Q2 + 3Q3

+ Time Time

+ .

Page 18: Introduction to Regression Lecture 4.2

Diploma in StatisticsIntroduction to Regression

Lecture 4.1 18

Alternative regression

* Q4 is highly correlated with other X variables* Q4 has been removed from the equation.

The regression equation isProduction = 1279 - 249 Q1 + 13.7 Q2 - 68.3 Q3 + 33.7 Time

Predictor Coef SE Coef T PConstant 1278.70 26.71 47.88 0.000Q1 -248.82 26.36 -9.44 0.000Q2 13.65 26.11 0.52 0.609Q3 -68.27 25.96 -2.63 0.019Time 33.725 1.619 20.83 0.000

S = 40.9654

Page 19: Introduction to Regression Lecture 4.2

Diploma in StatisticsIntroduction to Regression

Lecture 4.1 19

Homework 4.2.3

List correspondences between the output from the original regression and the output from the alternative regression.

Confirm that the coefficients of Q1, Q2 and Q3 in the original are the corresponding coefficients in the alternative with the Q4 coefficient added.

Page 20: Introduction to Regression Lecture 4.2

Diploma in StatisticsIntroduction to Regression

Lecture 4.1 20

Introduction to RegressionLecture 4.2

• Indicator variables for estimating seasonal effects in time series

– another application, Meter Sales analysis

• Correlated explanatory variables

Page 21: Introduction to Regression Lecture 4.2

Diploma in StatisticsIntroduction to Regression

Lecture 4.1 21

Another application, meter sales analysis

Recall the analysis of Meter sales, discussed in Lab 1 Feedback.doc.

Page 22: Introduction to Regression Lecture 4.2

Diploma in StatisticsIntroduction to Regression

Lecture 4.1 22

Another application, Meter Sales analysis

Meter Sales jumped when nominal Phone Charge increased. Model these jumps by adding "indicators" defined to be 0 for all years prior to the jump and 1 for all years during and after the jump.

Thus, the first jump occurred during 1952, so the corresponding indicator will be 0 from 1949 to 1952 and 1 from 1953 to 1983.

Multiplying this explanatory variable by regression coefficient adds 0 to predicted Meter Sales from 1949 to 1952 and adds from 1953 to 1983.

Page 23: Introduction to Regression Lecture 4.2

Diploma in StatisticsIntroduction to Regression

Lecture 4.1 23

Another application, Meter Sales analysisRegression Analysis: Meter Sales versus GNP, RLP, ...

Predictor Coef SE Coef T PConstant 38.87 51.62 0.75 0.461GNP 0.15905 0.04881 3.26 0.004RLP -73.53 14.76 -4.98 0.000RPC -14.36 65.76 -0.22 0.830Jump1953 13.385 9.688 1.38 0.184Jump1956 23.110 8.140 2.84 0.011Jump1964 41.93 15.17 2.76 0.013Jump1970 16.39 10.19 1.61 0.125

S = 5.42373

Page 24: Introduction to Regression Lecture 4.2

Diploma in StatisticsIntroduction to Regression

Lecture 4.1 24

Another application, Meter Sales analysis

Note that the t-value for RPC is negligible so that RPC may be omitted. The variation explained by RPC is captured by the four indicator variables.

Also, the s value is lower than before, suggesting that the variation in Meter Sales is better explained by the indicators than by RPC alone.

N.B. Additional uses for indicators may be found in Extra Notes, Indicators.doc

Page 25: Introduction to Regression Lecture 4.2

Diploma in StatisticsIntroduction to Regression

Lecture 4.1 25

Introduction to RegressionLecture 4.2

• Indicator variables for estimating seasonal effects in time series

– another application, Meter Sales analysis

• Correlated explanatory variables

Page 26: Introduction to Regression Lecture 4.2

Diploma in StatisticsIntroduction to Regression

Lecture 4.1 26

Correlated explanatory variables

Ref: Extra Notes

Homework 4.2.4

Calculate the simple linear regressions of Jobtime on each of T_Ops and Units. Confirm the corresponding t-values.

Calculate the simple linear regression of Jobtime on Ops per Unit. Comment of the negative correlation of Jobtime with Ops per Unit in the light of the corresponding t-value.

Confirm the calculation of the R2 values.

Page 27: Introduction to Regression Lecture 4.2

Diploma in StatisticsIntroduction to Regression

Lecture 4.1 27

Reading

SA §§ 9.1- 9.2

Hamilton, pp 82-84, 133-136