· Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models....

109
(Previously presented at SAS Global Forum, Dallas, 2015) Adventures in Forecasting David A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand when to difference data __Understand advantages/disadvantages of deterministic vs. stochastic inputs __Compare forecasts from deterministic versus stochastic input models. __ Incorporate Trends 1

Transcript of  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models....

Page 1:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

(Previously presented at SAS Global Forum, Dallas, 2015)

Adventures in ForecastingDavid A. Dickey

NC State University

Learning objectives: __Understand ARIMA models.__Interpret ARIMA output from PROC ARIMA__Forecast with forecast intervals__Understand when to difference data__Understand advantages/disadvantages of deterministic vs.

stochastic inputs__Compare forecasts from deterministic versus stochastic input

models. __ Incorporate Trends__ Incorporate Seasonality__ (optional) Introduce Cointegration

Note: Examples here are run in SASTM

TM SAS and its products is the registered trademark of SAS Institute, Cary NC, USA

1

Page 2:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Brief indexPg. Topic

3 Overview4 Autoregressive Models 8 Model checking10 Prediction Intervals11 Moving Average Models 14 Model Selection - AIC19 Stationarity – Unit Roots26 Determining Lag Differences for Unit Root Tests33 Models with Inputs (PROC AUTOREG)47 Detecting Outliers52 Seasonal Models63 (optional) Nonlinear Trends67 (optional) St. Petersburg Visitor Example72 (optional) Seasonal Unit Roots77 (optional) Cointegrated Series

Real data examples Silver pg. 5, 24, 36 Iron &Steel 12 Brewers 14, 23 Corn Yields 24 Harley 33 NCSU Energy 40 Deer Crash 52 Ice Classic 63 St. Pete Visits 67 T-bill rates 78

2

Page 3:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Overview of Time Series and Forecasting:

Data taken over time (usually equally spaced) Yt = data at time t = mean (constant over time)

Simplest model: Yt = + et

where et ~ N(0,2) independent. Forecast: Y±1 .96 S

Example 0: Yt = + Zt , corr(Zt,Zt-1)=0.8

3

Page 4:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Model (Yt(Yt1et , et ~ N(0,2) independent

ARIMA Models: AR(p): “Autoregressive”

et independent, constant variance: “White Noise”How to find p? Regress Y on lags.

PACF Partial Autocorrelation Function

(1) Regress Yt on Yt-1 then Yt on Yt-1 and Yt-2

then Yt on Yt-1,Yt-2, Yt-3 etc.

(2) Plot last lag coefficients versus lags.

4

Red series example: Partial Autocorrelations

Page 5:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Example 1: Supplies of Silver in NY commodities exchange:

Getting PACF (and other identifying plots). SAS code:

PROC ARIMA data=silver plots(unpack) = all; IDENTIFY var=silver; run;

5

Page 6:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

PACF

“Spikes” outside 2 standard error bands are statistically significant

Two spikes p=2

How to estimate and ’s ? PROC ARIMA’s ESTIMATE statement.

Use maximum likelihood (ml option) PROC ARIMA data=silver plots(unpack) = all; identify var=silver;

ESTIMATE p=2 ml;

6

Page 7:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Maximum Likelihood EstimationParameter

Estimate Standard Error

t Value Approx

Pr > |t|

Lag

MU 668.29592 38.07935 17.55 <.0001 0

AR1,1 1.57436 0.10186 15.46 <.0001 1

AR1,2 -0.67483 0.10422 -6.48 <.0001 2

7

Page 8:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Backshift notation: B(Yt)=Yt-1, B2(Yt)=B(B(Yt))=Yt-2

SAS output: (uses backshift)

Autoregressive FactorsFactor

1:1 - 1.57436 B**(1) + 0.67483 B**(2)

Checks:(1) Overfit (try AR(3) )

ESTIMATE p=3 ml;

Maximum Likelihood Estimation

Parameter Estimate Standard Error

t Value ApproxPr > |t|

Lag

MU 664.88129 35.21080 18.88 <.0001 0

AR1,1 1.52382 0.13980 10.90 <.0001 1

AR1,2 -0.55575 0.24687 -2.25 0.0244 2

AR1,3 -0.07883 0.14376 -0.55 0.5834 3

(2) Residual autocorrelations

Residual rt

Residual autocorrelation at lag j: Corr(rt, rt-j) = (j)

8

Page 9:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Box-Pierce Q statistic: Estimate, square, and sum k of these. Multiply by sample size n. PROC ARIMA: k in sets of 6. Limit distribution Chi-square if errors independent.

Later modification: Box-Ljung statistic for H0:residuals uncorrelated

SAS output:

Autocorrelation Check of ResidualsTo

LagChi-

SquareDF

Pr > ChiSq

Autocorrelations

6 3.49 4 0.4794 -0.070 -0.049 -0.080 0.100 -0.112 0.151

12 5.97 10 0.8178 0.026 -0.111 -0.094 -0.057

0.006 -0.110

18 10.27 16 0.8522 -0.037 -0.105 0.128 -0.051

0.032 -0.150

24 16.00 22 0.8161 -0.110 0.066 -0.039 0.057 0.200 -0.014

Residuals uncorrelatedResiduals are White Noise Residuals are unpredictable

9

Page 10:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

SAS computes Box-Ljung on original data too. Autocorrelation Check for White Noise

To Lag

Chi-Square

DF Pr > ChiSq

Autocorrelations

6 81.84 6 <.0001 0.867 0.663 0.439 0.214 -0.005 -0.184

12 142.96 12 <.0001 -0.314

-0.392 -0.417 -0.413

-0.410 -0.393

Data autocorrelated predictable!

Note: All p-values are based on an assumption called “stationarity” discussed later.

How to predict?

One step prediction

Two step prediction

Prediction error variance ( 2 = variance(et) )

10

Page 11:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

From prediction error variances, get 95% prediction intervals. Can estimate variance of et from past data. SAS PROC ARIMA does it all for you!

Moving Average, MA(q), and ARMA(p,q) models

MA(1) Yt = + et et-1 Variance (1+2)2

Yt-1 = + et-1 et-2 (1)=-/(1+2) Yt-2 = + et-2 et-3 (2)=0/(1+2)=0Autocorrelation function “ACF” ((j)) is 0 after lag q for MA(q). PACF is useless for identifying q in MA(q).

11

Page 12:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

PACF drops to 0 after lag 3 AR(3) p=3ACF drops to 0 after lag 2 MA(2) q=2Neither drops ARMA(p,q) p= ___ q=____

Example 2: Iron and Steel Exports.

PROC ARIMA plots(unpack)=all; IDENTIFY var=EXPORT;

12

Page 13:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

ACF: could be MA(1) PACF: could be AR(1) Spike at lags 0, 1 (No spike displayed at lag 0)

ESTIMATE P=1 ML; ESTIMATE Q=2 ML; ESTIMATE Q=1 ML; Maximum Likelihood Estimation

Approx Parameter Estimate t Value Pr>|t| Lag MU 4.42129 10.28 <.0001 0 AR1,1 0.46415 3.42 0.0006 1

MU 4.43237 11.41 <.0001 0 MA1,1 -0.54780 -3.53 0.0004 1 MA1,2 -0.12663 -0.82 0.4142 2

MU 4.42489 12.81 <.0001 0 MA1,1 -0.49072 -3.59 0.0003 1

How to choose? AIC - smaller is better

AIC: -2 ln(Lmax)+2(# parameters)Lmax = max of likelihood function

AIC 165.8342 (MA(1)) AIC 166.3711 (AR(1)) AIC 167.1906 (MA(2)) FORECAST lead=5 out=out1 id=date interval=year;

13

Page 14:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Example 3: Brewers’ Proportion Won

Mean of Working Series 0.478444 Standard Deviation 0.059934

AutocorrelationsLag -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 Correlation Std Error0 1.00000 | |********************| 01 0.52076 | . |********** | 0.1490712 0.18663 | . |**** . | 0.1851363 0.11132 | . |** . | 0.1892714 0.11490 | . |** . | 0.1907205 -.00402 | . | . | 0.192252

14

Page 15:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

6 -.14938 | . ***| . | 0.1922547 -.13351 | . ***| . | 0.1948178 -.06019 | . *| . | 0.1968409 -.05246 | . *| . | 0.19724810 -.20459 | . ****| . | 0.19755811 -.22159 | . ****| . | 0.20221112 -.24398 | . *****| . | 0.207537 "." marks two standard errors

Could be MA(1) Autocorrelation Check for White Noise

To Chi- Pr >Lag Square DF ChiSq --------------Autocorrelations---------------

6 17.27 6 0.0084 0.521 0.187 0.111 0.115 -0.004 -0.149 12 28.02 12 0.0055 -0.134 -0.060 -0.052 -0.205 -0.222 -0.244 NOT White Noise!

SAS Code:

PROC ARIMA data=brewers; IDENTIFY var=Win_Pct nlag=12; run; ESTIMATE q=1 ml; Maximum Likelihood Estimation

Standard Approx Parameter Estimate Error t Value Pr > |t| Lag

MU 0.47791 0.01168 40.93 <.0001 0 MA1,1 -0.50479 0.13370 -3.78 0.0002 1

AIC -135.099

Autocorrelation Check of Residuals

To Chi- Pr >Lag Square DF ChiSq --------------Autocorrelations------------

6 3.51 5 0.6219 0.095 0.161 0.006 0.119 0.006 -0.140 12 11.14 11 0.4313 -0.061 -0.072 0.066 -0.221 -0.053 -0.242 18 13.54 17 0.6992 0.003 -0.037 -0.162 -0.010 -0.076 -0.011 24 17.31 23 0.7936 -0.045 -0.035 -0.133 -0.087 -0.114 0.015

Estimated Mean 0.477911

15

Page 16:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Moving Average Factors Factor 1: 1 + 0.50479 B**(1) Partial Autocorrelations

Lag Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 1 0.52076 | . |********** | 2 -0.11603 | . **| . | 3 0.08801 | . |** . | 4 0.04826 | . |* . | 5 -0.12646 | . ***| . | 6 -0.12989 | . ***| . | 7 0.01803 | . | . | 8 0.01085 | . | . | 9 -0.02252 | . | . | 10 -0.20351 | . ****| . | 11 -0.03129 | . *| . | 12 -0.18464 | . ****| . |

OR … could be AR(1)ESTIMATE p=1 ml;

Maximum Likelihood Estimation Standard Approx Parameter Estimate Error t Value Pr > |t| Lag

MU 0.47620 0.01609 29.59 <.0001 0 AR1,1 0.53275 0.12750 4.18 <.0001 1 AIC -136.286 (vs. -135.099)

Autocorrelation Check of Residuals

To Chi- Pr >Lag Square DF ChiSq -----------Autocorrelations------------ 6 3.57 5 0.6134 0.050 -0.133 -0.033 0.129 0.021 -0.173 12 8.66 11 0.6533 -0.089 0.030 0.117 -0.154 -0.065 -0.181 18 10.94 17 0.8594 0.074 0.027 -0.161 0.010 -0.019 0.007 24 13.42 23 0.9423 0.011 -0.012 -0.092 -0.081 -0.106 0.013

Model for variable Win_pct Estimated Mean 0.476204 Autoregressive Factors Factor 1: 1 - 0.53275 B**(1)

16

Page 17:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Conclusions for Brewers:

Both models have statistically significant parameters.Both models are sufficient (no lack of fit)

Predictions from MA(1): First one uses correlationsThe rest are on the mean.

Predictions for AR(1):Converge exponentially fast toward mean

Not much difference but AIC prefers AR(1)

17

Page 18:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Stationarity

(1) Mean constant (no trends)(2) Variance constant(3) Covariance (j) and correlation

(j) = (j)/(0) between Yt and Yt-j depend only on j

ARMA(p,q) model:

18

Page 19:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Stationarity guaranteed whenever solutions of equation (roots of “characteristic polynomial”)

Xp – 1Xp-12Xp-2…p =0

are all <1 in magnitude.

Examples

(1) Yt = .8(Yt-1) + et X-.8=0 X=.8 stationary

(2) Yt = 1.00(Yt-1) + et nonstationary Note: Yt= Yt-1 + et Random walk

(3) Yt = 1.6(Yt-1) 0.6(Yt-2)+ et “characteristic polynomial”

X21.6X+0.6=0 X=1 or X=0.619

Page 20:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

nonstationary (unit root X=1)

(Yt)(Yt-1) =0.6[(Yt-1) (Yt-2)]+ et (YtYt-1) =0.6(Yt-1 Yt-2) + et

First differences form stationary AR(1) process!

No mean – no mean reversion – no gravity pulling toward the mean.

(4) Yt = 1.60(Yt-1) 0.63(Yt-1)+ et X21.60X+0.63=0 X=0.9 or X=0.7

|roots|<1 stationary

(Yt)(Yt-1) = 0.03(Yt-1) + 0.63[(Yt-1) (Yt-2)]+ et

YtYt-1 = 0.03(Yt-1) + 0.63(Yt-1Yt-2)+ et * Unit Root testing (H0: Series has a unit root)

20

Page 21:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Regress YtYt-1 on Yt-1 and (Yt-1Yt-2)Look at t test for Yt-1. If it is significantly negative then stationary.

*Note: If X=1 then –(X2–1.60X+0.63) = 0.3 (always equals lag Y coefficient so 0 unit root)Problem: Distribution of “t statistic” on Yt-1 is not t distribution under unit root hypothesis.

Distribution looks like this histogram:

(1 million random walks of length n=100)

21

Page 22:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Overlays: N(sample mean & variance) N(0,1)

Correct distribution: Dickey-Fuller test in PROC ARIMA.

-2.89 is the correct (left) 5th %ile46% of t’s are less than -1.645

(the normal 5th percentile)

Example 3 revisited: BrewersPROC ARIMA data=brewers; IDENTIFY var=Wins nlag=12 stationarity=(ADF=0); run;

Dickey-Fuller Unit Root Tests

Type Lags Rho Pr < Rho Tau Pr < Tau Zero Mean 0 -0.1803 0.6376 -0.22 0.6002Single Mean 0 -21.0783 0.0039 -3.75 0.0062 Trend 0 -21.1020 0.0287 -3.68 0.0347

Why “single mean?” Series has nonzero mean and no trend.

22

Page 23:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Conclusion reject H0:unit roots so Brewers series is stationary (mean reverting).

0 lags do not need lagged differences in model (just regress Yt-Yt-1 on Yt-1)

Example 1 revisited: Stocks of silver

Needed AR(2) (2 lags) so regress Yt-Yt-1 (D_Silver) on Yt-1 (L_Silver) and Yt-1-Yt-2 (D_Silver_1)

PROC REG:23

Page 24:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

ParameterVariable DF Estimate t Value Pr>|t|

Intercept 1 75.58073 2.76 0.0082L_Silver 1 -0.11703 -2.78 0.0079 wrong distn.D_Silver_1 1 0.67115 6.21 <.0001 OK

PROC ARIMA:

Augmented Dickey-Fuller Unit Root Tests

Type Lags Rho Pr<Rho Tau Pr<Tau Zero Mean 1 -0.2461 0.6232 -0.28 0.5800Single Mean 1 -17.7945 0.0121 -2.78 0.0689 OK Trend 1 -15.1102 0.1383 -2.63 0.2697

Same t statistic, corrected p-value! Conclusion: Unit root difference the series.

1 lag need 1 lagged difference in model (regress Yt-Yt-1 on Yt-1 and Yt-1-Yt-2 )

PROC ARIMA data=silver; IDENTIFY var=silver(1) stationarity=(ADF=(0));

24

Page 25:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

ESTIMATE p=1 ml; FORECAST lead=24 out=outN ID=date Interval=month;

Unit root forecast & forecast interval

HOW MANY LAGGED DIFFERENCES?

Regression: (YtYt1) = b0 + b1Yt1 + b2(Yt1Yt2) + …+ bp(Ytp1Ytp) not. | standard distributions for these | standard

Dickey & Fuller (1979)25

Page 26:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Lagged difference coefficients b2 … bp have standard (asymptotically normal) distributions. Trust their t test p-values in PROC REG. b0 and b1 have t statistics with same nonstandard

limit distributions as in the AR(1) model. Implication: Just use PROC REG to determine

appropriate number of lagged differences. o Too few => invalid testso Too many => loss of power

Said & Dickey (1984, 1985) prove that methods work even if moving average terms are present.Chang and Dickey (1993) show that the Inverse Autocorrelation Function (IACF) can be used to check for overdifferencing.

Yt- = (Yt-1-) + et – et-1 (||<1)

Autoregressive Moving Average “Dual” model: Yt- = (Yt-1-) + et – et-1Definition: Inverse Autocorrelation Function is Autocorrelation Function of dual model.

26

Page 27:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

IACF estimation: (a) Fit long autoregression, (b) move coefficients to moving average (MA) side, (c) calculate ACF as if estimated MA is true.

Chang (1993) Moving average unit root (e.g. =1) slow decay in IACF (Inverse AutoCorrelation Function)

Differencing whenever you see a trend is NOT appropriate:

IACF from generated linear trend plus white noise:

27

1)1(

),0(~

1

11

2

ttt

tt

ttt

eeYetY

NtindependeneetY

(“Non-invertible moving average”)

Page 28:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Example 4: Corn yields in the U.S. (bushels per acre 1866-1942 and 1943-2014)

Analysis of post 1942 yields.

Levels data:

PROC ARIMA; IDENTIFY var=Yield stationarity=(adf=0);

28

Page 29:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Dickey-Fuller Unit Root Tests Type Lags Tau Pr<Tau Zero Mean 0 0.61 0.8454 Single Mean 0 -1.23 0.6559 Trend 0 -8.65 <.0001

Autocorrelation Check of Residuals (from linear trend plus white noise) To Chi- Pr >Lag Square DF ChiSq --Autocorrelations----- 6 4.27 6 0.6403 -0.05 ... 0.20 -0.06 12 10.82 12 0.5445 -0.02 ... -0.21 -0.08 18 15.54 18 0.6247 0.03 ... 0.04 -0.08 24 22.43 24 0.5535 0.19 ... 0.08 0.09

Suppose we difference anyway: IACF

29

Page 30:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Example 2 revisited again: Silver Series

DATA CHECK;

SET SILVER; Lag_silver = LAG(silver); Diff = silver-Lag_silver; OUTPUT; RETAIN; Diff5=Diff4; Diff4=Diff3; Diff3=Diff2; Diff2=Diff1; Diff1=Diff;

PROC REG; MODEL Diff = Lag_silver Diff1-Diff5; REMOVE_2andup: TEST Diff2=0, Diff3=0, Diff4=0, Diff5=0; run Parameter StandardVariable DF Estimate Error t Value Pr > |t|

Intercept 1 124.70018 45.46720 2.74 0.0092 XLag_silver 1 -0.19292 0.07144 -2.70 0.0102 XDiff1 1 0.62085 0.14427 4.30 0.0001 OK **Diff2 1 0.06292 0.17499 0.36 0.7211 OKDiff3 1 0.00293 0.17224 0.02 0.9865 OKDiff4 1 0.21501 0.17392 1.24 0.2238 OKDiff5 1 0.08100 0.16209 0.50 0.6201 OK

Test REMOVE_2andup Results for Dependent Variable Diff

MeanSource DF Square F Value Pr > FNumerator 4 803.50398 0.89 0.4796 OKDenominator 39 903.76157

What actually happened next in Silver series?30

Page 31:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

(1) Fit stationary (AR(2)) and nonstationary models (differences~ AR(1)) to the data.(2) Compute forecasts, stationary and nonstationary

PROC AUTOREG

Fits a regression model (least squares) Fits stationary autoregressive model to error termsRefits accounting for autoregressive errors.

31

Page 32:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Example 5-A: AUTOREG Harley-Davidson closing stock prices 2009-present.

PROC AUTOREG data=Harley; MODEL close=date/ nlag=15 backstep; run;

One by one, AUTOREG eliminates insignificant lags then:

32

Page 33:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Estimates of Autoregressive ParametersLag Coefficien

tStandard Error t Value

1 -0.975229 0.006566 -148.53

Final model:

Parameter Estimates

Variable DF Estimate Standard Error t Value ApproxPr > |t|

Intercept 1 -412.1128

35.2646 -11.69 <.0001

Date 1 0.0239 0.001886 12.68 <.0001

In PROC AUTOREG model is Zt+Zt-1=et rather than ZtZt-1=et so with = 0.975229,error term Zt satisfies Zt0.97Zt-1=et.

Example 5-B: ARIMA Harley-Davidson closing stock prices 01/01/2009through 05/13/ 2013. (vs. AUTOREG)

33

Page 34:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Apparent upward movement: Linear trend or nonstationary?

Regress

Yt – Yt-1 on 1, t, Yt-1 (& lagged differences)

H0: Yt= + Yt-1 + et “random walk with drift”H1: Yt=t + Zt with Zt stationary AR(p) *

New distribution for t-test on Yt-1

34

Page 35:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

With trend

* Yt=t + Zt general model so YtYt-1 =t-(t-1) + Zt –Zt-1

and if Zt=Zt-1+et (unit root) then

YtYt-1 = + et or YtYt-1 + et

“Random walk with drift ”

35

Page 36:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Without trend

With trend

1 million simulations - runs in 7 seconds!

SAS code for Harley stock closing price36

Page 37:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

PROC ARIMA data=Harley;

IDENTIFY var=close stationarity=(adf) crosscor=(date) noprint; ESTIMATE input=(date) p=1 ml; FORECASE lead=120 id=date interval=weekday out=out1; run;

Stationarity test (0,1,2 lagged differences):

Augmented Dickey-Fuller Unit Root Tests

Type Lags Rho Pr < Rho Tau Pr < Tau

Zero Mean 0 0.8437 0.8853 1.14 0.9344

1 0.8351 0.8836 1.14 0.9354

2 0.8097 0.8786 1.07 0.9268

Single Mean

0 -2.0518 0.7726 -0.87

0.7981

1 -1.7772 0.8048 -0.77

0.8278

2 -1.8832 0.7925 -0.78

0.8227

Trend 0 -27.1559 0.0150 -3.67

0.0248

1 -26.9233 0.0158 -3.64

0.0268

2 -29.4935 0.0089 -3.80

0.0171

Conclusion: stationary around a linear trend.

37

Page 38:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Estimates: trend + AR(1)

Maximum Likelihood Estimation

Parameter

Estimate Standard Error t Value ApproxPr > |t|

Lag Variable Shift

MU -412.08104 35.45718 -11.62 <.0001 0 Close 0

AR1,1 0.97528 0.0064942 150.18 <.0001 1 Close 0

NUM1 0.02391 0.0018961 12.61 <.0001 0 Date 0

38

Page 39:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Autocorrelation Check of Residuals

To Lag Chi-Square DF

Pr > ChiSq Autocorrelations

6 3.20 5 0.6694 -0.005

0.044 -0.023 0.000 0.017 0.005

12 6.49 11 0.8389 -0.001

0.019 0.003 -0.010

0.049 -0.003

18 10.55 17 0.8791 0.041 -0.026 -0.022 -0.023

0.007 -0.011

24 16.00 23 0.8553 0.014 -0.037 0.041 -0.020

-0.032 0.003

30 22.36 29 0.8050 0.013 -0.026 0.028 0.051 0.036 0.000

36 24.55 35 0.9065 0.037 0.016 -0.012 0.002 -0.007 0.001

42 29.53 41 0.9088 -0.007

-0.021 0.029 0.030 -0.033 0.030

48 49.78 47 0.3632 0.027 -0.009 -0.097 -0.026

-0.074 0.026

What actually happened?

39

Page 40:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Example 6 (with inputs): NCSU Energy Demand

Type of day Class Days Work Days (no classes) Holidays & weekends.

Temperature Season of Year

40

Page 41:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Step 1: Make some plots of energy demand vs. temperature and season. Use type of day as color.

Seasons: S = A sin(2t/365) , C=B cos(2t/365)

Temperature Season of Year

Step 2: PROC AUTOREG with all inputs:

PROC AUTOREG data=energy; MODEL demand = temp tempsq class work s c /nlag=15 backstep dwprob; output out=out3 predicted = p predictedm=pm residual=r residualm=rm; run;

41

Page 42:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Estimates of Autoregressive Parameters

Lag Coefficient Standard Error t Value

1 -0.559658 0.043993 -12.72

5 -0.117824 0.045998 -2.56

7 -0.220105 0.053999 -4.08

8 0.188009 0.059577 3.16

9 -0.108031 0.051219 -2.11

12 0.110785 0.046068 2.40

14 -0.094713 0.045942 -2.06

Autocorrelation at 1, 7, 14, and others.

After autocorrelation adjustments, trust t tests etc.

Parameter Estimates

Variable DF Estimate Standard Error t Value ApproxPr > |t|

Intercept 1 6076 296.5261 20.49 <.0001

TEMP 1 28.1581 3.6773 7.66 <.0001

TEMPSQ 1 0.6592 0.1194 5.52 <.0001

CLASS 1 1159 117.4507 9.87 <.0001

WORK 1 2769 122.5721 22.59 <.0001

S 1 -764.0316 186.0912 -4.11 <.0001

C 1 -520.8604 188.2783 -2.77 0.0060

42

Page 43:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

rm

-5000

-4000

-3000

-2000

-1000

0

1000

2000

DATE

01JUL79 01AUG79 01SEP79 01OCT79 01NOV79 01DEC79 01JAN80 01FEB80 01MAR80 01APR80 01MAY80 01JUN80 01JUL80

Need better model?Big negative residual on Jan. 2

WC non work work class

Residuals from regression part. Large residual on workday January 2. Add dummy variable. Same idea: PROC ARIMA

Step 1: Graphs

Step 2: Regress on inputs, diagnose residual autocorrelation:

43

Page 44:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Not white noise (bottom right)Activity (bars) at lag 1, 7, 14Step 3: Estimate resulting model from diagnostics plus trial and error:

e input = (temp tempsq class work s c) p=1 q=(1,7,14) ml;

44

Page 45:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Maximum Likelihood Estimation

Parameter

Estimate Standard Error t Value

ApproxPr > |t|

Lag Variable Shift

MU 6183.1 300.87297 20.55 <.0001 0 DEMAND 0

MA1,1 0.11481 0.07251 1.58 0.1133 1 DEMAND 0

MA1,2 -0.18467 0.05415 -3.41 0.0006 7 DEMAND 0

MA1,3 -0.13326 0.05358 -2.49 0.0129 14 DEMAND 0

AR1,1 0.73980 0.05090 14.53 <.0001 1 DEMAND 0

NUM1 26.89511 3.83769 7.01 <.0001 0 TEMP 0

NUM2 0.64614 0.12143 5.32 <.0001 0 TEMPSQ 0

NUM3 912.80536 122.78189 7.43 <.0001 0 CLASS 0

NUM4 2971.6 123.94067 23.98 <.0001 0 WORK 0

NUM5 -767.41131 174.59057 -4.40 <.0001 0 S 0

NUM6 -553.13620 182.66142 -3.03 0.0025 0 C 0

Note: class days get class effect 913 plus work effect 2972.Note 2: Lags are sensible.

Step 4: Check model fit (stats look OK):

45

Page 46:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Autocorrelation Check of Residuals

To Lag Chi-Square DF

Pr > ChiSq Autocorrelations

6 2.86 2 0.2398 -0.001

-0.009 -0.053 -0.000

0.050 0.047

12 10.71 8 0.2188 0.001 -0.034 0.122 0.044 -0.039 -0.037

18 13.94 14 0.4541 -0.056

0.013 -0.031 0.048 -0.006 -0.042

24 16.47 20 0.6870 -0.023

-0.028 0.039 -0.049

0.020 -0.029

30 24.29 26 0.5593 0.006 0.050 -0.098 0.077 -0.002 0.039

36 35.09 32 0.3239 -0.029

-0.075 0.057 -0.001

0.121 -0.047

42 39.99 38 0.3817 0.002 -0.007 0.088 0.019 -0.004 0.060

48 43.35 44 0.4995 -0.043

0.043 -0.027 -0.047

-0.019 -0.032

46

Page 47:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Looking for “outliers” that can be explainedPROC ARIMA, OUTLIER statement

Available types(1) Additive (single outlier)(2) Level shift (sudden change in mean)(3) Temporary change (level shift for k contiguous

time points – you specify k)

NCSU energy: tested every point – 365 tests. Adjust for multiple testing

Require p < 0.05/365 = .0001369863 (Bonferroni)

OUTLIER type=additive alpha=.0001369863 id=date; FORMAT date weekdate.; run;

/*****************************************January 2, 1980 Wednesday: Hangover Day :-)

March 3,1980 Monday:On the afternoon and evening of March 2, 1980, North Carolina experienced a major winter storm with heavy snow across the entire state and near blizzard conditions in the eastern part of the state.

47

Page 48:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Widespread snowfall totals of 12 to 18 inches were observed over Eastern North Carolina, with localized amounts ranging up to 22 inches at Morehead City and 25 inches at Elizabeth City, with unofficial reports of up to 30 inches at Emerald Isle and Cherry Point (Figure 1). This was one of the great snowstorms in Eastern North Carolina history. What made this storm so remarkable was the combination of snow, high winds, and very cold temperatures.

May 10,1980 Saturday. Graduation!****************************************/;

Outlier DetailsOb

sTime ID Type Estimate Chi-Square Approx Prob>ChiSq

186 Wednesday Additive -3250.9 87.76 <.0001315 Saturday Additive 1798.1 28.19 <.0001247 Monday Additive -1611.8 22.65 <.0001

Outlier DetailsOb

sTime ID Type Estimate Chi-Square Approx Prob>ChiSq

186 02-JAN-1980 Additive -3250.9 87.76 <.0001315 10-MAY-1980 Additive 1798.1 28.19 <.0001247 03-MAR-1980 Additive -1611.8 22.65 <.0001

48

Page 49:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Outliers: Jan 2 (hangover day!), March 3 (snowstorm), May 10 (graduation day).

AR(1) produces 3 ‘rebound’ outlying next day residuals (). Add dummy variables for explainable outliers

data next; merge outarima energy; by date; hangover = (date="02Jan1980"d); storm = (date="03Mar1980"d); graduation = (date="10May1980"d);

49

Page 50:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

PROC ARIMA data=next; IDENTIFY var=demand crosscor=(temp tempsq class work s c hangover graduation storm) noprint; ESTIMATE input = (temp tempsq class work s c hangover graduation storm) p=1 q=(7,14) ml; FORECAST lead=0 out=outARIMA2 id=date interval=day; run;

Maximum Likelihood Estimation

Parameter

Estimate Standard Error t Value

ApproxPr > |t|

Lag Variable Shift

MU 6127.4 259.43918 23.62 <.0001 0 DEMAND 0

MA1,1 -0.25704 0.05444 -4.72 <.0001 7 DEMAND 0

MA1,2 -0.10821 0.05420 -2.00 0.0459 14 DEMAND 0

AR1,1 0.76271 0.03535 21.57 <.0001 1 DEMAND 0

NUM1 27.89783 3.15904 8.83 <.0001 0 TEMP 0

NUM2 0.54698 0.10056 5.44 <.0001 0 TEMPSQ 0

NUM3 626.08113 104.48069 5.99 <.0001 0 CLASS 0

NUM4 3258.1 105.73971 30.81 <.0001 0 WORK 0

NUM5 -757.90108 181.28967 -4.18 <.0001 0 S 0

NUM6 -506.31892 184.50221 -2.74 0.0061 0 C 0

NUM7 -3473.8 334.16645 -10.40 <.0001 0 hangover 0

NUM8 2007.1 331.77424 6.05 <.0001 0 graduation 0

NUM9 -1702.8 333.79141 -5.10 <.0001 0 storm 0

50

Page 51:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Model looks fine.

Comparison:

AUTOREG - regression with AR(p) errors versusARIMA – regression with differencing, ARMA(p,q) errors.

51

Workday = Non-workday + 3258Jan 2 = Workday – 3473 = Non-workday + 3258 – 3473Jan 2 is like a non-workday

Class day = Non-workday + 3258+626

Page 52:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

SEASONALITY

Many economic and environmental series show seasonality.

(1) Very regular (“deterministic”) or(2) Slowly changing (“stochastic”)

Example 7: NC accident reports involving deer. Method 1: regression

PROC REG data=deer; MODEL deer=date X11; run; (X11: 1 in Nov, 0 otherwise)

Parameter StandardVariable DF Estimate Error t Value Pr > |t|Intercept 1 1181.09091 78.26421 15.09 <.0001X11 1 2578.50909 271.11519 9.51 <.0001

52

Page 53:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Looks like December and October need dummies too! PROC REG data=deer; MODEL deer=date X10 X11 X12; run; Parameter StandardVariable DF Estimate Error t Value Pr > |t|Intercept 1 929.40000 39.13997 23.75 <.0001X10 1 1391.20000 123.77145 11.24 <.0001X11 1 2830.20000 123.77145 22.87 <.0001X12 1 1377.40000 123.77145 11.13 <.0001

Average of Jan through Sept. is 929 crashes per month. Add 1391 in October, 2830 in November, 1377 in December.

53

Page 54:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Try dummies for all but one month (need “average of rest” so must leave out at least one)

PROC REG data=deer; MODEL deer=X1-X11; OUTPUT out=out1 predicted=p residual=r; run; Parameter StandardVariable DF Estimate Error t Value Pr > |t|Intercept 1 2306.80000 81.42548 28.33 <.0001X1 1 -885.80000 115.15301 -7.69 <.0001X2 1 -1181.40000 115.15301 -10.26 <.0001X3 1 -1220.20000 115.15301 -10.60 <.0001X4 1 -1486.80000 115.15301 -12.91 <.0001X5 1 -1526.80000 115.15301 -13.26 <.0001X6 1 -1433.00000 115.15301 -12.44 <.0001X7 1 -1559.20000 115.15301 -13.54 <.0001X8 1 -1646.20000 115.15301 -14.30 <.0001X9 1 -1457.20000 115.15301 -12.65 <.0001X10 1 13.80000 115.15301 0.12 0.9051X11 1 1452.80000 115.15301 12.62 <.0001

Average of rest is just December mean 2307. Subtract 886 in January, add 1452 in November.

54

Page 55:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

October (X10) is not significantly different than December.

Residuals for Deer Crash data:

Looks like a trend – add trend (date):

PROC REG data=deer; MODEL deer=date X1-X11; OUTPUT out=out1 predicted=p residual=r; run; Parameter StandardVariable DF Estimate Error t Value Pr > |t|Intercept 1 -1439.94000 547.36656 -2.63 0.0115X1 1 -811.13686 82.83115 -9.79 <.0001X2 1 -1113.66253 82.70543 -13.47 <.0001

55

Page 56:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

X3 1 -1158.76265 82.60154 -14.03 <.0001X4 1 -1432.28832 82.49890 -17.36 <.0001X5 1 -1478.99057 82.41114 -17.95 <.0001X6 1 -1392.11624 82.33246 -16.91 <.0001X7 1 -1525.01849 82.26796 -18.54 <.0001X8 1 -1618.94416 82.21337 -19.69 <.0001X9 1 -1436.86982 82.17106 -17.49 <.0001X10 1 27.42792 82.14183 0.33 0.7399X11 1 1459.50226 82.12374 17.77 <.0001date 1 0.22341 0.03245 6.88 <.0001

Trend is 0.22 more accidents per day (1 per 5 days) and is significantly different from 0.

What about autocorrelation?

Method 2: PROC AUTOREG

PROC AUTOREG data=deer; MODEL deer=date X1-X11/nlag=13 backstep; run;

Backward Elimination of Autoregressive Terms

Lag Estimate t Value Pr > |t| 6 -0.003105 -0.02 0.9878 11 0.023583 0.12 0.9029 4 -0.032219 -0.17 0.8641 9 -0.074854 -0.42 0.6796 5 0.064228 0.44 0.6610

56

Page 57:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

13 -0.081846 -0.54 0.5955 12 0.076075 0.56 0.5763 8 -0.117946 -0.81 0.4205 10 -0.127661 -0.95 0.3489 7 0.153680 1.18 0.2458 2 0.254137 1.57 0.1228 3 -0.178895 -1.37 0.1781

Preliminary MSE 10421.3

Estimates of Autoregressive Parameters

Standard Lag Coefficient Error t Value 1 -0.459187 0.130979 -3.51

Parameter Estimates

Standard ApproxVariable DF Estimate Error t Value Pr > |t|Intercept 1 -1631 857.3872 -1.90 0.0634date 1 0.2346 0.0512 4.58 <.0001X1 1 -789.7592 64.3967 -12.26 <.0001X2 1 -1100 74.9041 -14.68 <.0001X3 1 -1149 79.0160 -14.54 <.0001X4 1 -1424 80.6705 -17.65 <.0001X5 1 -1472 81.2707 -18.11 <.0001X6 1 -1386 81.3255 -17.04 <.0001X7 1 -1519 80.9631 -18.76 <.0001X8 1 -1614 79.9970 -20.17 <.0001

57

Page 58:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

X9 1 -1432 77.8118 -18.40 <.0001X10 1 31.3894 72.8112 0.43 0.6684X11 1 1462 60.4124 24.20 <.0001

Method 3: PROC ARIMA

PROC ARIMA plots=(forecast(forecast)); IDENTIFY var=deer crosscor= (date X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11); ESTIMATE p=1 ML input= (date X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11); FORECAST lead=12 id=date interval=month; run;

Maximum Likelihood Estimation

Standard ApproxParameter Estimate Error t Value Pr>|t| Lag Variable

MU -1640.9 877.41683 -1.87 0.0615 0 deer AR1,1 0.47212 0.13238 3.57 0.0004 1 deer NUM1 0.23514 0.05244 4.48 <.0001 0 date NUM2 -789.10728 64.25814 -12.28 <.0001 0 X1 NUM3 -1099.3 74.93984 -14.67 <.0001 0 X2 NUM4 -1148.2 79.17135 -14.50 <.0001 0 X3 NUM5 -1423.7 80.90397 -17.60 <.0001 0 X4 NUM6 -1471.6 81.54553 -18.05 <.0001 0 X5 NUM7 -1385.4 81.60464 -16.98 <.0001 0 X6 NUM8 -1518.9 81.20724 -18.70 <.0001 0 X7 NUM9 -1613.4 80.15788 -20.13 <.0001 0 X8 NUM10 -1432.0 77.82871 -18.40 <.0001 0 X9 NUM11 31.46310 72.61873 0.43 0.6648 0 X10 NUM12 1462.1 59.99732 24.37 <.0001 0 X11

58

Page 59:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Autocorrelation Check of Residuals

To Chi- Pr >Lag Square DF ChiSq ------Autocorrelations-------

6 5.57 5 0.3504 0.042 -0.175 0.146 0.178 0.001 -0.009 12 10.41 11 0.4938 -0.157 -0.017 0.102 0.115 -0.055 -0.120 18 22.18 17 0.1778 0.158 0.147 -0.183 -0.160 0.189 -0.008 24 32.55 23 0.0893 -0.133 0.013 -0.095 0.005 0.101 -0.257

Autoregressive Factors Factor 1: 1 - 0.47212 B**(1)

Method 4: Differencing

Compute and model Dt = Yt-Yt-12

Removes seasonality Removes linear trend

Use (at least) q=(12) et-et-12 (A) if near 1, you’ve overdifferenced

(B) if 0<<1 this is seasonal exponential smoothing model.

Y t−Y t−12=et−θet−12

e t=Y t−Y t−12+θ (e t−12 )=Y t−Y t−12+θ(Y t−12−Y t−24+θet−24 )

Y t=(1−θ )[Y t−12+θY t−24+θ2Y t−36+θ3 Y t−48+⋯]+et

59

Page 60:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Forecast is a weighted (exponentially smoothed) average of past values:

Y t=(1−θ )[Y t−12+θY t−24+θ2Y t−36+θ3Y t−48+⋯]

IDENTIFY var=deer(12) nlag=25; ESTIMATE P=1 Q=(12) ml; run; Maximum Likelihood Estimation

Standard ApproxParameter Estimate Error t Value Pr>|t| Lag MU 85.73868 16.78380 5.11 <.0001 0 MA1,1 0.89728 0.94619 0.95 0.3430 12 AR1,1 0.46842 0.11771 3.98 <.0001 1

Autocorrelation Check of Residuals To Chi- Pr >Lag Square DF ChiSq ------------Autocorrelations------------ 6 4.31 4 0.3660 0.053 -0.161 0.140 0.178 -0.026 -0.013 12 7.47 10 0.6801 -0.146 -0.020 0.105 0.131 -0.035 -0.029 18 18.02 16 0.3226 0.198 0.167 -0.143 -0.154 0.183 0.002 24 24.23 22 0.3355 -0.127 0.032 -0.083 0.022 0.134 -0.155

Lag 12 MA 0.897 somewhat close to 1 with large standard error, model OK but not best. Variance

60

Page 61:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

estimate 15,122 (vs. 13,431 for dummy variable model).

Forecasts are similar 2 years out.

61

Page 62:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

OPTIONAL (time permitting) Trend BreaksAccounting for changes in trend

Example 7: Nenana Ice Classic data (trend break)Exact time (day and time) of thaw of the Tanana river in Nenana Alaska:

1917 Apr 30 11:30 a.m. 1918 May 11 9:33 a.m. 1919 May 3 2:33 p.m. (more data) 2010 Apr 29 6:22 p.m.2011 May 04 4:24 p.m.2012 Apr 23 7:39 p.m.

62

Page 63:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

When the tripod moves downstream, that is the unofficial start of spring.

Get “ramp” with PROC NLIN ____/ X= 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 8 …

PROC NLIN data=all; 63

Page 64:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

PARMS point=1960 int=126 slope=-.2; X = (year-point)*(year>point); MODEL break = int + slope*X; OUTPUT out=out2 predicted=p residual=r;

Approx Approximate 95% ConfidenceParameter Estimate Std Error Limits

point 1965.4 11.2570 1943.0 1987.7int 126.0 0.7861 124.5 127.6 slope -0.1593 0.0592 -0.2769 -0.0418

PROC SGPLOT data=out2; SERIES Y=break X=year; SERIES Y=p X=year/ lineattrs = (color=red thickness=2); REFLINE 1965.4 / axis=X; run;quit;

What about autocorrelation?64

Page 65:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Final ramp: Xt = (year-1965.4)*(year>1965);

PROC ARIMA; IDENTIFY var=break crosscor=(ramp) noprint; ESTIMATE input=(ramp); FORECAST lead=5 id=year out=out1; run;

PROC ARIMA generates diagnostic plots:

Autocorrelation Check of Residuals

To Chi- Pr > Lag Square DF ChiSq ---------------Autocorrelations---------------

6 6.93 6 0.3275 -0.025 0.067 -0.027 -0.032 -0.152 -0.192 12 14.28 12 0.2834 0.086 -0.104 -0.074 0.184 -0.041 -0.091 18 20.30 18 0.3164 -0.086 0.111 0.041 -0.073 0.142 -0.066

65

Page 66:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

24 21.37 24 0.6166 -0.015 -0.020 -0.036 -0.059 -0.047 -0.030

Optional: more on seasonal, time permitting:

Example 9: Visitors to St. Petersburg/Clearwaterhttp://www.pinellascvb.com/statistics/2013-04-VisitorProfile.pdf

66

Page 67:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Model 1: Seasonal dummy variables + trend + AR(p) (REG - R2>97%)

PROC REG data=aaem.stpete; MODEL visitors=t m1-m11; OUTPUT out=out4 predicted = P residual=R UCL=u95 LCL=l95; run;

PROC SGPLOT data=out3; BAND lower=l95 upper=u95 X=date; SERIES Y=P X=date; SCATTER Y=visitors X=date / datalabel=month datalabelattrs=(color=red size=0.3 cm); SERIES Y=U95 X=date/ lineattrs=(color=red thickness=0.8); SERIES Y=L95 X=date/ lineattrs=(color=red thickness=0.8); REFLINE "01apr2013"d / axis=x; where 2011<year(date)<2015; run;

67

Page 68:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

PROC SGPLOT data=out4; NEEDLE Y=r X=date; run;

68

Page 69:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Definitely autocorrelatedSlowly changing mean?

Try seasonal span difference model …

PROC ARIMA data=StPete plots=forecast(forecast);IDENTIFY var=visitors(12);

69

Page 70:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Typical ACF for ARMA(1,1) has initial dropoff followed by exponential decay – try ARMA(1,1) on span 12 differences.

ESTIMATE P=1 Q=1 ml; FORECAST lead=44 id=date interval = month

out=outARIMA; run;

Standard Approx Parameter Estimate Error t Value Pr > |t| Lag

MU 0.0058962 0.0029586 1.99 0.0463 0 MA1,1 0.32635 0.11374 2.87 0.0041 1 AR1,1 0.79326 0.07210 11.00 <.0001 1

Autocorrelation Check of Residuals To Chi- Pr >Lag Square DF ChiSq ----------------Autocorrelations----------------

6 1.81 4 0.7699 0.022 -0.055 -0.000 0.011 -0.028 -0.071 12 14.87 10 0.1367 0.109 0.044 0.003 0.060 0.211 -0.064 18 16.41 16 0.4248 0.010 -0.012 0.036 0.014 -0.066 0.038 24 17.30 22 0.7468 -0.056 0.020 0.004 0.018 -0.009 -0.017 30 34.15 28 0.1961 -0.131 -0.052 -0.144 0.134 0.026 -0.133 36 45.08 34 0.0969 -0.102 0.051 0.135 -0.107 0.014 -0.074

70

Page 71:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

PROC SGPLOT data=outARIMA; ; SERIES X=date Y=residual; run;

71

Page 72:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

We have seen deterministic (dummy variables) and dynamic (seasonal differences) models for St Pete visitors. How do decide between them?

Seasonal unit root test!

Optional: Seasonal Multiplicative Model.

Accommodates slowly changing seasonal patterns.

Contrast - Indicator (dummy) variables assume same effect for every January (etc.) no matter what year.

Simplest model – seasonal random walk. YtYt-s+et YtYt-s=et

(1s)Yt=et

Seasonal AR: YtYt-s+et

YtYt-s=(-1)(Yt-set 72

Page 73:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Idea: Regress YtYt-s on (Yt-sto estimate and get a t test. Distribution of t is far from standard normal.

Modification: replace Yt with deviations yt from seasonal means. Visitors data: Estimate of : 1-0.1556=0.8444

Parameter StandardVariable DF Estimate Error t Value Pr>|t|Intercept 1 5772.30818 1099.20049 5.25 <.0001Y_12 1 -0.15560 0.03191 -4.88 <.0001 XXIs t significant? What is the (null) distribution?

73

Page 74:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Empirical (left) and N(0,1) (right) densities with means. Area to left of leftmost line (=-4.50, see next model) is 0.4061. Critical value for 5% is -5.84, median is -4.27 (Dickey, Hasza & Fuller, 1984, table 7). DHF

Conclusion: Insufficient evidence against seasonal dynamic model (seasonal unit roots).

What if it’s seasonal with more lags? Multiplicative seasonal model is popular.

(1Bs)(11B2B2…pBp)(Yt-s)=et

s=seasonal meanStep 1: Regress Dt=YtYt-s on Dt-1, Dt-2,…Dt-p to estimate ’s. Residuals are et. Using these estimates, estimate filtered lags

Ft-s= (11B2B2…pBp)(Yt-s-s)(note t-s subscript).

74

Page 75:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Step 2: Regress Dt on Ft-s, Dt-1, Dt-2, …, Dt-p to get test statistic and improvements for q estimates. For visitor data with p=2:

Parameter Estimates

Parameter StandardVariable DF Estimate Error t Value Pr > |t|

Intercept 1 -137.31542 978.60547 -0.14 0.8886Filter 1 -0.18020 0.04007 -4.50 <.0001D1 1 0.01129 0.07204 0.16 0.8756D2 1 0.00207 0.07218 0.03 0.9771

From normal curve graph above, p-value is about 0.4061, not <.0001

Optional (time permitting) Cointegration

Two unit root processes Xt and Yt (nonstationary) are said to be cointegrated if there is a linear combination of them St=aXt+bYt such that St is stationary.

75

Page 76:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Usually a and b are unknown but sometimes the situation suggests values and in that case all that is needed to show Cointegration is a unit root test on St.

Cointegration analogy:Drunk man walking a dog

Man -> unit root processDog-> unit root process

Distance from man to dog stationary!

Red path = man Blue path = dog

76

Page 77:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

Example 10: T-bill rates for two maturities

Data:10 year t-bill yields30 year t-bill yields

(in logarithms)

Unit root tests show

(1) Both series are nonstationary (can’t reject unit roots using trend tests)

77

Page 78:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

(2) Neither requires more than 1 difference

Let St = log(30 year) – log(10 year) = log(30 year rate / 10 year rate)

Graph of St

Unit root test on St Augmented Dickey-Fuller Unit Root TestsType Lags Rho Pr < Rho Tau Pr < TauZero Mean 0 0.0028 0.6832 0.01 0.6844 1 0.0243 0.6882 0.05 0.6991 2 0.0324 0.6900 0.07 0.7060 3 0.0204 0.6872 0.04 0.6970Single Mean 0 -27.6337 0.0017 -3.69 0.0048 1 -24.9002 0.0033 -3.42 0.0113 2 -22.8252 0.0054 -3.19 0.0216

78

Page 79:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

3 -26.3394 0.0024 -3.36 0.0136 Trend 0 -28.5986 0.0100 -3.71 0.0227 1 -25.8595 0.0186 -3.44 0.0486 2 -23.7128 0.0298 -3.20 0.0866 3 -27.6531 0.0124 -3.37 0.0574

Final model: log(30 yr. rate/10 yr. rate) is stationary with estimated mean 0.2805 and autoregressive order 1 structure with =0.92

Summary: Use ACF, PACF to identify p=# autoregressive lags and q= # moving average lags.

Stationarity – mean reverting models versus unit roots (random walk type models). Graphics and DF test (and others) available.

Diagnostics – errors should be white noise – Ljung Box test to check.

Regression with autoregressive or ARMA errors

Nonlinear regression – to estimate slope changes 79

Page 80:  · Web viewDavid A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand

(least squares).

Seasonal models – dynamic or deterministic.

References for unit roots:

Chang, M. C. and D. A. Dickey , (1993) "Recognizing Overdifferenced Time Series," Journal of Time Series Analysis , 15, 1-8.

Dickey, D. A. and W. A. Fuller (1979). “Distribution of the Estimators for Autoregressive Time Series with a Unit Root”. Journal of the American Statistical Association, 74, p. 427-431.

Dickey, D. A. and W. A. Fuller (1981). “Likelihood Ratio Statistics for Autoregressive Time Series with a unit Root”. Econometrica 49, 1057-1072.

Dickey, D. A., D. P. Hasza, and W. A. Fuller (1984). “Testing for Unit Roots in Seasonal Time Series”, Journal of the American Statistical Association, 79, 355-367.

Said, S. E. and D. A. Dickey (1984). “Testing for Unit Roots in Autoregressive-Moving Average Models of Unknown Order”, Biometrika, 71, 599-607.

Said, S. E. and D. A. Dickey (1985). “Hypothesis Testing in ARIMA (p, 1, q) Models”, Journal of the American Statistical Association, 80, 369-374.

Dickey, D. A., W.R. Bell and R. B. Miller (1986). “Unit Roots in Time Series Models: Tests and Implications”, American Statistician 40, 12-26.

80