Chapter 1 - Chris Bilder · Web view2 2 Spring 1990 19128 1990-02-01 3 3 Summer 1990 7553...

1. Characteristics of Time Series1.1 Introduction

We are going to examine data that has been observed over time. Typically, there is a correlation between the observed data which limits our ability to use “conventional” statistical analysis methods.

Remember that many statistical applications rely on having observations that are independent.

In this class, we are going to learn how to identify this correlation and use it to help construct models. The models are then used to forecast “future” observations. These type of analyses fall under the title “time series analysis”. For most of our course, we will focus on modeling ONE series of observations without any explanatory variables. A few sections in Chapter 5 are exceptions where we will incorporate explanatory variables.

Read Shumway and Stoffer’s introduction to the methods discussed in this book.

2011 Christopher R. Bilder

1.1

1.2 The Nature of Time Series Data

Example: OSU enrollment data (osu_enroll.R, osu_enroll.xls)

Partial listing of the data:

t Semester Year Enrollment1 Fall 1989 20,1102 Spring 1990 19,1283 Summer 1990 7,5534 Fall 1990 19,5915 Spring 1991 18,3616 Summer 1991 6,702 32 Spring 2000 19,83533 Summer 2000 7,20234 Fall 2000 21,25235 Spring 2001 20,00436 Summer 2001 7,55837 Fall 2001 21,87238 Spring 2002 20,99239 Summer 2002 7,86840 Fall 2002 22,992

> library(RODBC)> z<-odbcConnectExcel("C:\\chris\\UNL\\STAT_time_series \\chapter1\\osu_enroll.xls")> osu.enroll<-sqlFetch(z, "Sheet1")> close(z)


1.2

> head(osu.enroll) t Semester Year Enrollment date1 1 Fall 1989 20110 1989-08-312 2 Spring 1990 19128 1990-02-013 3 Summer 1990 7553 1990-06-014 4 Fall 1990 19591 1990-08-315 5 Spring 1991 18361 1991-02-016 6 Summer 1991 6702 1991-06-01

> tail(osu.enroll) t Semester Year Enrollment date35 35 Spring 2001 20004 2001-02-0136 36 Summer 2001 7558 2001-06-0137 37 Fall 2001 21872 2001-08-3138 38 Spring 2002 20922 2002-02-0139 39 Summer 2002 7868 2002-06-0140 40 Fall 2002 22992 2002-08-31

> #One way to do plot> win.graph(width = 8, height = 6, pointsize = 10) > plot(x = osu.enroll$Enrollment, ylab = "OSU Enrollment", xlab = "t (time)", type = "l", col = "red", main = "OSU Enrollment from Fall 1989 to Fall 2002", panel.first=grid(col = "gray", lty = "dotted"))> points(x = osu.enroll$Enrollment, pch = 20, col = "blue")


1.3

0 10 20 30 40

1000

015

000

2000

0

OSU Enrollment from Fall 1989 to Fall 2002

t (time)

OS

U E

nrol

lmen

t

We will often use “t” to represent time so that we can say x1 = 20,110, x2 = 19,128, …, x40 = 22,992.

When only “x” is specified in the plot() function, R puts this on the y-axis and uses the observation number on the x-axis. Compare this to the next plot below where both “x” and “y” options are specified. > #More complicated plot> plot(y = osu.enroll[osu.enroll$Semester == "Fall",]$Enrollment, x = osu.enroll[osu.enroll$Semester == "Fall",]$t, ylab = "OSU Enrollment", xlab = "t (time)", col = “blue", main = "OSU Enrollment from Fall 1989 to Fall


1.4

2002", panel.first=grid(col = "gray", lty = dotted"), pch = 1, type = "o", ylim = c(0, max(osu.enroll$Enrollment)))> lines(y = osu.enroll[osu.enroll$Semester == "Spring",]$Enrollment,

x = osu.enroll[osu.enroll$Semester == "Spring",]$t, col = "red", type = "o", pch = 2)> lines(y = osu.enroll[osu.enroll$Semester == "Summer",]$Enrollment, x = osu.enroll[osu.enroll$Semester == "Spring",]$t, col = "darkgreen", type = "o", pch = 3)> legend(x = locator(1),legend=c("Fall", "Spring", "Summer"), pch = c(1,2,3), lty = c(1,1,1), col=c("blue", "red", "darkgreen"), bty="n")

0 10 20 30 40

050

0010

000

1500

020

000


t (time)

OS

U E

nrol

lmen

t FallSpringSummer

> #Another way to do plot with actual dates> plot(y = osu.enroll$Enrollment, x = as.Date(osu.enroll$date), xlab = "Time", type = "l",


1.5

col = "red", main = "OSU Enrollment from Fall 1989 to Fall 2002", ylab = “OSU Enrollment”)> points(y = osu.enroll$Enrollment, x = as.Date(osu.enroll$date), pch = 20, col = "blue")> #Create own gridlines> abline(v = as.Date(c("1990/1/1", "1992/1/1", "1994/1/1", "1996/1/1", "1998/1/1", "2000/1/1", "2002/1/1")), lty = "dotted", col = "lightgray")> abline(h = c(10000, 15000, 20000), lty = "dotted", col = "lightgray")> #There may be better ways to work with actual dates.

1000

015

000

2000

0


Time

OS

U E

nrol

lmen

t

1990 1992 1994 1996 1998 2000 2002

Questions of interest:1) What patterns are there over time?


1.6

2) How can the correlation between observations be used to help model the data?

3) Can future enrollment be predicted using this data?4) Most of the time, we will only use past values in the

series to predict future values. However, in this case, what explanatory variables (independent variables, covariates) may be useful to use to predict enrollment?

5) Why is modeling enrollment and predicting future enrollment important?

5-8-01 O’Collegian article: “$1.8 million loss attributed to slight enrollment decline”


1.7


1.8

Example: Russell 3000 Index (russell_3000.R, russell.xls)

Source: www.russell.com.

The index “measures the performance of the 3,000 largest United States companies based on total market capitalization, which represents approximately 98% of the investable United States equity market.”

> library(RODBC)> z<-odbcConnectExcel("C:\\chris\\UNL\\STAT_time_series \\chapter1\\russell.xls")> russell<-sqlFetch(z, "Sheet1")> close(z)

> head(russell) Index Name Date Value Without Dividends Value With Dividends1 Russell 3000® Index 1995-06-01 555.15 1034.422 Russell 3000® Index 1995-06-02 555.15 1034.563 Russell 3000® Index 1995-06-05 558.72 1041.214 Russell 3000® Index 1995-06-06 558.50 1041.045 Russell 3000® Index 1995-06-07 556.45 1037.216 Russell 3000® Index 1995-06-08 555.83 1036.18> tail(russell) Index Name Date Value Without Dividends Value With Dividends674 Russell 3000® Index 1997-12-23 965.71 1891.83675 Russell 3000® Index 1997-12-24 960.20 1881.05676 Russell 3000® Index 1997-12-26 963.43 1887.42677 Russell 3000® Index 1997-12-29 979.26 1919.07678 Russell 3000® Index 1997-12-30 996.66 1953.32679 Russell 3000® Index 1997-12-31 998.26 1956.51

> #One way to do plot> win.graph(width = 8, height = 6, pointsize = 10) > plot(x = russell$"Value Without Dividends", ylab = "Russell 3000 Index", xlab = "t (time)", type = "l", col = "red", main = "Russell 3000 Index from 6/1/1995 to 12/31/1997", panel.first=grid(col = "gray", lty = "dotted"))> points(x = russell$"Value Without Dividends", pch = 20, col = "blue")


1.9

http://www.russell.com/

0 100 200 300 400 500 600 700

600

700

800

900

1000

Russell 3000 Index from 6/1/1995 to 12/31/1997

t (time)

Rus

sell

3000

Inde

x

> #Another way to do plot with actual dates> plot(y = russell$"Value Without Dividends", x = as.Date(russell$Date), xlab = "Time", type = "l", col = "red", main = "Russell 3000 Index from 6/1/1995 to 12/31/1997", ylab = "Russell 3000 Index", xaxt = "n")> axis.Date(side = 1, at = seq(from = as.Date("1995/6/1"), to = as.Date("1997/12/31"), by = "months"), labels = format(x = seq(from = as.Date("1995/6/1"), to = as.Date("1997/12/31"), by = "months"), format = "%b%y"), las = 2) #las changes orientation of labels> points(y = russell$"Value Without Dividends", x = as.Date(russell$Date), pch = 20, col = "blue")> #Create own gridlines> abline(v = as.Date(c("1995/7/1", "1996/1/1", "1996/7/1", "1997/1/1", "1997/7/1")), lty = "dotted", col = "lightgray")> abline(h = seq(from = 600, to = 1000, by = 100), lty =


1.10

"dotted", col = "lightgray")

600

700

800

900

1000

Russell 3000 Index from 6/1/1995 to 12/31/1997

Time

Rus

sell

3000

Inde

x

Jun9

5Ju

l95

Aug

95S

ep95

Oct

95N

ov95

Dec

95Ja

n96

Feb9

6M

ar96

Apr

96M

ay96

Jun9

6Ju

l96

Aug

96S

ep96

Oct

96N

ov96

Dec

96Ja

n97

Feb9

7M

ar97

Apr

97M

ay97

Jun9

7Ju

l97

Aug

97S

ep97

Oct

97N

ov97

Dec

97Questions of interest:1) What patterns are there over time?2) How can the correlation between observations be

used to help model the data?3) Can future index values be predicted using this data? 4) Why would modeling the Russell 3000 Index and

predicting future values be important?


1.11

Example: Sunspots (sunspots.R, sunspots.csv)

Number of sunspots per year on the sun from 1784-1983.

> sunspots.data<-read.table(file = "C:\\chris\\UNL\\ STAT_time_series\\chapter1\\sunspots.csv", header=TRUE, sep = ",")> head(sunspots.data) Year Sunspots1 1784 10.22 1785 24.13 1786 82.94 1787 132.05 1788 130.96 1789 118.1

> tail(sunspots.data) Year Sunspots195 1978 92.50196 1979 155.40197 1980 32.27198 1981 54.25199 1982 59.65200 1983 63.62 > win.graph(width = 8, height = 6, pointsize = 10)> plot(x = sunspots.data$Sunspots, ylab = "Number of sunspots", xlab = "t (time)", type = "l", col = "red", main = "Sunspots per year from 1784 to 1983", panel.first=grid(col = "gray", lty = "dotted"))> points(x = sunspots.data$Sunspots, pch = 20, col = "blue")


1.12

0 50 100 150 200

050

100

150

Sunspots per year from 1784 to 1983

t (time)

Num

ber o

f sun

spot

s

> plot(y = sunspots.data$Sunspots, x = sunspots.data$Year, ylab = "Number of sunspots", xlab = "Year", type = "l", col = "red", main = "Sunspots per year from 1784 to 1983", panel.first=grid(col = "gray", lty = "dotted"))> points(y = sunspots.data$Sunspots, x = sunspots.data$Year, pch = 20, col = "blue")


1.13

1800 1850 1900 1950

050

100

150


Year

Num

ber o

f sun

spot

s

> #Convert to an object of class "ts"> x<-ts(sunspots.data$Sunspots, start = 1784, frequency = 1)> class(x)[1] "ts"> class(sunspots.data$Sunspots)[1] "numeric"> xTime Series:Start = 1784 End = 1983 Frequency = 1 [1] 10.20 24.10 82.90 132.00 130.90 118.10 89.90 66.60 60.00 46.90 41.00 21.30 16.00 6.40 4.10


1.14

[16] 6.80 14.50 34.00 45.00 43.10 47.50 42.20 28.10 10.10 8.10 2.50 0.00 1.40 5.00 12.20

EDITED [196] 155.40 32.27 54.25 59.65 63.62> plot.ts(x = x, ylab = expression(paste(x[t], " (Number of sunspots)")), xlab = "t (year)", type = "o", col = "red", main = "Sunspots per year from 1784 to 1983")


t (year)

x t (N

umbe

r of s

unsp

ots)

1800 1850 1900 1950

050

100

150

Notes: The sunspot values are not necessarily integers. Every object in R has a class. For time series data, it

is sometimes useful to use a “ts” class type with it.


1.15

Questions of interest:1) What patterns are there over time?2) How can the correlation between observations be

used to help model the data?3) Can future sunspots be predicted using this data? 4) Why would modeling the number of sunspots and

predicting future values be important?

See Shumway and Stoffer for more examples!


1.16

Chris Bilder, 01/03/-1,

Approximately a 10-12 year cycle between peaks (highs) and valleys (lows).

1.3 Time Series Statistical Models

Stochastic process – a collection of random variables {Xt} indexed by t

Time series – collection of random variables indexed according to the order they are obtained in time.

Let Xt be the random variable at time t

Then X1 = random variable at time 1X2 = random variable at time 2

A realization of the stochastic process is the observed values

The observed values are denoted by x1, x2, … .

Notice that lowercase letters are used to denote the observed value of the random variables.

NOTE: Shumway and Stoffer say the following:

Because it will be clear from the context of our discussions, we will use the term time series whether we are referring to the process or to a particular realization and make no notational distinction between the two concepts.


1.17

What does this mean? There will be no notational differentiation made between the random variables and their observed values. Shumway and Stoffer will typically use a lowercase letter – xt.

Example: White noise (white_noise.R)

The simplest kind of time series is a collection of independent and identically distributed random variables with mean 0 and constant variance.

This can be written as wt ~ independent (0, ) for t=1,…,n.

Most often, the probability distribution is assumed to be a normal probability distribution.

This can be written as wt ~ indepedent N(0, ) for t=1,…,n.

What does this mean? Each wt has a normal distribution with mean of 0 and a

constant variance. w1, w2, …, wn are independent of each other

Given this set up, answer the following questions: What patterns are there over time (t)?


1.18

Chris Bilder, 12/18/06,

engineering term - see Shumway and Stoffer description

Chris Bilder, 01/03/-1,

This occurs in many other time series books

How can the correlation between observations be used to help model the data?

How can we “simulate” a white noise process using R?

Since each random variable is independent, we could simulate 100 observations from a normal distribution. I am going to use = 1 here.

> set.seed(8128)> w<-rnorm(n = 100, mean = 0, sd = 1)> head(w)[1] -0.10528941 0.25548490 0.82065388 0.04070997 -0.66722880 -1.54502793

> #Using plot.ts() which is set up for time series plots> win.graph(width = 6, height = 6, pointsize = 10)> plot.ts(x = w, ylab = expression(w[t]), xlab = "t", type = "o", col = "red", main = expression(paste("White noise where ", w[t], " ~ ind. N(0, 1)")), panel.first=grid(col = "gray", lty = "dotted"))


1.19

White noise where wt ~ ind. N(0, 1)

t

wt

0 20 40 60 80 100

-2-1

01

2

> #Advantage of second plot is separate control over color of points> plot(x = w, ylab = expression(w[t]), xlab = "t", type = "l", col = "red", main = expression(paste("White noise where ", w[t], " ~ ind. N(0, 1)")), panel.first = grid(col = "gray", lty = "dotted"))> points(x = w, pch = 20, col = "blue")


1.20

0 20 40 60 80 100

-2-1

01

2


t

wt

Given this data set, answer the following questions: What patterns are there over time (t)? How can the correlation between observations be used

to help model the data?

Suppose another white noise process is simulated. To create a plot overlaying the two time series, use the code below.

> set.seed(1298)> w.new<-rnorm(n = 100, mean = 0, sd = 1)> head(w.new)


1.21

[1] 1.08820292 -1.46217413 -1.10887422 0.55156914 0.70582813 0.05079594 > plot(x = w, ylab = expression(w[t]), xlab = "t", type = "l", col = "red", main = expression(paste("White noise where ", w[t], " ~ ind. N(0, 1)")), panel.first = grid(col = "gray", lty = "dotted"), c(min(w.new, w), max(w.new, w)))> points(x = w, pch = 20, col = "blue")> lines(x = w.new, col = "green")> points(x = w.new, pch = 20,col = "orange")> legend(x = locator(1),legend=c("Time series 1", "Time Series 2"), lty=c(1,1), col=c("red", "green"), bty="n")

0 20 40 60 80 100

-2-1

01

2


t

wt

Time series 1Time Series 2


1.22

> win.graph(width = 8, height = 6, pointsize = 10) > par(mfrow = c(2,1))> plot(x = w, ylab = expression(w[t]), xlab = "t", type = "l", col = c("red"), main = expression(paste("White noise where ", w[t], "~N(0, 1)")), panel.first = grid(col = "gray", lty = "dotted"))> points(x = w, pch = 20, col = "blue")> plot(x = w.new, ylab = expression(w.new[t]), xlab = "t", type = "l", col = c("green"), main = expression(paste("White noise where ", w[t], " ~ ind. N(0, 1)")), panel.first=grid(col = "gray", lty = "dotted"))> points(x = w.new, pch = 20, col = "orange")

0 20 40 60 80 100

-2-1

01

2


t

wt

0 20 40 60 80 100

-2-1

01

2


t

w.n

ewt


1.23

Example: Moving average of white noise (moving_average.R)

The previous time series had no correlation between the observations. One way to induce correlation is to create a “moving average” of the observations. This will have an effect of “smoothing” the series.

Let mt = (wt + wt-1 + wt-2)/3

Note: This is different from the example given in Shumway and Stoffer p. 13 where they find (wt+1 + wt

+ wt-1)/3.

This can be done in R using the following code:

> set.seed(8128)> w<-rnorm(n = 100, mean = 0, sd = 1)> head(w)[1] -0.10528941 0.25548490 0.82065388 0.04070997 -0.66722880 -1.54502793

> m<-filter(x = w, filter = rep(x = 1/3, times = 3), method = "convolution", sides = 1)> head(m)[1] NA NA 0.32361646 0.37228292 0.06471168 -0.72384892

> tail(m)[1] 0.3158762 -0.1803096 0.2598066 -0.6450531 -0.5879723 -0.9120182

> (w[1]+w[2]+w[3])/3[1] 0.3236165

> (w[98]+w[99]+w[100])/3 2011 Christopher R. Bilder

1.24

[1] -0.9120182

> #This is what the book does> #m<-filter(x = w, filter = rep(x = 1/3, times = 3), method = "convolution", sides = 2) > par(mfrow = c(1,1))> plot(x = m, ylab = expression(m[t]), xlab = "t", type = "l", col = c("brown"), lwd = 1, main = expression(paste("Moving average where ", m[t] == (w[t] + w[t-1] + w[t-2])/3)), panel.first=grid(col = "gray", lty = "dotted"))> points(x = m, pch = 20, col = "orange")> #NOTE: The gridlines are not located in the correct locations

Moving average where mt wt wt 1 wt 2 3

t

mt

0 20 40 60 80 100

-1.0

-0.5

0.0

0.5

1.0

1.5

Comparing mt to wt> par(mfrow = c(1,1))


1.25

> plot(x = m, ylab = expression(paste(m[t], " or ", w[t])), xlab = "t", type = "l", col = c("brown"), lwd = 4, ylim = c(max(w), min(w)), main = expression(paste("Moving average where ", m[t] == (w[t] + w[t-1] + w[t-2])/3)), panel.first=grid(col = "gray", lty = "dotted"))> points(x = m, pch = 20, col = "orange")> lines(x = w, col = "red", lty = "dotted")> points(x = w, pch = 20,col = "blue")> legend(x = locator(1), legend=c("Moving average", "White noise"), lty=c("solid", "dotted"), col = c("brown", "red"), lwd = c(4,1), bty="n")

Moving average where mt wt wt 1 wt 2 3

t

mt o

r wt

0 20 40 60 80 100

21

0-1

-2

Moving averageWhite noise

Given these observed values of mt, answer the following questions: What patterns are there over time (t)?


1.26

How can the correlation between observations be used to help model the data?

The plot below shows a 7-point moving average (see program for code).

Compare moving averages

t

mt o

r wt

0 20 40 60 80 100

21

0-1

-2

3-pt Moving averageWhite noise7-pt Moving average

Example: Autoregressions (ar1.R)

An “autoregression” model uses past observations to predict future observations in a regression model.

Suppose the autoregression model is 2011 Christopher R. Bilder

1.27

xt = 0.7xt-1 + wt where wt~independent N(0,1) for t=1,…,n.

Notice how similar this is to a regression model from STAT 870! Since there is one past period on the right hand side, this is often denoted as an AR(1) model where AR stands for “autoregressive”.

Therefore,

x2 = 0.7x1 + w2

x3 = 0.7x2 + w3

Obviously, there will be a correlation between the random variables.

Below is one way in R to simulate observations from this model.

> set.seed(6381) #Different seed from white_noise.R> w<-rnorm(n = 200, mean = 0, sd = 1)> head(w)[1] 0.06737166 -0.68095839 0.78930605 0.60049855 -1.21297680 -1.14082872

> #######################################################> # autoregression > #Simple way to simulate AR(1) data> x<-numeric(length = 200)> x.1<-0> for (i in 1:length(x)) {


1.28

x[i]<-0.7*x.1 + w[i] x.1<-x[i] } > head(cbind(x, w)) x w[1,] 0.06737166 0.06737166[2,] -0.63379823 -0.68095839[3,] 0.34564730 0.78930605[4,] 0.84245166 0.60049855[5,] -0.62326064 -1.21297680[6,] -1.57711117 -1.14082872

> #Do not use first 100> x<-x[101:200] > win.graph(width = 8, height = 6, pointsize = 10) # Opens up wider plot window than the default (good for time series plots)> plot(x = x, ylab = expression(x[t]), xlab = "t", type = "l", col = c("red"), lwd = 1 , main = expression(paste("AR(1): ", x[t] == 0.7*x[t-1] + w[t])) , panel.first=grid(col = "gray", lty = "dotted"))> points(x = x, pch = 20, col = "blue")


1.29

0 20 40 60 80 100

-4-3

-2-1

01

2

AR(1): xt 0.7xt 1 wt

t

x t

Notes: Notice the syntax of the for loop. See the the first 6 rows of x and w right after the for

loop. Make sure you understand how the data was simulated!!!

The 1st value of x is 0.06737166 = 0.70 + 0.06737166

The 2nd value of x is -0.6337982 = 0.7(0.06737166) – 0.68095839

The 3rd value of x is 0.3456473 = 0.7(-0.6337982) + 0.78930605


1.30

Why are the first 100 observations discarded? For those of you who have taken a course where AR()

structures of a covariance matrix are discussed, what do you think the approximate correlation between xt and xt-1 is?

Here is an easier way to simulate observations from an AR(1). Note that this uses an Autoregressive Integrated Moving Average (ARIMA) structure that we will discuss in Chapter 3. In this case, I use = 10.

> set.seed(7181)> x<-arima.sim(model = list(ar = c(0.7)), n = 100, rand.gen = rnorm, sd = 10)> plot(x = x, ylab = expression(x[t]), xlab = "t", type = "l", col = c("red"), lwd = 1 ,main = expression(paste("AR(1): ", x[t] == 0.7*x[t-1] + w[t])), panel.first=grid(col = "gray", lty = "dotted"))> points(x = x, pch = 20, col = "blue")


1.31

Christopher Bilder, 12/19/06,

To remove the effects of using 0 as the first value. Some people may call this a "burn-in" period - notice that arima.sim() does this automatically (see help)

AR(1): xt 0.7xt 1 wt

t

x t

0 20 40 60 80 100

-20

-10

010

2030

More notes: Both the moving average and autoregressive models will

be discussed extensively in Chapter 3. See how Shumway and Stoffer are kind of trying to match

up their simulated data plots in Section 1.3 to actual data plots in Section 1.2. They are doing this because we want to develop an equation that reasonably mimics or “models” real data.


1.32

1.4 Measures of Dependence: Autocorrelation and Cross-Correlation

We would like to understand the relationship between all random variables in a time series. In order to do that, we would need to look at the joint distribution function.

Suppose the time series consists of the random variables . Then the cumulative joint distribution function is:

F(c1, c2, …, cn) =

This can be VERY difficult to examine over the MULTIDIMENSIONS. Note the t1,…,tn subscripts are used just to denote a “general” set of times – not necessarily 1, 2, …, n.

Instead, it is often easier to look at the one or two dimensional distribution functions. The one-dimensional cumulative distributional function is denoted by Ft(x) = P(xtx) for a random variable xt at time t. The corresponding probability distribution function is

The mean value function is


1.33

Shumway and Stoffer will drop the subscript x from xt where there is no confusion about what random variable is used.

Important: The interpretation of t is that it represents the mean taken over ALL possible events that could have produced xt. Another way to think about it is suppose that

is observed an infinite number of times. Then represents the average value at time t1, represents the

average value at time t2, …

Example: Moving Average

Let mt = (wt + wt-1 + wt-2)/3 where wt~ind. N(0,1) for t=1,…,n.

Then t = E(mt)

= E[(wt + wt-1 + wt-2)/3]= (1/3) E(wt + wt-1 + wt-2)= (1/3) [E(wt) + E(wt-1) + E(wt-2)]= (1/3) [0 + 0 + 0]= 0

Example: Autoregressions 2011 Christopher R. Bilder

1.34

Let xt = 0.7xt-1 + wt where wt~ ind. N(0,1) for t = 1, …, n.

Then t = E(xt)

= E(0.7xt-1 + wt)= 0.7E(xt-1) + E(wt)= 0.7E(0.7xt-2+wt-1) + 0= 0

Autocovariance function

To assess the dependence between two random variables, we need to examine the two-dimensional cumulative distribution function. This can be denoted as F(cs, ct) = P(xscs, xtct) for two different time points s and t.

In STAT 870, you learned about the covariance function which measured the linear dependence between two random variables (see Chapter 5 of Kutner, Nachtsheim, and Neter (2004)). Since we are interested in linear dependence between two random variables in the same time series, we will examine the autocovariance function:

x(s,t) = Cov(xs, xt)= E[(xs – s)(xt – t)] for all s and t.


1.35

Christopher Bilder, 01/03/-1,

Need to understand backshift notation introduced later

where and assuming

continuous random variables

Notes: If the autocovariance is 0, there is no linear

dependence. If s=t, the autocovariance is the variance: x(t,t) = E[(xt-t)2]

Shumway and Stoffer will drop the subscript x on when there is no confusion about which time series is being discussed.

Example: White noise

Suppose wt ~ ind. N(0, ) for t=1,…,n. What is (s,t) for s=t and st?



1.36

Let mt = (wt + wt-1 + wt-2)/3 where wt ~ ind. N(0,1) for t=1,…,n.

E[(ms-s)(mt-t)] = E[msmt] since s=t=0

Then

E[msmt] = E[(ws + ws-1 + ws-2)/3 (wt + wt-1 + wt-2)/3] = (1/9)E[(ws + ws-1 + ws-2)(wt + wt-1 + wt-2)]

To find this, we need to examine a few different cases: s=t

E[mtmt] = E[ ] = Var(mt) + E(mt)2 since Var(mt) = E[ ] - E(mt)2 = (1/9){Var(wt + wt-1 + wt-2)} + 02

= (1/9){Var(wt) + Var(wt-1) + Var(wt-2)} since wt’s are independent= (1/9)(1+1+1) = 3/9

s=t-1

E[mt-1mt] = (1/9)E[(wt-1 + wt-2 + wt-3)(wt + wt-1 + wt-2)]

= (1/9)E[wt-1wt + wt-1wt-1 + wt-1wt-2 + wt-2wt + wt-2wt-1 + wt-2wt-2 + wt-3wt + wt-3wt-1 + wt-3wt-2]

= (1/9)[E(wt-1wt) + E(wt-1wt-1) + E(wt-1wt-2) + E(wt-2wt) + E(wt-2wt-1) + E(wt-2wt-2) + E(wt-3wt)


1.37

+ E(wt-3wt-1) + E(wt-3wt-2)]

= (1/9)[E(wt-1)E(wt) + E( ) + E(wt-1)E(wt-2) + E(wt-2)E(wt) + E(wt-2)E(wt-1) + E( ) + E(wt-3)E(wt) + E(wt-3)E(wt-1) + E(wt-3)E(wt-2)]

=(1/9)[00 + 1 + 00 + 00 + 00 + 1 + 00 + 00 + 00]; note that E( )=1 since Var(wt-1)=1

=2/9

s=t-2



= (1/9)[E(wt-2wt) + E(wt-2wt-1) + E(wt-2wt-2) + E(wt-3wt) + E(wt-3wt-1) + E(wt-3wt-2) + E(wt-4wt) + E(wt-4wt-1) + E(wt-4wt-2)]

= (1/9)[00 + 00 + 00 + 1 + 00 + 00 + 00 + + 00 + 00]

= 1/9

s=t-3


1.38



= (1/9)E[00 + 00 + 00 + 00 + 00 + 00 + 00 + 00 + 00] since E(wt)=0 for all t=1,…,n

= 0

Notice that s=t+1, t+2, … can be found in a similar manner. Also s=t-4, t-5,… can be found. In summary, the autocovariance function is

Notes: The word “lag” is used to denote the time separation

between two values. For example, |s-t|=1 denotes the lag is 1. |s-t|=2 denotes the lag is 2. We will use this “lag” terminology throughout this semester.

The autocovariance depends on the lag, but NOT individual times for the moving average example! This will be VERY, VERY important later!


1.39

Autocorrelation function (ACF)

In STAT 870, the Pearson correlation coefficient was defined to be:

for two variables X and Y where Cov(X,Y) denotes the covariance between X and Y and Var(X) denotes the variance of X. The reason the correlation coefficient is examined instead of the covariance is that it is always between –1 and 1. Note the following: close to 1 means strong, positive linear dependence close to –1 means strong, negative linear

dependence close to 0 means weak linear dependence.

The autocorrelation is the extension of the Pearson correlation coefficient from STAT 870 to time series analysis. Below is the autocorrelation function (ACF):


1.40

where s and t denote two time points. The ACF is also between –1 and 1 and has a similar interpretation as for correlation coefficient.

Example: White noise

Suppose wt ~ independent N(0, ) for t=1,…,n. What is (s,t) for s=t and st?


Let mt = (wt + wt-1 + wt-2)/3 where wt ~ ind. N(0,1) for t = 1, …, n.

Note that

Then .

For example, (s,t) for |s-t|=1 is (2/9)/(3/9) = 2/3.


1.41

Example: Strong positive and negative linear dependence (dependence.R)

Questions: If there is strong positive linear dependence between xs

and xt, the time series will appear smooth or choppy in a plot of the series versus time. Choose an answer.

If there is strong negative linear dependence between xs and xt, the time series will appear smooth or choppy in a plot of the series versus time. Choose an answer.

Think of plots of xt vs. xs to help answer the above questions.

Below are three plots illustrating these statements. I simulated data from different time series models. The autocorrelation for |s-t|=1 is given for each model. The “estimated” autocorrelation, denoted by , is given for that particular data set. The will be discussed more Section 1.6.

1) (s,t) = 0.4972, for |s-t|=1


1.42

xt wt 0.9wt 1 where wt~ ind. N(0,1)

t

x t

0 20 40 60 80 100

-3-2

-10

12


1.43

2) (s,t) = -0.4972, for |s-t|=1xt wt 0.9wt 1 where wt~ ind. N(0,1)

t

x t

0 20 40 60 80 100

-2-1

01

23


1.44

3) (s,t)=0, for |s-t|=1xt wt 0wt 1 where wt~ ind. N(0,1)

t

x t

0 20 40 60 80 100

-2-1

01

Plot 1) is the least jagged and plot 2) is the most jagged. Remember what a correlation means. A positive correlation means that “large” values tend to occur with other “large” values and “small” values tend to occur with other “small” values. A negative correlation means that “large” values tend to occur with other “small” values and “small” values tend to occur with other “large” values.

Please see program for the code.

Cross-covariance function and Cross-correlation function 2011 Christopher R. Bilder

1.45

Sometimes there will be two time series of interest. When this occurs, one is often used to help in the modeling of the other. Suppose there are two series denoted by xt and yt for t=1,…,n. The cross-covariance function is

xy(s,t) = Cov(xs, yt) = E[(xs – xs)(yt – yt)] = E(xsyt) – xsyt

where xs=E(xs) and yt=E(yt)

The cross-correlation function is

where s and t denote two time points and the x and y subscripts help denote the particular series.

These two functions will be discussed more in Chapter 5 for transfer function models.


1.46

1.5 Stationary Time Series

Stationarity is a VERY important concept to understand for this course. A stationarity assumption will allow us to construct time series models.

Strictly stationary time series - The probabilistic behavior of is exactly the same as that of the shifted set

for ANY collection of time points t1,…, tk, for ANY k=1,2,…, and for ANY shift h=0, 1, 2, … .

What does this mean??? Let c1,…, ck be constants. Then

=

The cumulative joint probability distribution is INVARIANT to time shifts! For example,

P(x1c1, x2c2) = P(x10c1, x11c2)

Requiring a time series to be strictly stationary is VERY restrictive! A less restrictive requirement is weakly stationary.

Weakly stationary time series - The first two moments (mean and covariance) of the time series are invariant to time shifts. In other words,


1.47


Strictly stationary implies weakly stationary, but the reverse is not necessarily true. This is only true if the moments are finite – see p.8 of Wei (1990)

E(xt) = for ALL time t and (t, t+h) = (0, h) for ALL time t.

Notes: and (0, h) are NOT functions of t. A “lag” refers to the time shift. (t, t+h) = (0, h) for ALL time t means that the

autocovariance function ONLY depends on the number of lags of the time shift. Thus, (0, h) = (1, h+1) = (2, 2+h) = (3, 3+h) = … .

Since we will generally be dealing with a stationary time series, we can make the following notational change: (h) = (0, h) = (t, t+h).

The variance of xt is (0). The same notational change can be made to the

autocorrelation function (ACF). Thus, (h) denotes the ACF at lag h. Note that

so the ACF simplifies to (h) = (h)/(0). Frequently, we will just say “stationary” to refer to

weakly stationary and say the full “strictly stationary” to refer to strictly stationary.

Example: White noise 2011 Christopher R. Bilder

1.48

Suppose wt ~ ind. N(0, ) for t=1,…,n. Is this a weakly stationary time series?

Note that the joint distribution is the product of the one-dimensional distributions since wt are independent.


Let mt = (wt + wt-1 + wt-2)/3 where wt~ ind. N(0,1) for t = 1, …, n.

Previously, we found that t = 0 for all t and

Is the time series weakly stationary? Hint: let h=s-t.

Notes: (h) = (-h) for all h if the series is weakly stationary. This

means that it does not matter which way the shift occurs.


1.49


Write in terms of covariances - Cov(x_t, x_t+h) = Cov(x_t, x_t-h)


Yes – mean is 0 for all w_t, variance is constant for all w_t, and the covariance is 0 since they r.v.s are independent. AND, it is also strictly stationary

Stationarity can also be examined when two time series are of interest. To have two series, xt and yt, be jointly stationary: Both time series must have constant mean Both autocovariance functions must depend only on

the lag difference, and the cross-covariance must depend only on the lag difference. If two series are jointly stationary, then the following notation can be used:

xy(h) = E[(xt+h – x)(yt – y)] for x constant for all x

and y constant for all y, , and

xy(h)= yx(-h)

is not necessarily equal to (usually will be different).

Example: Work through Example 1.21.

Let xt = wt + wt-1 and yt = wt – wt-1 where wt ~ ind. N(0, ) for t = 1, …, n

Show xt and yt are weakly stationary and show xt and yt are jointly stationary.

xt


1.50

E(xt) = E(wt + wt-1) = E(wt) + E(wt-1) = 0 + 0 =0

Thus E(xt) = xt = 0 for all t.

(s,t) = E[(xs – xs)(xt – xt)] = E[xsxt] since xs=xt=0

Then E[xsxt] = E[(ws + ws-1)(wt + wt-1)] = E[wswt + ws-1wt + wswt-1 + ws-1wt-1)]

If s=t, then (t,t) = E[ + wt-1wt + wtwt-1 + ]= E[ ] + E[wt-1wt] + E[wtwt-1] + E[ ]= Var(wt) + E[wt]2 + 2E[wt-1]E[wt] + Var(wt-1) + E[wt-1]2

= + 02 + 200 + + 02

= 2

If s=t-1, then (t-1,t) = E[wt-1wt + wt-2wt + + wt-2wt-1]= E[wt-1wt] + E[wt-2wt] + E[ ] + E[wt-2wt-1]= E[wt-1]E[wt] + E[wt-2]E[wt] + Var(wt-1) + E[wt-1]2

+ E[wt-2]E[wt-1]= 00 + 00 + + 02 + 00= .

Note that (t-1,t) = and (s,t) = 0 for |s-t|>1.


1.51

Therefore, .

Thus, xt is weakly stationary.

yt

E(yt) = E(wt – wt-1) = E(wt) – E(wt-1) = 0 - 0 =0

Thus E(yt) = yt = 0 for all t

(s,t) = E[(ys-xs)(yt-xt)] = E[ysyt] since ys=yt=0

Then E[ysyt] = E[(ws – ws-1)(wt – wt-1)] = E[wswt – ws-1wt – wswt-1 + ws-1wt-1]

If s=t, then (t,t) = E[ – wt-1wt – wtwt-1 + ]= E[ ] – E[wt-1wt] – E[wtwt-1] + E[ ]= Var(wt) + E[wt]2 – 2E[wt-1]E[wt] + Var(wt-1) + E[wt-1]2

= + 02 – 200 + + 02

= 2

If s=t-1, then (t-1,t) = E[wt-1wt – wt-2wt – + wt-2wt-1]


1.52

= E[wt-1wt] – E[wt-2wt] – E[ ] + E[wt-2wt-1]= E[wt-1]E[wt] – E[wt-2]E[wt] – Var(wt-1) – E[wt-1]2

+ E[wt-2]E[wt-1]= 00 – 00 – – 02 + 00= - .

Note that (t-1,t) = - and (s,t) = 0 for |s-t|>1.

Therefore, .

Thus, yt is weakly stationary.

xt and yt

xy(s,t) = E[(xs – xs)(yt – yt)] = E[xsyt] since xt = yt = 0

Then xy(s,t) = E[(ws + ws-1)(wt – wt-1)] = E[wswt + ws-1wt – wswt-1 – ws-1wt-1]

If s=t, then xy(t,t) = E[wtwt + wt-1wt – wtwt-1 – wt-1wt-1]= E[ ] – E[ ]= Var(wt) + E[wt]2 – Var(wt-1) – E[wt-1]2 = + 02 – – 02

= 0


1.53

If s=t-1, then xy(t-1,t) = E[wt-1wt + wt-2wt – wt-1wt-1 – wt-2wt-1]= -E[ ]= -

If s=t+1, then xy(t+1,t) = E[wt+1wt + wtwt – wt+1wt-1 – wtwt-1]= E[ ]=

If |s-t|>1, then xy(s,t) = 0.

Therefore,

Thus, xt and yt are jointly stationary.

Example: where wt ~ independent N(0, ) for t = 1,…, n and |1|<1

Read this example after we discuss “backshift notation” in Section 2.3.

Note that the sum of an infinite series is


1.54

The time series can be rewritten as

Then xw(h) = E[(wt – wt)(xt+h – x,t+h)]

Remember that: wt=E(wt)=0 and x,t+h=E(xt+h)=0 and in general for two random variables V and W, E[(V – v)(W – w)] = E(VW) – E(V)E(W).

So xw(h) = E[wtxt+h] – E[wt]E[xt+h] = E[wtxt+h] – 00 = E[wtxt+h] = E[wt( )] =

Since the wt’s are independent with mean 0, we only need to be concerned about when the subscripts of the wt pairs match.

xw(0)


1.55

xw(1)

xw(-1)

In general, xw(h) = for h0 and xw(h) = 0 for h<0.

Therefore, the xt and wt are jointly stationary (note: it can be shown that xt and wt are stationary; I just did the cross-covariance part).

Since Var(xt) = ,


1.56

bilder, 12/19/06,

Makes sense since (1-phi*B)x_t = w_t. So x_t-1 comes from w_t-1, w_t-2, ... NOT w_t

for h0 and

=0 for h<0.

Linear Process

Note that the above examples are special cases of a “linear process”. In general, a linear process can be defined as

with and

wt ~ ind. N(0, ).

It can be shown that for h0 and

(h)=(-h) provided the series is stationary.

PF: Without loss of generality, let =0.

E(xt) = =

= = = 0.

Note that (h) = Cov(xt, xt+h) = E(xtxt+h) – E(xt)E(xt+h) = E(xtxt+h) since E(xt) = E(xt+h) = 0.


1.57

bilder, 12/19/06,

Constants do not affect a covariance

Then E(xtxt+h)

because E(wt-iwt+h-j)=0 when -ih-j

and E(wt-iwt+h-j)= = when -i=h-j j-i=h

Therefore, for h0 and (-h)=(h).

Compare the previous examples to this result!.

Important!:

There is a very important case when weakly stationary implies strictly stationary. This occurs when the time series has a multivariate normal distribution. Remember that a univariate normal distribution is defined only by its mean and variance. The multivariate normal distribution is defined only by its mean vector and covariance matrix.


1.58


Hint: Look at a sample case of q=1 or 2 and try h=0, 1, 2,….

Thus, if we can assume a multivariate normal distribution, we ONLY need to check if the time series satisfies the weakly stationary requirements in order to say the time series is strongly stationary. Thus, notice what the word “stationary” would mean in this case.

Example: Visualizing stationarity

Below are a few plots of the observed values of a time series. Identify which plots correspond to a weakly stationary series. There is no program given for this example.


1.59


1.60

From examining these plots, why is it important to have stationarity?


1.61


For example, estimation of the mean and variance – note: to look at stationary for the autocovariance (not variance h=0), need to look at ACF plot.


covariance part is very hard to see.

1.5 Estimation of Correlation

, (h), and (h) are usually unknown so we need to estimate them. In order to do this, we need to assume the time series is weakly stationary.

Sample mean function

By the weakly stationary assumption, E(x1)=, E(x2)=,…, E(xn)=. Thus, a logical estimate of is

Note that this would not make sense to do if the weakly stationarity assumption did not hold!

Sample autocovariance function

Again with the weakly stationarity assumption, we only need to worry about the lag difference. The estimated autocovariance function is:

Notes This is similar to the formula often used to estimate the

covariance between two random variables x and y: 2011 Christopher R. Bilder

1.62


What is this if h=1? What is this if h=0?

. If you do not recognize

this formula, look at the numerator of the estimated correlation coefficient.

The sum goes up to n-h to avoid having negative subscripts in the x’s.

This is NOT an unbiased estimate of (h)!

Sample autocorrelation function (ACF)

Question: What does (h)=0 mean and why would this be important to detect?

Since this is important, we conduct hypothesis tests for (h) for all h0! In order to do the hypothesis test, we need to find the sampling distribution for under the null hypothesis of (h)=0.

Sampling distribution for : See Shumway and Stoffer for the exact description (understanding the proof in the appendix requires having a course on asymptotics). In summary, if (h)=0, xt is stationary, and the sample size is “large” then has an approximate normal distribution with mean 0 and standard deviation .


1.63


There is no autocorrelation.

For a hypothesis test, we could check if is within the bounds of 0 or not where P(Z < Z1-/2) = 1 – /2 for a standard normal random variable Z. If it is not, then there is sufficient evidence to conclude that (h)0. We will be using this result a lot for the rest of this course!

Example: xt=0.7xt-1+wt where wt ~ ind. N(0,1) and n=100. (ar1_0.7.R, AR1.0.7.txt)

The data are simulated using the above model and are different from the example earlier in Chapter 1. Also, the data are read in from a file instead of simulated within R.

> ar1<-read.table(file = "C:\\chris\\UNL\\STAT_time_series \\chapter1\\AR1.0.7.txt", header=TRUE, sep = "")

> head(ar1) t x1 1 0.041726802 2 0.371906823 3 -0.185451854 4 -1.382974225 5 -2.875936526 6 -2.60017605

The plot below is constructed in a similar manner as past plots.


1.64

0 20 40 60 80 100

-4-2

02

4

Data simulated from AR(1): xt 0.7xt 1 wt where wt~N(0,1)

t

x t

The easiest way to find the autocorrelations in R is to use the acf() function.

> x<-ar1$x> rho.x<-acf(x = x, type = "correlation", main = expression(paste("Data simulated from AR(1): ", x[t] == 0.7*x[t-1] + w[t], " where ", w[t], "~N(0,1)")))


1.65

0 5 10 15 20

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

Lag

AC

F


The horizontal lines on the plot are drawn at 0 where Z1-0.05/2=1.96. > rho.x

Autocorrelations of series 'x', by lag

0 1 2 3 4 5 6 7 1.000 0.674 0.401 0.169 -0.023 -0.125 -0.067 -0.064

8 9 10 11 12 13 14-0.058 0.005 -0.044 -0.041 -0.017 0.064 0.076

15 16 17 18 19 20 0.160 0.191 0.141 0.081 0.006 -0.132


1.66

> names(rho.x)[1] "acf" "type" "n.used" "lag" "series" "snames"> rho.x$acf, , 1

[,1] [1,] 1.000000000 [2,] 0.673671871 [3,] 0.400891188 [4,] 0.168552826 [5,] -0.023391129 [6,] -0.124632501 [7,] -0.067392830 [8,] -0.064248086 [9,] -0.057717749[10,] 0.005312358[11,] -0.044035976[12,] -0.041121407[13,] -0.017197132[14,] 0.063864970[15,] 0.075575696[16,] 0.159665692[17,] 0.191349965[18,] 0.140967540[19,] 0.080508273[20,] 0.005584061[21,] -0.131559629

> rho.x$acf[1:2][1] 1.0000000 0.6736719

Questions: What happens to the autocorrelations over time? Why

do you think this happens? Is there a positive or negative correlation? At what lags is (h)0?

The autocovariances can also be found using acf(). > acf(x = x, type = "covariance", main =


1.67

expression(paste("Data simulated from AR(1): ", x[t] == 0.7*x[t-1] + w[t], " where ", w[t], "~N(0,1)")))

0 5 10 15 20

0.0

0.5

1.0

1.5

2.0

2.5

Lag

AC

F (c

ov)


To help understand autocorrelations and their relationship with the correlation coefficient better, I decided to look at the “usual” estimated Pearson correlations between xt, xt-1, xt-2, and xt-3. > x.ts<-ts(x)> set1<-ts.intersect(x.ts, x.ts1 = lag(x = x.ts, k = -1), x.ts2 = lag(x = x.ts, k = -2), x.ts3 = lag(x = x.ts, k = -3))


1.68

> set1Time Series:Start = 4 End = 100 Frequency = 1 x.ts x.ts1 x.ts2 x.ts3 4 -1.38297422 -0.18545185 0.37190682 0.04172680 5 -2.87593652 -1.38297422 -0.18545185 0.37190682 6 -2.60017605 -2.87593652 -1.38297422 -0.18545185 7 -1.10401719 -2.60017605 -2.87593652 -1.38297422 8 -0.46385116 -1.10401719 -2.60017605 -2.87593652 9 0.80339069 -0.46385116 -1.10401719 -2.60017605

Output edited

97 -1.36418639 -1.63175408 -1.56008530 -0.40824385 98 -0.37209392 -1.36418639 -1.63175408 -1.56008530 99 -0.65833401 -0.37209392 -1.36418639 -1.63175408100 2.03705932 -0.65833401 -0.37209392 -1.36418639

> cor(set1) x.ts x.ts1 x.ts2 x.ts3x.ts 1.0000000 0.6824913 0.4065326 0.1710145x.ts1 0.6824913 1.0000000 0.6929638 0.4108375x.ts2 0.4065326 0.6929638 1.0000000 0.6935801x.ts3 0.1710145 0.4108375 0.6935801 1.0000000 > library(car) #scatterplot.matrix is in this package - may need to install first> scatterplot.matrix(formula = ~x.ts + x.ts1 + x.ts2 + x.ts3, data=set1, reg.line=lm, smooth=TRUE, span=0.5, diagonal = 'histogram')


1.69

x

Fre

quen

cy

x.ts

-4 -2 0 2 4 -4 -2 0 2 4

-4-2

02

4

-4-2

02

4

x

Fre

quen

cy

x.ts1

x

Fre

quen

cyx.ts2

-4-2

02

4

-4 -2 0 2 4

-4-2

02

4

-4 -2 0 2 4

x

Fre

quen

cy

x.ts3

> set2<-ts.intersect(x.ts, x.ts1 = lag(x = x.ts, k = 1), x.ts2 = lag(x = x.ts, k = 2), x.ts3 = lag(x = x.ts, k = 3))> set2Time Series:Start = 1 End = 97 Frequency = 1 x.ts x.ts1 x.ts2 x.ts3 1 0.04172680 0.37190682 -0.18545185 -1.38297422 2 0.37190682 -0.18545185 -1.38297422 -2.87593652 3 -0.18545185 -1.38297422 -2.87593652 -2.60017605 4 -1.38297422 -2.87593652 -2.60017605 -1.10401719 5 -2.87593652 -2.60017605 -1.10401719 -0.46385116

Output edited 2011 Christopher R. Bilder

1.70

94 -0.40824385 -1.56008530 -1.63175408 -1.3641863995 -1.56008530 -1.63175408 -1.36418639 -0.3720939296 -1.63175408 -1.36418639 -0.37209392 -0.6583340197 -1.36418639 -0.37209392 -0.65833401 2.03705932

> cor(set2) x.ts x.ts1 x.ts2 x.ts3x.ts 1.0000000 0.6935801 0.4108375 0.1710145x.ts1 0.6935801 1.0000000 0.6929638 0.4065326x.ts2 0.4108375 0.6929638 1.0000000 0.6824913x.ts3 0.1710145 0.4065326 0.6824913 1.0000000

The ts() function converts the time series data to an object that R recognizes as a time series.

The lag function is used to find xt-1, xt-2, and xt-3. The k option specifies how many time periods to go back. Run just lag(x.ts, k = -1) and lag(x.ts, k = 1) to see what happens. To get everything lined up as I wanted with ts.intersect(), I chose to use k = -1. Page 388 of Venables and Ripley (2002) has a good quote about the lag() function:

The function lag shifts the time axis of a series back by k positions … This can cause confusion, as most people think of lags as shifting time and not the series; that is, the current value of a series lagged by one year is last year’s, not next year’s.

Run the lag() function by itself and check out the “Start” index.

The ts.intersect() function finds the intersection of the four different “variables”.


1.71

The cor() function finds the Pearson correlations between all variable pairs. Notice how close these correlations are to the autocorrelations!

The scatterplot.matrix() function finds a scatter plot matrix. The function is in the car package (see Fox’s “An R and S-Plus Companion to Applied Regression” book from STAT 870).

Example: OSU enrollment data (osu_enroll.R,osu_enroll.xls)

The code used to find the autocorrelations is:

> x<-osu.enroll$Enrollment> rho.x<-acf(x = x, type = "correlation", main = "OSU Enrollment series")> rho.x

Autocorrelations of series 'x', by lag 0 1 2 3 4 5 6 7 8 9 10 1.000 -0.470 -0.425 0.909 -0.438 -0.395 0.822 -0.403 -0.358 0.739 -0.367 11 12 13 14 15 16 -0.327 0.655 -0.337 -0.297 0.581 -0.309

> rho.x$acf[1:9][1] 1.0000000 -0.4702315 -0.4253427 0.9087421 -0.4377336 -0.3946048 0.8224660[8] -0.4025871 -0.3584216


1.72

0 5 10 15

-0.5

0.0

0.5

1.0

Lag

AC

F

OSU Enrollment series

Notes: There are some large autocorrelations. This is a

characteristic of a nonstationary series. We will examine this more later.

Since the series is not stationary, the hypothesis test for (h)=0 should not be done here using the methods discussed earlier.

There is a pattern among the autocorrelations. What does this correspond to?


1.73


Fall, Spring, Summer – There are three parts to each school year

Sample cross-covariance function:

Note that ; however, is not necessarily equal to .

Sample cross-correlation function:

Sampling distribution for : has an approximate normal distribution with mean 0 and standard deviation of

if the sample size is large and at least one of the series is white noise.

For a hypothesis test, we could check if is within the bounds of 0 or not. If it is not, then there is sufficient evidence to conclude that xy(h)0.

To find the CCF in R, use the acf() or ccf() functions.


1.74

Example: Simple CCF example (simple_CCF_example.xls, simple_CCF_example.R)

The above Excel file shows how some of the “by-hand” calculations of the cross-covariance function can be done. Below is part of the resulting spreadsheet.

h=0 h=1t xt yt

1 1 2 8.75 5.252 2 3 3.75 1.253 3 5 0.25 -0.254 4 6 0.25 0.755 5 8 3.75 6.256 6 9 8.75 No x7

Mean 3.5 5.5 Sum: 25.50 13.25

The estimated cross-covariance function is

.

Then,

The cross-correlation function is .


1.75

Then,

where and were found in R.

Below is the R code and output:

> x<-c(1, 2, 3, 4, 5, 6)> y<-c(2, 3, 5, 6, 8, 9)

> gamma.x<-acf(x = x, type = "covariance", plot = FALSE)> gamma.x

Autocovariances of series 'x', by lag

0 1 2 3 4 5 2.917 1.458 0.167 -0.792 -1.250 -1.042

> gamma.y<-acf(x = y, type = "covariance", plot = FALSE)> gamma.y

Autocovariances of series 'y', by lag

0 1 2 3 4 5 6.250 3.292 0.167 -1.625 -2.917 -2.042

> x.y.acf<-acf(x = cbind(x,y), type = "correlation")> x.y.acf

Autocorrelations of series 'cbind(x, y)', by lag

, , x

x y 1.000 ( 0) 0.995 ( 0)


1.76

0.500 ( 1) 0.517 (-1) 0.057 ( 2) 0.039 (-2) -0.271 ( 3) -0.263 (-3) -0.429 ( 4) -0.449 (-4)

, , y

x y 0.995 ( 0) 1.000 ( 0) 0.517 ( 1) 0.527 ( 1) 0.039 ( 2) 0.027 ( 2) -0.263 ( 3) -0.260 ( 3) -0.449 ( 4) -0.467 ( 4)

0 1 2 3 4

-0.5

0.0

0.5

1.0

Lag

AC

F

x

0 1 2 3 4

-0.5

0.0

0.5

1.0

Lag

x & y

-4 -3 -2 -1 0

-0.5

0.0

0.5

1.0

Lag

AC

F

y & x

0 1 2 3 4

-0.5

0.0

0.5

1.0

Lag

y

> x.y.ccf<-ccf(x = x, y = y, type = "correlation")> x.y.ccf


1.77

Autocorrelations of series 'X', by lag

-4 -3 -2 -1 0 1 2 3 4 -0.449 -0.263 0.039 0.517 0.995 0.517 0.039 -0.263 -0.449

-4 -2 0 2 4

-0.5

0.0

0.5

1.0

Lag

AC

F

x & y

Notes: The c() function was used to enter data into a R vector. R gives the CCF for “x & y” and “y & x”. The “x & y” gives

for h0 and “y & x” gives for h0. Note that the R labeling may be confusing!!! R is using the relationship = in its notation.


1.78

bilder, 12/19/06,

P. 392 of Venables and Ripley (top of page)

Example: Reproduce results from Example 1.25 of Shumway and Stoffer (ex1.25.R)

> #How to read in data> #Method #1 - obtain from tsa3.rda file > load("C:\\chris\\unl\\STAT_time_series\\ TSA_3rd_edition\\tsa3.rda")

> #Method #2 - I have ASCII text files from their previous book editions> soi<-read.table(file = "C:\\chris\\UNL\\ STAT_time_series\\Shumway_Stoffer_web_info\\Data\\ soi.dat", header=FALSE, col.names = "soi.var")> rec<-read.table(file = "C:\\chris\\UNL\\ STAT_time_series\\Shumway_Stoffer_web_info\\Data\\ recruit.dat", header=FALSE, col.names = "rec.var")> head(soi) soi.var1 0.3772 0.2463 0.3114 0.1045 -0.0166 0.235> head(rec) rec.var1 68.632 68.633 68.634 68.635 68.636 68.63 > soi.rec.acf<-acf(x = cbind(soi,rec), type = "correlation", lag.max = 50)> soi.rec.acf

Autocorrelations of series 'cbind(soi, rec)', by lag

, , soi.var


1.79

UNL, 08/08/11,

rec = Nnumber of new fish

UNL, 08/08/11,

SOI = monthly Southern Oscillation Index (change in air pressure over Pacific)

soi.var rec.var 1.000 ( 0) 0.025 ( 0) 0.604 ( 1) 0.011 ( -1) 0.374 ( 2) -0.042 ( -2) 0.214 ( 3) -0.146 ( -3) 0.050 ( 4) -0.297 ( -4)

Output edited, , rec.var

soi.var rec.var 0.025 ( 0) 1.000 ( 0) -0.013 ( 1) 0.922 ( 1) -0.086 ( 2) 0.783 ( 2) -0.154 ( 3) 0.627 ( 3) -0.228 ( 4) 0.477 ( 4)

Output edited

0 10 20 30 40 50

-0.5

0.0

0.5

1.0

Lag

ACF

soi.var

0 10 20 30 40 50

-0.5

0.0

0.5

1.0

Lag

soi.var & rec.var

-50 -40 -30 -20 -10 0

-0.5

0.0

0.5

1.0

Lag

ACF

rec.var & soi.var

0 10 20 30 40 50

-0.5

0.0

0.5

1.0

Lag

rec.var


1.80

> ccf(x = soi, y = rec, type = "correlation", main = "Part of Figure 1.14", lag = 50)

-40 -20 0 20 40

-0.6

-0.4

-0.2

0.0

0.2

Lag

AC

F

Part of Figure 1.14

Notes: The “soi.var & rec.var” and “rec.var and soi.var” are not

exactly the same. R gives the CCF for “soi.var & rec.var” and “rec.var and

soi.var”. The “soi.var & rec.var” gives for h0 and “rec.var and soi.var” gives for h0. Note that the R labeling may be confusing!!! R is using the relationship = in its notation.


1.81

Examine the strength of association for the two series individually and together.


1.82

1.6 Vector-Valued and Multidimensional Series

Since many different time series often occur at the same time, it is often useful to consider a vector of time series data.

Let be a vector time series. Note that this

could also be represented as a transpose:

xt = (xt1, xt2, …, xtp)

A vector is represented as a bold letter. Note that xt1 represents the first time series variable at time t, …, xtp represents the pth time series variable at time t.

For the stationary case, = E(xt) where = (t1, t2, …, tp) is the mean vector (h) = E[(xt+h - )(xt - )] is the autocovariance matrix

The autocovariance matrix is similar to the covariance matrix discussed in STAT 870. Elements of this matrix are


1.83

where ij(h) = E[(xt+h,i - i)(xtj-j)]

Notes: For example, 12(h)=Cov(xt,1, xt+h,2) Remember that a covariance matrix is symmetric (-h) = (h) since ij(h) = ji(-h)

Sample autocovariance matrix:

where

Note that

See Shumway and Stoffer’s discussion on multidimensional process. In this case, the time series is indexed by items in addition to time! For example, this can happen in spatial statistics.


1.84

Chapter 1 - Chris Bilder · Web view2 2 Spring 1990 19128 1990-02-01 3 3 Summer 1990 7553...

Documents

Transcript of Chapter 1 - Chris Bilder · Web view2 2 Spring 1990 19128 1990-02-01 3 3 Summer 1990 7553...