Chapter 1 - Chris Bilder · Web view2 2 Spring 1990 19128 1990-02-01 3 3 Summer 1990 7553...
Transcript of Chapter 1 - Chris Bilder · Web view2 2 Spring 1990 19128 1990-02-01 3 3 Summer 1990 7553...
1. Characteristics of Time Series1.1 Introduction
We are going to examine data that has been observed over time. Typically, there is a correlation between the observed data which limits our ability to use “conventional” statistical analysis methods.
Remember that many statistical applications rely on having observations that are independent.
In this class, we are going to learn how to identify this correlation and use it to help construct models. The models are then used to forecast “future” observations. These type of analyses fall under the title “time series analysis”. For most of our course, we will focus on modeling ONE series of observations without any explanatory variables. A few sections in Chapter 5 are exceptions where we will incorporate explanatory variables.
Read Shumway and Stoffer’s introduction to the methods discussed in this book.
2011 Christopher R. Bilder
1.1
1.2 The Nature of Time Series Data
Example: OSU enrollment data (osu_enroll.R, osu_enroll.xls)
Partial listing of the data:
t Semester Year Enrollment1 Fall 1989 20,1102 Spring 1990 19,1283 Summer 1990 7,5534 Fall 1990 19,5915 Spring 1991 18,3616 Summer 1991 6,702 32 Spring 2000 19,83533 Summer 2000 7,20234 Fall 2000 21,25235 Spring 2001 20,00436 Summer 2001 7,55837 Fall 2001 21,87238 Spring 2002 20,99239 Summer 2002 7,86840 Fall 2002 22,992
> library(RODBC)> z<-odbcConnectExcel("C:\\chris\\UNL\\STAT_time_series \\chapter1\\osu_enroll.xls")> osu.enroll<-sqlFetch(z, "Sheet1")> close(z)
2011 Christopher R. Bilder
1.2
> head(osu.enroll) t Semester Year Enrollment date1 1 Fall 1989 20110 1989-08-312 2 Spring 1990 19128 1990-02-013 3 Summer 1990 7553 1990-06-014 4 Fall 1990 19591 1990-08-315 5 Spring 1991 18361 1991-02-016 6 Summer 1991 6702 1991-06-01
> tail(osu.enroll) t Semester Year Enrollment date35 35 Spring 2001 20004 2001-02-0136 36 Summer 2001 7558 2001-06-0137 37 Fall 2001 21872 2001-08-3138 38 Spring 2002 20922 2002-02-0139 39 Summer 2002 7868 2002-06-0140 40 Fall 2002 22992 2002-08-31
> #One way to do plot> win.graph(width = 8, height = 6, pointsize = 10) > plot(x = osu.enroll$Enrollment, ylab = "OSU Enrollment", xlab = "t (time)", type = "l", col = "red", main = "OSU Enrollment from Fall 1989 to Fall 2002", panel.first=grid(col = "gray", lty = "dotted"))> points(x = osu.enroll$Enrollment, pch = 20, col = "blue")
2011 Christopher R. Bilder
1.3
0 10 20 30 40
1000
015
000
2000
0
OSU Enrollment from Fall 1989 to Fall 2002
t (time)
OS
U E
nrol
lmen
t
We will often use “t” to represent time so that we can say x1 = 20,110, x2 = 19,128, …, x40 = 22,992.
When only “x” is specified in the plot() function, R puts this on the y-axis and uses the observation number on the x-axis. Compare this to the next plot below where both “x” and “y” options are specified. > #More complicated plot> plot(y = osu.enroll[osu.enroll$Semester == "Fall",]$Enrollment, x = osu.enroll[osu.enroll$Semester == "Fall",]$t, ylab = "OSU Enrollment", xlab = "t (time)", col = “blue", main = "OSU Enrollment from Fall 1989 to Fall
2011 Christopher R. Bilder
1.4
2002", panel.first=grid(col = "gray", lty = dotted"), pch = 1, type = "o", ylim = c(0, max(osu.enroll$Enrollment)))> lines(y = osu.enroll[osu.enroll$Semester == "Spring",]$Enrollment,
x = osu.enroll[osu.enroll$Semester == "Spring",]$t, col = "red", type = "o", pch = 2)> lines(y = osu.enroll[osu.enroll$Semester == "Summer",]$Enrollment, x = osu.enroll[osu.enroll$Semester == "Spring",]$t, col = "darkgreen", type = "o", pch = 3)> legend(x = locator(1),legend=c("Fall", "Spring", "Summer"), pch = c(1,2,3), lty = c(1,1,1), col=c("blue", "red", "darkgreen"), bty="n")
0 10 20 30 40
050
0010
000
1500
020
000
OSU Enrollment from Fall 1989 to Fall 2002
t (time)
OS
U E
nrol
lmen
t FallSpringSummer
> #Another way to do plot with actual dates> plot(y = osu.enroll$Enrollment, x = as.Date(osu.enroll$date), xlab = "Time", type = "l",
2011 Christopher R. Bilder
1.5
col = "red", main = "OSU Enrollment from Fall 1989 to Fall 2002", ylab = “OSU Enrollment”)> points(y = osu.enroll$Enrollment, x = as.Date(osu.enroll$date), pch = 20, col = "blue")> #Create own gridlines> abline(v = as.Date(c("1990/1/1", "1992/1/1", "1994/1/1", "1996/1/1", "1998/1/1", "2000/1/1", "2002/1/1")), lty = "dotted", col = "lightgray")> abline(h = c(10000, 15000, 20000), lty = "dotted", col = "lightgray")> #There may be better ways to work with actual dates.
1000
015
000
2000
0
OSU Enrollment from Fall 1989 to Fall 2002
Time
OS
U E
nrol
lmen
t
1990 1992 1994 1996 1998 2000 2002
Questions of interest:1) What patterns are there over time?
2011 Christopher R. Bilder
1.6
2) How can the correlation between observations be used to help model the data?
3) Can future enrollment be predicted using this data?4) Most of the time, we will only use past values in the
series to predict future values. However, in this case, what explanatory variables (independent variables, covariates) may be useful to use to predict enrollment?
5) Why is modeling enrollment and predicting future enrollment important?
5-8-01 O’Collegian article: “$1.8 million loss attributed to slight enrollment decline”
2011 Christopher R. Bilder
1.7
2011 Christopher R. Bilder
1.8
Example: Russell 3000 Index (russell_3000.R, russell.xls)
Source: www.russell.com.
The index “measures the performance of the 3,000 largest United States companies based on total market capitalization, which represents approximately 98% of the investable United States equity market.”
> library(RODBC)> z<-odbcConnectExcel("C:\\chris\\UNL\\STAT_time_series \\chapter1\\russell.xls")> russell<-sqlFetch(z, "Sheet1")> close(z)
> head(russell) Index Name Date Value Without Dividends Value With Dividends1 Russell 3000® Index 1995-06-01 555.15 1034.422 Russell 3000® Index 1995-06-02 555.15 1034.563 Russell 3000® Index 1995-06-05 558.72 1041.214 Russell 3000® Index 1995-06-06 558.50 1041.045 Russell 3000® Index 1995-06-07 556.45 1037.216 Russell 3000® Index 1995-06-08 555.83 1036.18> tail(russell) Index Name Date Value Without Dividends Value With Dividends674 Russell 3000® Index 1997-12-23 965.71 1891.83675 Russell 3000® Index 1997-12-24 960.20 1881.05676 Russell 3000® Index 1997-12-26 963.43 1887.42677 Russell 3000® Index 1997-12-29 979.26 1919.07678 Russell 3000® Index 1997-12-30 996.66 1953.32679 Russell 3000® Index 1997-12-31 998.26 1956.51
> #One way to do plot> win.graph(width = 8, height = 6, pointsize = 10) > plot(x = russell$"Value Without Dividends", ylab = "Russell 3000 Index", xlab = "t (time)", type = "l", col = "red", main = "Russell 3000 Index from 6/1/1995 to 12/31/1997", panel.first=grid(col = "gray", lty = "dotted"))> points(x = russell$"Value Without Dividends", pch = 20, col = "blue")
2011 Christopher R. Bilder
1.9
0 100 200 300 400 500 600 700
600
700
800
900
1000
Russell 3000 Index from 6/1/1995 to 12/31/1997
t (time)
Rus
sell
3000
Inde
x
> #Another way to do plot with actual dates> plot(y = russell$"Value Without Dividends", x = as.Date(russell$Date), xlab = "Time", type = "l", col = "red", main = "Russell 3000 Index from 6/1/1995 to 12/31/1997", ylab = "Russell 3000 Index", xaxt = "n")> axis.Date(side = 1, at = seq(from = as.Date("1995/6/1"), to = as.Date("1997/12/31"), by = "months"), labels = format(x = seq(from = as.Date("1995/6/1"), to = as.Date("1997/12/31"), by = "months"), format = "%b%y"), las = 2) #las changes orientation of labels> points(y = russell$"Value Without Dividends", x = as.Date(russell$Date), pch = 20, col = "blue")> #Create own gridlines> abline(v = as.Date(c("1995/7/1", "1996/1/1", "1996/7/1", "1997/1/1", "1997/7/1")), lty = "dotted", col = "lightgray")> abline(h = seq(from = 600, to = 1000, by = 100), lty =
2011 Christopher R. Bilder
1.10
"dotted", col = "lightgray")
600
700
800
900
1000
Russell 3000 Index from 6/1/1995 to 12/31/1997
Time
Rus
sell
3000
Inde
x
Jun9
5Ju
l95
Aug
95S
ep95
Oct
95N
ov95
Dec
95Ja
n96
Feb9
6M
ar96
Apr
96M
ay96
Jun9
6Ju
l96
Aug
96S
ep96
Oct
96N
ov96
Dec
96Ja
n97
Feb9
7M
ar97
Apr
97M
ay97
Jun9
7Ju
l97
Aug
97S
ep97
Oct
97N
ov97
Dec
97Questions of interest:1) What patterns are there over time?2) How can the correlation between observations be
used to help model the data?3) Can future index values be predicted using this data? 4) Why would modeling the Russell 3000 Index and
predicting future values be important?
2011 Christopher R. Bilder
1.11
Example: Sunspots (sunspots.R, sunspots.csv)
Number of sunspots per year on the sun from 1784-1983.
> sunspots.data<-read.table(file = "C:\\chris\\UNL\\ STAT_time_series\\chapter1\\sunspots.csv", header=TRUE, sep = ",")> head(sunspots.data) Year Sunspots1 1784 10.22 1785 24.13 1786 82.94 1787 132.05 1788 130.96 1789 118.1
> tail(sunspots.data) Year Sunspots195 1978 92.50196 1979 155.40197 1980 32.27198 1981 54.25199 1982 59.65200 1983 63.62 > win.graph(width = 8, height = 6, pointsize = 10)> plot(x = sunspots.data$Sunspots, ylab = "Number of sunspots", xlab = "t (time)", type = "l", col = "red", main = "Sunspots per year from 1784 to 1983", panel.first=grid(col = "gray", lty = "dotted"))> points(x = sunspots.data$Sunspots, pch = 20, col = "blue")
2011 Christopher R. Bilder
1.12
0 50 100 150 200
050
100
150
Sunspots per year from 1784 to 1983
t (time)
Num
ber o
f sun
spot
s
> plot(y = sunspots.data$Sunspots, x = sunspots.data$Year, ylab = "Number of sunspots", xlab = "Year", type = "l", col = "red", main = "Sunspots per year from 1784 to 1983", panel.first=grid(col = "gray", lty = "dotted"))> points(y = sunspots.data$Sunspots, x = sunspots.data$Year, pch = 20, col = "blue")
2011 Christopher R. Bilder
1.13
1800 1850 1900 1950
050
100
150
Sunspots per year from 1784 to 1983
Year
Num
ber o
f sun
spot
s
> #Convert to an object of class "ts"> x<-ts(sunspots.data$Sunspots, start = 1784, frequency = 1)> class(x)[1] "ts"> class(sunspots.data$Sunspots)[1] "numeric"> xTime Series:Start = 1784 End = 1983 Frequency = 1 [1] 10.20 24.10 82.90 132.00 130.90 118.10 89.90 66.60 60.00 46.90 41.00 21.30 16.00 6.40 4.10
2011 Christopher R. Bilder
1.14
[16] 6.80 14.50 34.00 45.00 43.10 47.50 42.20 28.10 10.10 8.10 2.50 0.00 1.40 5.00 12.20
EDITED [196] 155.40 32.27 54.25 59.65 63.62> plot.ts(x = x, ylab = expression(paste(x[t], " (Number of sunspots)")), xlab = "t (year)", type = "o", col = "red", main = "Sunspots per year from 1784 to 1983")
Sunspots per year from 1784 to 1983
t (year)
x t (N
umbe
r of s
unsp
ots)
1800 1850 1900 1950
050
100
150
Notes: The sunspot values are not necessarily integers. Every object in R has a class. For time series data, it
is sometimes useful to use a “ts” class type with it.
2011 Christopher R. Bilder
1.15
Questions of interest:1) What patterns are there over time?2) How can the correlation between observations be
used to help model the data?3) Can future sunspots be predicted using this data? 4) Why would modeling the number of sunspots and
predicting future values be important?
See Shumway and Stoffer for more examples!
2011 Christopher R. Bilder
1.16
1.3 Time Series Statistical Models
Stochastic process – a collection of random variables {Xt} indexed by t
Time series – collection of random variables indexed according to the order they are obtained in time.
Let Xt be the random variable at time t
Then X1 = random variable at time 1X2 = random variable at time 2
A realization of the stochastic process is the observed values
The observed values are denoted by x1, x2, … .
Notice that lowercase letters are used to denote the observed value of the random variables.
NOTE: Shumway and Stoffer say the following:
Because it will be clear from the context of our discussions, we will use the term time series whether we are referring to the process or to a particular realization and make no notational distinction between the two concepts.
2011 Christopher R. Bilder
1.17
What does this mean? There will be no notational differentiation made between the random variables and their observed values. Shumway and Stoffer will typically use a lowercase letter – xt.
Example: White noise (white_noise.R)
The simplest kind of time series is a collection of independent and identically distributed random variables with mean 0 and constant variance.
This can be written as wt ~ independent (0, ) for t=1,…,n.
Most often, the probability distribution is assumed to be a normal probability distribution.
This can be written as wt ~ indepedent N(0, ) for t=1,…,n.
What does this mean? Each wt has a normal distribution with mean of 0 and a
constant variance. w1, w2, …, wn are independent of each other
Given this set up, answer the following questions: What patterns are there over time (t)?
2011 Christopher R. Bilder
1.18
How can the correlation between observations be used to help model the data?
How can we “simulate” a white noise process using R?
Since each random variable is independent, we could simulate 100 observations from a normal distribution. I am going to use = 1 here.
> set.seed(8128)> w<-rnorm(n = 100, mean = 0, sd = 1)> head(w)[1] -0.10528941 0.25548490 0.82065388 0.04070997 -0.66722880 -1.54502793
> #Using plot.ts() which is set up for time series plots> win.graph(width = 6, height = 6, pointsize = 10)> plot.ts(x = w, ylab = expression(w[t]), xlab = "t", type = "o", col = "red", main = expression(paste("White noise where ", w[t], " ~ ind. N(0, 1)")), panel.first=grid(col = "gray", lty = "dotted"))
2011 Christopher R. Bilder
1.19
White noise where wt ~ ind. N(0, 1)
t
wt
0 20 40 60 80 100
-2-1
01
2
> #Advantage of second plot is separate control over color of points> plot(x = w, ylab = expression(w[t]), xlab = "t", type = "l", col = "red", main = expression(paste("White noise where ", w[t], " ~ ind. N(0, 1)")), panel.first = grid(col = "gray", lty = "dotted"))> points(x = w, pch = 20, col = "blue")
2011 Christopher R. Bilder
1.20
0 20 40 60 80 100
-2-1
01
2
White noise where wt ~ ind. N(0, 1)
t
wt
Given this data set, answer the following questions: What patterns are there over time (t)? How can the correlation between observations be used
to help model the data?
Suppose another white noise process is simulated. To create a plot overlaying the two time series, use the code below.
> set.seed(1298)> w.new<-rnorm(n = 100, mean = 0, sd = 1)> head(w.new)
2011 Christopher R. Bilder
1.21
[1] 1.08820292 -1.46217413 -1.10887422 0.55156914 0.70582813 0.05079594 > plot(x = w, ylab = expression(w[t]), xlab = "t", type = "l", col = "red", main = expression(paste("White noise where ", w[t], " ~ ind. N(0, 1)")), panel.first = grid(col = "gray", lty = "dotted"), c(min(w.new, w), max(w.new, w)))> points(x = w, pch = 20, col = "blue")> lines(x = w.new, col = "green")> points(x = w.new, pch = 20,col = "orange")> legend(x = locator(1),legend=c("Time series 1", "Time Series 2"), lty=c(1,1), col=c("red", "green"), bty="n")
0 20 40 60 80 100
-2-1
01
2
White noise where wt ~ ind. N(0, 1)
t
wt
Time series 1Time Series 2
2011 Christopher R. Bilder
1.22
> win.graph(width = 8, height = 6, pointsize = 10) > par(mfrow = c(2,1))> plot(x = w, ylab = expression(w[t]), xlab = "t", type = "l", col = c("red"), main = expression(paste("White noise where ", w[t], "~N(0, 1)")), panel.first = grid(col = "gray", lty = "dotted"))> points(x = w, pch = 20, col = "blue")> plot(x = w.new, ylab = expression(w.new[t]), xlab = "t", type = "l", col = c("green"), main = expression(paste("White noise where ", w[t], " ~ ind. N(0, 1)")), panel.first=grid(col = "gray", lty = "dotted"))> points(x = w.new, pch = 20, col = "orange")
0 20 40 60 80 100
-2-1
01
2
White noise where wt ~ ind. N(0, 1)
t
wt
0 20 40 60 80 100
-2-1
01
2
White noise where wt ~ ind. N(0, 1)
t
w.n
ewt
2011 Christopher R. Bilder
1.23
Example: Moving average of white noise (moving_average.R)
The previous time series had no correlation between the observations. One way to induce correlation is to create a “moving average” of the observations. This will have an effect of “smoothing” the series.
Let mt = (wt + wt-1 + wt-2)/3
Note: This is different from the example given in Shumway and Stoffer p. 13 where they find (wt+1 + wt
+ wt-1)/3.
This can be done in R using the following code:
> set.seed(8128)> w<-rnorm(n = 100, mean = 0, sd = 1)> head(w)[1] -0.10528941 0.25548490 0.82065388 0.04070997 -0.66722880 -1.54502793
> m<-filter(x = w, filter = rep(x = 1/3, times = 3), method = "convolution", sides = 1)> head(m)[1] NA NA 0.32361646 0.37228292 0.06471168 -0.72384892
> tail(m)[1] 0.3158762 -0.1803096 0.2598066 -0.6450531 -0.5879723 -0.9120182
> (w[1]+w[2]+w[3])/3[1] 0.3236165
> (w[98]+w[99]+w[100])/3 2011 Christopher R. Bilder
1.24
[1] -0.9120182
> #This is what the book does> #m<-filter(x = w, filter = rep(x = 1/3, times = 3), method = "convolution", sides = 2) > par(mfrow = c(1,1))> plot(x = m, ylab = expression(m[t]), xlab = "t", type = "l", col = c("brown"), lwd = 1, main = expression(paste("Moving average where ", m[t] == (w[t] + w[t-1] + w[t-2])/3)), panel.first=grid(col = "gray", lty = "dotted"))> points(x = m, pch = 20, col = "orange")> #NOTE: The gridlines are not located in the correct locations
Moving average where mt wt wt 1 wt 2 3
t
mt
0 20 40 60 80 100
-1.0
-0.5
0.0
0.5
1.0
1.5
Comparing mt to wt> par(mfrow = c(1,1))
2011 Christopher R. Bilder
1.25
> plot(x = m, ylab = expression(paste(m[t], " or ", w[t])), xlab = "t", type = "l", col = c("brown"), lwd = 4, ylim = c(max(w), min(w)), main = expression(paste("Moving average where ", m[t] == (w[t] + w[t-1] + w[t-2])/3)), panel.first=grid(col = "gray", lty = "dotted"))> points(x = m, pch = 20, col = "orange")> lines(x = w, col = "red", lty = "dotted")> points(x = w, pch = 20,col = "blue")> legend(x = locator(1), legend=c("Moving average", "White noise"), lty=c("solid", "dotted"), col = c("brown", "red"), lwd = c(4,1), bty="n")
Moving average where mt wt wt 1 wt 2 3
t
mt o
r wt
0 20 40 60 80 100
21
0-1
-2
Moving averageWhite noise
Given these observed values of mt, answer the following questions: What patterns are there over time (t)?
2011 Christopher R. Bilder
1.26
How can the correlation between observations be used to help model the data?
The plot below shows a 7-point moving average (see program for code).
Compare moving averages
t
mt o
r wt
0 20 40 60 80 100
21
0-1
-2
3-pt Moving averageWhite noise7-pt Moving average
Example: Autoregressions (ar1.R)
An “autoregression” model uses past observations to predict future observations in a regression model.
Suppose the autoregression model is 2011 Christopher R. Bilder
1.27
xt = 0.7xt-1 + wt where wt~independent N(0,1) for t=1,…,n.
Notice how similar this is to a regression model from STAT 870! Since there is one past period on the right hand side, this is often denoted as an AR(1) model where AR stands for “autoregressive”.
Therefore,
x2 = 0.7x1 + w2
x3 = 0.7x2 + w3
Obviously, there will be a correlation between the random variables.
Below is one way in R to simulate observations from this model.
> set.seed(6381) #Different seed from white_noise.R> w<-rnorm(n = 200, mean = 0, sd = 1)> head(w)[1] 0.06737166 -0.68095839 0.78930605 0.60049855 -1.21297680 -1.14082872
> #######################################################> # autoregression > #Simple way to simulate AR(1) data> x<-numeric(length = 200)> x.1<-0> for (i in 1:length(x)) {
2011 Christopher R. Bilder
1.28
x[i]<-0.7*x.1 + w[i] x.1<-x[i] } > head(cbind(x, w)) x w[1,] 0.06737166 0.06737166[2,] -0.63379823 -0.68095839[3,] 0.34564730 0.78930605[4,] 0.84245166 0.60049855[5,] -0.62326064 -1.21297680[6,] -1.57711117 -1.14082872
> #Do not use first 100> x<-x[101:200] > win.graph(width = 8, height = 6, pointsize = 10) # Opens up wider plot window than the default (good for time series plots)> plot(x = x, ylab = expression(x[t]), xlab = "t", type = "l", col = c("red"), lwd = 1 , main = expression(paste("AR(1): ", x[t] == 0.7*x[t-1] + w[t])) , panel.first=grid(col = "gray", lty = "dotted"))> points(x = x, pch = 20, col = "blue")
2011 Christopher R. Bilder
1.29
0 20 40 60 80 100
-4-3
-2-1
01
2
AR(1): xt 0.7xt 1 wt
t
x t
Notes: Notice the syntax of the for loop. See the the first 6 rows of x and w right after the for
loop. Make sure you understand how the data was simulated!!!
The 1st value of x is 0.06737166 = 0.70 + 0.06737166
The 2nd value of x is -0.6337982 = 0.7(0.06737166) – 0.68095839
The 3rd value of x is 0.3456473 = 0.7(-0.6337982) + 0.78930605
2011 Christopher R. Bilder
1.30
Why are the first 100 observations discarded? For those of you who have taken a course where AR()
structures of a covariance matrix are discussed, what do you think the approximate correlation between xt and xt-1 is?
Here is an easier way to simulate observations from an AR(1). Note that this uses an Autoregressive Integrated Moving Average (ARIMA) structure that we will discuss in Chapter 3. In this case, I use = 10.
> set.seed(7181)> x<-arima.sim(model = list(ar = c(0.7)), n = 100, rand.gen = rnorm, sd = 10)> plot(x = x, ylab = expression(x[t]), xlab = "t", type = "l", col = c("red"), lwd = 1 ,main = expression(paste("AR(1): ", x[t] == 0.7*x[t-1] + w[t])), panel.first=grid(col = "gray", lty = "dotted"))> points(x = x, pch = 20, col = "blue")
2011 Christopher R. Bilder
1.31
AR(1): xt 0.7xt 1 wt
t
x t
0 20 40 60 80 100
-20
-10
010
2030
More notes: Both the moving average and autoregressive models will
be discussed extensively in Chapter 3. See how Shumway and Stoffer are kind of trying to match
up their simulated data plots in Section 1.3 to actual data plots in Section 1.2. They are doing this because we want to develop an equation that reasonably mimics or “models” real data.
2011 Christopher R. Bilder
1.32
1.4 Measures of Dependence: Autocorrelation and Cross-Correlation
We would like to understand the relationship between all random variables in a time series. In order to do that, we would need to look at the joint distribution function.
Suppose the time series consists of the random variables . Then the cumulative joint distribution function is:
F(c1, c2, …, cn) =
This can be VERY difficult to examine over the MULTIDIMENSIONS. Note the t1,…,tn subscripts are used just to denote a “general” set of times – not necessarily 1, 2, …, n.
Instead, it is often easier to look at the one or two dimensional distribution functions. The one-dimensional cumulative distributional function is denoted by Ft(x) = P(xtx) for a random variable xt at time t. The corresponding probability distribution function is
The mean value function is
2011 Christopher R. Bilder
1.33
Shumway and Stoffer will drop the subscript x from xt where there is no confusion about what random variable is used.
Important: The interpretation of t is that it represents the mean taken over ALL possible events that could have produced xt. Another way to think about it is suppose that
is observed an infinite number of times. Then represents the average value at time t1, represents the
average value at time t2, …
Example: Moving Average
Let mt = (wt + wt-1 + wt-2)/3 where wt~ind. N(0,1) for t=1,…,n.
Then t = E(mt)
= E[(wt + wt-1 + wt-2)/3]= (1/3) E(wt + wt-1 + wt-2)= (1/3) [E(wt) + E(wt-1) + E(wt-2)]= (1/3) [0 + 0 + 0]= 0
Example: Autoregressions 2011 Christopher R. Bilder
1.34
Let xt = 0.7xt-1 + wt where wt~ ind. N(0,1) for t = 1, …, n.
Then t = E(xt)
= E(0.7xt-1 + wt)= 0.7E(xt-1) + E(wt)= 0.7E(0.7xt-2+wt-1) + 0= 0
Autocovariance function
To assess the dependence between two random variables, we need to examine the two-dimensional cumulative distribution function. This can be denoted as F(cs, ct) = P(xscs, xtct) for two different time points s and t.
In STAT 870, you learned about the covariance function which measured the linear dependence between two random variables (see Chapter 5 of Kutner, Nachtsheim, and Neter (2004)). Since we are interested in linear dependence between two random variables in the same time series, we will examine the autocovariance function:
x(s,t) = Cov(xs, xt)= E[(xs – s)(xt – t)] for all s and t.
2011 Christopher R. Bilder
1.35
where and assuming
continuous random variables
Notes: If the autocovariance is 0, there is no linear
dependence. If s=t, the autocovariance is the variance: x(t,t) = E[(xt-t)2]
Shumway and Stoffer will drop the subscript x on when there is no confusion about which time series is being discussed.
Example: White noise
Suppose wt ~ ind. N(0, ) for t=1,…,n. What is (s,t) for s=t and st?
Example: Moving Average
2011 Christopher R. Bilder
1.36
Let mt = (wt + wt-1 + wt-2)/3 where wt ~ ind. N(0,1) for t=1,…,n.
E[(ms-s)(mt-t)] = E[msmt] since s=t=0
Then
E[msmt] = E[(ws + ws-1 + ws-2)/3 (wt + wt-1 + wt-2)/3] = (1/9)E[(ws + ws-1 + ws-2)(wt + wt-1 + wt-2)]
To find this, we need to examine a few different cases: s=t
E[mtmt] = E[ ] = Var(mt) + E(mt)2 since Var(mt) = E[ ] - E(mt)2 = (1/9){Var(wt + wt-1 + wt-2)} + 02
= (1/9){Var(wt) + Var(wt-1) + Var(wt-2)} since wt’s are independent= (1/9)(1+1+1) = 3/9
s=t-1
E[mt-1mt] = (1/9)E[(wt-1 + wt-2 + wt-3)(wt + wt-1 + wt-2)]
= (1/9)E[wt-1wt + wt-1wt-1 + wt-1wt-2 + wt-2wt + wt-2wt-1 + wt-2wt-2 + wt-3wt + wt-3wt-1 + wt-3wt-2]
= (1/9)[E(wt-1wt) + E(wt-1wt-1) + E(wt-1wt-2) + E(wt-2wt) + E(wt-2wt-1) + E(wt-2wt-2) + E(wt-3wt)
2011 Christopher R. Bilder
1.37
+ E(wt-3wt-1) + E(wt-3wt-2)]
= (1/9)[E(wt-1)E(wt) + E( ) + E(wt-1)E(wt-2) + E(wt-2)E(wt) + E(wt-2)E(wt-1) + E( ) + E(wt-3)E(wt) + E(wt-3)E(wt-1) + E(wt-3)E(wt-2)]
=(1/9)[00 + 1 + 00 + 00 + 00 + 1 + 00 + 00 + 00]; note that E( )=1 since Var(wt-1)=1
=2/9
s=t-2
E[mt-2mt] = (1/9)E[(wt-2 + wt-3 + wt-4)(wt + wt-1 + wt-2)]
= (1/9)E[wt-2wt + wt-2wt-1 + wt-2wt-2 + wt-3wt + wt-3wt-1 + wt-3wt-2 + wt-4wt + wt-4wt-1 + wt-4wt-2]
= (1/9)[E(wt-2wt) + E(wt-2wt-1) + E(wt-2wt-2) + E(wt-3wt) + E(wt-3wt-1) + E(wt-3wt-2) + E(wt-4wt) + E(wt-4wt-1) + E(wt-4wt-2)]
= (1/9)[00 + 00 + 00 + 1 + 00 + 00 + 00 + + 00 + 00]
= 1/9
s=t-3
2011 Christopher R. Bilder
1.38
E[mt-3mt] = (1/9)E[(wt-3 + wt-4 + wt-5)(wt + wt-1 + wt-2)]
= (1/9)E[wt-3wt + wt-3wt-1 + wt-3wt-2 + wt-4wt + wt-4wt-1 + wt-4wt-2 + wt-5wt + wt-5wt-1 + wt-5wt-2]
= (1/9)E[00 + 00 + 00 + 00 + 00 + 00 + 00 + 00 + 00] since E(wt)=0 for all t=1,…,n
= 0
Notice that s=t+1, t+2, … can be found in a similar manner. Also s=t-4, t-5,… can be found. In summary, the autocovariance function is
Notes: The word “lag” is used to denote the time separation
between two values. For example, |s-t|=1 denotes the lag is 1. |s-t|=2 denotes the lag is 2. We will use this “lag” terminology throughout this semester.
The autocovariance depends on the lag, but NOT individual times for the moving average example! This will be VERY, VERY important later!
2011 Christopher R. Bilder
1.39
Autocorrelation function (ACF)
In STAT 870, the Pearson correlation coefficient was defined to be:
for two variables X and Y where Cov(X,Y) denotes the covariance between X and Y and Var(X) denotes the variance of X. The reason the correlation coefficient is examined instead of the covariance is that it is always between –1 and 1. Note the following: close to 1 means strong, positive linear dependence close to –1 means strong, negative linear
dependence close to 0 means weak linear dependence.
The autocorrelation is the extension of the Pearson correlation coefficient from STAT 870 to time series analysis. Below is the autocorrelation function (ACF):
2011 Christopher R. Bilder
1.40
where s and t denote two time points. The ACF is also between –1 and 1 and has a similar interpretation as for correlation coefficient.
Example: White noise
Suppose wt ~ independent N(0, ) for t=1,…,n. What is (s,t) for s=t and st?
Example: Moving Average
Let mt = (wt + wt-1 + wt-2)/3 where wt ~ ind. N(0,1) for t = 1, …, n.
Note that
Then .
For example, (s,t) for |s-t|=1 is (2/9)/(3/9) = 2/3.
2011 Christopher R. Bilder
1.41
Example: Strong positive and negative linear dependence (dependence.R)
Questions: If there is strong positive linear dependence between xs
and xt, the time series will appear smooth or choppy in a plot of the series versus time. Choose an answer.
If there is strong negative linear dependence between xs and xt, the time series will appear smooth or choppy in a plot of the series versus time. Choose an answer.
Think of plots of xt vs. xs to help answer the above questions.
Below are three plots illustrating these statements. I simulated data from different time series models. The autocorrelation for |s-t|=1 is given for each model. The “estimated” autocorrelation, denoted by , is given for that particular data set. The will be discussed more Section 1.6.
1) (s,t) = 0.4972, for |s-t|=1
2011 Christopher R. Bilder
1.42
xt wt 0.9wt 1 where wt~ ind. N(0,1)
t
x t
0 20 40 60 80 100
-3-2
-10
12
2011 Christopher R. Bilder
1.43
2) (s,t) = -0.4972, for |s-t|=1xt wt 0.9wt 1 where wt~ ind. N(0,1)
t
x t
0 20 40 60 80 100
-2-1
01
23
2011 Christopher R. Bilder
1.44
3) (s,t)=0, for |s-t|=1xt wt 0wt 1 where wt~ ind. N(0,1)
t
x t
0 20 40 60 80 100
-2-1
01
Plot 1) is the least jagged and plot 2) is the most jagged. Remember what a correlation means. A positive correlation means that “large” values tend to occur with other “large” values and “small” values tend to occur with other “small” values. A negative correlation means that “large” values tend to occur with other “small” values and “small” values tend to occur with other “large” values.
Please see program for the code.
Cross-covariance function and Cross-correlation function 2011 Christopher R. Bilder
1.45
Sometimes there will be two time series of interest. When this occurs, one is often used to help in the modeling of the other. Suppose there are two series denoted by xt and yt for t=1,…,n. The cross-covariance function is
xy(s,t) = Cov(xs, yt) = E[(xs – xs)(yt – yt)] = E(xsyt) – xsyt
where xs=E(xs) and yt=E(yt)
The cross-correlation function is
where s and t denote two time points and the x and y subscripts help denote the particular series.
These two functions will be discussed more in Chapter 5 for transfer function models.
2011 Christopher R. Bilder
1.46
1.5 Stationary Time Series
Stationarity is a VERY important concept to understand for this course. A stationarity assumption will allow us to construct time series models.
Strictly stationary time series - The probabilistic behavior of is exactly the same as that of the shifted set
for ANY collection of time points t1,…, tk, for ANY k=1,2,…, and for ANY shift h=0, 1, 2, … .
What does this mean??? Let c1,…, ck be constants. Then
=
The cumulative joint probability distribution is INVARIANT to time shifts! For example,
P(x1c1, x2c2) = P(x10c1, x11c2)
Requiring a time series to be strictly stationary is VERY restrictive! A less restrictive requirement is weakly stationary.
Weakly stationary time series - The first two moments (mean and covariance) of the time series are invariant to time shifts. In other words,
2011 Christopher R. Bilder
1.47
E(xt) = for ALL time t and (t, t+h) = (0, h) for ALL time t.
Notes: and (0, h) are NOT functions of t. A “lag” refers to the time shift. (t, t+h) = (0, h) for ALL time t means that the
autocovariance function ONLY depends on the number of lags of the time shift. Thus, (0, h) = (1, h+1) = (2, 2+h) = (3, 3+h) = … .
Since we will generally be dealing with a stationary time series, we can make the following notational change: (h) = (0, h) = (t, t+h).
The variance of xt is (0). The same notational change can be made to the
autocorrelation function (ACF). Thus, (h) denotes the ACF at lag h. Note that
so the ACF simplifies to (h) = (h)/(0). Frequently, we will just say “stationary” to refer to
weakly stationary and say the full “strictly stationary” to refer to strictly stationary.
Example: White noise 2011 Christopher R. Bilder
1.48
Suppose wt ~ ind. N(0, ) for t=1,…,n. Is this a weakly stationary time series?
Note that the joint distribution is the product of the one-dimensional distributions since wt are independent.
Example: Moving Average
Let mt = (wt + wt-1 + wt-2)/3 where wt~ ind. N(0,1) for t = 1, …, n.
Previously, we found that t = 0 for all t and
Is the time series weakly stationary? Hint: let h=s-t.
Notes: (h) = (-h) for all h if the series is weakly stationary. This
means that it does not matter which way the shift occurs.
2011 Christopher R. Bilder
1.49
Stationarity can also be examined when two time series are of interest. To have two series, xt and yt, be jointly stationary: Both time series must have constant mean Both autocovariance functions must depend only on
the lag difference, and the cross-covariance must depend only on the lag difference. If two series are jointly stationary, then the following notation can be used:
xy(h) = E[(xt+h – x)(yt – y)] for x constant for all x
and y constant for all y, , and
xy(h)= yx(-h)
is not necessarily equal to (usually will be different).
Example: Work through Example 1.21.
Let xt = wt + wt-1 and yt = wt – wt-1 where wt ~ ind. N(0, ) for t = 1, …, n
Show xt and yt are weakly stationary and show xt and yt are jointly stationary.
xt
2011 Christopher R. Bilder
1.50
E(xt) = E(wt + wt-1) = E(wt) + E(wt-1) = 0 + 0 =0
Thus E(xt) = xt = 0 for all t.
(s,t) = E[(xs – xs)(xt – xt)] = E[xsxt] since xs=xt=0
Then E[xsxt] = E[(ws + ws-1)(wt + wt-1)] = E[wswt + ws-1wt + wswt-1 + ws-1wt-1)]
If s=t, then (t,t) = E[ + wt-1wt + wtwt-1 + ]= E[ ] + E[wt-1wt] + E[wtwt-1] + E[ ]= Var(wt) + E[wt]2 + 2E[wt-1]E[wt] + Var(wt-1) + E[wt-1]2
= + 02 + 200 + + 02
= 2
If s=t-1, then (t-1,t) = E[wt-1wt + wt-2wt + + wt-2wt-1]= E[wt-1wt] + E[wt-2wt] + E[ ] + E[wt-2wt-1]= E[wt-1]E[wt] + E[wt-2]E[wt] + Var(wt-1) + E[wt-1]2
+ E[wt-2]E[wt-1]= 00 + 00 + + 02 + 00= .
Note that (t-1,t) = and (s,t) = 0 for |s-t|>1.
2011 Christopher R. Bilder
1.51
Therefore, .
Thus, xt is weakly stationary.
yt
E(yt) = E(wt – wt-1) = E(wt) – E(wt-1) = 0 - 0 =0
Thus E(yt) = yt = 0 for all t
(s,t) = E[(ys-xs)(yt-xt)] = E[ysyt] since ys=yt=0
Then E[ysyt] = E[(ws – ws-1)(wt – wt-1)] = E[wswt – ws-1wt – wswt-1 + ws-1wt-1]
If s=t, then (t,t) = E[ – wt-1wt – wtwt-1 + ]= E[ ] – E[wt-1wt] – E[wtwt-1] + E[ ]= Var(wt) + E[wt]2 – 2E[wt-1]E[wt] + Var(wt-1) + E[wt-1]2
= + 02 – 200 + + 02
= 2
If s=t-1, then (t-1,t) = E[wt-1wt – wt-2wt – + wt-2wt-1]
2011 Christopher R. Bilder
1.52
= E[wt-1wt] – E[wt-2wt] – E[ ] + E[wt-2wt-1]= E[wt-1]E[wt] – E[wt-2]E[wt] – Var(wt-1) – E[wt-1]2
+ E[wt-2]E[wt-1]= 00 – 00 – – 02 + 00= - .
Note that (t-1,t) = - and (s,t) = 0 for |s-t|>1.
Therefore, .
Thus, yt is weakly stationary.
xt and yt
xy(s,t) = E[(xs – xs)(yt – yt)] = E[xsyt] since xt = yt = 0
Then xy(s,t) = E[(ws + ws-1)(wt – wt-1)] = E[wswt + ws-1wt – wswt-1 – ws-1wt-1]
If s=t, then xy(t,t) = E[wtwt + wt-1wt – wtwt-1 – wt-1wt-1]= E[ ] – E[ ]= Var(wt) + E[wt]2 – Var(wt-1) – E[wt-1]2 = + 02 – – 02
= 0
2011 Christopher R. Bilder
1.53
If s=t-1, then xy(t-1,t) = E[wt-1wt + wt-2wt – wt-1wt-1 – wt-2wt-1]= -E[ ]= -
If s=t+1, then xy(t+1,t) = E[wt+1wt + wtwt – wt+1wt-1 – wtwt-1]= E[ ]=
If |s-t|>1, then xy(s,t) = 0.
Therefore,
Thus, xt and yt are jointly stationary.
Example: where wt ~ independent N(0, ) for t = 1,…, n and |1|<1
Read this example after we discuss “backshift notation” in Section 2.3.
Note that the sum of an infinite series is
2011 Christopher R. Bilder
1.54
The time series can be rewritten as
Then xw(h) = E[(wt – wt)(xt+h – x,t+h)]
Remember that: wt=E(wt)=0 and x,t+h=E(xt+h)=0 and in general for two random variables V and W, E[(V – v)(W – w)] = E(VW) – E(V)E(W).
So xw(h) = E[wtxt+h] – E[wt]E[xt+h] = E[wtxt+h] – 00 = E[wtxt+h] = E[wt( )] =
Since the wt’s are independent with mean 0, we only need to be concerned about when the subscripts of the wt pairs match.
xw(0)
2011 Christopher R. Bilder
1.55
xw(1)
xw(-1)
In general, xw(h) = for h0 and xw(h) = 0 for h<0.
Therefore, the xt and wt are jointly stationary (note: it can be shown that xt and wt are stationary; I just did the cross-covariance part).
Since Var(xt) = ,
2011 Christopher R. Bilder
1.56
for h0 and
=0 for h<0.
Linear Process
Note that the above examples are special cases of a “linear process”. In general, a linear process can be defined as
with and
wt ~ ind. N(0, ).
It can be shown that for h0 and
(h)=(-h) provided the series is stationary.
PF: Without loss of generality, let =0.
E(xt) = =
= = = 0.
Note that (h) = Cov(xt, xt+h) = E(xtxt+h) – E(xt)E(xt+h) = E(xtxt+h) since E(xt) = E(xt+h) = 0.
2011 Christopher R. Bilder
1.57
Then E(xtxt+h)
because E(wt-iwt+h-j)=0 when -ih-j
and E(wt-iwt+h-j)= = when -i=h-j j-i=h
Therefore, for h0 and (-h)=(h).
Compare the previous examples to this result!.
Important!:
There is a very important case when weakly stationary implies strictly stationary. This occurs when the time series has a multivariate normal distribution. Remember that a univariate normal distribution is defined only by its mean and variance. The multivariate normal distribution is defined only by its mean vector and covariance matrix.
2011 Christopher R. Bilder
1.58
Thus, if we can assume a multivariate normal distribution, we ONLY need to check if the time series satisfies the weakly stationary requirements in order to say the time series is strongly stationary. Thus, notice what the word “stationary” would mean in this case.
Example: Visualizing stationarity
Below are a few plots of the observed values of a time series. Identify which plots correspond to a weakly stationary series. There is no program given for this example.
2011 Christopher R. Bilder
1.59
2011 Christopher R. Bilder
1.60
From examining these plots, why is it important to have stationarity?
2011 Christopher R. Bilder
1.61
1.5 Estimation of Correlation
, (h), and (h) are usually unknown so we need to estimate them. In order to do this, we need to assume the time series is weakly stationary.
Sample mean function
By the weakly stationary assumption, E(x1)=, E(x2)=,…, E(xn)=. Thus, a logical estimate of is
Note that this would not make sense to do if the weakly stationarity assumption did not hold!
Sample autocovariance function
Again with the weakly stationarity assumption, we only need to worry about the lag difference. The estimated autocovariance function is:
Notes This is similar to the formula often used to estimate the
covariance between two random variables x and y: 2011 Christopher R. Bilder
1.62
. If you do not recognize
this formula, look at the numerator of the estimated correlation coefficient.
The sum goes up to n-h to avoid having negative subscripts in the x’s.
This is NOT an unbiased estimate of (h)!
Sample autocorrelation function (ACF)
Question: What does (h)=0 mean and why would this be important to detect?
Since this is important, we conduct hypothesis tests for (h) for all h0! In order to do the hypothesis test, we need to find the sampling distribution for under the null hypothesis of (h)=0.
Sampling distribution for : See Shumway and Stoffer for the exact description (understanding the proof in the appendix requires having a course on asymptotics). In summary, if (h)=0, xt is stationary, and the sample size is “large” then has an approximate normal distribution with mean 0 and standard deviation .
2011 Christopher R. Bilder
1.63
For a hypothesis test, we could check if is within the bounds of 0 or not where P(Z < Z1-/2) = 1 – /2 for a standard normal random variable Z. If it is not, then there is sufficient evidence to conclude that (h)0. We will be using this result a lot for the rest of this course!
Example: xt=0.7xt-1+wt where wt ~ ind. N(0,1) and n=100. (ar1_0.7.R, AR1.0.7.txt)
The data are simulated using the above model and are different from the example earlier in Chapter 1. Also, the data are read in from a file instead of simulated within R.
> ar1<-read.table(file = "C:\\chris\\UNL\\STAT_time_series \\chapter1\\AR1.0.7.txt", header=TRUE, sep = "")
> head(ar1) t x1 1 0.041726802 2 0.371906823 3 -0.185451854 4 -1.382974225 5 -2.875936526 6 -2.60017605
The plot below is constructed in a similar manner as past plots.
2011 Christopher R. Bilder
1.64
0 20 40 60 80 100
-4-2
02
4
Data simulated from AR(1): xt 0.7xt 1 wt where wt~N(0,1)
t
x t
The easiest way to find the autocorrelations in R is to use the acf() function.
> x<-ar1$x> rho.x<-acf(x = x, type = "correlation", main = expression(paste("Data simulated from AR(1): ", x[t] == 0.7*x[t-1] + w[t], " where ", w[t], "~N(0,1)")))
2011 Christopher R. Bilder
1.65
0 5 10 15 20
-0.2
0.0
0.2
0.4
0.6
0.8
1.0
Lag
AC
F
Data simulated from AR(1): xt 0.7xt 1 wt where wt~N(0,1)
The horizontal lines on the plot are drawn at 0 where Z1-0.05/2=1.96. > rho.x
Autocorrelations of series 'x', by lag
0 1 2 3 4 5 6 7 1.000 0.674 0.401 0.169 -0.023 -0.125 -0.067 -0.064
8 9 10 11 12 13 14-0.058 0.005 -0.044 -0.041 -0.017 0.064 0.076
15 16 17 18 19 20 0.160 0.191 0.141 0.081 0.006 -0.132
2011 Christopher R. Bilder
1.66
> names(rho.x)[1] "acf" "type" "n.used" "lag" "series" "snames"> rho.x$acf, , 1
[,1] [1,] 1.000000000 [2,] 0.673671871 [3,] 0.400891188 [4,] 0.168552826 [5,] -0.023391129 [6,] -0.124632501 [7,] -0.067392830 [8,] -0.064248086 [9,] -0.057717749[10,] 0.005312358[11,] -0.044035976[12,] -0.041121407[13,] -0.017197132[14,] 0.063864970[15,] 0.075575696[16,] 0.159665692[17,] 0.191349965[18,] 0.140967540[19,] 0.080508273[20,] 0.005584061[21,] -0.131559629
> rho.x$acf[1:2][1] 1.0000000 0.6736719
Questions: What happens to the autocorrelations over time? Why
do you think this happens? Is there a positive or negative correlation? At what lags is (h)0?
The autocovariances can also be found using acf(). > acf(x = x, type = "covariance", main =
2011 Christopher R. Bilder
1.67
expression(paste("Data simulated from AR(1): ", x[t] == 0.7*x[t-1] + w[t], " where ", w[t], "~N(0,1)")))
0 5 10 15 20
0.0
0.5
1.0
1.5
2.0
2.5
Lag
AC
F (c
ov)
Data simulated from AR(1): xt 0.7xt 1 wt where wt~N(0,1)
To help understand autocorrelations and their relationship with the correlation coefficient better, I decided to look at the “usual” estimated Pearson correlations between xt, xt-1, xt-2, and xt-3. > x.ts<-ts(x)> set1<-ts.intersect(x.ts, x.ts1 = lag(x = x.ts, k = -1), x.ts2 = lag(x = x.ts, k = -2), x.ts3 = lag(x = x.ts, k = -3))
2011 Christopher R. Bilder
1.68
> set1Time Series:Start = 4 End = 100 Frequency = 1 x.ts x.ts1 x.ts2 x.ts3 4 -1.38297422 -0.18545185 0.37190682 0.04172680 5 -2.87593652 -1.38297422 -0.18545185 0.37190682 6 -2.60017605 -2.87593652 -1.38297422 -0.18545185 7 -1.10401719 -2.60017605 -2.87593652 -1.38297422 8 -0.46385116 -1.10401719 -2.60017605 -2.87593652 9 0.80339069 -0.46385116 -1.10401719 -2.60017605
Output edited
97 -1.36418639 -1.63175408 -1.56008530 -0.40824385 98 -0.37209392 -1.36418639 -1.63175408 -1.56008530 99 -0.65833401 -0.37209392 -1.36418639 -1.63175408100 2.03705932 -0.65833401 -0.37209392 -1.36418639
> cor(set1) x.ts x.ts1 x.ts2 x.ts3x.ts 1.0000000 0.6824913 0.4065326 0.1710145x.ts1 0.6824913 1.0000000 0.6929638 0.4108375x.ts2 0.4065326 0.6929638 1.0000000 0.6935801x.ts3 0.1710145 0.4108375 0.6935801 1.0000000 > library(car) #scatterplot.matrix is in this package - may need to install first> scatterplot.matrix(formula = ~x.ts + x.ts1 + x.ts2 + x.ts3, data=set1, reg.line=lm, smooth=TRUE, span=0.5, diagonal = 'histogram')
2011 Christopher R. Bilder
1.69
x
Fre
quen
cy
x.ts
-4 -2 0 2 4 -4 -2 0 2 4
-4-2
02
4
-4-2
02
4
x
Fre
quen
cy
x.ts1
x
Fre
quen
cyx.ts2
-4-2
02
4
-4 -2 0 2 4
-4-2
02
4
-4 -2 0 2 4
x
Fre
quen
cy
x.ts3
> set2<-ts.intersect(x.ts, x.ts1 = lag(x = x.ts, k = 1), x.ts2 = lag(x = x.ts, k = 2), x.ts3 = lag(x = x.ts, k = 3))> set2Time Series:Start = 1 End = 97 Frequency = 1 x.ts x.ts1 x.ts2 x.ts3 1 0.04172680 0.37190682 -0.18545185 -1.38297422 2 0.37190682 -0.18545185 -1.38297422 -2.87593652 3 -0.18545185 -1.38297422 -2.87593652 -2.60017605 4 -1.38297422 -2.87593652 -2.60017605 -1.10401719 5 -2.87593652 -2.60017605 -1.10401719 -0.46385116
Output edited 2011 Christopher R. Bilder
1.70
94 -0.40824385 -1.56008530 -1.63175408 -1.3641863995 -1.56008530 -1.63175408 -1.36418639 -0.3720939296 -1.63175408 -1.36418639 -0.37209392 -0.6583340197 -1.36418639 -0.37209392 -0.65833401 2.03705932
> cor(set2) x.ts x.ts1 x.ts2 x.ts3x.ts 1.0000000 0.6935801 0.4108375 0.1710145x.ts1 0.6935801 1.0000000 0.6929638 0.4065326x.ts2 0.4108375 0.6929638 1.0000000 0.6824913x.ts3 0.1710145 0.4065326 0.6824913 1.0000000
The ts() function converts the time series data to an object that R recognizes as a time series.
The lag function is used to find xt-1, xt-2, and xt-3. The k option specifies how many time periods to go back. Run just lag(x.ts, k = -1) and lag(x.ts, k = 1) to see what happens. To get everything lined up as I wanted with ts.intersect(), I chose to use k = -1. Page 388 of Venables and Ripley (2002) has a good quote about the lag() function:
The function lag shifts the time axis of a series back by k positions … This can cause confusion, as most people think of lags as shifting time and not the series; that is, the current value of a series lagged by one year is last year’s, not next year’s.
Run the lag() function by itself and check out the “Start” index.
The ts.intersect() function finds the intersection of the four different “variables”.
2011 Christopher R. Bilder
1.71
The cor() function finds the Pearson correlations between all variable pairs. Notice how close these correlations are to the autocorrelations!
The scatterplot.matrix() function finds a scatter plot matrix. The function is in the car package (see Fox’s “An R and S-Plus Companion to Applied Regression” book from STAT 870).
Example: OSU enrollment data (osu_enroll.R,osu_enroll.xls)
The code used to find the autocorrelations is:
> x<-osu.enroll$Enrollment> rho.x<-acf(x = x, type = "correlation", main = "OSU Enrollment series")> rho.x
Autocorrelations of series 'x', by lag 0 1 2 3 4 5 6 7 8 9 10 1.000 -0.470 -0.425 0.909 -0.438 -0.395 0.822 -0.403 -0.358 0.739 -0.367 11 12 13 14 15 16 -0.327 0.655 -0.337 -0.297 0.581 -0.309
> rho.x$acf[1:9][1] 1.0000000 -0.4702315 -0.4253427 0.9087421 -0.4377336 -0.3946048 0.8224660[8] -0.4025871 -0.3584216
2011 Christopher R. Bilder
1.72
0 5 10 15
-0.5
0.0
0.5
1.0
Lag
AC
F
OSU Enrollment series
Notes: There are some large autocorrelations. This is a
characteristic of a nonstationary series. We will examine this more later.
Since the series is not stationary, the hypothesis test for (h)=0 should not be done here using the methods discussed earlier.
There is a pattern among the autocorrelations. What does this correspond to?
2011 Christopher R. Bilder
1.73
Sample cross-covariance function:
Note that ; however, is not necessarily equal to .
Sample cross-correlation function:
Sampling distribution for : has an approximate normal distribution with mean 0 and standard deviation of
if the sample size is large and at least one of the series is white noise.
For a hypothesis test, we could check if is within the bounds of 0 or not. If it is not, then there is sufficient evidence to conclude that xy(h)0.
To find the CCF in R, use the acf() or ccf() functions.
2011 Christopher R. Bilder
1.74
Example: Simple CCF example (simple_CCF_example.xls, simple_CCF_example.R)
The above Excel file shows how some of the “by-hand” calculations of the cross-covariance function can be done. Below is part of the resulting spreadsheet.
h=0 h=1t xt yt
1 1 2 8.75 5.252 2 3 3.75 1.253 3 5 0.25 -0.254 4 6 0.25 0.755 5 8 3.75 6.256 6 9 8.75 No x7
Mean 3.5 5.5 Sum: 25.50 13.25
The estimated cross-covariance function is
.
Then,
The cross-correlation function is .
2011 Christopher R. Bilder
1.75
Then,
where and were found in R.
Below is the R code and output:
> x<-c(1, 2, 3, 4, 5, 6)> y<-c(2, 3, 5, 6, 8, 9)
> gamma.x<-acf(x = x, type = "covariance", plot = FALSE)> gamma.x
Autocovariances of series 'x', by lag
0 1 2 3 4 5 2.917 1.458 0.167 -0.792 -1.250 -1.042
> gamma.y<-acf(x = y, type = "covariance", plot = FALSE)> gamma.y
Autocovariances of series 'y', by lag
0 1 2 3 4 5 6.250 3.292 0.167 -1.625 -2.917 -2.042
> x.y.acf<-acf(x = cbind(x,y), type = "correlation")> x.y.acf
Autocorrelations of series 'cbind(x, y)', by lag
, , x
x y 1.000 ( 0) 0.995 ( 0)
2011 Christopher R. Bilder
1.76
0.500 ( 1) 0.517 (-1) 0.057 ( 2) 0.039 (-2) -0.271 ( 3) -0.263 (-3) -0.429 ( 4) -0.449 (-4)
, , y
x y 0.995 ( 0) 1.000 ( 0) 0.517 ( 1) 0.527 ( 1) 0.039 ( 2) 0.027 ( 2) -0.263 ( 3) -0.260 ( 3) -0.449 ( 4) -0.467 ( 4)
0 1 2 3 4
-0.5
0.0
0.5
1.0
Lag
AC
F
x
0 1 2 3 4
-0.5
0.0
0.5
1.0
Lag
x & y
-4 -3 -2 -1 0
-0.5
0.0
0.5
1.0
Lag
AC
F
y & x
0 1 2 3 4
-0.5
0.0
0.5
1.0
Lag
y
> x.y.ccf<-ccf(x = x, y = y, type = "correlation")> x.y.ccf
2011 Christopher R. Bilder
1.77
Autocorrelations of series 'X', by lag
-4 -3 -2 -1 0 1 2 3 4 -0.449 -0.263 0.039 0.517 0.995 0.517 0.039 -0.263 -0.449
-4 -2 0 2 4
-0.5
0.0
0.5
1.0
Lag
AC
F
x & y
Notes: The c() function was used to enter data into a R vector. R gives the CCF for “x & y” and “y & x”. The “x & y” gives
for h0 and “y & x” gives for h0. Note that the R labeling may be confusing!!! R is using the relationship = in its notation.
2011 Christopher R. Bilder
1.78
Example: Reproduce results from Example 1.25 of Shumway and Stoffer (ex1.25.R)
> #How to read in data> #Method #1 - obtain from tsa3.rda file > load("C:\\chris\\unl\\STAT_time_series\\ TSA_3rd_edition\\tsa3.rda")
> #Method #2 - I have ASCII text files from their previous book editions> soi<-read.table(file = "C:\\chris\\UNL\\ STAT_time_series\\Shumway_Stoffer_web_info\\Data\\ soi.dat", header=FALSE, col.names = "soi.var")> rec<-read.table(file = "C:\\chris\\UNL\\ STAT_time_series\\Shumway_Stoffer_web_info\\Data\\ recruit.dat", header=FALSE, col.names = "rec.var")> head(soi) soi.var1 0.3772 0.2463 0.3114 0.1045 -0.0166 0.235> head(rec) rec.var1 68.632 68.633 68.634 68.635 68.636 68.63 > soi.rec.acf<-acf(x = cbind(soi,rec), type = "correlation", lag.max = 50)> soi.rec.acf
Autocorrelations of series 'cbind(soi, rec)', by lag
, , soi.var
2011 Christopher R. Bilder
1.79
soi.var rec.var 1.000 ( 0) 0.025 ( 0) 0.604 ( 1) 0.011 ( -1) 0.374 ( 2) -0.042 ( -2) 0.214 ( 3) -0.146 ( -3) 0.050 ( 4) -0.297 ( -4)
Output edited, , rec.var
soi.var rec.var 0.025 ( 0) 1.000 ( 0) -0.013 ( 1) 0.922 ( 1) -0.086 ( 2) 0.783 ( 2) -0.154 ( 3) 0.627 ( 3) -0.228 ( 4) 0.477 ( 4)
Output edited
0 10 20 30 40 50
-0.5
0.0
0.5
1.0
Lag
ACF
soi.var
0 10 20 30 40 50
-0.5
0.0
0.5
1.0
Lag
soi.var & rec.var
-50 -40 -30 -20 -10 0
-0.5
0.0
0.5
1.0
Lag
ACF
rec.var & soi.var
0 10 20 30 40 50
-0.5
0.0
0.5
1.0
Lag
rec.var
2011 Christopher R. Bilder
1.80
> ccf(x = soi, y = rec, type = "correlation", main = "Part of Figure 1.14", lag = 50)
-40 -20 0 20 40
-0.6
-0.4
-0.2
0.0
0.2
Lag
AC
F
Part of Figure 1.14
Notes: The “soi.var & rec.var” and “rec.var and soi.var” are not
exactly the same. R gives the CCF for “soi.var & rec.var” and “rec.var and
soi.var”. The “soi.var & rec.var” gives for h0 and “rec.var and soi.var” gives for h0. Note that the R labeling may be confusing!!! R is using the relationship = in its notation.
2011 Christopher R. Bilder
1.81
Examine the strength of association for the two series individually and together.
2011 Christopher R. Bilder
1.82
1.6 Vector-Valued and Multidimensional Series
Since many different time series often occur at the same time, it is often useful to consider a vector of time series data.
Let be a vector time series. Note that this
could also be represented as a transpose:
xt = (xt1, xt2, …, xtp)
A vector is represented as a bold letter. Note that xt1 represents the first time series variable at time t, …, xtp represents the pth time series variable at time t.
For the stationary case, = E(xt) where = (t1, t2, …, tp) is the mean vector (h) = E[(xt+h - )(xt - )] is the autocovariance matrix
The autocovariance matrix is similar to the covariance matrix discussed in STAT 870. Elements of this matrix are
2011 Christopher R. Bilder
1.83
where ij(h) = E[(xt+h,i - i)(xtj-j)]
Notes: For example, 12(h)=Cov(xt,1, xt+h,2) Remember that a covariance matrix is symmetric (-h) = (h) since ij(h) = ji(-h)
Sample autocovariance matrix:
where
Note that
See Shumway and Stoffer’s discussion on multidimensional process. In this case, the time series is indexed by items in addition to time! For example, this can happen in spatial statistics.
2011 Christopher R. Bilder
1.84