1
ESSE 4020: Time Series Basics, Stochastic Models, Random Walks Sept 2019
Time series – generally one or more dependent variable, and one independent variable. So F(t) or F(t), often with a stochastic component – random behaviour or random measurement error. May also have a periodic, or seasonal component – annual, diurnal variations, and a trend. Library: Online Click to access these resources. Can download chapters – free. S&S:Time series analysis and its applications [electronic resource] : with R examples Author:Robert H. Shumway, David S. Stoffer. Publication info:New York : Springer, c2011.; Format: EBook, Book, Online CM2009: Introductory time series with R [electronic resource] Author: Paul S.P. Cowpertwait, Andrew V. Metcalfe. Publication info: New York : Springer-Verlag, c2009. ;Format: EBook, Book, Online And in the future, a text using matlab. https://www.springer.com/gp/book/9783030207892 Notes based on material from SS2011 Chapter 1; CM2009 Chapters 1, 2, 4.2, 4.3; . READ CM 2009 Chapters 1, 2. Matlab Help. https://www.mathworks.com/help/matlab/data_analysis/time-series-objects.html Time Series Objects: Create, modify, and analyze timeseries objects containing time-dependent data. A timeseries object contains data and time information within its properties that describes a dynamic process. You can use timeseries object functions to create, modify, and analyze the behavior of a time series. Or with R https://a-little-book-of-r-for-time-series.readthedocs.io/en/latest/src/timeseries.html Using R for Time Series Analysis; Time Series Analysis This booklet tells you how to use the R statistical software to carry out some simple analyses that are common in analysing time series data. This booklet assumes that the reader has some basic knowledge of time series analysis, and the principal focus of the booklet is not to explain time series analysis, but rather to explain how to carry out these analyses using R. R is more convenient for data that are arrayed as year, month, or day, hour and have a “seasonal” component.
2
from S&S 2011 and Jim Hansen’s analyses http://www.columbia.edu/~jeh1/
From CM 2009
Many different ways to look at the data. Anomalies are relative to 1980-2015 average. Recent data available at https://data.giss.nasa.gov/gistemp/graphs_v3/
3
Has a trend, somewhat linear and one could look for annual cycle but not apparent.
Many other examples. Raw data or combinations to provide indices (SOI, S&PTSX etc). Continuous or discrete, time averaging or values at specified time intervals – why are they different?
4
And from https://www.yorku.ca/pat/weatherStation/index.php
5
Sampled every 2 s and averaged over 5 min. A medical example – from S&S. Multiple variables at same time - Blood Oxygenation
6
Decomposition - from CM2009, chapter 1,
Additive decomposition more common. A simple additive decomposition model is given by xt = mt + st + zt where, at time t, xt is the observed series, mt is the trend, st is the seasonal effect, and zt is an error term that is, in general, a sequence of correlated random variables with mean zero. Histograms, pdf, cdf for continuous (dependent) variables, or pmf - probability mass function for discrete (dependent) variables, e.g. dice, reported cloud amount (tenths). From Wikipedia
7
Probability density function, pdf, f(x) = F'(x) provided F is cts and differentiable, and
Histograms, Lake Level histograms, pdf, cdf . Dependent variable more or less cts but finite number of samples.
Previous years notes….
Annual average lake level Lake Huron, 1875-1972 (98 samples): Mean 9.00, Standard Deviation 1.32 Not sure if these are ft or m, one station or average – need the corresponding metadata (https://en.wikipedia.org/wiki/Metadata) is a bit general but generally need precise information on the data, where from, what it is, how measured, how processed, etc For more recent lake level data, https://waterlevels.gc.ca/C&A/historical-eng.html or https://www.glerl.noaa.gov/data/wlevels/ Sites are good and provide metadata : Lake Erie monthly mean water levels in metres referred to IGLD 1985 - ???? Ask Google?
6.
7.
8.
9.
10.
11.
12.
0 20 40 60 80 100
Series
8
Water level plots at http://lre-wm.usace.army.mil/ForecastData/GLBasinConditions/LTA-GLWL-Graph.pdf
9
.csv files could be downloaded but I used http://www.tides.gc.ca/C&A/network_means-eng.html No download link but copy and paste their table into Excel worked this time.
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Yearly Minimum
Maximum
Range
Average
Monthly
Monthly
1918 174.59 174.74 174.74 174.84 175 175.14 175.17 175.16 175.1 175.03 174.98 174.99 174.96 174.59 175.17 0.58
1919 175.14 174.88 175 175.14 175.3 175.32 175.25 175.16 175.06 175 174.91 174.76 175.08 174.76 175.32 0.56
1920 174.3 174.4 174.58 174.82 174.95 175.01 175.07 175.08 175.01 174.92 174.84 174.81 174.82 174.3 175.08 0.78
1921 174.87 174.43 174.8 175 175.07 175.08 175.06 174.97 174.88 174.77 174.74 174.76 174.87 174.43 175.08 0.65
…. To 2018
Excel will generate good plots, as above but we should also use Matlab or R or … So to plot annual averages, extract two columns, YY, and YA and save as .csv files With Matlab YY=csvread("E:4020/GTLAKES/YY.csv"); YA=csvread("E:4020/GTLAKES/YA.csv"); plot(YY,YA); %works without transpose,
Can then clean up a little, add axis labels etc. axis ([1915 2020 174 176]); xlabel({'year'}); ylabel({'water level (m)'}); title('Lake Erie Annual Water Levels')
174
174.5
175
175.5
176
1918
1923
1928
1933
1938
1943
1948
1953
1958
1963
1968
1973
1978
1983
1988
1993
1998
2003
2008
2013
2018
Yearly Average Water Level, Lake Erie (m)
10
What do we get from looking at these plots? What quantities might we want to compute? Mean, variance or standard deviation, histograms? AV = mean(YA); AV = 175.0175 SD = std(YA); SD = 0.3449; VA=var(YA); VA = 0.1190; sqrt(VA); 0.3449 histogram(YA); histogram(YA,11)
Is there a trend? Linear regression? fitlm, table - we will write a linear regression code later. tbl=table(YY,YA); yafit=fitlm(tbl); plot(yafit) axis ([1915 2020 174 176]); xlabel({'year'}); ylabel({'water level (m)'});
11
Is there something going on with a 30-40 year period??? Somehow plot looks different with points rather than lines. Worth taking different looks. Year to year memory? Autocorrelation. acf = autocorr(YA); plot(acf); xlabel({'lag'}); ylabel({'acf'});
12
Or just autocorr(YA)
So what is Autocorrelation? How is it computed? What does it mean? Definitions below are for a time series model or stochastic process with many realizations. With a single observed time series we need the sample autocovariance.
13
Stationarity can be an issue. For example there is a trend in the Lake Erie water level data. Might be useful to remove it? Recalling the cdf and pdf and thinking in terms of many realizations of a process at time t we can define
14
Note
15
Some stationary time series examples. But hard to establish stationarity of a single sample time series. Many are not, but one can still compute autocorrelation – as in lake level example. Ergodic hypothesis. In econometrics and signal processing, a stochastic process is said to be ergodic if its statistical properties can be deduced from a single, sufficiently long, random sample of the process. Cowpertwait and Metcalfe, Chapters 1 and 2. Useful reading – download them!
---------------------------------------------------
Will get back to linear regression, correlations and autocorrelations later, but look at time series models and decompositions first.
16
Pseudo-Random numbers. A slightly archaic term for a computer-generated random number. The prefix pseudo- is used
to distinguish this type of number from a "truly" random number generated by a random
physical process such as radioactive decay.
http://mathworld.wolfram.com/RandomNumber.html
A random number is a number chosen as if by chance from some specified distribution such
that selection of a large set of these numbers reproduces the underlying distribution. Almost
always, such numbers are also required to be independent, so that there are no correlations
between successive numbers. Computer-generated random numbers are sometimes
called pseudorandom numbers, while the term "random" is reserved for the output of
unpredictable physical processes.
Matlab :
X = rand; returns a single uniformly distributed random number in the interval (0,1).
seed? rand(n,m) generates an array of n x m pseudo random values
rng(seed) seeds the random number generator using the nonnegative integer seed so
that rand, randi, and randn produce a predictable sequence of numbers.
XX=rand(1000,1); histogram(XX)
17
- maybe increase to 10000 and specify number of bins,
XX=rand(10000,1); histogram(XX,10)
And look at autocorr
autocorr(XX)
With a normal distribution,
YY=randn(1000,1); >> histogram(YY); autocorr(YY)
18
Look at some sample numbers, YY(1:10,1)
ans = -0.0039, 0.1188, 0.5800, -0.0995, -0.5751, 0.8516, 1.4024, 0.5074
-0.2197, 1.3957
help(randn)
X = randn(n) returns an n-by-n matrix of normally distributed random numbers. Why nxn ?
X = randn(sz1,...,szN) returns an sz1-by-...-by-szN array of random numbers
wheresz1,...,szN indicate the size of each dimension. For example, randn(3,4) returns
a 3-by-4 matrix. Note mean 0 and std Deviation = 1
In general, you can generate N random numbers in the interval (a,b) with the formula
r = a + (b-a).*rand(N,1).
Or with R, runif, rnorm
runif(10, min = 0, max = 1) [1] 0.5194561 0.5429784 0.5136143 0.3902505 0.0737053 0.5408465 0.8932388
[8] 0.1209699 0.1794184 0.8080221
rts=runif(1000, -1,1); plot (rts); hist(rts)
Histogram looks a bit one sided? mean(rts) [1] -0.0382606: var(rts) [1] 0.3518284 – what do we expect?
19
> rnorm(10)
[1] 0.11258788 -1.13525995 -0.04802839 -0.85215659 -0.80190492 0.53513598
[7] -0.09190464 0.67647779 0.91268526 0.10332288
rntst2<- rnorm(10000, mean = 0, sd = 1); hist(rntst2)
> mean(rntst2); var(rntst2) [1] 0.01692706 ; [1] 0.9837065; sd(rntst2) [1] 0.9918198
20
Some definitions, iid, white noise
iid - independent (uncorrelated) and identically distributed, IID(0,σ2),
White Noise - zero mean, uncorrelated, variance σ2 - see B&D 2: p16,17 : IID
White Noise; WN(0,σ2) Strictly speaking this need not be identically distributed, just same
variance.
Definition: A white noise process is a random process of random variables that are
uncorrelated, have mean zero, and a finite variance (which is denoted s2 below).
IID noise is white noise but white noise need not be identically distributed – but the term
“white noise” often assumes IID as used by some texts.
Problem 1.8 in B&D looked devious to me:
Gaussian white noise; each sample has a normal distribution with zero mean.
Can we set up Xt? Change a bit to avoid t= 0, so t+1 in place of t-1
21
YY=randn(10000,1);
for i = 1:5000; X(2*i)=YY(2*i);X(2*i-1)=(YY(2*i)^2-1)/sqrt(2);end
mean(X), var(X)
ans = -0.0063 ans = 0.9766
hist(X,20); autocorr(X);
Top Related