Download - ESSE 4020: Time Series Basics, Stochastic Models, Random ... · Cowpertwait and Metcalfe, Chapters 1 and 2. Useful reading – download them! ----- Will get back to linear regression,

1

ESSE 4020: Time Series Basics, Stochastic Models, Random Walks Sept 2019

Time series – generally one or more dependent variable, and one independent variable. So F(t) or F(t), often with a stochastic component – random behaviour or random measurement error. May also have a periodic, or seasonal component – annual, diurnal variations, and a trend. Library: Online Click to access these resources. Can download chapters – free. S&S:Time series analysis and its applications [electronic resource] : with R examples Author:Robert H. Shumway, David S. Stoffer. Publication info:New York : Springer, c2011.; Format: EBook, Book, Online CM2009: Introductory time series with R [electronic resource] Author: Paul S.P. Cowpertwait, Andrew V. Metcalfe. Publication info: New York : Springer-Verlag, c2009. ;Format: EBook, Book, Online And in the future, a text using matlab. https://www.springer.com/gp/book/9783030207892 Notes based on material from SS2011 Chapter 1; CM2009 Chapters 1, 2, 4.2, 4.3; . READ CM 2009 Chapters 1, 2. Matlab Help. https://www.mathworks.com/help/matlab/data_analysis/time-series-objects.html Time Series Objects: Create, modify, and analyze timeseries objects containing time-dependent data. A timeseries object contains data and time information within its properties that describes a dynamic process. You can use timeseries object functions to create, modify, and analyze the behavior of a time series. Or with R https://a-little-book-of-r-for-time-series.readthedocs.io/en/latest/src/timeseries.html Using R for Time Series Analysis; Time Series Analysis This booklet tells you how to use the R statistical software to carry out some simple analyses that are common in analysing time series data. This booklet assumes that the reader has some basic knowledge of time series analysis, and the principal focus of the booklet is not to explain time series analysis, but rather to explain how to carry out these analyses using R. R is more convenient for data that are arrayed as year, month, or day, hour and have a “seasonal” component.

https://www.springer.com/gp/book/9783030207892

https://www.mathworks.com/help/matlab/data_analysis/time-series-objects.html

https://a-little-book-of-r-for-time-series.readthedocs.io/en/latest/src/timeseries.html

2

from S&S 2011 and Jim Hansen’s analyses http://www.columbia.edu/~jeh1/

From CM 2009

Many different ways to look at the data. Anomalies are relative to 1980-2015 average. Recent data available at https://data.giss.nasa.gov/gistemp/graphs_v3/

http://www.columbia.edu/%7Ejeh1/

https://data.giss.nasa.gov/gistemp/graphs_v3/

3

Has a trend, somewhat linear and one could look for annual cycle but not apparent.

Many other examples. Raw data or combinations to provide indices (SOI, S&PTSX etc). Continuous or discrete, time averaging or values at specified time intervals – why are they different?

4

And from https://www.yorku.ca/pat/weatherStation/index.php

https://www.yorku.ca/pat/weatherStation/index.php

5

Sampled every 2 s and averaged over 5 min. A medical example – from S&S. Multiple variables at same time - Blood Oxygenation

6

Decomposition - from CM2009, chapter 1,

Additive decomposition more common. A simple additive decomposition model is given by xt = mt + st + zt where, at time t, xt is the observed series, mt is the trend, st is the seasonal effect, and zt is an error term that is, in general, a sequence of correlated random variables with mean zero. Histograms, pdf, cdf for continuous (dependent) variables, or pmf - probability mass function for discrete (dependent) variables, e.g. dice, reported cloud amount (tenths). From Wikipedia

7

Probability density function, pdf, f(x) = F'(x) provided F is cts and differentiable, and

Histograms, Lake Level histograms, pdf, cdf . Dependent variable more or less cts but finite number of samples.

Previous years notes….

Annual average lake level Lake Huron, 1875-1972 (98 samples): Mean 9.00, Standard Deviation 1.32 Not sure if these are ft or m, one station or average – need the corresponding metadata (https://en.wikipedia.org/wiki/Metadata) is a bit general but generally need precise information on the data, where from, what it is, how measured, how processed, etc For more recent lake level data, https://waterlevels.gc.ca/C&A/historical-eng.html or https://www.glerl.noaa.gov/data/wlevels/ Sites are good and provide metadata : Lake Erie monthly mean water levels in metres referred to IGLD 1985 - ???? Ask Google?

6.

7.

8.

9.

10.

11.

12.

0 20 40 60 80 100

Series

https://en.wikipedia.org/wiki/Metadata

https://waterlevels.gc.ca/C&A/historical-eng.html

https://www.glerl.noaa.gov/data/wlevels/

8

Water level plots at http://lre-wm.usace.army.mil/ForecastData/GLBasinConditions/LTA-GLWL-Graph.pdf

http://lre-wm.usace.army.mil/ForecastData/GLBasinConditions/LTA-GLWL-Graph.pdf

http://lre-wm.usace.army.mil/ForecastData/GLBasinConditions/LTA-GLWL-Graph.pdf

9

.csv files could be downloaded but I used http://www.tides.gc.ca/C&A/network_means-eng.html No download link but copy and paste their table into Excel worked this time.

Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Yearly Minimum

Maximum

Range

Average

Monthly

Monthly

1918 174.59 174.74 174.74 174.84 175 175.14 175.17 175.16 175.1 175.03 174.98 174.99 174.96 174.59 175.17 0.58

1919 175.14 174.88 175 175.14 175.3 175.32 175.25 175.16 175.06 175 174.91 174.76 175.08 174.76 175.32 0.56

1920 174.3 174.4 174.58 174.82 174.95 175.01 175.07 175.08 175.01 174.92 174.84 174.81 174.82 174.3 175.08 0.78

1921 174.87 174.43 174.8 175 175.07 175.08 175.06 174.97 174.88 174.77 174.74 174.76 174.87 174.43 175.08 0.65

…. To 2018

Excel will generate good plots, as above but we should also use Matlab or R or … So to plot annual averages, extract two columns, YY, and YA and save as .csv files With Matlab YY=csvread("E:4020/GTLAKES/YY.csv"); YA=csvread("E:4020/GTLAKES/YA.csv"); plot(YY,YA); %works without transpose,

Can then clean up a little, add axis labels etc. axis ([1915 2020 174 176]); xlabel({'year'}); ylabel({'water level (m)'}); title('Lake Erie Annual Water Levels')

174

174.5

175

175.5

176

1918

1923

1928

1933

1938

1943

1948

1953

1958

1963

1968

1973

1978

1983

1988

1993

1998

2003

2008

2013

2018

Yearly Average Water Level, Lake Erie (m)

http://www.tides.gc.ca/C&A/network_means-eng.html

10

What do we get from looking at these plots? What quantities might we want to compute? Mean, variance or standard deviation, histograms? AV = mean(YA); AV = 175.0175 SD = std(YA); SD = 0.3449; VA=var(YA); VA = 0.1190; sqrt(VA); 0.3449 histogram(YA); histogram(YA,11)

Is there a trend? Linear regression? fitlm, table - we will write a linear regression code later. tbl=table(YY,YA); yafit=fitlm(tbl); plot(yafit) axis ([1915 2020 174 176]); xlabel({'year'}); ylabel({'water level (m)'});

11

Is there something going on with a 30-40 year period??? Somehow plot looks different with points rather than lines. Worth taking different looks. Year to year memory? Autocorrelation. acf = autocorr(YA); plot(acf); xlabel({'lag'}); ylabel({'acf'});

12

Or just autocorr(YA)

So what is Autocorrelation? How is it computed? What does it mean? Definitions below are for a time series model or stochastic process with many realizations. With a single observed time series we need the sample autocovariance.

13

Stationarity can be an issue. For example there is a trend in the Lake Erie water level data. Might be useful to remove it? Recalling the cdf and pdf and thinking in terms of many realizations of a process at time t we can define

14

Note

15

Some stationary time series examples. But hard to establish stationarity of a single sample time series. Many are not, but one can still compute autocorrelation – as in lake level example. Ergodic hypothesis. In econometrics and signal processing, a stochastic process is said to be ergodic if its statistical properties can be deduced from a single, sufficiently long, random sample of the process. Cowpertwait and Metcalfe, Chapters 1 and 2. Useful reading – download them!

---------------------------------------------------

Will get back to linear regression, correlations and autocorrelations later, but look at time series models and decompositions first.

16

Pseudo-Random numbers. A slightly archaic term for a computer-generated random number. The prefix pseudo- is used

to distinguish this type of number from a "truly" random number generated by a random

physical process such as radioactive decay.

http://mathworld.wolfram.com/RandomNumber.html

A random number is a number chosen as if by chance from some specified distribution such

that selection of a large set of these numbers reproduces the underlying distribution. Almost

always, such numbers are also required to be independent, so that there are no correlations

between successive numbers. Computer-generated random numbers are sometimes

called pseudorandom numbers, while the term "random" is reserved for the output of

unpredictable physical processes.

Matlab :

X = rand; returns a single uniformly distributed random number in the interval (0,1).

seed? rand(n,m) generates an array of n x m pseudo random values

rng(seed) seeds the random number generator using the nonnegative integer seed so

that rand, randi, and randn produce a predictable sequence of numbers.

XX=rand(1000,1); histogram(XX)




http://mathworld.wolfram.com/PseudorandomNumber.html

https://www.mathworks.com/help/matlab/ref/rand.html

https://www.mathworks.com/help/matlab/ref/randi.html

https://www.mathworks.com/help/matlab/ref/randn.html

17

- maybe increase to 10000 and specify number of bins,

XX=rand(10000,1); histogram(XX,10)

And look at autocorr

autocorr(XX)

With a normal distribution,

YY=randn(1000,1); >> histogram(YY); autocorr(YY)

18

Look at some sample numbers, YY(1:10,1)

ans = -0.0039, 0.1188, 0.5800, -0.0995, -0.5751, 0.8516, 1.4024, 0.5074

-0.2197, 1.3957

help(randn)

X = randn(n) returns an n-by-n matrix of normally distributed random numbers. Why nxn ?

X = randn(sz1,...,szN) returns an sz1-by-...-by-szN array of random numbers

wheresz1,...,szN indicate the size of each dimension. For example, randn(3,4) returns

a 3-by-4 matrix. Note mean 0 and std Deviation = 1

In general, you can generate N random numbers in the interval (a,b) with the formula

r = a + (b-a).*rand(N,1).

Or with R, runif, rnorm

runif(10, min = 0, max = 1) [1] 0.5194561 0.5429784 0.5136143 0.3902505 0.0737053 0.5408465 0.8932388

[8] 0.1209699 0.1794184 0.8080221

rts=runif(1000, -1,1); plot (rts); hist(rts)

Histogram looks a bit one sided? mean(rts) [1] -0.0382606: var(rts) [1] 0.3518284 – what do we expect?

19

> rnorm(10)

[1] 0.11258788 -1.13525995 -0.04802839 -0.85215659 -0.80190492 0.53513598

[7] -0.09190464 0.67647779 0.91268526 0.10332288

rntst2<- rnorm(10000, mean = 0, sd = 1); hist(rntst2)

> mean(rntst2); var(rntst2) [1] 0.01692706 ; [1] 0.9837065; sd(rntst2) [1] 0.9918198

20

Some definitions, iid, white noise

iid - independent (uncorrelated) and identically distributed, IID(0,σ2),

White Noise - zero mean, uncorrelated, variance σ2 - see B&D 2: p16,17 : IID

White Noise; WN(0,σ2) Strictly speaking this need not be identically distributed, just same

variance.

Definition: A white noise process is a random process of random variables that are

uncorrelated, have mean zero, and a finite variance (which is denoted s2 below).

IID noise is white noise but white noise need not be identically distributed – but the term

“white noise” often assumes IID as used by some texts.

Problem 1.8 in B&D looked devious to me:

Gaussian white noise; each sample has a normal distribution with zero mean.

Can we set up Xt? Change a bit to avoid t= 0, so t+1 in place of t-1

https://en.wikipedia.org/wiki/Normal_distribution

21

YY=randn(10000,1);

for i = 1:5000; X(2*i)=YY(2*i);X(2*i-1)=(YY(2*i)^2-1)/sqrt(2);end

mean(X), var(X)

ans = -0.0063 ans = 0.9766

hist(X,20); autocorr(X);