R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Time Series Analysis and Mining with R
Yanchang Zhao
RDataMining.comhttp://www.rdatamining.com/
18 July 2011
1/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Outline
1 R and Time Series Data
2 Time Series Decomposition
3 Time Series Forecasting
4 Time Series Clustering
5 Time Series Classification
6 R Functions & Packages for Time Series
7 Conclusions
2/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
R
a free software environment for statistical computing andgraphics
runs on Windows, Linux and MacOS
widely used in academia and research, as well as industrialapplications
over 3,000 packages
CRAN Task View: Time Series Analysishttp://cran.r-project.org/web/views/TimeSeries.html
3/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Time Series Data in R
class ts
represents data which has been sampled at equispacedpoints in time
frequency=7: a weekly series
frequency=12: a monthly series
frequency=4: a quarterly series
4/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Time Series Data in R
> a <- ts(1:20, frequency=12, start=c(2011,3))
> print(a)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov2011 1 2 3 4 5 6 7 8 92012 11 12 13 14 15 16 17 18 19 20
Dec2011 102012
> str(a)
Time-Series [1:20] from 2011 to 2013: 1 2 3 4 5 6 7 8 9 10 ...
> attributes(a)
$tsp[1] 2011.167 2012.750 12.000
$class[1] "ts"
5/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Outline
1 R and Time Series Data
2 Time Series Decomposition
3 Time Series Forecasting
4 Time Series Clustering
5 Time Series Classification
6 R Functions & Packages for Time Series
7 Conclusions
6/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
What is Time Series Decomposition
To decompose a time series into components:
Trend component: long term trend
Seasonal component: seasonal variation
Cyclical component: repeated but non-periodicfluctuations
Irregular component: the residuals
7/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Data AirPassengers
Data AirPassengers: Monthly totals of Box Jenkinsinternational airline passengers, 1949 to 1960. It has144(=12×12) values.
> plot(AirPassengers)
Time
AirP
asse
nger
s
1950 1952 1954 1956 1958 1960
100
200
300
400
500
600
8/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Decomposition
> apts <- ts(AirPassengers, frequency = 12)
> f <- decompose(apts)
> # seasonal figures
> plot(f$figure,type="b")
●
●
●
●●
●
● ●
●
●
●
●
2 4 6 8 10 12
−40
−20
020
4060
Index
f$fig
ure
9/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Decomposition
> plot(f)
100
300
500
obse
rved
150
250
350
450
tren
d
−40
040
seas
onal
−40
020
60
2 4 6 8 10 12
rand
om
Time
Decomposition of additive time series
10/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Outline
1 R and Time Series Data
2 Time Series Decomposition
3 Time Series Forecasting
4 Time Series Clustering
5 Time Series Classification
6 R Functions & Packages for Time Series
7 Conclusions
11/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Time Series Forecasting
To forecast future events based on known past data
E.g., to predict the opening price of a stock based on itspast performance
Popular models
Autoregressive moving average (ARMA)Autoregressive integrated moving average (ARIMA)
12/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Forecasting
> # build an ARIMA model
> fit <- arima(AirPassengers, order=c(1,0,0),
+ list(order=c(2,1,0), period=12))
> fore <- predict(fit, n.ahead=24)
> # error bounds at 95% confidence level
> U <- fore$pred + 2*fore$se
> L <- fore$pred - 2*fore$se
> ts.plot(AirPassengers, fore$pred, U, L,
+ col=c(1,2,4,4), lty = c(1,1,2,2))
> legend("topleft", col=c(1,2,4), lty=c(1,1,2),
+ c("Actual", "Forecast",
+ "Error Bounds (95% Confidence)"))
13/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Forecasting
Time
1950 1952 1954 1956 1958 1960 1962
100
200
300
400
500
600
700
ActualForecastError Bounds (95% Confidence)
14/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Outline
1 R and Time Series Data
2 Time Series Decomposition
3 Time Series Forecasting
4 Time Series Clustering
5 Time Series Classification
6 R Functions & Packages for Time Series
7 Conclusions
15/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Time Series Clustering
To partition time series data into groups based onsimilarity or distance, so that time series in the samecluster are similar
Measure of distance/dissimilarity
Euclidean distanceManhattan distanceMaximum normHamming distanceThe angle between two vectors (inner product)Dynamic Time Warping (DTW) distance...
16/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Dynamic Time Warping (DTW)
DTW finds optimal alignment between two time series.
> library(dtw)
> idx <- seq(0, 2*pi, len=100)
> a <- sin(idx) + runif(100)/10
> b <- cos(idx)
> align <- dtw(a, b, step=asymmetricP1, keep=T)
> dtwPlotTwoWay(align)
Index
Que
ry v
alue
0 20 40 60 80 100
−1.
0−
0.5
0.0
0.5
1.0
17/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Synthetic Control Chart Time Series
The dataset contains 600 examples of control chartssynthetically generated by the process in Alcock andManolopoulos (1999).
Each control chart is a time series with 60 values.
Six classes:
1-100 Normal101-200 Cyclic201-300 Increasing trend301-400 Decreasing trend401-500 Upward shift501-600 Downward shift
http://kdd.ics.uci.edu/databases/synthetic_control/synthetic_
control.html
18/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Synthetic Control Chart Time Series
> # read data into R
> # sep="": the separator is white space, i.e., one
> # or more spaces, tabs, newlines or carriage returns
> sc <- read.table("synthetic_control.data",
+ header=F, sep="")
> # show one sample from each class
> idx <- c(1,101,201,301,401,501)
> sample1 <- t(sc[idx,])
> plot.ts(sample1, main="")
19/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Six Classes
2426
2830
3234
36
1
1525
3545
101
2530
3540
45
0 10 20 30 40 50 60
201
Time
010
2030
301
2530
3540
45
401
1015
2025
3035
0 10 20 30 40 50 60
501
Time
20/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Hierarchical Clustering with Euclidean distance
> # sample n cases from every class
> n <- 10
> s <- sample(1:100, n)
> idx <- c(s, 100+s, 200+s, 300+s, 400+s, 500+s)
> sample2 <- sc[idx,]
> observedLabels <- c(rep(1,n), rep(2,n), rep(3,n),
+ rep(4,n), rep(5,n), rep(6,n))
> # hierarchical clustering with Euclidean distance
> hc <- hclust(dist(sample2), method="ave")
> plot(hc, labels=observedLabels, main="")
21/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Hierarchical Clustering with Euclidean distance
64 6 66 6
66
66 6
44 4 4 4 4 4
4 4 3 33 3
3 35
55 5 5 5
3 3 3 3 5 55 5
2 22
2 22
2 22 2
11 1 1 1 1 1
11 120
4060
8010
012
014
0
hclust (*, "average")dist(sample2)
Hei
ght
22/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Hierarchical Clustering with Euclidean distance
> # cut tree to get 8 clusters
> memb <- cutree(hc, k=8)
> table(observedLabels, memb)
membobservedLabels 1 2 3 4 5 6 7 8
1 10 0 0 0 0 0 0 02 0 3 1 1 3 2 0 03 0 0 0 0 0 0 10 04 0 0 0 0 0 0 0 105 0 0 0 0 0 0 10 06 0 0 0 0 0 0 0 10
23/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Hierarchical Clustering with DTW Distance
> myDist <- dist(sample2, method="DTW")
> hc <- hclust(myDist, method="average")
> plot(hc, labels=observedLabels, main="")
> # cut tree to get 8 clusters
> memb <- cutree(hc, k=8)
> table(observedLabels, memb)
membobservedLabels 1 2 3 4 5 6 7 8
1 10 0 0 0 0 0 0 02 0 4 3 2 1 0 0 03 0 0 0 0 0 6 4 04 0 0 0 0 0 0 0 105 0 0 0 0 0 0 10 06 0 0 0 0 0 0 0 10
24/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Hierarchical Clustering with DTW Distance
3 3 3 3 3 35 5
3 3 3 35 5 5 5
5 5 5 5 6 6 6 4 6 4 4 4 4 4 4 4 4 46 6
6 6 6 6 1 1 1 1 1 1 1 1 1 12 2
2 2 2 2 2 2 2 2
020
040
060
080
010
00
hclust (*, "average")myDist
Hei
ght
25/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Outline
1 R and Time Series Data
2 Time Series Decomposition
3 Time Series Forecasting
4 Time Series Clustering
5 Time Series Classification
6 R Functions & Packages for Time Series
7 Conclusions
26/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Time Series Classification
Time Series Classification
To build a classification model based on labelled timeseries
and then use the model to predict the lable of unlabelledtime series
Feature Extraction
Singular Value Decomposition (SVD)
Discrete Fourier Transform (DFT)
Discrete Wavelet Transform (DWT)
Piecewise Aggregate Approximation (PAA)
Perpetually Important Points (PIP)
Piecewise Linear Representation
Symbolic Representation
27/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Decision Tree (ctree)
ctree from package party
> classId <- c(rep("1",100), rep("2",100),
+ rep("3",100), rep("4",100),
+ rep("5",100), rep("6",100))
> newSc <- data.frame(cbind(classId, sc))
> library(party)
> ct <- ctree(classId ~ ., data=newSc,
+ controls = ctree_control(minsplit=20,
+ minbucket=5, maxdepth=5))
28/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Decision Tree
> pClassId <- predict(ct)
> table(classId, pClassId)
pClassIdclassId 1 2 3 4 5 6
1 100 0 0 0 0 02 1 97 2 0 0 03 0 0 99 0 1 04 0 0 0 100 0 05 4 0 8 0 88 06 0 3 0 90 0 7
> # accuracy
> (sum(classId==pClassId)) / nrow(sc)
[1] 0.8183333
29/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
DWT (Discrete Wavelet Transform)
Wavelet transform provides a multi-resolutionrepresentation using wavelets.
Haar Wavelet Transform – the simplest DWThttp://dmr.ath.cx/gfx/haar/
DFT (Discrete Fourier Transform): another popularfeature extraction technique
30/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
DWT (Discrete Wavelet Transform)
> # extract DWT (with Haar filter) coefficients
> library(wavelets)
> wtData <- NULL
> for (i in 1:nrow(sc)) {
+ a <- t(sc[i,])
+ wt <- dwt(a, filter="haar", boundary="periodic")
+ wtData <- rbind(wtData,
+ unlist(c(wt@W, wt@V[[wt@level]])))
+ }
> wtData <- as.data.frame(wtData)
> wtSc <- data.frame(cbind(classId, wtData))
31/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Decision Tree with DWT
> ct <- ctree(classId ~ ., data=wtSc, controls =
+ ctree_control(minsplit=20,
+ minbucket=5, maxdepth=5))
> pClassId <- predict(ct)
> table(classId, pClassId)
pClassIdclassId 1 2 3 4 5 6
1 98 2 0 0 0 02 1 99 0 0 0 03 0 0 81 0 19 04 0 0 0 74 0 265 0 0 16 0 84 06 0 0 0 3 0 97
> (sum(classId==pClassId)) / nrow(wtSc)
[1] 0.888333332/42
Decision Tree with DWT
> plot(ct, ip_args=list(pval=FALSE), ep_args=list(digits=0))
V57
1
≤ 117 > 117
W43
2
≤ −4 > −4
W5
3
≤ −8 > −8
Node 4 (n = 68)
123456
0
0.2
0.4
0.6
0.8
1Node 5 (n = 6)
123456
0
0.2
0.4
0.6
0.8
1
W31
6
≤ −6 > −6
Node 7 (n = 9)
123456
0
0.2
0.4
0.6
0.8
1Node 8 (n = 86)
123456
0
0.2
0.4
0.6
0.8
1
V57
9
≤ 140 > 140
Node 10 (n = 31)
123456
0
0.2
0.4
0.6
0.8
1
V57
11
≤ 178 > 178
W22
12
≤ −6 > −6
Node 13 (n = 80)
123456
0
0.2
0.4
0.6
0.8
1
W31
14
≤ −13 > −13
Node 15 (n = 9)
123456
0
0.2
0.4
0.6
0.8
1Node 16 (n = 99)
123456
0
0.2
0.4
0.6
0.8
1
W31
17
≤ −15 > −15
Node 18 (n = 12)
123456
0
0.2
0.4
0.6
0.8
1
W43
19
≤ 3 > 3
Node 20 (n = 103)
123456
0
0.2
0.4
0.6
0.8
1Node 21 (n = 97)
123456
0
0.2
0.4
0.6
0.8
1
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
k-NN Classification
find the k nearest neighbours of a new instance
label it by majority voting
needs an efficient indexing structure for large datasets
> k <- 20
> newTS <- sc[501,] + runif(100)*15
> distances <- dist(newTS, sc, method="DTW")
> s <- sort(as.vector(distances), index.return=TRUE)
> # class IDs of k nearest neighbours
> table(classId[s$ix[1:k]])
4 63 17
Results of Majority Voting
Label of newTS ← class 6
34/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Outline
1 R and Time Series Data
2 Time Series Decomposition
3 Time Series Forecasting
4 Time Series Clustering
5 Time Series Classification
6 R Functions & Packages for Time Series
7 Conclusions
35/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Functions - Construction, Plot & Smoothing
Construction
ts() create time-series objects (stats)
Plot
plot.ts() plot time-series objects (stats)
Smoothing & Filtering
smoothts() time series smoothing (ast)
sfilter() remove seasonal fluctuation using movingaverage (ast)
36/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Functions - Decomposition & Forecasting
Decomposition
decomp() time series decomposition by square-root filter(timsac)decompose() classical seasonal decomposition by movingaverages (stats)stl() seasonal decomposition of time series by loess(stats)tsr() time series decomposition (ast)ardec() time series autoregressive decomposition(ArDec)
Forecasting
arima() fit an ARIMA model to a univariate time series(stats)predict.Arima() forecast from models fitted by arima(stats)
37/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Packages
Packages
timsac time series analysis and control program
ast time series analysis
ArDec time series autoregressive-based decomposition
ares a toolbox for time series analyses using generalizedadditive models
dse tools for multivariate, linear, time-invariant, timeseries models
forecast displaying and analysing univariate time seriesforecasts
dtw Dynamic Time Warping – find optimal alignmentbetween two time series
wavelets wavelet filters, wavelet transforms andmultiresolution analyses
38/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Online Resources
An R Time Series Tutorialhttp://www.stat.pitt.edu/stoffer/tsa2/R_time_series_quick_fix.htm
Time Series Analysis with Rhttp://www.statoek.wiso.uni-goettingen.de/veranstaltungen/zeitreihen/sommer03/ts_r_
intro.pdf
Using R (with applications in Time Series Analysis)http://people.bath.ac.uk/masgs/time%20series/TimeSeriesR2004.pdf
CRAN Task View: Time Series Analysishttp://cran.r-project.org/web/views/TimeSeries.html
R Functions for Time Series Analysishttp://cran.r-project.org/doc/contrib/Ricci-refcard-ts.pdf
R Reference Card for Data Mining;R and Data Mining: Examples and Case Studieshttp://www.rdatamining.com/
Time Series Analysis for Business Forecastinghttp://home.ubalt.edu/ntsbarsh/stat-data/Forecast.htm
39/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Outline
1 R and Time Series Data
2 Time Series Decomposition
3 Time Series Forecasting
4 Time Series Clustering
5 Time Series Classification
6 R Functions & Packages for Time Series
7 Conclusions
40/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
Conclusions
Time series decomposition and forecasting: many Rfunctions and packages available
Time series classification and clustering: no R functions orpackages specially for this purpose; have to work it out byyourself
Time series classification: extract and build features, andthen apply existing classification techniques, such as SVM,k-NN, neural networks, regression and decision trees
Time series clustering: work out your owndistance/similarity metrics, and then use existing clusteringtechniques, such as k-means and hierarchical clustering
Techniques specially for classifying/clustering time seriesdata: a lot of research publications, but no Rimplementations (as far as I know)
41/42
R and TimeSeries Data
Time SeriesDecomposi-tion
Time SeriesForecasting
Time SeriesClustering
Time SeriesClassification
R Functions &Packages forTime Series
Conclusions
The End
Email:
RDataMining Website:
http://www.rdatamining.com/
RDataMining Group:
http://group.rdatamining.com/
42/42
Top Related