Download - Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

Transcript
Page 1: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Time Series Analysis and Mining with R

Yanchang Zhao

RDataMining.comhttp://www.rdatamining.com/

18 July 2011

1/42

Page 2: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Outline

1 R and Time Series Data

2 Time Series Decomposition

3 Time Series Forecasting

4 Time Series Clustering

5 Time Series Classification

6 R Functions & Packages for Time Series

7 Conclusions

2/42

Page 3: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

R

a free software environment for statistical computing andgraphics

runs on Windows, Linux and MacOS

widely used in academia and research, as well as industrialapplications

over 3,000 packages

CRAN Task View: Time Series Analysishttp://cran.r-project.org/web/views/TimeSeries.html

3/42

Page 4: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Time Series Data in R

class ts

represents data which has been sampled at equispacedpoints in time

frequency=7: a weekly series

frequency=12: a monthly series

frequency=4: a quarterly series

4/42

Page 5: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Time Series Data in R

> a <- ts(1:20, frequency=12, start=c(2011,3))

> print(a)

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov2011 1 2 3 4 5 6 7 8 92012 11 12 13 14 15 16 17 18 19 20

Dec2011 102012

> str(a)

Time-Series [1:20] from 2011 to 2013: 1 2 3 4 5 6 7 8 9 10 ...

> attributes(a)

$tsp[1] 2011.167 2012.750 12.000

$class[1] "ts"

5/42

Page 6: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Outline

1 R and Time Series Data

2 Time Series Decomposition

3 Time Series Forecasting

4 Time Series Clustering

5 Time Series Classification

6 R Functions & Packages for Time Series

7 Conclusions

6/42

Page 7: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

What is Time Series Decomposition

To decompose a time series into components:

Trend component: long term trend

Seasonal component: seasonal variation

Cyclical component: repeated but non-periodicfluctuations

Irregular component: the residuals

7/42

Page 8: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Data AirPassengers

Data AirPassengers: Monthly totals of Box Jenkinsinternational airline passengers, 1949 to 1960. It has144(=12×12) values.

> plot(AirPassengers)

Time

AirP

asse

nger

s

1950 1952 1954 1956 1958 1960

100

200

300

400

500

600

8/42

Page 9: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Decomposition

> apts <- ts(AirPassengers, frequency = 12)

> f <- decompose(apts)

> # seasonal figures

> plot(f$figure,type="b")

●●

● ●

2 4 6 8 10 12

−40

−20

020

4060

Index

f$fig

ure

9/42

Page 10: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Decomposition

> plot(f)

100

300

500

obse

rved

150

250

350

450

tren

d

−40

040

seas

onal

−40

020

60

2 4 6 8 10 12

rand

om

Time

Decomposition of additive time series

10/42

Page 11: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Outline

1 R and Time Series Data

2 Time Series Decomposition

3 Time Series Forecasting

4 Time Series Clustering

5 Time Series Classification

6 R Functions & Packages for Time Series

7 Conclusions

11/42

Page 12: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Time Series Forecasting

To forecast future events based on known past data

E.g., to predict the opening price of a stock based on itspast performance

Popular models

Autoregressive moving average (ARMA)Autoregressive integrated moving average (ARIMA)

12/42

Page 13: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Forecasting

> # build an ARIMA model

> fit <- arima(AirPassengers, order=c(1,0,0),

+ list(order=c(2,1,0), period=12))

> fore <- predict(fit, n.ahead=24)

> # error bounds at 95% confidence level

> U <- fore$pred + 2*fore$se

> L <- fore$pred - 2*fore$se

> ts.plot(AirPassengers, fore$pred, U, L,

+ col=c(1,2,4,4), lty = c(1,1,2,2))

> legend("topleft", col=c(1,2,4), lty=c(1,1,2),

+ c("Actual", "Forecast",

+ "Error Bounds (95% Confidence)"))

13/42

Page 14: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Forecasting

Time

1950 1952 1954 1956 1958 1960 1962

100

200

300

400

500

600

700

ActualForecastError Bounds (95% Confidence)

14/42

Page 15: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Outline

1 R and Time Series Data

2 Time Series Decomposition

3 Time Series Forecasting

4 Time Series Clustering

5 Time Series Classification

6 R Functions & Packages for Time Series

7 Conclusions

15/42

Page 16: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Time Series Clustering

To partition time series data into groups based onsimilarity or distance, so that time series in the samecluster are similar

Measure of distance/dissimilarity

Euclidean distanceManhattan distanceMaximum normHamming distanceThe angle between two vectors (inner product)Dynamic Time Warping (DTW) distance...

16/42

Page 17: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Dynamic Time Warping (DTW)

DTW finds optimal alignment between two time series.

> library(dtw)

> idx <- seq(0, 2*pi, len=100)

> a <- sin(idx) + runif(100)/10

> b <- cos(idx)

> align <- dtw(a, b, step=asymmetricP1, keep=T)

> dtwPlotTwoWay(align)

Index

Que

ry v

alue

0 20 40 60 80 100

−1.

0−

0.5

0.0

0.5

1.0

17/42

Page 18: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Synthetic Control Chart Time Series

The dataset contains 600 examples of control chartssynthetically generated by the process in Alcock andManolopoulos (1999).

Each control chart is a time series with 60 values.

Six classes:

1-100 Normal101-200 Cyclic201-300 Increasing trend301-400 Decreasing trend401-500 Upward shift501-600 Downward shift

http://kdd.ics.uci.edu/databases/synthetic_control/synthetic_

control.html

18/42

Page 19: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Synthetic Control Chart Time Series

> # read data into R

> # sep="": the separator is white space, i.e., one

> # or more spaces, tabs, newlines or carriage returns

> sc <- read.table("synthetic_control.data",

+ header=F, sep="")

> # show one sample from each class

> idx <- c(1,101,201,301,401,501)

> sample1 <- t(sc[idx,])

> plot.ts(sample1, main="")

19/42

Page 20: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Six Classes

2426

2830

3234

36

1

1525

3545

101

2530

3540

45

0 10 20 30 40 50 60

201

Time

010

2030

301

2530

3540

45

401

1015

2025

3035

0 10 20 30 40 50 60

501

Time

20/42

Page 21: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Hierarchical Clustering with Euclidean distance

> # sample n cases from every class

> n <- 10

> s <- sample(1:100, n)

> idx <- c(s, 100+s, 200+s, 300+s, 400+s, 500+s)

> sample2 <- sc[idx,]

> observedLabels <- c(rep(1,n), rep(2,n), rep(3,n),

+ rep(4,n), rep(5,n), rep(6,n))

> # hierarchical clustering with Euclidean distance

> hc <- hclust(dist(sample2), method="ave")

> plot(hc, labels=observedLabels, main="")

21/42

Page 22: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Hierarchical Clustering with Euclidean distance

64 6 66 6

66

66 6

44 4 4 4 4 4

4 4 3 33 3

3 35

55 5 5 5

3 3 3 3 5 55 5

2 22

2 22

2 22 2

11 1 1 1 1 1

11 120

4060

8010

012

014

0

hclust (*, "average")dist(sample2)

Hei

ght

22/42

Page 23: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Hierarchical Clustering with Euclidean distance

> # cut tree to get 8 clusters

> memb <- cutree(hc, k=8)

> table(observedLabels, memb)

membobservedLabels 1 2 3 4 5 6 7 8

1 10 0 0 0 0 0 0 02 0 3 1 1 3 2 0 03 0 0 0 0 0 0 10 04 0 0 0 0 0 0 0 105 0 0 0 0 0 0 10 06 0 0 0 0 0 0 0 10

23/42

Page 24: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Hierarchical Clustering with DTW Distance

> myDist <- dist(sample2, method="DTW")

> hc <- hclust(myDist, method="average")

> plot(hc, labels=observedLabels, main="")

> # cut tree to get 8 clusters

> memb <- cutree(hc, k=8)

> table(observedLabels, memb)

membobservedLabels 1 2 3 4 5 6 7 8

1 10 0 0 0 0 0 0 02 0 4 3 2 1 0 0 03 0 0 0 0 0 6 4 04 0 0 0 0 0 0 0 105 0 0 0 0 0 0 10 06 0 0 0 0 0 0 0 10

24/42

Page 25: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Hierarchical Clustering with DTW Distance

3 3 3 3 3 35 5

3 3 3 35 5 5 5

5 5 5 5 6 6 6 4 6 4 4 4 4 4 4 4 4 46 6

6 6 6 6 1 1 1 1 1 1 1 1 1 12 2

2 2 2 2 2 2 2 2

020

040

060

080

010

00

hclust (*, "average")myDist

Hei

ght

25/42

Page 26: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Outline

1 R and Time Series Data

2 Time Series Decomposition

3 Time Series Forecasting

4 Time Series Clustering

5 Time Series Classification

6 R Functions & Packages for Time Series

7 Conclusions

26/42

Page 27: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Time Series Classification

Time Series Classification

To build a classification model based on labelled timeseries

and then use the model to predict the lable of unlabelledtime series

Feature Extraction

Singular Value Decomposition (SVD)

Discrete Fourier Transform (DFT)

Discrete Wavelet Transform (DWT)

Piecewise Aggregate Approximation (PAA)

Perpetually Important Points (PIP)

Piecewise Linear Representation

Symbolic Representation

27/42

Page 28: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Decision Tree (ctree)

ctree from package party

> classId <- c(rep("1",100), rep("2",100),

+ rep("3",100), rep("4",100),

+ rep("5",100), rep("6",100))

> newSc <- data.frame(cbind(classId, sc))

> library(party)

> ct <- ctree(classId ~ ., data=newSc,

+ controls = ctree_control(minsplit=20,

+ minbucket=5, maxdepth=5))

28/42

Page 29: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Decision Tree

> pClassId <- predict(ct)

> table(classId, pClassId)

pClassIdclassId 1 2 3 4 5 6

1 100 0 0 0 0 02 1 97 2 0 0 03 0 0 99 0 1 04 0 0 0 100 0 05 4 0 8 0 88 06 0 3 0 90 0 7

> # accuracy

> (sum(classId==pClassId)) / nrow(sc)

[1] 0.8183333

29/42

Page 30: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

DWT (Discrete Wavelet Transform)

Wavelet transform provides a multi-resolutionrepresentation using wavelets.

Haar Wavelet Transform – the simplest DWThttp://dmr.ath.cx/gfx/haar/

DFT (Discrete Fourier Transform): another popularfeature extraction technique

30/42

Page 31: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

DWT (Discrete Wavelet Transform)

> # extract DWT (with Haar filter) coefficients

> library(wavelets)

> wtData <- NULL

> for (i in 1:nrow(sc)) {

+ a <- t(sc[i,])

+ wt <- dwt(a, filter="haar", boundary="periodic")

+ wtData <- rbind(wtData,

+ unlist(c(wt@W, wt@V[[wt@level]])))

+ }

> wtData <- as.data.frame(wtData)

> wtSc <- data.frame(cbind(classId, wtData))

31/42

Page 32: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Decision Tree with DWT

> ct <- ctree(classId ~ ., data=wtSc, controls =

+ ctree_control(minsplit=20,

+ minbucket=5, maxdepth=5))

> pClassId <- predict(ct)

> table(classId, pClassId)

pClassIdclassId 1 2 3 4 5 6

1 98 2 0 0 0 02 1 99 0 0 0 03 0 0 81 0 19 04 0 0 0 74 0 265 0 0 16 0 84 06 0 0 0 3 0 97

> (sum(classId==pClassId)) / nrow(wtSc)

[1] 0.888333332/42

Page 33: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

Decision Tree with DWT

> plot(ct, ip_args=list(pval=FALSE), ep_args=list(digits=0))

V57

1

≤ 117 > 117

W43

2

≤ −4 > −4

W5

3

≤ −8 > −8

Node 4 (n = 68)

123456

0

0.2

0.4

0.6

0.8

1Node 5 (n = 6)

123456

0

0.2

0.4

0.6

0.8

1

W31

6

≤ −6 > −6

Node 7 (n = 9)

123456

0

0.2

0.4

0.6

0.8

1Node 8 (n = 86)

123456

0

0.2

0.4

0.6

0.8

1

V57

9

≤ 140 > 140

Node 10 (n = 31)

123456

0

0.2

0.4

0.6

0.8

1

V57

11

≤ 178 > 178

W22

12

≤ −6 > −6

Node 13 (n = 80)

123456

0

0.2

0.4

0.6

0.8

1

W31

14

≤ −13 > −13

Node 15 (n = 9)

123456

0

0.2

0.4

0.6

0.8

1Node 16 (n = 99)

123456

0

0.2

0.4

0.6

0.8

1

W31

17

≤ −15 > −15

Node 18 (n = 12)

123456

0

0.2

0.4

0.6

0.8

1

W43

19

≤ 3 > 3

Node 20 (n = 103)

123456

0

0.2

0.4

0.6

0.8

1Node 21 (n = 97)

123456

0

0.2

0.4

0.6

0.8

1

Page 34: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

k-NN Classification

find the k nearest neighbours of a new instance

label it by majority voting

needs an efficient indexing structure for large datasets

> k <- 20

> newTS <- sc[501,] + runif(100)*15

> distances <- dist(newTS, sc, method="DTW")

> s <- sort(as.vector(distances), index.return=TRUE)

> # class IDs of k nearest neighbours

> table(classId[s$ix[1:k]])

4 63 17

Results of Majority Voting

Label of newTS ← class 6

34/42

Page 35: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Outline

1 R and Time Series Data

2 Time Series Decomposition

3 Time Series Forecasting

4 Time Series Clustering

5 Time Series Classification

6 R Functions & Packages for Time Series

7 Conclusions

35/42

Page 36: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Functions - Construction, Plot & Smoothing

Construction

ts() create time-series objects (stats)

Plot

plot.ts() plot time-series objects (stats)

Smoothing & Filtering

smoothts() time series smoothing (ast)

sfilter() remove seasonal fluctuation using movingaverage (ast)

36/42

Page 37: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Functions - Decomposition & Forecasting

Decomposition

decomp() time series decomposition by square-root filter(timsac)decompose() classical seasonal decomposition by movingaverages (stats)stl() seasonal decomposition of time series by loess(stats)tsr() time series decomposition (ast)ardec() time series autoregressive decomposition(ArDec)

Forecasting

arima() fit an ARIMA model to a univariate time series(stats)predict.Arima() forecast from models fitted by arima(stats)

37/42

Page 38: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Packages

Packages

timsac time series analysis and control program

ast time series analysis

ArDec time series autoregressive-based decomposition

ares a toolbox for time series analyses using generalizedadditive models

dse tools for multivariate, linear, time-invariant, timeseries models

forecast displaying and analysing univariate time seriesforecasts

dtw Dynamic Time Warping – find optimal alignmentbetween two time series

wavelets wavelet filters, wavelet transforms andmultiresolution analyses

38/42

Page 39: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Online Resources

An R Time Series Tutorialhttp://www.stat.pitt.edu/stoffer/tsa2/R_time_series_quick_fix.htm

Time Series Analysis with Rhttp://www.statoek.wiso.uni-goettingen.de/veranstaltungen/zeitreihen/sommer03/ts_r_

intro.pdf

Using R (with applications in Time Series Analysis)http://people.bath.ac.uk/masgs/time%20series/TimeSeriesR2004.pdf

CRAN Task View: Time Series Analysishttp://cran.r-project.org/web/views/TimeSeries.html

R Functions for Time Series Analysishttp://cran.r-project.org/doc/contrib/Ricci-refcard-ts.pdf

R Reference Card for Data Mining;R and Data Mining: Examples and Case Studieshttp://www.rdatamining.com/

Time Series Analysis for Business Forecastinghttp://home.ubalt.edu/ntsbarsh/stat-data/Forecast.htm

39/42

Page 40: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Outline

1 R and Time Series Data

2 Time Series Decomposition

3 Time Series Forecasting

4 Time Series Clustering

5 Time Series Classification

6 R Functions & Packages for Time Series

7 Conclusions

40/42

Page 41: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

Conclusions

Time series decomposition and forecasting: many Rfunctions and packages available

Time series classification and clustering: no R functions orpackages specially for this purpose; have to work it out byyourself

Time series classification: extract and build features, andthen apply existing classification techniques, such as SVM,k-NN, neural networks, regression and decision trees

Time series clustering: work out your owndistance/similarity metrics, and then use existing clusteringtechniques, such as k-means and hierarchical clustering

Techniques specially for classifying/clustering time seriesdata: a lot of research publications, but no Rimplementations (as far as I know)

41/42

Page 42: Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

R and TimeSeries Data

Time SeriesDecomposi-tion

Time SeriesForecasting

Time SeriesClustering

Time SeriesClassification

R Functions &Packages forTime Series

Conclusions

The End

Email:

[email protected]

RDataMining Website:

http://www.rdatamining.com/

RDataMining Group:

http://group.rdatamining.com/

42/42