Time Series Analysis and Mining with R - and Time Series Data Time Series Decomposi-tion Time Series...

Click here to load reader

  • date post

    09-Apr-2018
  • Category

    Documents

  • view

    241
  • download

    7

Embed Size (px)

Transcript of Time Series Analysis and Mining with R - and Time Series Data Time Series Decomposi-tion Time Series...

  • R and TimeSeries Data

    Time SeriesDecomposi-tion

    Time SeriesForecasting

    Time SeriesClustering

    Time SeriesClassification

    R Functions &Packages forTime Series

    Conclusions

    Time Series Analysis and Mining with R

    Yanchang Zhao

    RDataMining.comhttp://www.rdatamining.com/

    18 July 2011

    1/42

    http://www.rdatamining.com/

  • R and TimeSeries Data

    Time SeriesDecomposi-tion

    Time SeriesForecasting

    Time SeriesClustering

    Time SeriesClassification

    R Functions &Packages forTime Series

    Conclusions

    Outline

    1 R and Time Series Data

    2 Time Series Decomposition

    3 Time Series Forecasting

    4 Time Series Clustering

    5 Time Series Classification

    6 R Functions & Packages for Time Series

    7 Conclusions

    2/42

  • R and TimeSeries Data

    Time SeriesDecomposi-tion

    Time SeriesForecasting

    Time SeriesClustering

    Time SeriesClassification

    R Functions &Packages forTime Series

    Conclusions

    R

    a free software environment for statistical computing andgraphics

    runs on Windows, Linux and MacOS

    widely used in academia and research, as well as industrialapplications

    over 3,000 packages

    CRAN Task View: Time Series Analysishttp://cran.r-project.org/web/views/TimeSeries.html

    3/42

    http://cran.r-project.org/web/views/TimeSeries.html

  • R and TimeSeries Data

    Time SeriesDecomposi-tion

    Time SeriesForecasting

    Time SeriesClustering

    Time SeriesClassification

    R Functions &Packages forTime Series

    Conclusions

    Time Series Data in R

    class ts

    represents data which has been sampled at equispacedpoints in time

    frequency=7: a weekly series

    frequency=12: a monthly series

    frequency=4: a quarterly series

    4/42

  • R and TimeSeries Data

    Time SeriesDecomposi-tion

    Time SeriesForecasting

    Time SeriesClustering

    Time SeriesClassification

    R Functions &Packages forTime Series

    Conclusions

    Time Series Data in R

    > a print(a)

    Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov2011 1 2 3 4 5 6 7 8 92012 11 12 13 14 15 16 17 18 19 20

    Dec2011 102012

    > str(a)

    Time-Series [1:20] from 2011 to 2013: 1 2 3 4 5 6 7 8 9 10 ...

    > attributes(a)

    $tsp[1] 2011.167 2012.750 12.000

    $class[1] "ts"

    5/42

  • R and TimeSeries Data

    Time SeriesDecomposi-tion

    Time SeriesForecasting

    Time SeriesClustering

    Time SeriesClassification

    R Functions &Packages forTime Series

    Conclusions

    Outline

    1 R and Time Series Data

    2 Time Series Decomposition

    3 Time Series Forecasting

    4 Time Series Clustering

    5 Time Series Classification

    6 R Functions & Packages for Time Series

    7 Conclusions

    6/42

  • R and TimeSeries Data

    Time SeriesDecomposi-tion

    Time SeriesForecasting

    Time SeriesClustering

    Time SeriesClassification

    R Functions &Packages forTime Series

    Conclusions

    What is Time Series Decomposition

    To decompose a time series into components:

    Trend component: long term trend

    Seasonal component: seasonal variation

    Cyclical component: repeated but non-periodicfluctuations

    Irregular component: the residuals

    7/42

  • R and TimeSeries Data

    Time SeriesDecomposi-tion

    Time SeriesForecasting

    Time SeriesClustering

    Time SeriesClassification

    R Functions &Packages forTime Series

    Conclusions

    Data AirPassengers

    Data AirPassengers: Monthly totals of Box Jenkinsinternational airline passengers, 1949 to 1960. It has144(=1212) values.> plot(AirPassengers)

    Time

    AirP

    asse

    nger

    s

    1950 1952 1954 1956 1958 1960

    100

    200

    300

    400

    500

    600

    8/42

  • R and TimeSeries Data

    Time SeriesDecomposi-tion

    Time SeriesForecasting

    Time SeriesClustering

    Time SeriesClassification

    R Functions &Packages forTime Series

    Conclusions

    Decomposition

    > apts f # seasonal figures

    > plot(f$figure,type="b")

    2 4 6 8 10 12

    40

    20

    020

    4060

    Index

    f$fig

    ure

    9/42

  • R and TimeSeries Data

    Time SeriesDecomposi-tion

    Time SeriesForecasting

    Time SeriesClustering

    Time SeriesClassification

    R Functions &Packages forTime Series

    Conclusions

    Decomposition

    > plot(f)

    100

    300

    500

    obse

    rved

    150

    250

    350

    450

    tren

    d

    40

    040

    seas

    onal

    40

    020

    60

    2 4 6 8 10 12

    rand

    om

    Time

    Decomposition of additive time series

    10/42

  • R and TimeSeries Data

    Time SeriesDecomposi-tion

    Time SeriesForecasting

    Time SeriesClustering

    Time SeriesClassification

    R Functions &Packages forTime Series

    Conclusions

    Outline

    1 R and Time Series Data

    2 Time Series Decomposition

    3 Time Series Forecasting

    4 Time Series Clustering

    5 Time Series Classification

    6 R Functions & Packages for Time Series

    7 Conclusions

    11/42

  • R and TimeSeries Data

    Time SeriesDecomposi-tion

    Time SeriesForecasting

    Time SeriesClustering

    Time SeriesClassification

    R Functions &Packages forTime Series

    Conclusions

    Time Series Forecasting

    To forecast future events based on known past data

    E.g., to predict the opening price of a stock based on itspast performance

    Popular models

    Autoregressive moving average (ARMA)Autoregressive integrated moving average (ARIMA)

    12/42

  • R and TimeSeries Data

    Time SeriesDecomposi-tion

    Time SeriesForecasting

    Time SeriesClustering

    Time SeriesClassification

    R Functions &Packages forTime Series

    Conclusions

    Forecasting

    > # build an ARIMA model

    > fit fore # error bounds at 95% confidence level

    > U L ts.plot(AirPassengers, fore$pred, U, L,

    + col=c(1,2,4,4), lty = c(1,1,2,2))

    > legend("topleft", col=c(1,2,4), lty=c(1,1,2),

    + c("Actual", "Forecast",

    + "Error Bounds (95% Confidence)"))

    13/42

  • R and TimeSeries Data

    Time SeriesDecomposi-tion

    Time SeriesForecasting

    Time SeriesClustering

    Time SeriesClassification

    R Functions &Packages forTime Series

    Conclusions

    Forecasting

    Time

    1950 1952 1954 1956 1958 1960 1962

    100

    200

    300

    400

    500

    600

    700

    ActualForecastError Bounds (95% Confidence)

    14/42

  • R and TimeSeries Data

    Time SeriesDecomposi-tion

    Time SeriesForecasting

    Time SeriesClustering

    Time SeriesClassification

    R Functions &Packages forTime Series

    Conclusions

    Outline

    1 R and Time Series Data

    2 Time Series Decomposition

    3 Time Series Forecasting

    4 Time Series Clustering

    5 Time Series Classification

    6 R Functions & Packages for Time Series

    7 Conclusions

    15/42

  • R and TimeSeries Data

    Time SeriesDecomposi-tion

    Time SeriesForecasting

    Time SeriesClustering

    Time SeriesClassification

    R Functions &Packages forTime Series

    Conclusions

    Time Series Clustering

    To partition time series data into groups based onsimilarity or distance, so that time series in the samecluster are similar

    Measure of distance/dissimilarity

    Euclidean distanceManhattan distanceMaximum normHamming distanceThe angle between two vectors (inner product)Dynamic Time Warping (DTW) distance...

    16/42

  • R and TimeSeries Data

    Time SeriesDecomposi-tion

    Time SeriesForecasting

    Time SeriesClustering

    Time SeriesClassification

    R Functions &Packages forTime Series

    Conclusions

    Dynamic Time Warping (DTW)

    DTW finds optimal alignment between two time series.

    > library(dtw)

    > idx a b align dtwPlotTwoWay(align)

    Index

    Que

    ry v

    alue

    0 20 40 60 80 100

    1.

    0

    0.5

    0.0

    0.5

    1.0

    17/42

  • R and TimeSeries Data

    Time SeriesDecomposi-tion

    Time SeriesForecasting

    Time SeriesClustering

    Time SeriesClassification

    R Functions &Packages forTime Series

    Conclusions

    Synthetic Control Chart Time Series

    The dataset contains 600 examples of control chartssynthetically generated by the process in Alcock andManolopoulos (1999).

    Each control chart is a time series with 60 values.

    Six classes:

    1-100 Normal101-200 Cyclic201-300 Increasing trend301-400 Decreasing trend401-500 Upward shift501-600 Downward shift

    http://kdd.ics.uci.edu/databases/synthetic_control/synthetic_

    control.html

    18/42

    http://kdd.ics.uci.edu/databases/synthetic_control/synthetic_control.htmlhttp://kdd.ics.uci.edu/databases/synthetic_control/synthetic_control.html

  • R and TimeSeries Data

    Time SeriesDecomposi-tion

    Time SeriesForecasting

    Time SeriesClustering

    Time SeriesClassification

    R Functions &Packages forTime Series

    Conclusions

    Synthetic Control Chart Time Series

    > # read data into R

    > # sep="": the separator is white space, i.e., one

    > # or more spaces, tabs, newlines or carriage returns

    > sc # show one sample from each class

    > idx sample1 plot