time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or...

39
Data Warehousing and Data Mining: Time Series Analytics Berlin, February 5, 2018 Patrick Schäfer Vorlesung: https://hu.berlin/vl_dwhdm17 Übung: https://hu.berlin/ue_dwhdm17

Transcript of time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or...

Page 1: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

DataWarehousingandDataMining:TimeSeriesAnalytics

Berlin,February5,2018

PatrickSchäfer

Vorlesung: https://hu.berlin/vl_dwhdm17Übung: https://hu.berlin/ue_dwhdm17

Page 2: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

Motivation• Temporaldataiscommoninmanydataminingapplicationsanddatawarehouses• ThereisatimedimensioninallM/ERmodels

• Applicationdomainsrangefrom:• Sensordata:environmentalsensorsmeasuretemperature,pressurehumidity

• Medicaldevices:electrocardiogram(ECG)andelectroencephalogram(EEG)

• Financialmarket:stockprices,economicindicators,productsales

• Meteorologicaldata:sedimentsfromdrillholes,earthobservationsatellitedata

2

Page 3: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

Motivation

• Dataanalyticsbasedonsimilarity isthetoolforexploringthesedatasets• Examplesofsimilarityqueriesinclude• Findallstocksthatshowsimilartrends• Findthemostunusualheartbeatinapatient’sECGrecording• Findfrequentpatternsinabirdsoundrecording• FindthepatientswiththemostsimilarECGrecording• Findproductswithasimilarsalespattern

3

Page 4: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

TimeSeriesDefinition

• Definition:ATimeSeriesisasequence(orderedcollection)ofnrealvaluesattimestamps !", … , !% :

& = (", … , (%

• TimeSeriesmaybeunivariateormultivariate• Univariate:asinglevalue)* isassociatedwitheachtimestamp+*.• Multivariate:, values)* = (./, ….0) areassociatedwitheachtimestamp+*.

• Thedimensionalityofatimeseriesreferstothenumberofvaluesateachtimestamp

4

Page 5: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

TimeSeriesvsSequenceData

• Temporaldatamaybediscreteorreal-valued(continuous)• Timeseriescontainrealvaluesandsequence data(i.e.webclickstreams,DNAsequences,documents)referstodiscretedata• Differentalgorithmsapplicableforsequencedata:Hashing,Tries,NaïveBayes,Boyer-Moore,vectorspacemodel,tf-idf,wordembeddings…• Insomecasesatimesseriescanbeconvertedtoasequencebyuseofdiscretization,therebymakeuseofsequenceminingalgorithms

5

Page 6: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

Pre-processing:MissingData

• Itisconvenienttohavetimeseriesthatareequallyspacedandsynchronizedacrosstimestamps• However,itiscommonfortimeseriestocontainmissingdata• Onemethodistoapplylinear,i.e.estimatethe(missing)valuesatdesiredtimestamps• Linearinterpolation:)23/ and)2 arevaluesofthetimeseriesattimes+*3/ and+*

)456 = )23/ ++456 − +*3/ 9 )2 − )23/

+* − +*3/+*3/ +*+456

)23/

)2

)456

missingdata

6

Equallyspaced

Page 7: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

Pre-processing:NoiseRemoval

• Removesshort-termfluctuationandnoise• Binning:• Dividethedataintodisjoint intervalsofsize..ThencalculatethemeanvaluesT = );/ …);6 , with< =4

>,ineachinterval:);* =

∑ @ABCADB CEF GF

>• Thisreducesthenumberofpointsbyafactorof.

• Smoothing(Moving-Averages)• Dividethedataintooverlapping intervalsofsizekoverwhichtheaveragesarecalculated• Thus,theaverageiscomputedateachtimestamp+/, +> , +H, +>I/ , … ratherthanonlyattheintervalboundaries +/, +> , +>I/, +H> , … 7

Page 8: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

Pre-processing:Normalization

• Timeseriesneedtobenormalized,especiallywhendifferentsensorsareusedtorecordthedata• Normalizationputsthedataonthesamescaletomakecomparisonsmeaningful(i.e.FahrenheitundCelsius)• Twomethodsarecommonlyused:• Z-normalization (isgenerallypreferred)• Range-basednormalization

8

Page 9: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

Pre-processing:Z-Normalization

• Z-normalization:LetJ andK bethemeanandstandarddeviationofthetimeseries& = (",… , (% ,then

LMNO, P = )Q/, … , )Q4

with)′* =@C3S

T

9

Page 10: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

Pre-processing:Range-basednormalization

• Range-basednormalization:Theminimumandmaximumvaluesoveralltimeseriesaredetermined,theneachvalueofthetimeseries& = (",… , (% ismappedtorange(0,1)by:

OUMVW_MNO, P = )Q/, … , )Q4

with)′* =@C30*4

YZ[ 30*4

10

Page 11: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

TimeSeriesSimilarity

• Atthecoreoftimeseriesdataanalyticsthereare:• atimeseriesrepresentation.• asimilaritymeasuretocomparetwotimeseries.

11

Page 12: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

SimilarityMeasures(DistanceMeasures)• Thesimilarityoftwotimeseries\ andP isexpressedintermsofarealvalueusingadistancemeasure:] \, P → ℝ`

I

• Asimilaritymeasureistheinverseofthedistancemeasure:itqualifiessimilar(/dissimilar)timeseriesbyasmall(/large)value• Timeseriessimilaritymeasuresaredesignedwithapplicationspecificgoals• MostcommonmethodsareEuclideandistance(ED)andDynamicTimeWarpingDistance(DTW)• OthercommondistancemeasuresareLongestCommonSubsequenceorEditDistance

12

Page 13: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

EuclideanDistance(ED)

• Definition:TheEuclideandistancebetweentwotimeseries\ = (a/, … , a4) andC=(b/, … , b4),bothoflengthM,isdefinedas:

]cd \, e = f a* − b* H

*

• TheEDappliesalinearalignmentofthetimeaxis• EDcannotcopewithvariablelengthtimeseries• EDruntimeisO(n)

13

Page 14: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

DynamicTimeWarping(DTW)

• DynamicTimeWarpingappliesanelastictransformationofthetimeaxistodetectsimilarshapesthathaveadifferentphase• Thisisessentiallyapeak-to-peakandvalley-to-valleyalignmentoftwotimeseries• Intuition:DTWcanbethoughtofasanextensionoftheED,whichusestwoindiceshandi representingbothtimeaxis• Theseindicesareincrementedindependently:

]djk \, e = f a* − b2H

(*,2)

14

Page 15: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

DynamicTimeWarping(CostMatrix)

• DTWstartsbycomputingacostmatrixMwiththedistancesbetweenallpairsofvaluesin\ = (a/ …a4) ande = (b/ … b4)

• ThismatrixhasthedimensionalityMH andisgivenby:l*,2 = a* − b2

H, h, i ∈ 1…M

• DTWthensearchesfortheoptimalwarpingpath

15

Page 16: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

DynamicTimeWarping(WarpingPath)

• Definition:AwarpingpathpisdefinedasasetoftuplesthatdefinesatraversalofthecostmatrixMwhereash andirepresentindicesofthevaluesin\ = (a/ …a4) ande =(b/ … b4):

o/ = ", " , 2,2 , …, M − 1, M − 1 , %, %oH = ", " , 1,2 , 1⏟

r

, 3⏟t

, 2,3 , …, M − 1, M , %, %

• Avalidwarpingpathhastosatisfytwoconditions:• Thestarthastobe(1,1)andtheendhastobe(n,n)• Thepathmayproceedbyamaximumofoneindex:0 ≤ hwI/ − hw ≤ 1 and0 ≤ iwI/ − iw ≤ 1 forallU < M

• EDisthespecialcaseofthediagonalinthecostmatrix

16

Page 17: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

DynamicTimeWarping(Distance)

• Definition:TheDTWdistanceisdefinedasthewarpingpathwiththeminimaltotalcostthroughthecostmatrixl]djk \, e = ,hM f a* − b2

H�

(*,2)∈y

|o ∈ l

• AsDTWisapeak-to-peakandvalley-to-valleyalignmentoftwotimeseries,itmayproducesuboptimalresultsifthereisavariablenumberofpeaksandvalleys• DTWruntimeis{(MH)

17

Page 18: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

DynamicTimeWarping(Algorithm)int DTWDistance(

Time Series q[1..n], TimeSeries c[1..n]) { // cost matrixM:= array [0..n, 0..n] for i := 1 to n

M[i, 0] := infinity for i := 1 to n

M[0, i] := infinity M[0, 0] := 0

//find optimal pathfor i := 1 to n

for j := 1 to n cost := D(q[i], c[j]) // measure distance M[i, j] := cost + minimum(

M[i-1, j ], // insertion M[i , j-1], // deletion M[i-1, j-1]) // match

return M[n, n] }

18

Page 19: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

DynamicTimeWarping(WarpingWindow)

• Awarpingwindowconstraintcanbesettoreducethesearchspace(computationalcomplexity)• Itdefinestheamountofwarpingallowedbetweeneachpairofpointsonthewarpingpatho:

h − i ≤ O 9 M∀ h, i }o

• DTWwithawarpingwindowconstraintOhasacomputationalcomplexityof{(MO)

r

19

Page 20: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

Similarity:Wholevs.SubsequenceMatching

• Wholematching:thedistanceiscalculatedbetweentwowholetimeseries• Subsequencematching:Searchesforthebestsubsequencewithinalongertimeseries• ComplexityfortimeserieslengthnandsubsequencelengthwwithEuclideandistance:• Wholematching:{(M)• Subsequencematching:{ < M − <

long time series C

slide along

sliding window S

Query Q

Similarity D(Q,S)

most similar subsequence

Query Q

Query QQuery Q

Similarity D(Q,S)

dataset DStime series TWhole Matching

Subsequence Matching

20

Page 21: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

TimeSeriesDataTransformations

• Avarietyofmethodsexistforreducingthedimensionalityoftimeseries(i.e.fornoisefiltering,fasterprocessing),• Real-valuedmethodstransformintoasmallernumberofnumericvaluesandsymbolicmethodstransformintodiscretevalues(sequences)

21

Page 22: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

PiecewiseAggregateApproximation(PAA)

• Intuition:torepresentatimeseriesoflengthnthedataisdividedintowequal-sizedintervalsandthemeanvalueineachintervaliscalculated• PAA:atimeseriesP = ()/, … , )4) oflengthnisrepresentedbyaw-dimensionalsequenceofmeanvaluesC = ()/, … , )6),wherei-th elementiscalculatedas:

)*� =<

Mf )2

46*

2Ä46 *3/ I/

PAA:0,08-0,08-0,240,180,450,340,00-0,190,050,900,03-0,32-0,70-0,210,30-1,72

22

Page 23: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

PiecewiseAggregateApproximation(PAA)

23

Page 24: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

DiscreteFourierTransform(DFT)

• Intuition:eachseriesoflengthncanbeexpressedbyalinearcombinationofsmoothperiodicsinusoidalseries• Eachwaveisrepresentedbyacomplexnumber(FourierCoefficient)• ThisrepresentationiscalledFrequencyDomain• TheDFTconcentratesmostofitsenergyinthefirstfewFouriercoefficients• Low-passfilter:OnecanapproximateatimeseriesbyitsfirstwFouriercoefficients

DFT:0-8.81 -20.7 -11.9 -6.28 -8.02 -0.67 15.31 -18.7-18.36 -5.67 16.84 -8.919 -23.8010.70 -21.92 25.255-1.321...

24

Page 25: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

DiscreteFourierTransform(DFT)

• TheDFTdecomposesatimeseriesToflengthnintoasumofnorthogonalbasisfunctionsusingsinusoidwaves• AFouriercoefficient(sinusoidwave)isrepresentedbythecomplexnumber: ÅÇ = OWUÉÇ, h,UVÇ , ÑNOÖ = 0,1… , M − 1

• Then-pointDFTofatimeseriesP = ()/ …)4) isthengivenby:]ÜP P = Å`,… , Å43/ = (OWUÉ`, h,UV`, … , OWUÉ43/, h,UV43/)

• withÅÇ =/

4∑ )* 9 W

3AáàâCä4

*Ä/ , ÑNOÖ} 0, M , i = −1�

25

Page 26: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

DiscreteFourierTransform(DFT)

• ThefirstFouriercoefficientisequaltothemeanofatimeseriesandcanbediscardedtoobtainoffsetinvariance:Å` =

/

4∑ )* 9 W

`4*Ä/

• UsingonlythefirstfewFouriercoefficientsisequaltwolow-passfiltering(smoothening)atimeseries• ComputationalComplexity:TheFastFourierTransformhasacomputationalcomplexityof{(MlogM)tocomputetheDFTofatimeseriesoflengthn

26

Page 27: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

SymbolicAggregateApproximation(SAX)

• SAXconvertsthetimeseriesintoadiscretesequenceofsymbolsby• X-axis:AppliesdimensionalityreductionusingthePAAtransformation

• Y-axis:TransformseachPAAvalueusingdiscretizationtoasymbol

• Valuessampledfromaz-normalizedtimeseriesfollowGaussiandistribution• SAXappliesdiscretizationbasedonGaussiandistributiontoproduceequi-probablesymbols• Arealvaluedtimeseriesisthenrepresentedbyaword

116 J. Lin et al.

0

--

0 20 40 60 80 100 120

bbb

a

cc

c

a

c

a

b

0

--

Fig. 5 A time series is discretized by first obtaining a PAA approximation and then using prede-termined breakpoints to map the PAA coefficients into SAX symbols. In the example above, withn = 128, w = 8 and a = 3, the time series is mapped to the word baabccbc

the time series to a binary vector. They demonstrated that discretizing the timeseries before clustering significantly improves the accuracy in the presence ofoutliers. We note that “clipping” is actually a special case of SAX, where a = 2.

3.3 Distance measures

Having introduced the new representation of time series, we can now definea distance measure on it. By far the most common distance measure for timeseries is the Euclidean distance (Keogh and Kasetty 2002; Reinert et al. 2000).Given two time series Q and C of the same length n, Eq. 3 defines their Euclid-ean distance, and Fig. 6A illustrates a visual intuition of the measure.

D (Q, C) ≡

!""#n$

i=1

(qi − ci)2 (3)

If we transform the original subsequences into PAA representations, Q andC, using Eq. 1, we can then obtain a lower bounding approximation of theEuclidean distance between the original subsequences by

DR(Q, C) ≡%

nw

&$w

i=1(qi − ci)

2 (4)

This measure is illustrated in Fig. 6B. If we further transform the data intothe symbolic representation, we can define a MINDIST function that returnsthe minimum distance between the original time series of two words:

MINDIST(Q, C) ≡%

nw

&$w

i=1

'dist(qi , ci)

(2 (5)

The function resembles Eq. 4 except for the fact that the distance betweenthe two PAA coefficients has been replaced with the sub-function dist(). Thedist() function can be implemented using a table lookup as illustrated in Table 4.

from:Linetal.:ExperiencingSAX:anovelsymbolicrepresentationoftimeseries

27

Page 28: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

SymbolicAggregateApproximation(SAX)

BC

BC

A

28

Page 29: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

SymbolicFourierApproximation(SFA)

• SFArepresentseachrealvaluedtimeseriesbyaword• SFAiscomposedof

a) approximation usingtheFouriertransformand

b) a dataadaptivediscretization• ThediscretizationintervalsarelearnedfromtheFouriertransformeddatadistributionratherthanusingfixedintervals

DFT0-8.81-20.7-11.9-6.28-8.02-0.6715.31-18.7-18.36-5.67-16.84-8.919[...]

DiscretizationCBBCCDCBBCBCB[...]

Raw:0.26790.24800.18280.08170.0051-0.023-0.052-0.082-0.111-0.075-0.032-0.022-0.029[...]

29

Page 30: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

SymbolicFourierApproximation(SFA)

30

Page 31: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

ComparisonofSFAandSAX

• Discretizationandapproximationcauselossofinformation• Thehigherthenumberofsymbolsandthealphabetsize,themoreexactistherepresentation

31

Page 32: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

Propertiesof Symbolic Representations

• Noiseremoval• SAX:UsingPAAandquantization.• SFA:UsingtheDFT(low-passfilter)andquantization.

• Stringrepresentation• Allowsforstringdomainalgorithmslikehashingorthebag-of-wordstobeapplied

• DimensionalityReduction• AllowforindexinghighdimensionaldatausingtheiSAX index(SAX)ortheSFAtrie(SFA)

• Storagereduction• Sequenceshaveamuchlowermemoryfootprintthanreal-valuedtimeseries,i.e.lessthan1byte(“Char”)vs8byte(“Double”)foreachtimestamp

32

Page 33: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

TimeSeriesDataAnalyticsTasks

Euclidean Distance

6

5

4

3

2

1

DTW

6

5

4

3

2

1

BOSS

6

5

4

3

2

1

Bell

Cylinder

Funnel

caffeinchlorogenic acid

?

Motif Discovery Classification

Clustering

Discords

abnormal

abnormal

Query

33

Page 34: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

Motifs

• AMotifisfrequentlyoccurringpatternorshapeinatimeseries• Distance-based(“exact”Motifs)

• Asubsequenceofatimeseriesissaidtosupportamotif,ifthedistancebetweenthesubsequenceandthemotifislessthanathreshold

• Sequentialpatternmining(“approximate”Motifs)• Discretizationisappliedtoconvertthetimeseriesintoasequence.Motifsdiscoverylendsmethodsfromsequentialpatternmining From:JonasSpenger’s BachelorThesis

10

Results - Quality

● 85 planted motif time series● 8 planted sequences (red) per

time series● Task: Recover the planted

motifs (red regions)

Sequences (red) from: Chen, Yanping, et al. "The ucr time series classification archive."

34

Page 35: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

Distance-basedMotifs

• Thesearecommonlydefinedascontiguoussequencesofatimeseries• Approximatedistancematch• Amotif é/, … , é6 issaidtoapproximatelymatchacontiguoussubsequenceoftimeseriesT = )/ …)4 atpositionh with< ≤ M ,ifthedistancebetween é/ … é6 and )* …)*I63/ isatmost}.• ThemostcommondistancemeasureusedistheEuclideandistance

• Motifcount:Thenumberofmatcheswithinthreshold} ofamotiftothetimeseriesisdefinedasthenumberofmatches• Atypicalgoalistofindthemostfrequentmotifs

35

Page 36: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

NaïveAlgorithmMotif findBestMotif(

Time Series (y1,…,yn), WindowLength w, Threshold epsilon)

Begin

for i := 1 to n-w+1 do

candidate_motif = (yi,…,yi+w-1);

for j:=1 to n-w+1 do

D = computeDistance(candidate_motif, (yj,…,yj+w-1));

If (D < epsilon) and (non-trivial match) then

increment count of candidate_motif by 1

end if

end for

if (candidate_motif has highest count so far) then

update best_candidate to candidate_motif;

end if

return best_candidate;

end

• Thenaïvealgorithmextractsallcandidatemotifsoflengthwfromatimeseriesandcomputesthedistancetoalloffsets

• Thenumberofmatchesiscounted• Trivialmatchesareoverlappingsub-sequences

(i.e.i =j)• Theapproachrequiresanested-loopandthe

numberofoperationsisequaltothesizeofthetimeseries,thus{ MH distancecomputations

• TotalcomplexityforEuclideandistance{ MH<

36

Page 37: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

Ideastospeed-up Motifdiscovery

• Computeafastlowerboundforthedistance• Ifthelowerboundisgreaterthanepsilonthentherealdistancedoesnothavetobecomputed

• Approaches:• UsePAAandcomputedistanceonlyformeanvalues

]hé+ Å, è ≥ ,� 9 ]hé+(ÅQ, èQ)• UseSAXandaprecomputedlookuptablefordistancesbetweensymbols

37

Page 38: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

RelationtoSequentialPatternMining

• First,convertthetimeseriesintoasequencesusingforexampleSAX• NowapplyStringminingalgorithms• ItemsetMining:discoverfrequentitemsets usingassociationruleminingalgorithms(ARM).TheA-Priorialgorithmwillbepresentedinthelecture

38

Page 39: time series analysis · Time Series vs Sequence Data •Temporal data may be discrete or real-valued (continuous) •Time series contain real values and sequencedata (i.e. web click

Questions?

53