Abdullah Mueen Eamonn Keogh University of California, Riverside

18
Abdullah Mueen Eamonn Keogh University of California, Riverside Online Discovery and Maintenance of Time Series Motifs

description

Online Discovery and Maintenance of Time Series Motifs. Abdullah Mueen Eamonn Keogh University of California, Riverside. Time Series Motifs. Repeated pattern in a time series Utility: Activity/Event discovery Summarization Classification Application - PowerPoint PPT Presentation

Transcript of Abdullah Mueen Eamonn Keogh University of California, Riverside

Page 1: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Abdullah Mueen Eamonn KeoghUniversity of California, Riverside

Online Discovery and Maintenance

of Time Series Motifs

Page 2: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Time Series Motifs

• Repeated pattern in a time series

• Utility:– Activity/Event discovery– Summarization– Classification

• Application– Data Center Chiller Management (Patnaik, et al. KDD 09)– Designing Smart Cane for Elders (Vahdatpour, et al. IJCAI 09)

0 2000 4000 6000 80000

10

20

30

Page 3: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Online Time Series Motifs

• Streaming time series• Sliding window of the recent history

– What minute long trace repeated in the last hour?

Page 4: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Problem FormulationDiscovery

100 200 300 400 500 600 700 800 900 1000

-8000

-7500

-7000

. . .

12345678...

873

The most similar pair of non-overlappingsubsequences

Maintenance

12345678...

873874

time:100112345678...

873874875

time:100212345678...

873874875876

time:100312345678...

873874875876877

time:100412345678...

873874875876877878

time:1005

Update motif pair afterevery time tick

time

Page 5: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Challenge

• A subsequence is a high dimensional point– The dynamic closest pair of points problem

• Closest pair may change upon every update• Naïve approach: Do quadratic comparisons.

4

5

2

7

3

6

89

4

5

2

7

3

6

8

1

4

5

2

7

3

6

8

1

Deletion Insertion

Page 6: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Our Approach

• Goal: Algorithm with Linear update time• Previous method for dynamic closest pair

(Eppstein,00) – A matrix of all-pair distances is maintained

• O(w2) space required

– Quad-tree is used to update the matrix• Maintain a set of neighbors and reverse

neighbors for all points• We do it in O( ) spaceww

Page 7: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Outline

• Methods• Evaluation• Extensions• Case Studies• Conclusion

Page 8: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Maintaining Motif• Smallest nearest neighbor → Closest pair• Upon insertion

– Find the nearest neighbor; Needs O(w) comparisons.

• Upon deletion– Find the next NN of all the reverse NN

4

5

2

7

3

6

8

1

4

5

2

7

3

6

8

1

Insertion

9

4

5

2

7

3

6

8

1

Deletion

9

Page 9: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Data Structure

1 2 3 4 5 6 7 8FIFO Sliding window

N-listsNeighbors in

order of distances

8 7 21 8 1 24

5 1 17 3 3 67

7 6 75 2 4 55

1 4 83 7 5 72

4 8 42 5 2 13

6 1 38 1 8 38

3 5 66 4 6 46

R-listsPoints that should be updated

Total number of nodes in R-lists is w

5 1 3 24

8 67

4

5

2

7

3

6

8

1

4

7

Still O(w2) space

(ptr, distance)

Page 10: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

ObservationsWhile inserting

Updating NN of old points is not necessary A point can be removed from the neighbor list if it

violates the temporal order

Averagelength O( )w

Page 11: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Example4

5

2

7

3

6

8

1

1 1 21 3 1 2

52

83

2 43 5 3 6

4 7

5

6

4

7

6

1 2 3 4 5 6 7 8

2 3 32 3 2 6

3 7 9

5

54 4 6 8

5 7

6

68

4

2 3 4 5 6 7 8 91

4

5

2

7

3

6

8

1

9

4

5

2

7

3

6

89

10

3 34 3 6 6

57 9

54 4 7 8

5

6

4

6 8

5

7

8

9

3 4 5 6 7 8 9 1021

10

Page 12: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Evaluation

0

0.02

0.04

0.06

0.08

0.1

0.12

0.140.16

0.18

Fast Pair Insect

EEGEOGRW

Ave

rage

Upd

ate

Tim

e (S

ec)

0 0.5 1 1.5 2 2.5 3 3.5 4x 104

Window Size (w)0 200 400 600 800 1000

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Insect

EEG

EOGRW

Fast Pair

Ave

rage

Upd

ate

Tim

e (S

ec)

Motif Length (m)

8x

0 0.5 1 1.5 2 2.5 3 3.5 4x 104

102030405060708090100110

Insect

EEG

EOG

RW

Ave

rage

Len

gth

of N

-lis

t

Window Size (w)

•Up to 8x speedup from general dynamic closest pair• Stable space cost per point with increasing window size

0 200 400 600 800 10000

50

100

150

200

250

300

350

InsectEEG

EOG

RW

Motif Length (m)

Ave

rage

Len

gth

of N

-lis

t

Page 13: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Outline

• Methods• Evaluation• Extensions• Case Studies• Conclusion

Page 14: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Extension

1 1 21 3 1 2

52

83

2 43 5 3 6

4 5

5 7

6

4

7

6

1 2 3 4 5 6 7 8

1 1 21 3 1 2

52

83

2 43 5 3 6

4 5

5 7

6

4

7

6

1 2 3 4 5 6 7 8

1 1 21 3 1 2

52

83

2 43 5 3 6

4 5

5 7

6

4

7

6

1 2 3 4 5 6 7 8

• Multidimensional Motif• Maintain motif in Multiple Synchronous time series• Points in one series can have neighbors in others

• Arbitrary Data Rate• Load shedding: Skip points if can’t handle

0.40.50.60.70.80.91.0

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Fraction of Data Used

Fraction of M

otifs Discovered

EOGRandom Walk

EEGInsect FIFO for each streams

Page 15: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Case Study 1: Online Compression

• Replace all the occurrences of a motif by a pointer to that motif.

• For signals with regular repetitions– Higher compression rate with less error.

0 500 1000 1500 2000

Raw

PLA

MR

A

BA AB B B B B

Dictionary

Page 16: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Case Study 2: Closing The Loop

• Check if a robot closed a loop• Convert the stream of video frames to stream

of color histograms.• Most similar color histograms are good

candidates for loop detection.

0 400 800 1200 16000481216

0 400 800 1200 16000481216

Start Loop Closure Detected

Page 17: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Conclusion

• First attempt to maintain time series motif online– Maintains minutes long repetition in hours long

sliding window• Linear update time with less space cost than

quad-tree based method– O( ) Vs O(w2)

• Faster than general dynamic closest pair solution

ww

Page 18: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Thank You

Code and Data: http://www.cs.ucr.edu/~mueen/OnlineMotif