Time%SeriesIpeople.dsv.su.se/~panagiotis/DAMI2014/timeseries1.pdf · 2014-12-07 · Syllabus% Nov4...

Time Series I

1

Syllabus Nov 4 Introduc8on to data mining

Nov 5 Associa8on Rules

Nov 10, 14 Clustering and Data Representa8on

Nov 17 Exercise session 1 (Homework 1 due)

Nov 19 Classifica8on

Nov 24, 26 Similarity Matching and Model Evalua8on

Dec 1 Exercise session 2 (Homework 2 due)

Dec 3 Combining Models

Dec 8, 10 Time Series Analysis

Dec 15 Exercise session 3 (Homework 3 due)

Dec 17 Ranking

Jan 13 Review

Jan 14 EXAM

Feb 23 Re-‐EXAM

Why deal with sequen8al data? •  Because all data is sequen8al J •  All data items arrive in the data store in some order •  Examples

–  transac8on data –  documents and words

•  In some (or many) cases the order does not maXer •  In many cases the order is of interest

3

Time-‐series data: example

Financial 8me series 4

Ques8ons

•  What is 8me series?

•  How do we compare 8me series data?

•  What is the structure of 8me series data?

•  Can we represent this structure compactly and accurately?

5

Time Series •  A sequence of observations:

–  X = (x1, x2, x3, x4, …, xn) •  Each xi is a real number

–  e.g., (2.0, 2.4, 4.8, 5.6, 6.3, 5.6, 4.4, 4.5, 5.8, 7.5)

8me axis

value axis

Time Series Databases •  A <me series is an ordered set of real numbers,

represen8ng the measurements of a real variable at equal 8me intervals

– Stock prices – Volume of sales over 8me – Daily temperature readings – ECG data

•  A <me series database is a large collec8on of 8me series

7

•  Given two 8me series X = (x1, x2, …, xn) Y = (y1, y2, …, yn)

•  Define and compute D (X, Y) •  Or be@er…

Time Series Similarity

database

query X

D (X, Y) 1-NN

Time Series Similarity Search •  Given a 8me series database and a query X •  Find the best match of X in the database

•  Why is that useful?

Examples

•  Find companies with similar stock prices over a

8me interval

•  Find products with similar sell cycles

•  Cluster users with similar credit card u8liza8on

•  Find similar subsequences in DNA sequences

•  Find scenes in video streams

10

Types of queries

•  whole match vs subsequence match •  range query vs nearest neighbor query

11

day

$price

1 365

day

$price

1 365

day

$price

1 365

distance function: by expert

(e.g., Euclidean distance)

12

Problems

•  Define the similarity (or distance) func8on •  Find an efficient algorithm to retrieve similar 8me series from a database –  (Faster than sequen8al scan)

The Similarity function depends on the Application

13

Metric Distances

•  What proper8es should a similarity distance have to allow (easy) indexing?

–  D(A,B) = D(B,A) Symmetry –  D(A,A) = 0 Constancy of Self-‐Similarity –  D(A,B) >= 0 Posi4vity –  D(A,B) ≤ D(A,C) + D(B,C) Triangle Inequality

•  Some8mes the distance func8on that best fits an applica8on is not a metric

•  Then indexing becomes interes8ng and challenging 14

Euclidean Distance

15

•  Each 8me series: a point in the n-‐dim space

•  Euclidean distance – pair-‐wise point distance

v1 v2

L2 = (xi − yi )2

i=1

n

∑

X = x1, x2, …, xn

Y = y1, y2, …, yn

Euclidean model Query Q

n datapoints

Database

n datapoints 16

Query Q

n datapoints

D Q,X( ) ≡ qi − xi( ) 2i=1

n∑

S

Q

Euclidean Distance between two time series Q = {q1, q2, …, qn} and X = {x1, x2, …, xn}

Database

n datapoints 17

Euclidean model

Query Q

n datapoints

D Q,X( ) ≡ qi − xi( ) 2i=1

n∑

S

Q

Euclidean Distance between two time series Q = {q1, q2, …, qn} and X = {x1, x2, …, xn}

Distance

0.98

0.07

0.21

0.43

Rank

4

1

2

3

Database

n datapoints 18

Euclidean model

•  Easy to compute: O(n) •  Allows scalable solu8ons to other problems, such as –  indexing – clustering – etc...

Advantages

19

•  Query and target lengths should be equal!

•  Cannot tolerate noise: –  Time shiks –  Sequences out of phase –  Scaling in the y-‐axis

Disadvantages

20

21

Limita8ons of Euclidean Distance

Euclidean Distance Sequences are aligned “one to one”.

“Warped” Time Axis Nonlinear alignments are possible.

D Q,X( ) ≡ qi − xi( ) 2i=1

n∑

Q

Q

C

C

22

DTW: Dynamic 8me warping (1/2)

•  Each cell c = (i, j) is a pair of indices whose corresponding values will be computed, (xi–qj)2, and included in

the sum for the distance.

•  Euclidean path:

–  i = j always.

–  Ignores off-‐diagonal cells.

X

Q

(x2–q2)2 + (x1–q1)2 (x1–q1)2

23

(i, j)

DTW: Dynamic 8me warping (2/2)

•  DTW allows any path. •  Examine all paths:

•  Standard dynamic programming to fill in the table.

•  The top-‐right cell contains final result.

(i, j) (i-1, j)

(i-1, j-1) (i, j-1)

Shrink X / stretch Q

Stretch X / shrink Q

X

Q

a

b

24

Computa8on

Ddtw (Q,X) = f (N,M )

f (i, j) = qi − x j +minf (i, j −1)f (i−1, j)f (i−1, j −1)

"

#$

%$

q-‐stretch no stretch

x-‐stretch

•  DTW is computed by dynamic programming •  Given two sequences

– Q = {q1, q2, …, qN} – X = {x1, x2, …, xM}

•  Warping path W: –  set of grid cells in the 8me warping matrix

•  DTW finds the op8mum warping path W: –  the path with the smallest matching score

Op8mum warping path W (the best alignment) Proper<es of a DTW legal path

I.   Boundary condi<ons

W1=(1,1) and WK=(n,m)

II.   Con<nuity Given Wk = (a, b), then Wk-‐1 = (c, d), where a-‐c ≤ 1, b-‐d ≤ 1

III.   Monotonicity Given Wk = (a, b), then Wk-‐1 = (c, d), where a-‐c ≥ 0, b-‐d ≥ 0

Proper8es of DTW

X

Y

25

Proper8es of DTW

I.   Boundary condi<ons

W1=(1,1) and WK=(n,m)

II.   Con<nuity Given Wk = (a, b), then Wk-‐1 = (c, d), where a-‐c ≤ 1, b-‐d ≤ 1

III.   Monotonicity Given Wk = (a, b), then Wk-‐1 = (c, d), where a-‐c ≥ 0, b-‐d ≥ 0 26

•  Paths start at the boXom lek cell and end at the top right cell

•  There is always a point of the path in each row and column of the matrix

•  Paths go always from lek to right and from boXom to top

•  Query and target lengths may not be of equal length J

•  Can tolerate noise: –  8me shiks –  sequences out of phase –  scaling in the y-‐axis

Advantages

27

•  Computa8onal complexity: O(nm)

•  May not be able to handle some types of noise...

•  DTW is not metric (triangle inequality does not hold)

Disadvantages

28

29

Sakoe-‐Chiba Band Itakura Parallelogram

r =

Global Constraints n  Slightly speed up the calcula8ons and prevent pathological warpings n  A global constraint limits the indices of the warping path

wk = (i, j)k such that j-‐r ≤ i ≤ j+r n  Where r is a term defining allowed range of warping for a given point in a

sequence

Complexity of DTW

•  Basic implementa8on = O(n2) where n is the length of the sequences –  will have to solve the problem for each (i, j) pair

•  If warping window is specified, then O(nr) –  only solve for the (i, j) pairs where | i – j | <= r

30

Longest Common Subsequence Measures

(Allowing for Gaps in Sequences)

Gap skipped

31

Longest Common Subsequence (LCSS)

ignore majority of noise

match

match

Advantages of LCSS:

A. Outlying values not matched

B. Distance/Similarity distorted less

Disadvantages of DTW:

A. All points are matched

B. Outliers can distort distance

C. One-to-many mapping

LCSS is more resilient to noise than DTW.

32

Longest Common Subsequence Similar dynamic programming solution as DTW, but now we measure similarity not distance.

Can also be expressed as distance

33

Similarity Retrieval

•  Range Query –  Find all 8me series X where

•  Nearest Neighbor query –  Find all the k most similar 8me series to Q

•  A method to answer the above queries: –  Linear scan

•  A beXer approach – GEMINI [next 8me]

D Q,X( ) ≤ ε

34

35

Lower Bounding – NN search

Intui<on ü  Try to use a cheap lower bounding calcula8on as oken as possible ü  Do the expensive, full calcula8ons when absolutely necessary

We can speed up similarity search by using a lower bounding func8on §  D: distance measure

§  LB: lower bounding func8on s.t.: LB(Q, X) ≤ D(Q, X)

Ø  Set best = ∞ Ø  For each Xi:

à if LB(Xi, Q) < best if D(Xi, Q) < best best = D(Xi, Q)

1-NN Search Using LB

We assume a database of 8me series: DB = {X1, X2, …, XN}

36

Lower Bounding – NN search

Intui<on ü  Try to use a cheap lower bounding calcula8on as oken as possible ü  Do the expensive, full calcula8ons when absolutely necessary

We can speed up similarity search by using a lower bounding func8on §  D: distance measure

§  LB: lower bounding func8on s.t.: LB(Q, X) ≤ D(Q, X)

Range Query Using LB For each Xi:

à if LB(Xi, Q) ≤ ε if D(Xi, Q) < ε report Xi

We assume a database of 8me series: DB = {X1, X2, …, XN}

Problems •  How to define Lower bounds for different distance measures?

•  How to extract the features? How to define the feature space? –  Fourier transform – Wavelets transform – Averages of segments (Histograms or APCA) –  Chebyshev polynomials –  .... your favorite curve approxima8on...

37

38

Some Lower Bounds on DTW

Each 8me series is represented by 4 features: <First, Last, Min, Max>

LB_Kim = maximum squared difference of the corresponding features

LB_Kim

max(Q)

min(Q)

LB_Yi

LB_Yi = squared differences of the points of X that fall above max(Q) or below min(Q)

X

Q

X

Q

39

LB_Keogh [Keogh 2004]

L

U

Q

U

L Q

X

Q

X

Q

Sakoe-‐Chiba Band

Itakura Parallelogram

Ui = max(qi-‐r : qi+r) Li = min(qi-‐r : qi+r)

40

X U

L Q

X U

L Q

X

Q

X

Q

Sakoe-Chiba Band

Itakura Parallelogram

LB_Keogh(Q,X)=

(xi −Ui )2 if xi >Ui

(xi − Li )2 if xi <Li

0 otherwise

"

#$$

%$$

i=1

n

∑LB_Keogh

LB_Keogh

LB_Keogh(Q,X) ≤ DTW (Q,X)

41

LB_Keogh Sakoe-Chiba

LB_Keogh Itakura

LB_Yi

LB_Kim

…propor8onal to the length of gray lines used in the illustra8ons

Tightness of LB

nceDistaWarpTimeDynamicTruenceDistaWarpTimeDynamicofEstimateBoundLowerT =

0 ≤ T ≤ 1 The larger the

beXer

Lower Bounding

distance Q

we want to find the 1-‐NN to our query data series, Q

Lower Bounding

distance Q true S1

we compute the distance to the first data series in our dataset, D(S1,Q)

this becomes the best so far (BSF)

Lower Bounding

distance Q true S1

BSF

LB S2

we compute the distance LB(S2,Q) and it is greater than the BSF

we can safely prune it, since D(S2,Q) LB(S2,Q)

Lower Bounding

distance Q true S1

BSF

LB S2

we compute the distance LB(S3,Q) and it is smaller than the BSF we have to compute D(S3,Q)≥ LB(S3,Q), since it may s8ll be

smaller than BSF

LB S3

Lower Bounding

distance Q true S1

BSF

LB S2

it turns out that D(S3,Q)≥ BSF, so we can safely prune S3

true S3

Lower Bounding

distance Q true S1

BSF

LB S2 true S3

Lower Bounding

distance Q true S1

BSF

LB S2 true S3

we compute the distance LB(S4,Q) and it is smaller than the BSF we have to compute D(S4,Q)≥ LB(S4,Q), since it may s8ll be

smaller than BSF

LB S4

Lower Bounding

distance Q true S1

BSF

LB S2 true S3 true S4

it turns out that D(S4,Q)< BSF, so S4 becomes the new BSF

Lower Bounding

distance Q true S1

S1 cannot be the 1-‐NN, because S4 is closer to Q

LB S2 true S3 true S4

BSF

51

How about subsequence matching?

•  DTW is defined for full-‐sequence matching: –  All points of the query sequence are matched to all points of the target sequence

•  Subsequence matching: –  The query is matched to a part (subsequence) of the target sequence

Query sequence Data stream

X: long sequence

Q: short sequence

What subsequence of X is the best match for Q?

Subsequence Matching

X: long sequence

Q: short sequence

What subsequence of X is the best match for Q … such that the match ends at position j?

position j

J-Position Subsequence Match

X: long sequence

Q: short sequence

X: long sequence

Q: short sequence

position j


X: long sequence

Q: short sequence

Naïve Solution: DTW Examine all possible subsequences

X: long sequence

Q: short sequence

position j



X: long sequence

Q: short sequence

X: long sequence

Q: short sequence


X: long sequence

Q: short sequence

position j


Too costly!


X: long sequence

Q: short sequence

X: long sequence

Q: short sequence


58

•  Compute the 8me warping matrices star8ng from every database frame –  Need O(n) matrices, O(nm) 8me per frame

Q

X xtstart xtend

x1

Why not ‘naive’?

Capture the optimal subsequence starting

from t = tstart n

m

59

Key Idea •  Star-padding

– Use only a single matrix

(the naïve solution uses n matrices)

–  Prefix Q with ‘*’, that always gives zero distance

–  Instead of Q=(q1 , q2 , …, qm), compute distances with Q’

– O(m) time and space (the naïve requires O(nm))

(*)),,,,('

0

210

=

=

qqqqqQ m…

SPRING: dynamic programming

n  Initialization n  Insert a “dummy” state ‘*’ at the beginning of the query n  ‘*’ matches every value in X with score 0

database sequence X

quer

y Q

* 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

n  Computation n  Perform dynamic programming computation in a similar

manner as standard DTW

database sequence X

* 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

quer

y Q


(i, j) (i, j) (i-1, j)

(i-1, j-1) (i, j-1)

Q[1:i] is matched with X[s,j]

database sequence X

* 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

quer

y Q

i

js

n  For each (i, j): n  compute the j-position subsequence match of the first i

items of Q to X[s:j]


n  For each (i, j): n  compute the j-position subsequence match of the first i

items of Q to X[s:j] n  Top row: j-position subsequence match of Q for all j’s n  Final answer: best among j-position matches

n  Look at answers stored at the top row of the table

database sequence X

* 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

quer

y Q


database sequence X

* 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Subsequence vs. full matching qu

ery

Q

Q

p1 pi pN

q1

qj

qM

n  Assume that the database is one very long sequence n  Concatenate all sequences into one sequence

n  O (|Q| * |X|) n  But can be computed faster by looking at only two

adjacent columns

Computational complexity

database sequence X

* 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

quer

y Q

STWM (Subsequence Time Warping Matrix)

•  Problem of the star-padding: we lose the information about the starting frame of the match

•  After the scan, “which is the optimal subsequence?”

•  Elements of STWM

– Distance value of each subsequence

–  Starting position !!

•  Combination of star-padding and STWM

– Efficiently identify the optimal subsequence in a stream fashion

Up next…

•  Time series summariza8ons

–  Discrete Fourier Transform (DFT)

–  Piecewise Aggregate Approxima8on (PAA)

–  Symbolic ApproXimation (SAX)

•  Streams

–  Z-normalization

–  A fast algorithm for subsequence matching in streams

•  Time series classification [briefly]

–  Lazy learners and Shapelets

Time%SeriesIpeople.dsv.su.se/~panagiotis/DAMI2014/timeseries1.pdf · 2014-12-07 · Syllabus% Nov4...

Documents

Transcript of Time%SeriesIpeople.dsv.su.se/~panagiotis/DAMI2014/timeseries1.pdf · 2014-12-07 · Syllabus% Nov4...