Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data...

38
Richard J. Povinelli, Doctoral Oral Examination Page 1, 3/9/99 Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and Prediction of Complex Time Series Events Xin Feng, Ph.D. Department of Electrical and Computer Engineering Marquette University Milwaukee, Wisconsin 53233, USA (414)288-3504 [email protected] November 2013 11/4/2013 2 An Ancient Chinese Say … You are not “grown-ups” till you reach the age of 30. You are not “free from doubts” till the age of 40. You do not know God’s Will till the age of 50. The bottom line: Learning is a life-long process

Transcript of Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data...

Page 1: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 1, 3/9/99

Time Series Data Mining (TSDM):A New Temporal Pattern Identification

Method for Characterization and Predictionof Complex Time Series Events

Xin Feng, Ph.D.

Department of Electrical and Computer Engineering

Marquette University

Milwaukee, Wisconsin 53233, USA

(414)288-3504

[email protected]

November 2013

11/4/2013 2

An Ancient Chinese Say …

You are not “grown-ups” till you reach the age of 30.

You are not “free from doubts” till the age of 40.

You do not know God’s Will till the age of 50.

The bottom line:

Learning is a life-long process

Page 2: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 2, 3/9/99

11/4/2013 3

Overview of Presentation

Problem Statement Graphical Problem Statement

Time Series Analysis Literature

Innovative New Approach

Algorithm Phase Space

Mathematical Formulation

Algorithm Results

Applications Progression of Time Series

Engineering

Financial

Collaborators/Credits Dr. Noveen Bansal, MSCS

Dr. Richard Povinelli, EECE

Hai Huang, Microsoft, Inc.

Odilon K. Senyana, FAA

Dr. Wenjing Zhang, Discover, Inc.

Shaobo Wang, MU

Mining Multiple Temporal Patterns 4

Page 3: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 3, 3/9/99

11/4/2013 5

Graphical Problem Statement

Can we automatically characterize these sequences (temporal patterns)?

Can we use such temporal patterns for prediction?

-

0.5

1.0

1.5

2.0

2.5

0 10 20 30 40 50 60 70 80 90 100

t

X

nonstationary, non-periodic time series

11/4/2013 6

Temporal Pattern Find temporal patterns

p PQ, a vector of length Q

-

0.5

1.0

1.5

2.0

2.5

0 10 20 30 40 50 60 70 80 90 100

t

x

temporal patternQ = 5

Page 4: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 4, 3/9/99

11/4/2013 7

Events Temporal patterns that characterize and predict events

Chosen a priori Algorithm not restricted by event definition

Event definition is problem specific

Time instances with high event value

-

0.5

1.0

1.5

2.0

2.5

0 10 20 30 40 50 60 70 80 90 100t

x t

11/4/2013 8

Time Series Analysis Literature Box-Jenkins, ARMA

Pandit and Wu (1983)

Bowerman and O’Connell (1993)

Chaotic deterministic Takens (1981)

Sauer (1991)

Casdagli (1989)

Abarbanel (1990, 1994)

Ghoshray (1996)

Page 5: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 5, 3/9/99

11/4/2013 9

An Innovative Approach Time Series Data Mining

Time Series,

Find temporal patterns that are characteristic and predictive of an event

Nonstationary, non-periodic time series

Chaotic deterministic time series whose attractors are non-stationary

Local model, local prediction Not concerned with characterizing and predicting everywhere (every time)

Characterize and predict events

Applying Genetic Algorithm (GA) to find the “optimal” local model

Exciting, new results with difficult real world time series

X x t Nt , , ,1

p P Q

g xt

11/4/2013 10

Introduction to Data Mining A step in the knowledge discovery process

Application of algorithms to extract meaningfulpatterns

Page 6: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 6, 3/9/99

11/4/2013 11

Data Mining and Knowledge Discovery "The nontrivial extraction of implicit, previously

unknown, and potentially useful information from data"[1]

Uses artificial intelligence, statistical and visualization techniques to discover and present knowledge in a form which is easily comprehensible to humans.

[1] W. Frawley and G. Piatetsky-Shapiro and C. Matheus, Knowledge Discovery in Databases: An Overview. AI Magazine, Fall 1992, pgs 213-228.

11/4/2013 12

Data Mining and Knowledge DiscoveryMore recently (already obsolete):

“The huge amount of tracking data available from web sites as an "information gold mine.”

Business Week

“74% of large companies expect to mine web data to increase profits by 2002.”

Forrester Research (1998)

Page 7: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 7, 3/9/99

11/4/2013 13

Popular Data Mining Problems Associating – Identifies patterns or groups of items

“men who buy red ties also often buy cigars.”

Classifying – Identifies clusters of items with common attributes

“men who buy red ties and cigars also usually have wine at lunch and pay by credit card. ”

11/4/2013 14

Sequencing – Identifies the order of events“ men tend to buy red ties before lunch and cigars after lunch.”

Predictive Modeling – Identifies a likely outcome from item clusters. “men who buy red ties and cigars and have wine at lunch are very likely to buy a silver Benz within two years and finance their purchase by borrowing money from bank”

Popular Data Mining Problems

Page 8: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 8, 3/9/99

11/4/2013 15

Where do prospectors search for the gold? Geological formation

Quartz and ironstone Structures such as banded iron formations

Data Mining Define formations that point to nuggets of information Define patterns that identify an information strike

Definition of “gold” For Gold Mining

Size of nuggets makes a difference Mining for oil or silver is different

Data Mining Definition of knowledge Clearly define the desired nuggets of information

Gold Mining Analogy

11/4/2013 16

Knowledge Discovery in Databases

Data Warehouse

Prepareddata

Data

CleaningIntegration

SelectionTransformation

DataMining

Patterns

EvaluationVisualization

KnowledgeKnowledge

Base

Page 9: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 9, 3/9/99

11/4/2013 17

Features of Data Mining Intend to discover knowledge which are previously

unknown

A Multi-disciplinary field growing out of many areas Mathematical modeling and statistics

Pattern recognition

Computational intelligence

11/4/2013 18

Data Mining Tools Traditional pattern recognition algorithms:

Statistics, Mathematical Modeling, Algorithms. Data Visualization, Graphics

Intelligent computing: Artificial Neural networks, Fuzzy Logic, Genetic Algorithms/Evolutionary

Computing, Expert Systems, Natural Language Processing, etc.

They all use “mechanical” computing power: Database Systems; Client-Server; Internet/WWW;

Software/programming; etc.

Page 10: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 10, 3/9/99

11/4/2013 19

Chaotic Deterministic Time Series No universal definition

Characterized by the following criteria Sensitivity to initial conditions

Positive Lyapunov exponent

Broadband Fourier spectra

Finite, possibly fractional attractors

11/4/2013 20

Attractors Given a manifold M and a map f: M M, an invariant

set S is defined as follows

A positively invariant set requires that n 0.

A set A M is an attracting set if A is a closed invariant set and there exists a neighborhood U of A such that Uis a positively invariant set and

An attractor is defined as an attracting set that contains a dense orbit.N. B. Tufillaro, T. Abbott, and J. Reilly, An Experimental Approach to

Nonlinear Dynamics and Chaos. Redwood City, CA: Addison-Wesley, 1992.

S x x S f x S n Mn 0 0 0: , ,

f x A x Un

Page 11: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 11, 3/9/99

11/4/2013 21

Gold Mining Analogy Where do prospectors search for the gold?

Geological formations Quartz and ironstone

Structures such as banded iron formations

Data Mining Define formations that point to nuggets of information (events)

Define temporal patterns that identify an information strike.

Definition of gold Size of nuggets makes a difference in how the mining is approached.

Mining for oil or silver is different

Data Mining Definition of knowledge

Clearly define the desired nuggets of information (events)

Define event function g(xt)

11/4/2013 22

Two ideas arise from the analogy Patterns (Temporal Patterns)

Patterns that guide and direct to the nuggets of information need to be understood and identified

Temporal patterns and time series embedding

Gold (Events) Nuggets of information require clear definition

Time series events

Relationship between a temporal pattern and event Find “temporal patterns” that indicate a high event value

Example A sequence of charge flow values in the brain that precede a physical response

A series of voltage values that precede the release of a droplet of metal from a welder

A sequence of stock prices that precede a rapid increase in the stock price

Page 12: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 12, 3/9/99

11/4/2013 23

Presentation Status

Problem Statement Graphical Problem Statement

Time Series Analysis Literature

Innovative New Approach

Algorithm Phase Space

Mathematical Formulation

Algorithm Results

Applications Progression of Time Series

Engineering

Financial

11/4/2013 24

Algorithm Search for the optimal temporal pattern that yields the

maximal “event” values

Choose the support, Q, of the temporal pattern p

Define the event function, g

Embed the time series into a phase space Define a metric to allow comparison between the temporal pattern and

embedded time series

Find the “optimal” pattern cluster Pattern neighborhood that contains the maximum average “eventness”

Genetic Algorithm

Page 13: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 13, 3/9/99

11/4/2013 25

Graphical Mapping

-

0.5

1.0

1.5

2.0

2.5

0 10 20 30 40 50 60 70 80 90 100t

X

0.00

0.50

1.00

1.50

2.00

2.50

0.00 0.50 1.00 1.50 2.00 2.50x t

x t+ 1

Phase Space, Q = 2

11/4/2013 26

Phase Space with Event Values

0.00

0.50

1.00

1.50

2.00

2.50

0.00 0.50 1.00 1.50 2.00 2.50xt

xt+

1

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.201

23-20246

xt+1

g

xt

Q = 2

Page 14: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 14, 3/9/99

11/4/2013 27

Mathematical Formulation I Non-stationary training time series

N is the length of the series

Testing time series

Define Event Function Inputs must chosen carefully

Relationship to temporal pattern is important

Example g The percentage change in the time series between times t+Q and t+Q+1

Q is the length of the temporal pattern

g xx x

xtt Q t Q

t Q

1

1

X x t Nt , , ,1

Y x t R S R Nt , , , ,

11/4/2013 28

Mathematical Formulation II Define P Q

A Q dimensional real space

Phase space

Embed X into P, likewise for Y create yt Subsets, xt, of X of cardinality Q

Chosen by any consistent rule

Define metric d for P d(p, xt)

Compare the embedded time series and the temporal pattern

p - temporal pattern, a vector of length Q

xt - embedded time series

x tT

t t t Qx x x t NQ

, , , , , ,

1 11 1

1 2 1 Q

Page 15: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 15, 3/9/99

11/4/2013 29

Property 1 Find a temporal pattern p P and a threshold

that have the following two properties

Property 1 Given the following definitions

That

and the set

is statistically different from the set

M t d t Ntrain t Q : , , , ,p x 1 1

Mtrain

tt M

train

trainc M

g x1

X tt

N Q

N Qg x

1

1 1

1

M Xtrain

g x t Mt train:

g x t N Qt : , , 1 1

11/4/2013 30

Property 2 Property 2

Given the following definitions

That

and that the set

is statistically different from the set

M t d t R Stest t Q : , , , ,p y 1

Mtest

tt M

test

testc M

g x1

Y tt R

S Q

S Q Rg x

1

2

1

M Ytest

g x t Mt test: g x t R S Qt : , , 1

Page 16: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 16, 3/9/99

11/4/2013 31

Define threshold In terms of normalized threshold d

Use the mean and standard deviation statistics of d(p, xt)

Optimization formulation Constraint forces cluster to be greater than 1 in size

dQ

t dt

N

Nd

Q

2

1

2

1

1 1

p x,

dQ

tt

N

Nd

Q

1

1 1

1

p x,

Mathematical Formulation III

maximize

subject to

pp

,, , ,

, .

f X gc M

g x

c M N

M tt M

train

train

train

1

0 1

d d d

11/4/2013 32

Algorithm Results

Training time series

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2-0.5

0

0.5

1

1.5

2

2.5

xt

xt+

1

temporal pattern

pattern neighborhood

highest event value time instances

radius,

Page 17: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 17, 3/9/99

11/4/2013 33

Time Series View of Results

0.0

0.5

1.0

1.5

2.0

2.5

0 10 20 30 40 50 60 70 80 90 100

t

x t

x

p

event

Training time series

11/4/2013 34

Test Series Phase Space

-

0.5

1.0

1.5

2.0

2.5

101 111 121 131 141 151 161 171 181 191 201

t

X

next 100 points of nonstationary, non-periodic time series

0.00

0.50

1.00

1.50

2.00

2.50

0.00 0.50 1.00 1.50 2.00 2.50xt

xt+

1

Page 18: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 18, 3/9/99

11/4/2013 35

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2-0.5

0

0.5

1

1.5

2

2.5

xt

x t+1

Test Series Results

temporal pattern from training

pattern neighborhood from training

highest gold value time instances

11/4/2013 36

Testing Phase Space

0

0.5

1

1.5

2

2.5

-0.5

0

0.5

1

1.5

2

2.50

0.5

1

1.5

2

2.5

xtxt+1

g

Event Values and Training

Pattern Cluster

Page 19: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 19, 3/9/99

11/4/2013 37

Time Series View of Testing

0.0

0.5

1.0

1.5

2.0

2.5

100 110 120 130 140 150 160 170 180 190 200

t

x t

x

p

event

Pattern, p, is from training process

11/4/2013 38

Presentation Status

Problem Statement Graphical Problem Statement

Time Series Analysis Literature

Innovative New Approach

Algorithm Phase Space

Mathematical Formulation

Algorithm Results

Applications Progression of Time Series

Engineering

Financial

Proposed Work

Page 20: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 20, 3/9/99

11/4/2013 39

Applications Set of progressively more complex time series

Show how each is embedded into the phase space

Show results of algorithm

Constant

Sinusoidal

Chirp

Random noise

Engineering Application Welding Time Series

Financial Application Stock Open Price

11/4/2013 40

Progression of Time Series Choose the support of the pattern, Q = 2, to allow

graphical presentation

Define the gold function The t+Qth value in the time series

Q is the length of the pattern

g x B xtQ

t 1

Page 21: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 21, 3/9/99

11/4/2013 41

Constant Train Phase Space

0.00.20.40.60.81.01.21.41.61.82.0

0 10 20 30 40 50 60 70 80 90 100t

x t

Training portion of time series

0.00

0.50

1.00

1.50

2.00

2.50

3.00

0.00 0.50 1.00 1.50 2.00 2.50 3.00

x t

x t+1

Q = 2

11/4/2013 42

Constant Test Phase Space

0.00.20.40.60.81.01.21.41.61.82.0

100 110 120 130 140 150 160 170 180 190 200t

x t

Testing portion of time series

0.00

0.50

1.00

1.50

2.00

2.50

3.00

0.00 0.50 1.00 1.50 2.00 2.50 3.00

x t

x t+1

Q = 2

Page 22: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 22, 3/9/99

11/4/2013 43

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

0 10 20 30 40 50 60 70 80 90 100t

x t

Linearly Increasing Train

Training portion of time series

0.00

0.50

1.00

1.50

2.00

0.00 0.50 1.00 1.50 2.00

x t

x t+1

Q = 2

11/4/2013 44

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

100 110 120 130 140 150 160 170 180 190 200t

x t

Linearly Increasing Test

Testing portion of time series

0.00

1.00

2.00

3.00

4.00

2.00 2.50 3.00 3.50 4.00

x t

x t+1

Q = 2

Page 23: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 23, 3/9/99

11/4/2013 45

-1.0-0.8-0.6-0.4-0.20.00.20.40.60.81.0

0 10 20 30 40 50 60 70 80 90 100

t

x t

Sinusoidal Train Phase Space

Training portion of time

series

-1.00

-0.80

-0.60

-0.40

-0.20

0.00

0.20

0.40

0.60

0.80

1.00

-1.00 -0.50 0.00 0.50 1.00

x t

x t+1

Q = 2

11/4/2013 46

-1.0-0.8-0.6-0.4-0.20.00.20.40.60.81.0

100 110 120 130 140 150 160 170 180 190 200

t

x t

Sinusoidal Test Phase Space

Testing portion of time series

-1.00

-0.80

-0.60

-0.40

-0.20

0.00

0.20

0.40

0.60

0.80

1.00

-1.00 -0.50 0.00 0.50 1.00

x t

x t+1

Q = 2

Page 24: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 24, 3/9/99

11/4/2013 47

-1.0-0.8-0.6-0.4-0.20.00.20.40.60.81.0

0 10 20 30 40 50 60 70 80 90 100

t

x t

Chirp Train Phase Space Training portion of time series

-1.00

-0.80

-0.60

-0.40

-0.20

0.00

0.20

0.40

0.60

0.80

1.00

-1.00 -0.50 0.00 0.50 1.00

x t

x t+1Q = 2

11/4/2013 48

-1.0-0.8-0.6-0.4-0.20.00.20.40.60.81.0

100 110 120 130 140 150 160 170 180 190 200

t

x t

Chirp Test Phase Space Testing portion of time series

-1.00

-0.80

-0.60

-0.40

-0.20

0.00

0.20

0.40

0.60

0.80

1.00

-1.00 -0.50 0.00 0.50 1.00

x t

x t+1 Q = 2

Page 25: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 25, 3/9/99

11/4/2013 49

Noisy Sinusoidal Train Phase Space

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

0 10 20 30 40 50 60 70 80 90 100

t

x t

-1.00

-0.50

0.00

0.50

1.00

1.50

2.00

-1.00 -0.50 0.00 0.50 1.00 1.50 2.00

x t

x t+1

Q = 2

Training portion of

time series

11/4/2013 50

Noisy Sinusoidal Test Phase Space

Testing portion of

time series

Q = 2

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

100 110 120 130 140 150 160 170 180 190 200

t

x t

-1.00

-0.50

0.00

0.50

1.00

1.50

2.00

-1.00 -0.50 0.00 0.50 1.00 1.50 2.00

x t

x t+1

Page 26: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 26, 3/9/99

11/4/2013 51

Noise Train Phase Space

Q = 2

0.00.10.20.30.40.50.60.70.80.91.0

0 10 20 30 40 50 60 70 80 90 100t

x t

0.00

0.20

0.40

0.60

0.80

1.00

0.00 0.20 0.40 0.60 0.80 1.00

x t

x t+1

Uniform Density Random Variable

11/4/2013 52

Noise Test Phase Space

Testing portion of

time series

Q = 2

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

100 110 120 130 140 150 160 170 180 190 200t

x t

0.00

0.20

0.40

0.60

0.80

1.00

0.00 0.20 0.40 0.60 0.80 1.00

x t

x t+1

Page 27: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 27, 3/9/99

11/4/2013 53

Constant Pattern Cluster

1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 31

1.5

2

2.5

3

xt

x t + 1

pclusterx t

Training Pattern Cluster

1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 31

1.5

2

2.5

3

x t

x t + 1

pclusterx t

Pattern cluster applied to test time

series

11/4/2013 54

Linearly Increasing Pattern Cluster

Training portion of time series

0 0.5 1 1.5 2 2.50

0.5

1

1.5

2

2.5

xt

xt + 1

pclusterx t

xt + 1

1.5 2 2.5 3 3.5 41

1.5

2

2.5

3

3.5

4

xt

pclusterxt

Pattern cluster applied to test

time series

Page 28: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 28, 3/9/99

11/4/2013 55

Sinusoidal Pattern Cluster

Training portion of time series

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-1

-0.5

0

0.5

1

1.5

x t

x t + 1

pclusterx t

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-1

-0.5

0

0.5

1

1.5

xt

xt + 1

pclusterx t

Pattern cluster applied to test

time series

11/4/2013 56

Time Series View of Sinusoidal

Testing portion of time series

Pattern, p, is from training process

-1.0

-0.8

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

100 110 120 130 140 150 160 170 180 190 200

t

x t

x

p

event

Page 29: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 29, 3/9/99

11/4/2013 57

Sinusoidal Statistical Tests Runs Test

H0: There is no difference between the matched time series and the whole time series.

HA: There is significant difference between the matched time series and the whole time series.

Using a 1% probability of Type I error ( = 0.01).

= 1.87e-018 which means the null hypothesis can easily be rejected.

Difference of two independent means Although the two populations are probably dependent, this can be ignored

because it makes the statistics more conservative, i.e., it will tend to overestimate the Type I error.

H0: M - g(X) = 0.

HA: M - g(X) > 0.

Using a 1% probability of Type I error ( = 0.01).

= 5.179741e-043 shows that the null hypothesis can be rejected.

11/4/2013 58

Chirp Pattern Cluster

Training portion of time series

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-1

-0.5

0

0.5

1

1.5

x t

x t + 1

pclusterx t

Pattern cluster applied to test

time series

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-1

-0.5

0

0.5

1

1.5

xt

xt + 1

pclusterxt

Page 30: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 30, 3/9/99

11/4/2013 59

Time Series View of ChirpTesting portion of time

series

Pattern, p, is from training process

x

p

event

-1.0

-0.8

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

100 110 120 130 140 150 160 170 180 190 200

t

x t

11/4/2013 60

Noisy Sinusoidal Pattern Cluster

Training portion of

time series

Pattern cluster applied to test

time series

-1 -0.5 0 0.5 1 1.5 2-1

-0.5

0

0.5

1

1.5

2

x t

x t + 1

pclusterx t

-1 -0.5 0 0.5 1 1.5 2-1

-0.5

0

0.5

1

1.5

2

xt

xt + 1

pclusterxt

Page 31: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 31, 3/9/99

11/4/2013 61

Time Series View of Noisy Sinusoidal

Testing portion of time series

Pattern, p, is from training process

x

p

event

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

100 110 120 130 140 150 160 170 180 190 200

t

x t

11/4/2013 62

Noise Pattern Cluster

Training portion of

time series

Pattern cluster applied to test

time series

0 0.2 0.4 0.6 0.8 1 1.2 1.40

0.2

0.4

0.6

0.8

1

xt

xt + 1

pclusterxt

0 0.2 0.4 0.6 0.8 1 1.2 1.40

0.2

0.4

0.6

0.8

1

xt

xt + 1

pclusterxt

Page 32: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 32, 3/9/99

11/4/2013 63

Welding Application Welding Process

Two pieces of metal joined into one by making a joint between them

Arcing current is created between welder and metal to be joined

Wire is pushed out of welder

Tip of wire melts, forming a droplet of metal that elongates (sticks out) until it releases

Goal: Predict when droplet will release Can’t be done by traditional methods

11/4/2013 64

Welding Data Set Goal: Predict when a droplet will release

Four time series Release (event), 1kHz sampling rate (~5000 data points)

Stickout, 1kHz sampling rate (~5000 data points)

Current, 5kHz sampling rate (~35,000 data points)

Voltage, 5kHz sampling rate (~35,000 data points)

Data not originally synchronized

First Pass Used release time series as event function

Used stickout as time series “not too reliable”

Used a set of phase spaces (Q) from dimension 1 to 16

Generated 16 temporal patterns

Page 33: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 33, 3/9/99

11/4/2013 65

Welding Time Series

200 210 220 230 240 250 260 270 280 290 3000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

11050 1100 1150 1200 1250 1300 1350 1400 1450 1500

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

stickoutdetachmentcurrentvoltage

11/4/2013 66

Welding Initial Results Training Set

Runs = 0

Means = 3.02 x 10-49

true positives: 125 (5.6%), false positives: 99 (4.4%)

true negatives: 2005 (89.3%), false negatives: 17 (0.8%)

94.8% accuracy

Testing Set Runs = 0

Means = 1.34 x 10-51

true positives: 97 (3.5%), false positives: 52 (1.9%)

true negatives: 2470 (89.1%), false negatives: 55 (2.0%)

95.1% accuracy

Page 34: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 34, 3/9/99

11/4/2013 67

Statistical Tests Runs Test

H0: There is no difference between the matched time series and the whole time series.

HA: There is significant difference between the matched time series and the whole time series.

Using a 1% probability of Type I error ( = 0.01).

Difference of two independent means Although the two populations are probably dependent, this can be ignored

because it makes the statistics more conservative, i.e., it will tend to overestimate the Type I error.

H0: M - g(X) = 0.

HA: M - g(X) > 0.

Using a 1% probability of Type I error ( = 0.01).

11/4/2013 68

ICN Time Series ICN is a pharmaceutical company whose stock trades

on the NYSE

Training Time Series Daily open price for 1st 126 trading days of 1990

Testing Time Series Daily open price for the 2nd 127 trading days of 1990

Stock prices tend to grow exponentially A filter is applied to the time series

Z z

B

Bxt t

1

Page 35: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 35, 3/9/99

11/4/2013 69

ICN Train Phase Space

z

-0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

z t

t + 1

pclusterx t

Filter

Q = 2

0.0

2.0

4.0

6.0

8.0

10.0

12.0

0 10 20 30 40 50 60 70 80 90 100 110 120t

x t

11/4/2013 70

ICN Test Phase Space

-0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

x t

x t + 1

pclusterx t

Pattern, p, is from training

process

Filter

0.01.02.03.04.05.06.07.08.09.0

10.0

126 136 146 156 166 176 186 196 206 216 226 236 246t

x t

Page 36: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 36, 3/9/99

11/4/2013 71

Time Series View of ICN

Testing portion of time series, filtered

Pattern, p, is from training process

-0.2

-0.1

-0.1

0.0

0.1

0.1

0.2

0.2

0.3

126 136 146 156 166 176 186 196 206 216 226 236 246

t

z t

zp

event

11/4/2013 72

ICN Initial Results Training Set

Using Buy and Hold Strategy: -26.2% (125 days)

Using Temporal Patterns: 51.0% (8 days)

Runs = 2.21 x 10-3

Means = 2.42 x 10-2

Testing Set Using Buy and Hold Strategy: -24.1% (126 days)

Using Temporal Patterns: 17.3% (22 days)

Runs = 3.63 x 10-3

Means = 2.54 x 10-1

Page 37: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 37, 3/9/99

11/4/2013 73

Presentation Status

Problem Statement Graphical Problem Statement

Time Series Analysis Literature

Innovative New Approach

Algorithm Phase Space

Mathematical Formulation

Algorithm Results

Applications Progression of Time Series

Engineering

Financial

11/4/2013 74

Thank You References

Povinelli, R. J., and Feng, X. (1998). “Temporal Pattern Identification of Time Series Data using Pattern Wavelets and Genetic Algorithms” Proceedings, 8th

International Conference on Artificial Neural Networks in Engineering (ANNIE’99), Vol. 8, pp.691-696, November 1998.

Povinelli and Xin Feng. (1999). "Data Mining of Multiple Non-stationary Time Series" Proceedings, 9th International Conference on Artificial Neural Networks in Engineering (ANNIE’99), Vol. 9, pp. 511-516, November 1999.

Richard J. Povinelli and Xin Feng. "Improving Genetic Algorithms Performance By Hashing Fitness Values" Proceedings, 9th International Conference on Artificial Neural Networks in Engineering (ANNIE’99), Vol. 9, pp. 399-404,November 1999.

Richard J. Povinelli and Xin Feng. "Characterization And Prediction of Welding Droplet Release Using Time Series Data Mining" Proceedings, 10th International Conference on Artificial Neural Networks in Engineering (ANNIE2000), Vol. 9, pp. 857-862, November, 2000

Richard J. Povinelli and Xin Feng, “A New Temporal Pattern Identification Method for Characterization and Prediction of Complex Time Series Events,” IEEE Transactions On Knowledge And Data Engineering, Vol. 15, No. 2, March/April 2003.

Page 38: Time Series Data Mining (TSDM)mehdi/seminar/11-4-13_TSDM.pdf · 11/4/2013  · Time Series Data Mining (TSDM): A New Temporal Pattern Identification Method for Characterization and

Richard J. Povinelli, Doctoral Oral Examination

Page 38, 3/9/99

THANK YOU!

[email protected]

11/4/2013 76