Sakurai Lab.

56
Mining and Forecasting of Big Time-series Data Yasushi Sakurai Sakurai Lab. © 2016 Yasushi Sakurai 1 Sakurai Lab. @ Kumamoto University

Transcript of Sakurai Lab.

Page 1: Sakurai Lab.

MiningandForecastingofBigTime-seriesData

Yasushi Sakurai

Sakurai Lab. © 2016 Yasushi Sakurai 1

Sakurai Lab.@ Kumamoto University

Page 2: Sakurai Lab.

BigData

Big time-series data

Sakurai Lab. © 2016 Yasushi Sakurai 2

climate

EconomyWeb

Epidemic

Social/naturalphenomena

IoT,sensors

Page 3: Sakurai Lab.

BigData

Big time-series data

Sakurai Lab. © 2016 Yasushi Sakurai 3

climate

EconomyWeb

Epidemic

Social/naturalphenomena

Environmental sensors

- Temperature,- Air pressure, etc.Application

- Agriculture

IoT,sensors

Page 4: Sakurai Lab.

BigData

Big time-series data

Sakurai Lab. © 2016 Yasushi Sakurai 4

climate

EconomyWeb

Epidemic

Social/naturalphenomena

IoT,sensors

Online user activities

- Web clicks, - Online reviewsApplication

- Market analysis

Page 5: Sakurai Lab.

BigData

Big time-series data

Sakurai Lab. © 2016 Yasushi Sakurai 5

climate

EconomyWeb

Epidemic

Social/naturalphenomena

IoT,sensors

Sensor streams

- Vibration,AccelerationApplication

- Self-driving car- Structural health

monitoring

Page 6: Sakurai Lab.

Motivation

• Given: Big time-series data– Sensor data– Web, online activities– Medical data

• Goal: Real-time forecasting• At-work: – Online marketing– Sensor monitoring– Forecasting future epidemics

Sakurai Lab. © 2016 Yasushi Sakurai 6

Page 7: Sakurai Lab.

New challenges

Sakurai Lab. © 2016 Yasushi Sakurai 7

Indexing,Similaritysearch

Featureextraction

Datastream

Linear-modeling

Time-seriesdataanalysis

PCA ICAModel(M)Data(X)

ED,DTW,SAM

AR,LDS,ARIMA

Correlationmonitoring

etc.

DFT,DWT,SVD,ICA

Y

X

DTW

Page 8: Sakurai Lab.

ED,DTW,SAM

AR,LDS,ARIMA

Correlationmonitoring

etc.

DFT,DWT,SVD,ICA

New challenges

Sakurai Lab. © 2016 Yasushi Sakurai 8

Indexing,Similaritysearch

Featureextraction

Datastream

Linear-modeling

Theycannothandle“Big”data,i.e.,

Y

X

PCA ICADTWModel(M)Data(X)

Automaticparameter-tuning

Non-linearphenomena

Complexevents Real-time

forecastingX

Page 9: Sakurai Lab.

New challenges

Sakurai Lab. © 2016 Yasushi Sakurai 9

Indexing,Similaritysearch

Featureextraction

Datastream

Linear-modeling

Theycannothandle“Big”data,i.e.,

Y

X

PCA ICADTWModel(M)Data(X)

ED,DTWCorrelation

AR,ARIMA,LDS

StatStreametc…

DFT,DWT,SVD,ICA

Automaticparameter-tuning

Non-linearphenomena

Complexevents Real-time

forecastingX

Page 10: Sakurai Lab.

• Time-series data analysis– Indexing and fast searching– Sequence matching– Clustering– etc.

• New research directions1. Automatic mining2. Non-linear modeling3. Large-scale tensor analysis

Sakurai Lab. © 2016 Yasushi Sakurai 10

X ≈ + ...

Time

New research directions

NOmagic#

Page 11: Sakurai Lab.

(R1) Automatic mining

No magic numbers! ... because,

Big data mining: -> we cannot afford human intervention!!

Sakurai Lab. © 2016 Yasushi Sakurai 11

Manual- sensitive to the parameter tuning- long tuning steps (hours, days, …)

Automatic (no magic numbers)- no expert tuning required

Page 12: Sakurai Lab.

(R1) Automatic mining

Sakurai Lab. © 2016 Yasushi Sakurai 12

“Automatic” mining algorithm

Find

Given

beaks

wings

tailfeathers

clapsbeaks

wings

tailfeathers

claps

Find: compact description of data X

Chicken dance

Page 13: Sakurai Lab.

Sakurai Lab. 13

(R1) Automatic mining

Idea (1): Multi-level chain model –HMM-based probabilistic model–with “across-regime” transitions

© 2016 Yasushi Sakurai

Model

Sequences

beaks

wings

claps

Regimes

“Automatic” mining algorithm

Page 14: Sakurai Lab.

(R1) Automatic mining

Idea(2): Minimize encoding cost!

Sakurai Lab. © 2016 Yasushi Sakurai 14

Good compression

Good description

1 2 3 4 5 6 7 8 9 10

CostMCostCCostT

(# of r, m)

CostM(M) + Costc(X|M)

Model cost Coding cost

min ( )

“Automatic” mining algorithm

Page 15: Sakurai Lab.

(R1) Automatic mining

Sakurai Lab. 15© 2016 Yasushi Sakurai

AutoPlait (NO magic numbers)

“Automatic” mining algorithm

Page 16: Sakurai Lab.

(R2) Non-linear (gray-box) modeling

• Gray-box mining– If we know the equations

• Non-linear (differential) equations– Epidemic– Biology– Physics, Economics, etc.

• Modeling non-linear social phenomena– Non-linear analysis for Web online activities(e.g., information diffusion, interaction)

Sakurai Lab. © 2016 Yasushi Sakurai 16

Gray-box

Page 17: Sakurai Lab.

(R2) Non-linear (gray-box) modeling

Sakurai Lab. © 2016 Yasushi Sakurai 17

News spread in social mediaNumber of mentions in blogs/Twitter

20 40 60 80 100 120 140 1600

100

200

Time (hours)

# of m

entio

ns

20 40 60 80 100 120 140 1600

50

100

Time (hours)

# of m

entio

ns

Breaking news Decay

(power law)

News spread

(perhour,1week)

Page 18: Sakurai Lab.

(R2) Non-linear (gray-box) modeling

Sakurai Lab. © 2016 Yasushi Sakurai 18

News spread in social media

Timen=0 Timen=nb Timen=nb+1

β

1. Un-informed bloggers

2. External shock

3. Infection (word-of-mouth)

Page 19: Sakurai Lab.

(R2) Non-linear (gray-box) modeling

Sakurai Lab. © 2016 Yasushi Sakurai 19

Timen=0 Timen=nb Timen=nb+1

β

1. Un-informed bloggers

2. External shock

3. Infection (word-of-mouth)

News spread in social media

Decayfunction: f (n) = β *n−1.5

f (n)

n

Linear f (n)

n

Log

Powerlaw

Page 20: Sakurai Lab.

Memes in social media/blogs

(R2) Non-linear (gray-box) modeling

Sakurai Lab. © 2016 Yasushi Sakurai 20

News spread in social media

20 40 60 80 100 1200

50

100

Time

Val

ue

OriginalSpikeM

20 40 60 80 100 1200

50

100

Time

Val

ue

OriginalSpikeM

20 40 60 80 100 1200

50

100

Time

Val

ue

OriginalSpikeM

20 40 60 80 100 1200

50

100

Time

Val

ue

OriginalSpikeM

20 40 60 80 100 1200

50

100

Time

Val

ue

OriginalSpikeM

20 40 60 80 100 1200

50

100

TimeV

alue

OriginalSpikeM

Page 21: Sakurai Lab.

(R2) Non-linear (gray-box) modeling

KDD 2012 21

Forecast tail-part and rise-part!

Given (1) first spike,(2) release date of two sequel movies (3) access volume before the release date

(1)Firstspike (2)Releasedate (3)Twoweeksbeforerelease

Y. Matsubara et al.

Page 22: Sakurai Lab.

(R2) Non-linear (gray-box) modeling

Sakurai Lab. © 2016 Yasushi Sakurai 22

Android

Xbox

PlayStationWii

Online competitionInteractions

between keywords

Page 23: Sakurai Lab.

(R2) Non-linear (gray-box) modeling

Sakurai Lab. © 2016 Yasushi Sakurai 23

Online competition(Google Search volume)

Android

Xbox

PlayStation Wii

Page 24: Sakurai Lab.

(R2) Non-linear (gray-box) modeling

Sakurai Lab. © 2016 Yasushi Sakurai 24

The Web as a Jungle !

Ecosystem in the

Jungle

Ecosystem on the

Web

Page 25: Sakurai Lab.

(R2) Non-linear (gray-box) modeling

Sakurai Lab. © 2016 Yasushi Sakurai 25

The Web as a Jungle !

Ecosystem in the

Jungle

Ecosystem on the

Web--- Non-Linear equations ---

Pi (t +1) = Pi (t) 1+ ri 1−aijPi (t)j=1

d∑

Ki

#

$

%%%

&

'

(((

)

*

+++

,

-

.

.

.

Page 26: Sakurai Lab.

Ecosystem on the Web

Sakurai Lab. © 2016 Yasushi Sakurai 26

Online activities

Userresources

Biological species

Foodresources

PopularityPopulation

Annual events(e.g., Xmas)

Climate /season

Jungle Web

Page 27: Sakurai Lab.

(R2) Non-linear (gray-box) modeling

Sakurai Lab. © 2016 Yasushi Sakurai 27

Android

Xbox

PlayStationWii

The Web as a Jungle !Interactions

between keywords

Page 28: Sakurai Lab.

(R2) Non-linear (gray-box) modeling

Sakurai Lab. © 2016 Yasushi Sakurai 28

Android

Xbox

PlayStationWii

The Web as a Jungle !

Seasonality

Page 29: Sakurai Lab.

(R3) Large-scale tensor analysis

Sakurai Lab. © 2016 Yasushi Sakurai 29

Time URL User

08-01-12:00 CNN.com Smith

08-02-15:00 YouTube.com Brown

08-02-19:00 CNET.com Smith

08-03-11:00 CNN.com Johnson

… … …

RepresentasMth ordertensor(M=3)

URL

user

Time

x

u

v

n

Elementx:#ofevents

e.g.,‘Smith’,‘CNN.com’,‘Aug1,10pm’;21times

• Time-stamped events– e.g., web clicks

X

Page 30: Sakurai Lab.

(R3) Large-scale tensor analysis

Sakurai Lab. © 2016 Yasushi Sakurai 30

Complex time-stamped events

Timestamp URL User Device2012-08-01-12:00 CNN.com Smith iphone2012-08-02-15:00 YouTube.com Brown iphone2012-08-02-19:00 CNET.com Smith mac2012-08-03-11:00 CNN.com Johnson ipad

… … … …2012-08-05-12:00 CNN.com Smith iphone2012-08-05-19:00 CNET.com Smith iphone

Forecast

{time,URL,userID,accessdevices,httpreferrer,…}

Page 31: Sakurai Lab.

(R3) Large-scale tensor analysis

Sakurai Lab. © 2016 Yasushi Sakurai 31

Complex time-stamped events

Object

Actor

Time

(business) (news) (media)

Object/URLActor/user Time

Webclicks

Page 32: Sakurai Lab.

(R3) Large-scale tensor analysis

Sakurai Lab. © 2016 Yasushi Sakurai 32

Complex time-stamped events

Object

Actor

Time

(business) (news) (media)

Object/URLActor/user Time

Webclickse.g.,businesstopicvectorsObject/URL

Money.comCNN.com

SmithJohnson

Actor/user

Time

Mon-Fri Sat-Sun

Higher value:Highly related

topic

Page 33: Sakurai Lab.

(R3) Large-scale tensor analysis

Sakurai Lab. © 2016 Yasushi Sakurai 33URLmatrix Usermatrix Timematrix

NoisyL SparseL

Complex time-stamped events(Originaldata)

Page 34: Sakurai Lab.

New research directions1. Automatic mining2. Non-linear (gray-box) modeling3. Large-scale tensor analysis

X ≈ + ...

NOmagic!

New challenge: MANT analysisMulti-Aspect Non-linear Time-series

Putalltogether

M� A� N� T�M� A� N� T�

Putting it all together 3h-TutorialsSIGMOD’15WWW’16

Sakurai Lab. © 2016 Yasushi Sakurai 34

Page 35: Sakurai Lab.

Sakurai Lab. © 2016 Yasushi Sakurai 35

w/Sakurai,Panhuis,Faloutsos

Local patternsP3

External shocksP4

Mistakes, errorsP5

P2

SeasonalityP1

Vaccination

Non-Linear tensor analysis

New challenge: MANT analysis

Page 36: Sakurai Lab.

New challenge: MANT analysis

Sakurai Lab. © 2016 Yasushi Sakurai 36

X

=

P1 P2 P3 P4 P5

B RN ME

FUNNEL

X

Given: Tensor X (disease x location x time)

Find: Compact description of X, “automatically”

Non-Linear tensor analysis

Page 37: Sakurai Lab.

New challenge: MANT analysis

Sakurai Lab. © 2016 Yasushi Sakurai 37

X

=

P1 P2 P3 P4 P5

B RN ME

FUNNEL

X

Given: Tensor X (disease x location x time)

Find: Compact description of X, “automatically”

Non-Linear tensor analysis

DiscontinuitiesSeasonality

Page 38: Sakurai Lab.

Sakurai Lab. © 2016 Yasushi Sakurai 38

Non-Linear mining & forecasting of local competition

New challenge: MANT analysis

Page 39: Sakurai Lab.

New challenge: MANT analysis

Time-stamped events: {activity, location, time}

© 2016 Yasushi Sakurai 39Sakurai Lab.

m

d

n

X

Activity

Time(weekly)

Page 40: Sakurai Lab.

New challenge: MANT analysis

Given: Tensor X(activity x location x time)

Find: Compact description of X

© 2016 Yasushi Sakurai 40Sakurai Lab.

X

X=

CompCube

B C S D

Page 41: Sakurai Lab.

New challenge: MANT analysis

Given: Tensor X(activity x location x time)

Find: Compact description of X

© 2016 Yasushi Sakurai 41Sakurai Lab.

X

X=

CompCube

B C S D

Basics Competition Seasonality Deltas

Page 42: Sakurai Lab.

Modeling power of CompCube

Google search for

© 2016 Yasushi Sakurai 42Sakurai Lab.

Kindle, Nexus

US

CA

JPCNBRAU

ZA

IT

Weak/Average/Strong

Local Competition strength

Page 43: Sakurai Lab.

Modeling power of CompCube

Google search for

© 2016 Yasushi Sakurai 43Sakurai Lab.

Kindle, Nexus

US

CA

JPCNBRAU

ZA

IT

Local Competition strength

Weak/Average/Strong

Page 44: Sakurai Lab.

Modeling power of CompCube

Local seasonality for

© 2016 Yasushi Sakurai 44Sakurai Lab.

iPod

Component #1 Component #2

Page 45: Sakurai Lab.

Modeling power of CompCube

Local seasonality for

© 2016 Yasushi Sakurai 45Sakurai Lab.

iPod

Component #1 Component #2

XmasDec.

ChineseNew Year

Feb.

Page 46: Sakurai Lab.

Modeling power of CompCube

Fitting for

© 2016 Yasushi Sakurai 46Sakurai Lab.

News resources

2004 2006 2008 2010 2012 20140

0.2

0.4

0.6

0.8

1

Time (weekly)

# of

clic

ks @

tim

e

Fitting result − RMSE=0.056

1CNN 2Fox_News 3TIME 4Google_News 5BBC 6Yahoo_News 7AP 8Huffington_Post 9MSN_News10Al_Jazeera

Detected!

Wikipedia

Page 47: Sakurai Lab.

Modeling power of CompCube

Fitting for

© 2016 Yasushi Sakurai 47Sakurai Lab.

US electionNov. 2008

Local attention toUS election

Weak/Strong

Wikipedia

News resources

Page 48: Sakurai Lab.

Sakurai Lab. © 2016 Yasushi Sakurai 48

New challenge: MANT analysis

3.Forecastfutureactivities

?Future

Time

Non-Linear mining & forecasting of local competition

Page 49: Sakurai Lab.

X ≈ + ...

NOmagic!New research directions

1. Automatic mining2. Non-linear (gray-box) modeling3. Large-scale tensor analysis

New challenge: MANT analysisMulti-Aspect Non-linear Time-series

Putalltogether

M� A� N� T�M� A� N� T�

Putting it all together 3h-TutorialsSIGMOD’15WWW’16

Sakurai Lab. © 2016 Yasushi Sakurai 49

Page 50: Sakurai Lab.

Industrial contribution

• Bridge structural health monitoring– Structural monitoring using vibration/shock sensors– Keep track of lag correlations for sensor data

streams

BRAID@SIGMODʼ05

Sakurai Lab. © 2016 Yasushi Sakurai 50

Page 51: Sakurai Lab.

Industrial contribution

• Bridge structural health monitoring– Goal: real-time anomaly detection for disaster

prevention– Several thousands readings (per sec) from

several hundreds sensor nodes

Structural health monitoringVibration/shock sensor

• Uses BRAID• Metropolitan Expressway

(Tokyo, Japan)

BRAID@SIGMOD’05

Sakurai Lab. © 2016 Yasushi Sakurai 51

Page 52: Sakurai Lab.

Industrial contribution

• Bridge structural health monitoring with BRAID

Metropolitan Expressway(Tokyo, Japan)

Tokyo Gate Bridge(Tokyo, Japan)

Can Tho Bridge (Vietnam)

BRAID@SIGMOD’05

Sakurai Lab. © 2016 Yasushi Sakurai 52

Page 53: Sakurai Lab.

Industrial contribution

– Automobile sensor data{ trip ID, zone ID, object}

TrailMarker

Sakurai Lab. © 2016 Yasushi Sakurai 53

velocity

Object(sensors)

Longitudinal/lateral acceleration

w

d

n

Trip

Zone

Original tensor X

WebDBFʼ15Best paper

Page 54: Sakurai Lab.

Industrial contribution

– Automobile sensor data{ trip ID, zone ID, object}

TrailMarker

Sakurai Lab. © 2016 Yasushi Sakurai 54

WebDBFʼ15Best paper

Trip 1

w

d

n

Trip

Zone

Original tensor X

Trip 2

Trip 3

zone

zone

zone

Page 55: Sakurai Lab.

Industrial contribution

– Automobile sensor data{ trip ID, zone ID, object}

TrailMarker

Sakurai Lab. © 2016 Yasushi Sakurai 55

WebDBFʼ15Best paper

Page 56: Sakurai Lab.

Thank you

M� A� N� T�M� A� N� T�

X ≈ + ...

NOmagic!

Code/Data/Tutorial: http://www.cs.kumamoto-u.ac.jp/~yasushi/

Sakurai Lab.

Sakurai Lab. © 2016 Yasushi Sakurai 56