Sakurai Lab.
Transcript of Sakurai Lab.
MiningandForecastingofBigTime-seriesData
Yasushi Sakurai
Sakurai Lab. © 2016 Yasushi Sakurai 1
Sakurai Lab.@ Kumamoto University
BigData
Big time-series data
Sakurai Lab. © 2016 Yasushi Sakurai 2
climate
EconomyWeb
Epidemic
Social/naturalphenomena
IoT,sensors
BigData
Big time-series data
Sakurai Lab. © 2016 Yasushi Sakurai 3
climate
EconomyWeb
Epidemic
Social/naturalphenomena
Environmental sensors
- Temperature,- Air pressure, etc.Application
- Agriculture
IoT,sensors
BigData
Big time-series data
Sakurai Lab. © 2016 Yasushi Sakurai 4
climate
EconomyWeb
Epidemic
Social/naturalphenomena
IoT,sensors
Online user activities
- Web clicks, - Online reviewsApplication
- Market analysis
BigData
Big time-series data
Sakurai Lab. © 2016 Yasushi Sakurai 5
climate
EconomyWeb
Epidemic
Social/naturalphenomena
IoT,sensors
Sensor streams
- Vibration,AccelerationApplication
- Self-driving car- Structural health
monitoring
Motivation
• Given: Big time-series data– Sensor data– Web, online activities– Medical data
• Goal: Real-time forecasting• At-work: – Online marketing– Sensor monitoring– Forecasting future epidemics
Sakurai Lab. © 2016 Yasushi Sakurai 6
New challenges
Sakurai Lab. © 2016 Yasushi Sakurai 7
Indexing,Similaritysearch
Featureextraction
Datastream
Linear-modeling
Time-seriesdataanalysis
PCA ICAModel(M)Data(X)
ED,DTW,SAM
AR,LDS,ARIMA
Correlationmonitoring
etc.
DFT,DWT,SVD,ICA
Y
X
DTW
ED,DTW,SAM
AR,LDS,ARIMA
Correlationmonitoring
etc.
DFT,DWT,SVD,ICA
New challenges
Sakurai Lab. © 2016 Yasushi Sakurai 8
Indexing,Similaritysearch
Featureextraction
Datastream
Linear-modeling
Theycannothandle“Big”data,i.e.,
Y
X
PCA ICADTWModel(M)Data(X)
Automaticparameter-tuning
Non-linearphenomena
Complexevents Real-time
forecastingX
New challenges
Sakurai Lab. © 2016 Yasushi Sakurai 9
Indexing,Similaritysearch
Featureextraction
Datastream
Linear-modeling
Theycannothandle“Big”data,i.e.,
Y
X
PCA ICADTWModel(M)Data(X)
ED,DTWCorrelation
AR,ARIMA,LDS
StatStreametc…
DFT,DWT,SVD,ICA
Automaticparameter-tuning
Non-linearphenomena
Complexevents Real-time
forecastingX
• Time-series data analysis– Indexing and fast searching– Sequence matching– Clustering– etc.
• New research directions1. Automatic mining2. Non-linear modeling3. Large-scale tensor analysis
Sakurai Lab. © 2016 Yasushi Sakurai 10
X ≈ + ...
Time
New research directions
NOmagic#
(R1) Automatic mining
No magic numbers! ... because,
Big data mining: -> we cannot afford human intervention!!
Sakurai Lab. © 2016 Yasushi Sakurai 11
Manual- sensitive to the parameter tuning- long tuning steps (hours, days, …)
Automatic (no magic numbers)- no expert tuning required
(R1) Automatic mining
Sakurai Lab. © 2016 Yasushi Sakurai 12
“Automatic” mining algorithm
Find
Given
beaks
wings
tailfeathers
clapsbeaks
wings
tailfeathers
claps
Find: compact description of data X
Chicken dance
Sakurai Lab. 13
(R1) Automatic mining
Idea (1): Multi-level chain model –HMM-based probabilistic model–with “across-regime” transitions
© 2016 Yasushi Sakurai
Model
Sequences
beaks
wings
claps
Regimes
“Automatic” mining algorithm
(R1) Automatic mining
Idea(2): Minimize encoding cost!
Sakurai Lab. © 2016 Yasushi Sakurai 14
Good compression
Good description
1 2 3 4 5 6 7 8 9 10
CostMCostCCostT
(# of r, m)
CostM(M) + Costc(X|M)
Model cost Coding cost
min ( )
“Automatic” mining algorithm
(R1) Automatic mining
Sakurai Lab. 15© 2016 Yasushi Sakurai
AutoPlait (NO magic numbers)
“Automatic” mining algorithm
(R2) Non-linear (gray-box) modeling
• Gray-box mining– If we know the equations
• Non-linear (differential) equations– Epidemic– Biology– Physics, Economics, etc.
• Modeling non-linear social phenomena– Non-linear analysis for Web online activities(e.g., information diffusion, interaction)
Sakurai Lab. © 2016 Yasushi Sakurai 16
Gray-box
(R2) Non-linear (gray-box) modeling
Sakurai Lab. © 2016 Yasushi Sakurai 17
News spread in social mediaNumber of mentions in blogs/Twitter
20 40 60 80 100 120 140 1600
100
200
Time (hours)
# of m
entio
ns
20 40 60 80 100 120 140 1600
50
100
Time (hours)
# of m
entio
ns
Breaking news Decay
(power law)
News spread
(perhour,1week)
(R2) Non-linear (gray-box) modeling
Sakurai Lab. © 2016 Yasushi Sakurai 18
News spread in social media
Timen=0 Timen=nb Timen=nb+1
β
1. Un-informed bloggers
2. External shock
3. Infection (word-of-mouth)
(R2) Non-linear (gray-box) modeling
Sakurai Lab. © 2016 Yasushi Sakurai 19
Timen=0 Timen=nb Timen=nb+1
β
1. Un-informed bloggers
2. External shock
3. Infection (word-of-mouth)
News spread in social media
Decayfunction: f (n) = β *n−1.5
f (n)
n
Linear f (n)
n
Log
Powerlaw
Memes in social media/blogs
(R2) Non-linear (gray-box) modeling
Sakurai Lab. © 2016 Yasushi Sakurai 20
News spread in social media
20 40 60 80 100 1200
50
100
Time
Val
ue
OriginalSpikeM
20 40 60 80 100 1200
50
100
Time
Val
ue
OriginalSpikeM
20 40 60 80 100 1200
50
100
Time
Val
ue
OriginalSpikeM
20 40 60 80 100 1200
50
100
Time
Val
ue
OriginalSpikeM
20 40 60 80 100 1200
50
100
Time
Val
ue
OriginalSpikeM
20 40 60 80 100 1200
50
100
TimeV
alue
OriginalSpikeM
(R2) Non-linear (gray-box) modeling
KDD 2012 21
Forecast tail-part and rise-part!
Given (1) first spike,(2) release date of two sequel movies (3) access volume before the release date
(1)Firstspike (2)Releasedate (3)Twoweeksbeforerelease
Y. Matsubara et al.
(R2) Non-linear (gray-box) modeling
Sakurai Lab. © 2016 Yasushi Sakurai 22
Android
Xbox
PlayStationWii
Online competitionInteractions
between keywords
(R2) Non-linear (gray-box) modeling
Sakurai Lab. © 2016 Yasushi Sakurai 23
Online competition(Google Search volume)
Android
Xbox
PlayStation Wii
(R2) Non-linear (gray-box) modeling
Sakurai Lab. © 2016 Yasushi Sakurai 24
The Web as a Jungle !
Ecosystem in the
Jungle
Ecosystem on the
Web
(R2) Non-linear (gray-box) modeling
Sakurai Lab. © 2016 Yasushi Sakurai 25
The Web as a Jungle !
Ecosystem in the
Jungle
Ecosystem on the
Web--- Non-Linear equations ---
Pi (t +1) = Pi (t) 1+ ri 1−aijPi (t)j=1
d∑
Ki
#
$
%%%
&
'
(((
)
*
+++
,
-
.
.
.
Ecosystem on the Web
Sakurai Lab. © 2016 Yasushi Sakurai 26
Online activities
Userresources
Biological species
Foodresources
PopularityPopulation
Annual events(e.g., Xmas)
Climate /season
Jungle Web
(R2) Non-linear (gray-box) modeling
Sakurai Lab. © 2016 Yasushi Sakurai 27
Android
Xbox
PlayStationWii
The Web as a Jungle !Interactions
between keywords
(R2) Non-linear (gray-box) modeling
Sakurai Lab. © 2016 Yasushi Sakurai 28
Android
Xbox
PlayStationWii
The Web as a Jungle !
Seasonality
(R3) Large-scale tensor analysis
Sakurai Lab. © 2016 Yasushi Sakurai 29
Time URL User
08-01-12:00 CNN.com Smith
08-02-15:00 YouTube.com Brown
08-02-19:00 CNET.com Smith
08-03-11:00 CNN.com Johnson
… … …
RepresentasMth ordertensor(M=3)
URL
user
Time
x
u
v
n
Elementx:#ofevents
e.g.,‘Smith’,‘CNN.com’,‘Aug1,10pm’;21times
• Time-stamped events– e.g., web clicks
X
(R3) Large-scale tensor analysis
Sakurai Lab. © 2016 Yasushi Sakurai 30
Complex time-stamped events
Timestamp URL User Device2012-08-01-12:00 CNN.com Smith iphone2012-08-02-15:00 YouTube.com Brown iphone2012-08-02-19:00 CNET.com Smith mac2012-08-03-11:00 CNN.com Johnson ipad
… … … …2012-08-05-12:00 CNN.com Smith iphone2012-08-05-19:00 CNET.com Smith iphone
Forecast
{time,URL,userID,accessdevices,httpreferrer,…}
(R3) Large-scale tensor analysis
Sakurai Lab. © 2016 Yasushi Sakurai 31
Complex time-stamped events
Object
Actor
Time
(business) (news) (media)
…
Object/URLActor/user Time
Webclicks
(R3) Large-scale tensor analysis
Sakurai Lab. © 2016 Yasushi Sakurai 32
Complex time-stamped events
Object
Actor
Time
(business) (news) (media)
…
Object/URLActor/user Time
Webclickse.g.,businesstopicvectorsObject/URL
Money.comCNN.com
SmithJohnson
Actor/user
Time
Mon-Fri Sat-Sun
Higher value:Highly related
topic
(R3) Large-scale tensor analysis
Sakurai Lab. © 2016 Yasushi Sakurai 33URLmatrix Usermatrix Timematrix
NoisyL SparseL
Complex time-stamped events(Originaldata)
New research directions1. Automatic mining2. Non-linear (gray-box) modeling3. Large-scale tensor analysis
X ≈ + ...
NOmagic!
New challenge: MANT analysisMulti-Aspect Non-linear Time-series
Putalltogether
M� A� N� T�M� A� N� T�
Putting it all together 3h-TutorialsSIGMOD’15WWW’16
Sakurai Lab. © 2016 Yasushi Sakurai 34
Sakurai Lab. © 2016 Yasushi Sakurai 35
w/Sakurai,Panhuis,Faloutsos
Local patternsP3
External shocksP4
Mistakes, errorsP5
P2
SeasonalityP1
Vaccination
Non-Linear tensor analysis
New challenge: MANT analysis
New challenge: MANT analysis
Sakurai Lab. © 2016 Yasushi Sakurai 36
X
=
P1 P2 P3 P4 P5
B RN ME
FUNNEL
X
Given: Tensor X (disease x location x time)
Find: Compact description of X, “automatically”
Non-Linear tensor analysis
New challenge: MANT analysis
Sakurai Lab. © 2016 Yasushi Sakurai 37
X
=
P1 P2 P3 P4 P5
B RN ME
FUNNEL
X
Given: Tensor X (disease x location x time)
Find: Compact description of X, “automatically”
Non-Linear tensor analysis
DiscontinuitiesSeasonality
Sakurai Lab. © 2016 Yasushi Sakurai 38
Non-Linear mining & forecasting of local competition
New challenge: MANT analysis
New challenge: MANT analysis
Time-stamped events: {activity, location, time}
© 2016 Yasushi Sakurai 39Sakurai Lab.
m
d
n
X
Activity
Time(weekly)
New challenge: MANT analysis
Given: Tensor X(activity x location x time)
Find: Compact description of X
© 2016 Yasushi Sakurai 40Sakurai Lab.
X
X=
CompCube
B C S D
New challenge: MANT analysis
Given: Tensor X(activity x location x time)
Find: Compact description of X
© 2016 Yasushi Sakurai 41Sakurai Lab.
X
X=
CompCube
B C S D
Basics Competition Seasonality Deltas
Modeling power of CompCube
Google search for
© 2016 Yasushi Sakurai 42Sakurai Lab.
Kindle, Nexus
US
CA
JPCNBRAU
ZA
IT
Weak/Average/Strong
Local Competition strength
Modeling power of CompCube
Google search for
© 2016 Yasushi Sakurai 43Sakurai Lab.
Kindle, Nexus
US
CA
JPCNBRAU
ZA
IT
Local Competition strength
Weak/Average/Strong
Modeling power of CompCube
Local seasonality for
© 2016 Yasushi Sakurai 44Sakurai Lab.
iPod
Component #1 Component #2
Modeling power of CompCube
Local seasonality for
© 2016 Yasushi Sakurai 45Sakurai Lab.
iPod
Component #1 Component #2
XmasDec.
ChineseNew Year
Feb.
Modeling power of CompCube
Fitting for
© 2016 Yasushi Sakurai 46Sakurai Lab.
News resources
2004 2006 2008 2010 2012 20140
0.2
0.4
0.6
0.8
1
Time (weekly)
# of
clic
ks @
tim
e
Fitting result − RMSE=0.056
1CNN 2Fox_News 3TIME 4Google_News 5BBC 6Yahoo_News 7AP 8Huffington_Post 9MSN_News10Al_Jazeera
Detected!
Wikipedia
Modeling power of CompCube
Fitting for
© 2016 Yasushi Sakurai 47Sakurai Lab.
US electionNov. 2008
Local attention toUS election
Weak/Strong
Wikipedia
News resources
Sakurai Lab. © 2016 Yasushi Sakurai 48
New challenge: MANT analysis
3.Forecastfutureactivities
?Future
Time
Non-Linear mining & forecasting of local competition
X ≈ + ...
NOmagic!New research directions
1. Automatic mining2. Non-linear (gray-box) modeling3. Large-scale tensor analysis
New challenge: MANT analysisMulti-Aspect Non-linear Time-series
Putalltogether
M� A� N� T�M� A� N� T�
Putting it all together 3h-TutorialsSIGMOD’15WWW’16
Sakurai Lab. © 2016 Yasushi Sakurai 49
Industrial contribution
• Bridge structural health monitoring– Structural monitoring using vibration/shock sensors– Keep track of lag correlations for sensor data
streams
BRAID@SIGMODʼ05
Sakurai Lab. © 2016 Yasushi Sakurai 50
Industrial contribution
• Bridge structural health monitoring– Goal: real-time anomaly detection for disaster
prevention– Several thousands readings (per sec) from
several hundreds sensor nodes
Structural health monitoringVibration/shock sensor
• Uses BRAID• Metropolitan Expressway
(Tokyo, Japan)
BRAID@SIGMOD’05
Sakurai Lab. © 2016 Yasushi Sakurai 51
Industrial contribution
• Bridge structural health monitoring with BRAID
Metropolitan Expressway(Tokyo, Japan)
Tokyo Gate Bridge(Tokyo, Japan)
Can Tho Bridge (Vietnam)
BRAID@SIGMOD’05
Sakurai Lab. © 2016 Yasushi Sakurai 52
Industrial contribution
– Automobile sensor data{ trip ID, zone ID, object}
TrailMarker
Sakurai Lab. © 2016 Yasushi Sakurai 53
velocity
Object(sensors)
Longitudinal/lateral acceleration
w
d
n
Trip
Zone
Original tensor X
WebDBFʼ15Best paper
Industrial contribution
– Automobile sensor data{ trip ID, zone ID, object}
TrailMarker
Sakurai Lab. © 2016 Yasushi Sakurai 54
WebDBFʼ15Best paper
Trip 1
w
d
n
Trip
Zone
Original tensor X
Trip 2
Trip 3
zone
zone
zone
Industrial contribution
– Automobile sensor data{ trip ID, zone ID, object}
TrailMarker
Sakurai Lab. © 2016 Yasushi Sakurai 55
WebDBFʼ15Best paper
Thank you
M� A� N� T�M� A� N� T�
X ≈ + ...
NOmagic!
Code/Data/Tutorial: http://www.cs.kumamoto-u.ac.jp/~yasushi/
Sakurai Lab.
Sakurai Lab. © 2016 Yasushi Sakurai 56