Temporal Skeletonization on Sequential Data...3 Hong Kong University of Science and Technology 1/22....
Transcript of Temporal Skeletonization on Sequential Data...3 Hong Kong University of Science and Technology 1/22....
Temporal Skeletonization on Sequential Data
Temporal Skeletonization on Sequential DataPatterns, Categorization, and Visualization
Chuanren LiuRutgers University
Chuanren Liu1, Kai Zhang2, Hui Xiong1, Geoff Jiang2, Qiang Yang3
1 Rutgers University2 NEC Laboratories America, Inc.
3 Hong Kong University of Science and Technology
1 / 22
Temporal Skeletonization on Sequential Data
Background
Sequential Pattern Analysis
Diversified applications in dynamic business environments.
Challenge: determine the right granularity for sequentialpattern mining with curse of cardinality.
Cardinality of sequential data: number of symbols.
2 / 22
Temporal Skeletonization on Sequential Data
Introduction
Business-to-Business Purchase Pattern Analysis
Challenge: symbolic items with unknown stages.
3 / 22
Temporal Skeletonization on Sequential Data
Introduction
Motivation: Curse of Cardinality
0.1 0.2 0.3 0.4 0.5 0.6 0.7
10−1
100
101
102
103
104
Support
Tim
e(S
econ
ds)
GSPSPADE
PrefixSpanSPAMOurs
Complexity
p: patternc : cardinality
`: pattern length
Pr(p) = c−`
Rareness
m,h,j,f,d,a,i,k,b
j,l,m,a,n,f,b,o,g
e,h,l,c,f,n,i,b,o
h,l,e,c,a,f,k,o,i
Granularity
m,k,j,f,d,a,i,h,b
j,l,m,a,n,f,b,o,g
e,h,l,c,f,n,i,b,o
h,l,e,c,m,f,k,o,i
Noise
4 / 22
Temporal Skeletonization on Sequential Data
Introduction
Existing Approaches
Figure : Taxonomy of symbols.
s x y z
s1 * * *s2 * * *s3 * * *s4 * * *s5 * * *
Table : Features of symbols.
Disadvantages:
� It is difficult to obtain knowledge of items.� It is difficult to define distances and cluster the items.� Performed irrespective of the temporal content.� Cannot identify relevant temporal structure.
5 / 22
Temporal Skeletonization on Sequential Data
Introduction
Our Approach: Temporal Skeletonization
Proactively reduce cardinality by temporal clusters:
Summarize the temporal correlaion in graph.
Embed the graph in a low dimensional space.
Identify temporal clusters.
Re-encode sequences with temporal clusters.
−30 −20 −10 0 10 20 30−30
−25
−20
−15
−10
−5
0
5
10
15
20
x
y
(a) Random sequences
−40 −30 −20 −10 0 10 20 30 40−25
−20
−15
−10
−5
0
5
10
15
20
x
y
(b) Seqs with temporal clusters
−25 −20 −15 −10 −5 0 5 10 15−40
−30
−20
−10
0
10
20
x
y
(c) Customer event sequences
Figure : The embedding of items in different types of sequence data.
6 / 22
Temporal Skeletonization on Sequential Data
Temporal Skeletonization
Graph Model for Temporal Skeletonizaion I
Notations:
Symbols S = {e1, e2, · · · , e|S|}.Sequence Sn = (sn1 , s
n2 , · · · , snTn
).
Sequences {Sn|n = 1, 2, · · · ,N}.Objective: a coding scheme y = f (e) ∈ {1, 2, · · · ,K} by
min1
N
N∑n=1
∑1≤p,q≤Tn
|p−q|≤r
(f (snp )− f (snq )
)2.
Relax the integer constraints to:
yi = f (ei ) ∈ R.
7 / 22
Temporal Skeletonization on Sequential Data
Temporal Skeletonization
Graph Model for Temporal Skeletonizaion II
Define the temporal graph W as
Wij =1
N
N∑n=1
∑1≤p,q≤Tn
|p−q|≤r
[snp = ei ∧ snq = ej ]
Then our objective is to minimize∑i ,j
Wij(yi − yj)2.
This is an standard graph-based optimization problem (withconstraints avoiding trivial solutions).
8 / 22
Temporal Skeletonization on Sequential Data
Temporal Skeletonization
Embedding and Visualization
Compute eigenvectors of graph Laplacian as embedding.
Cluster the items in the embedding space.
Transform raw sequences to sequences of temporal clusters.
−30 −20 −10 0 10 20 30−30
−25
−20
−15
−10
−5
0
5
10
15
20
x
y
(a) Random sequences
−40 −30 −20 −10 0 10 20 30 40−25
−20
−15
−10
−5
0
5
10
15
20
x
y
(b) Seqs with temporal clusters
−25 −20 −15 −10 −5 0 5 10 15−40
−30
−20
−10
0
10
20
x
y
(c) Customer event sequences
Figure : The embedding of symbols in different types of sequence data.
9 / 22
Temporal Skeletonization on Sequential Data
Temporal Skeletonization
Post-Temporal-Smoothing
Objective: Further reduce temporal variations in individualsequences.
Solution: Gaussian mixture models with fused lasso.
Notations: Given soft stages Y ∈ RT×K , compute a smootherversion X n:
Optimization: max∑T
t=1
∑Kk=1 XtkYtk , subject to
1
T − 1
T−1∑t=1
‖Xt − Xt+1‖1 ≤ λ,
K∑k=1
Xtk = 1,Xtk ≥ 0
10 / 22
Temporal Skeletonization on Sequential Data
Temporal Skeletonization
Applications
1 Sequence visualization
Visualize symbol with coordinates.Visualize sequence as a trajectory.
2 Sequential pattern mining
Temporal cluster as new granularity.Curse of cardinality can be relieved.
3 Sequence clustering
Noises removed.More meaningful features.
11 / 22
Temporal Skeletonization on Sequential Data
Simulated Study
Data
5000 sequences with:
Stages A,B,C ,D,E
Patterns (A→ B → C → D), (B → E → C )
Symbols 25 for each stage.
Simulation process:
Determine the stage to sample from based on the pattern.
Sample d symbols from the stage.
d ∼ (1− p)pd−1, p =14
15,E[d ] = 15.
12 / 22
Temporal Skeletonization on Sequential Data
Simulated Study
Baselines I
Using the temporal clusters (stages) identified via our approach,the mining process succeeds quickly in less than 1 second, and canrecover exactly the two ground truth stage-wise patterns (whensupport is less than or equal to 0.5).
0.1 0.2 0.3 0.4 0.5 0.6 0.7
10−1
100
101
102
103
104
Support
Tim
e(S
econ
ds)
GSPSPADE
PrefixSpanSPAMOurs
0.1 0.2 0.3 0.4 0.5 0.6 0.7
100
101
102
103
104
105
Support
Pat
tern
s
OthersOurs
Figure : FSM algorithms on the simulated data.
13 / 22
Temporal Skeletonization on Sequential Data
Simulated Study
Baselines II
HMMC iterations:
1 For all sequences in cluster Ck , estimate a transition matrixφk and a emission matrix θk ;
2 Reallocate each sequence Sn to the cluster Ck on whosetransition and emission matrices it has the highest probabilityof being produced, i.e., k = arg maxk Pr(Sn|φk , θk).
Task Sequence clustering Stage recovery
Method HMM Ours HMM Ours
Precision 0.997 1 0.488 1Recall 0.997 1 0.448 1
Table : Error of HMM and our method on two tasks.
14 / 22
Temporal Skeletonization on Sequential Data
Simulated Study
Our Results
We can perfectly recover the stages.
We can perfectly cluster the sequences.
We do not require prior knowledge on the data.
−15 −10 −5 0 5 10 15−10
−8
−6
−4
−2
0
2
4
6
DA
B C
E
(a) Five symbol clusters (b) Two sequence clusters
15 / 22
Temporal Skeletonization on Sequential Data
B2B Purchase Pattern Analysis
Data Description
Huge amount of customer event data:
88040 customers
5028 event symbols
248725 event records
16 / 22
Temporal Skeletonization on Sequential Data
B2B Purchase Pattern Analysis
Embedding Results and Buying Stages
−30 −20 −10 0 10 20 30 40
−30
−20
−10
0
10
20
30
40
50
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
C11
C12
C13
C Top keywords SizeC1 Official Website 12C2 Corporate Event, Direct Mail 20C3 Trial Product Download 45C4 Conference 27C5 Unsubscribe 38C6 Webinar 101C7 Trial Product Download 70C8 Tradeshow 37C9 Corporate Event, Direct Mail 65C10 Web Marketing Ads 13C11 Webinar 21C12 Webinar 42C13 Search Engine 12
17 / 22
Temporal Skeletonization on Sequential Data
B2B Purchase Pattern Analysis
Sequential Patterns in B2B customer event data
0 2 4 6 8
·10−2
10−1
100
101
102
103
Support
Tim
e(S
econ
ds)
GSPSPADE
PrefixSpanSPAMOurs
0 2 4 6 8
·10−2
101
102
103
104
Support
Pat
tern
s
OthersOurs
Figure : Sequential patterns in B2B customer event data.
18 / 22
Temporal Skeletonization on Sequential Data
B2B Purchase Pattern Analysis
Critical Buying Paths
−30 −20 −10 0 10 20 30 40−40
−30
−20
−10
0
10
20
30
40
50
P1
P2
P3
P4
P5
Unsubscribe
Tradeshow
Conference
Search
Ads
Website
Webinar
Class P Path/Keyword Size
Successful
P1
C10 → C1 → C7 → C12 → C8 933Ads→Website→Download→Webinar→Tradeshow
P2
C13 → C3 → C11 → C12 → C8 1110Search→Download→Webinar→Webinar→Tradeshow
P3
C6 → C11 → C12 → C8 702Webinar→Webinar→Webinar→Tradeshow
Unsuccessful
P4
C2 → C9 → C5 423Mail→Corporate Event→Unsubscribe
P5C11 → C4 → C5 333Webinar→Conference→Unsubscribe
19 / 22
Temporal Skeletonization on Sequential Data
B2B Purchase Pattern Analysis
Comparison with baselines
H1
H2
H3
H4
H5
State Top keywordsH1 Seminar, Official Website, Trial Download
H2 Seminar, Corporate Event, Tradeshow
H3 Trial Download, Seminar, Corporate Event
H4 Seminar, Conference, Corporate Event
H5 Seminar, Unsubscribe, Corporate Event
20 / 22
Temporal Skeletonization on Sequential Data
Summary
Temporal skeletonization
A new approach to address curse of cardinality:
Translate rich temporal content into topoly space.Explore, quantify, and visualize temporal structures.
Applied on B2B customer event data:
Discover ciritcal purchasing patterns.Identify dynamic buying stages of customers.Improve the marketing practice.
21 / 22
Temporal Skeletonization on Sequential Data
Summary
Thanks!
22 / 22