MOTIVATION
Event Sequences
Research and Technology Center North America | CR/RTC-HMI1 | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
3
breakfast start work
Use Case: Human Activities Analysis
wake up
Event Sequences
Research and Technology Center North America | CR/RTC-HMI1 | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
3
browse products checkout
Use Case: Website Click Streams Analysis
log in
Understand customer behaviorAdjust UI design & improve customer experience
Event Sequences
4
Use Case: Car Faults Analysis
A
08-20 10:00Car battery low
B
08-21 12:30 GPS inoperative
Research and Technology Center North America | CR/RTC-HMI1 | 8/25/2017
C
08-22 12:30 Short circuit
© 2017 Robert Bosch LLC and affiliates. All rights reserved.
Repair / maintenance
X t
�Car modules like ECUs (electronic control units) / sensors emits fault signals like DTCs (diagnostics trouble codes) during operation.
� Fault data is archived for most car brands.
Event Sequences
5
Use Case: Car Faults Analysis
……
…… t
A
A
A
B
B
B
C
C
C
E E
D
Research and Technology Center North America | CR/RTC-HMI1 | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
�What are the typical development paths of faults? (Identify sequential patterns )
�Do cars matched to the same pattern come from the same country? (correlation analysis)
Insights support predictive diagnostics (i.e. identify faults likely to happen in the future).Better driving experience & warranty cost saving.
Visualize Event Sequences
6 Research and Technology Center North America | CR/RTC-HMI1 | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
Plotting Raw Data
259 sequences & 2500 events in total Difficult to identify sequential patterns
Visualizing Event Sequences
7 Research and Technology Center North America | CR/RTC-HMI1 | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
Aggregation and Interaction
EventFlow Monroe et. al. 2013
OutflowWongsuphasawat and Gotz, 2015
Provide succinct overview of sequences
Not robust to noisy data
Visualizing Event Sequences
Research and Technology Center North America | CR/RTC-HMI1 | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
8
Visual Summary through Sequential Pattern Mining / Clustering� Sequence Clustering
Visual cluster exploration, Wei et. al. 2012
Unsupervised clickstream clustering, Wang et. al. 2016
Frequence, Perer and Wang, 2014
Patterns&Sequences, Liu et. al. 2016
Peekquence, Kwon et. al. 2016
� Sequential Pattern Mining
Interpretation of clusters: How to characterize each sequence cluster
Robust to noisy data
Interpretable algorithmic parameters and resultsLarge number of patterns: Need to be pruned based on heuristics Does not consider missing eventsWe need to have an interpretable, noise tolerant,
principled approach for event sequence summarization.
OUR APPROACH
�Two-part representation of event sequences as lossless compression of the data
�Optimal pattern set selection for visual summary based on the Minimum Description Length (MDL) principle
� Optimization algorithm
� Speedup with locality sensitive hashing
Our Approach – Sequence Synopsis
Research and Technology Center North America | CR/RTC1.4-NA | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
10
Overview
A
A
A
B
B
B
C
C
C
E E
D
A B C
Our Approach – Sequence Synopsis
Research and Technology Center North America | CR/RTC1.4-NA | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
11
Two-Part Representation of Event Sequences
Representative pattern summarizes multiple sequences.
A
A
A
B
B
B
C
C
C
E E
D
A B C
Our Approach – Sequence Synopsis
Research and Technology Center North America | CR/RTC1.4-NA | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
12
Two-Part Representation of Event Sequences
A
A
A
B
B
B
C
C
C
E E
D
A B C
Corrections - event insertions (edits) recover the original sequences from the pattern.
Use sequential patterns for visual summary.Model information loss with the required edits (corrections).
Representative pattern summarizes multiple sequences.
BA
Our Approach – Sequence Synopsis
Research and Technology Center North America | CR/RTC1.4-NA | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
13
Two-Part Representation of Event Sequences
A
A
B
B
C
C
C
E E
D
A B C
Event deletion is another possible type of edit.
Representative pattern summarizes multiple sequences.
Different types of edits allow different variations from the pattern. Enable noise tolerant & robust pattern matching.
Our Approach – Sequence Synopsis
Research and Technology Center North America | CR/RTC1.4-NA | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
14
Two-Part Representation of Event Sequences
What can be considered as a good set of patterns to summarize a collection of event sequences?
Patterns Edits (Corrections)Event Sequences = +
Our Approach – Sequence Synopsis
Research and Technology Center North America | CR/RTC1.4-NA | 8/25/2017
L = L(M) + L(D|M)
© 2017 Robert Bosch LLC and affiliates. All rights reserved.15
The Minimum Description Length (MDL) Principle
Model description length Data description length with the help of the model
�Widely used information-theoretic criteria for model selection
� Introduced by Jorma Rissanen in 1978�Formalizes “Occam’s Razor”
�The best model (or hypothesis) of a data set should minimize its total description length:
Our Approach – Sequence Synopsis
Research and Technology Center North America | CR/RTC1.4-NA | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
16
Description Length of Event Sequences
7 6
Trade-off between reducing visual complexity & minimizing information loss.
L = L(M) + L(D|M)
sum(lengths of patterns)# min edits (corrections)
Our Approach – Sequence Synopsis
Research and Technology Center North America | CR/RTC1.4-NA | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
17
Optimize Description Length for the Best Set of Patterns
�Basic Idea: iteratively find & merge two groups of sequences with maximum description length reduction
�How to calculate description length reduction?� Find representative sequence for the merged group�Calculate the minimum number of edits (insertion, deletion, swapping event positions)
needed to transform the representative sequence to the individual sequence in the merged group‒ Assuming insertion & deletion are allowed. Longest common subsequence (LCS) algorithm
can be applied to calculate min #edits�Sum up the description length
Our Approach – Sequence Synopsis
Research and Technology Center North America | CR/RTC1.4-NA | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
18
Optimize Description Length for Best Set of Patterns
�Basic Idea: iteratively find & merge two groups of sequences with maximum description length reduction
Our Approach – Sequence Synopsis
Research and Technology Center North America | CR/RTC1.4-NA | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
18
Optimize Description Length for Best Set of Patterns
�Basic Idea: iteratively find & merge two groups of sequences with maximum description length reduction
Try to merge each pair of sequences/patterns
-4Calculate description length reduction
Our Approach – Sequence Synopsis
Research and Technology Center North America | CR/RTC1.4-NA | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
18
Optimize Description Length for Best Set of Patterns
�Basic Idea: iteratively find & merge two groups of sequences with maximum description length reduction
Try to merge each pair of sequences/patterns
-2
Calculate description length reduction
Our Approach – Sequence Synopsis
Research and Technology Center North America | CR/RTC1.4-NA | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
18
Optimize Description Length for Best Set of Patterns
�Basic Idea: iteratively find & merge two groups of sequences with maximum description length reduction
-4
Merge the pair with maximum description length reduction
Our Approach – Sequence Synopsis
Research and Technology Center North America | CR/RTC1.4-NA | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
18
Optimize Description Length for Best Set of Patterns
�Basic Idea: iteratively find & merge two groups of sequences with maximum description length reduction
…
-4-4
Need to perform pairwise comparison at each iteration
Our Approach – Sequence Synopsis
Research and Technology Center North America | CR/RTC1.4-NA | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
19
Algorithm Speedup through Locality Sensitive Hashing (LSH)
�Bottleneck of the approach: find best pair of event sequence groups to merge�Locality sensitive hashing: algorithm for fast approximate neighbor search
Our Approach – Sequence Synopsis
Research and Technology Center North America | CR/RTC1.4-NA | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
19
Algorithm Speedup through Locality Sensitive Hashing (LSH)
�Bottleneck of the approach: find best pair of event sequence groups to merge�Locality sensitive hashing: algorithm for fast approximate neighbor search
Simplified similarity measure with set relation
Our Approach – Sequence Synopsis
Research and Technology Center North America | CR/RTC1.4-NA | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
19
Algorithm Speedup through Locality Sensitive Hashing (LSH)
�Bottleneck of the approach: find best pair of event sequence groups to merge�Locality sensitive hashing: algorithm for fast approximate neighbor search
Our Approach – Sequence Synopsis
Research and Technology Center North America | CR/RTC1.4-NA | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
19
Algorithm Speedup through Locality Sensitive Hashing (LSH)
�Bottleneck of the approach: find best pair of event sequence groups to merge�Locality sensitive hashing: algorithm for fast approximate neighbor search
20x ~ 50x speed gain
Our Approach – Sequence Synopsis
Research and Technology Center North America | CR/RTC1.4-NA | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
19
Algorithm Speedup through Locality Sensitive Hashing (LSH)
�Bottleneck of the approach: find best pair of event sequence groups to merge�Locality sensitive hashing: algorithm for fast approximate neighbor search
Our Approach – Sequence Synopsis
Research and Technology Center North America | CR/RTC1.4-NA | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
20
Advantages�Simultaneous event sequence clustering and
pattern extraction�Soft constraints on pattern matching,
therefore robust to noisy data�Generalizability: possibility to include
different sequence editing operations (e.g. event insertion, deletion, swapping positions)
A
A
A
B
B
B
C
C
C
E E
D
A B C
SYSTEM
System
Research and Technology Center North America | CR/RTC1.4-NA | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
22
Visual Design
CorrectionsPatternsOriginal Data Visual Design
System
Research and Technology Center North America | CR/RTC1.4-NA | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
23
Architecture
System
Research and Technology Center North America | CR/RTC1.4-NA | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
24
Supportive Views, UI, Case Study – Vehicle Fault Analysis
System
Research and Technology Center North America | CR/RTC1.4-NA | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
31
Case Study – Application Log Analysis
�D. Fisher. Agavue event data sample
�~2000 user sessions� Interaction log of using a
data visualization application
System
Research and Technology Center North America | CR/RTC1.4-NA | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
25
Case Study – Application Log Analysis
�D. Fisher. Agavue event data sample
�~2000 user sessions� Interaction log of using a
data visualization application
Binding data
EVALUATION &SUMMARY
Evaluation & Summary
Research and Technology Center North America | CR/RTC1.4-NA | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
27
Comparative Experiment
EventFlow Monroe et. al. 2013
Our method
�Vehicle Fault Sequence�259 cars & 2500 events
Evaluation & Summary
Research and Technology Center North America | CR/RTC1.4-NA | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
28
Contributions
�A new application domain of event sequence visualization�A generic two-part representation of event sequences
that:� Quantifies visual complexity & information loss in visual
summaries� Combined with the MDL principle, defines an optimal set of
patterns for summary�An efficient algorithm to optimize visual summary using LSH�A visual analytics system that supports interactive analysis
of real-world event sequences from different application domains
Evaluation & Summary
Research and Technology Center North America | CR/RTC1.4-NA | 8/25/2017© 2017 Robert Bosch LLC and affiliates. All rights reserved.
29
Future Work
�Revise model representation to discover multiple patterns in a single sequence
�Towards quantifiable visual designs by applying the MDL principle to different types of data: graph/networks, time series …
THANK YOU!Q&A
Top Related