Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory...

Post on 06-Jul-2020

12 views 0 download

Transcript of Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory...

Similarity-based Analysis for Trajectory Data

Kevin Zheng

25/04/2014 DASFAA 2014 Tutorial 1

Outline •  Background

–  What is trajectory –  Where do they come from –  Why are they useful –  Characteristics

•  Trajectory similarity search –  Query classification –  Trajectory similarity measures –  Trajectory index

•  Similarity-based trajectory mining –  Popular route mining –  Co-traveller discovery –  Trajectory clustering

25/04/2014 DASFAA 2014 Tutorial 2

Outline •  Background

–  What is trajectory –  Where do they come from –  Why are they useful –  Characteristics

•  Trajectory similarity search –  Query classification –  Trajectory similarity measures –  Trajectory index

•  Similarity-based trajectory mining –  Popular route mining –  Co-traveller discovery –  Trajectory clustering

25/04/2014 DASFAA 2014 Tutorial 3

What is trajectory? •  Historical location records of moving objects •  In mathematics

– Continuous function: time à location – Location can be any dimension

•  In real applications – Locations are sampled periodically – A finite sequence of time-stamped locations: <p1,

t1>, <p2, t2> …, <pn, tn> – p: two or three dimensions (longitude, latitude)

25/04/2014 DASFAA 2014 Tutorial 4

Where is it from?

25/04/2014 DASFAA 2014 Tutorial 5

Where is it from? •  GPS module on moving objects

– Vehicles, mobile phone users, animals •  Online social network

– Twitter, Flickr, Facebook, Weibo

•  Sensors – Surveillance cameras, RFID, WiFi

•  More …

25/04/2014 DASFAA 2014 Tutorial 6

Who cares about it? •  Government

–  Traffic pattern analysis –  Public transportation management –  Urban planning

•  Business –  Location-based service –  Personalized advertisement & recommendation –  Taxi company, logistic company

•  Scientists & Researchers –  Zoologist, meteorologist, astronomer –  Open problems, challenging tasks

•  More …

25/04/2014 DASFAA 2014 Tutorial 7

Trajectory data are BIG •  Volume •  Velocity •  Variety

25/04/2014 DASFAA 2014 Tutorial 8

Volume •  In 2010, 1 billion vehicles

– Taxi, logistic companies keep tracking their vehicles

– Self-driving car in near future? •  In 2012, 1.08 billion smartphone users •  In 2013, 20 million surveillance cameras in

China •  They are generator!

– The data keep accumulated

25/04/2014 DASFAA 2014 Tutorial 9

Velocity •  Not just huge, they’re being generated quickly •  Vehicle tracking & navigation

– Re-position every few seconds •  Geo-tagged social media

– 2 million Flickr photos per day, 5% geo-tagged – 100 million posts on Sina Weibo per day, 1-2%

geo-tagged – 400 million tweets per day, 1% geo-tagged

•  Sensors – How many cars pass a road camera every day?

25/04/2014 DASFAA 2014 Tutorial 10

Geo-tagged tweets

25/04/2014 DASFAA 2014 Tutorial 11

Images courtesy of Twitter

Variety •  Data source •  Tracking devices

– Car GPS, smartphones, sensors •  Tracking methods

– Sampling strategy, sampling rate,

•  Spatial length & temporal duration •  Data quality

25/04/2014 DASFAA 2014 Tutorial 12

Research directions •  Scalable, real-time data processing •  Flexible database storage and index •  Effective similarity measures •  Uncertainty management •  Data compression

25/04/2014 DASFAA 2014 Tutorial 13

Key and fundamental research problem: similarity-based analysis

Outline •  Background

–  What is trajectory –  Where do they come from –  Why are they useful –  Characteristics

•  Trajectory similarity search –  Query classification –  Trajectory similarity measures –  Trajectory index

•  Similarity-based trajectory mining –  Popular route mining –  Co-traveller discovery –  Trajectory clustering

25/04/2014 DASFAA 2014 Tutorial 14

Similarity-based analysis for trajectories •  Core problem: trajectory similarity search

–  Input: a trajectory dataset D, a query Q – Output: a subset of D that are ‘similar’ to Q

•  Foundation – Trajectory similarity measures

•  Approach –  Index and search algorithm

•  Application –  Popular route mining (route recommendation) –  co-traveller discovery, clustering, classification, etc…

25/04/2014 DASFAA 2014 Tutorial 15

Similarity query classification •  P-query

– Query: point(s) •  R-query

– Query: region (spatial & temporal dimension) •  T-query

– Query: trajectory

25/04/2014 DASFAA 2014 Tutorial 16

P-query (single point)

25/04/2014 DASFAA 2014 Tutorial 17

ts te

ts

te

q

𝐷(𝑞,  𝑇)= 𝑚𝑖𝑛𝑑𝑖𝑠𝑡(𝑞,𝑝)  𝑝∈𝑇 and satisfy tc dist(q,p): -  Lp-norm -  Network distance

Query location: q Temporal constraint (optional): tc = [ts, te]

[Tao2002] Tao Y., Papadias D. and Shen Q., Continuous nearest neighbour search, VLDB, 2002

P-query (multiple points)

25/04/2014 DASFAA 2014 Tutorial 18

q1

q2

q3

q4

[Chen2010] Chen Z., Shen HT., Zhou X., Zheng Y and Xie X., Searching trajectories by locations – an efficiency study. SIGMOD 2010

Query locations Q: q1, q2, q3, q4 D(Q,T) is an aggregate function of D(q,T)

R-query •  Spatial region: R •  Temporal interval:[ts, te]

25/04/2014 DASFAA 2014 Tutorial 19

ts te

ts te

[Pfoster 2000] Dieter Pfoster, Christian S. Jensen, Yannis T., Novel approaches to the indexing of moving object trajectories. VLDB, 2000

R

Ask for trajectories in a given region during a time interval

T-query •  Query: Tq

25/04/2014 DASFAA 2014 Tutorial 20

Tq

How to measure their distance?

Trajectory similarity measures •  Many-to-many mapping •  Different semantic/applications •  Different lengths •  Different sampling rates •  Noises •  Temporal dimension?

25/04/2014 DASFAA 2014 Tutorial 21

Classification

25/04/2014 DASFAA 2014 Tutorial 22

Lp-norm DTW LCSS EDR

DTW, LCSS, EDR with time constrain

OWD LIP

Synchronous Euclidean Distance

Spatial-only Spatial-temporal

Discrete

Continuous

Consider location only

Consider both location and time

Based on location samples

Based on line segments or curves

Classification

25/04/2014 DASFAA 2014 Tutorial 23

DTW, LCSS, EDR with time constrain

OWD LIP

Synchronous Euclidean Distance

Spatial-only Spatial-temporal

Discrete

Continuous

Lp-norm DTW LCSS EDR

Lp-norm •  Average Lp-norm distance of all matched

locations •  1-to-1 mapping •  Trajectories are of the same length

25/04/2014 DASFAA 2014 Tutorial 24

Lp-norm •  Cannot detect similar trajectories with different

sampling rates •  Sensitive to noise

25/04/2014 DASFAA 2014 Tutorial 25

DTW •  Dynamic Time Warping distance

– Adaptation from time series distance measure – Used to handle time shift and scale in time series

•  Optimal order-aware alignment between two sequences – Goal: minimize the aggregate distance between

matched points •  1-to-many mapping

25/04/2014 DASFAA 2014 Tutorial 26

Yi, Byoung-Kee, Jagadish, HV and Faloutsos, Christos, Efficient retrieval of similar time sequences under time warping. ICDE 1998

DTW for trajectories •  Nothing to do with ‘time’ at all •  Useful when detecting similar trajectories with

different sampling rates •  Sensitive to noise

25/04/2014 DASFAA 2014 Tutorial 27

LCSS •  Longest Common Sub-Sequence •  Adaptation of string similarity

– Lcss(‘abcde’,’bd’) = 2 •  Threshold-based equality relationship

– Two locations are regarded as equal if they’re ‘close’ (compared to a threshold)

•  1-to-(1 or null) mapping

25/04/2014 DASFAA 2014 Tutorial 28

VLACHOS, M., GUNOPULOS, D., AND KOLLIOS, G. Discovering similar multidimensional trajectories. ICDE 2002

LCSS •  Insensitive to noise •  Not easy to define threshold •  May return dissimilar trajectories

25/04/2014 DASFAA 2014 Tutorial 29

p1

p2

p3

p4

p5

p’1 p’2

p’3

EDR •  Edit Distance on Real sequence •  Adaptation from Edit Distance on strings

– Number of insert, delete, replace needed to convert A into B

•  Threshold-based equality relationship – Two locations are regarded as equal if they’re

‘close’ (compared to a threshold)

25/04/2014 DASFAA 2014 Tutorial 30

Lei Chen, M. Tamer Ozsu, Vincent Oria, Robust and Fast Similarity Search for Moving Object Trajectories. SIGMOD 2005

EDR •  Value means the number of operations, not

“distance between locations” –  Insensitive to noise

25/04/2014 DASFAA 2014 Tutorial 31

p1

p2

p3

p4

p5

p’1 p’2

p’3 insert

insert replace

LCSS and EDR •  They are both count-based

– LCSS counts the number of matched pairs – EDR counts the cost of operations needed to fix

the unmatched pairs •  Higher LCSS, lower EDR •  If cost(replace) = cost(insert) + cost(delete): •  EDR(X,Y) = L(X)+L(Y) – 2LCSS(X,Y)

25/04/2014 DASFAA 2014 Tutorial 32

Classification

25/04/2014 DASFAA 2014 Tutorial 33

DTW, LCSS, EDR with time constrain

Synchronous Euclidean Distance

Spatial-only Spatial-temporal

Discrete

Continuous

Lp-norm DTW LCSS EDR

OWD LIP

OWD •  One Way Distance from T1 to T2 is:

–  Integral of the distance from points of T1 to T2

– Divided by the length of T1

•  Make it into symmetric measure

25/04/2014 DASFAA 2014 Tutorial 34

Bin Lin, Jianwen Su, One Way Distance: For Shape Based Similarity Search of Moving Object Trajectories. In Geoinformatica (2008)

OWD example •  Consider one trajectory as piece-wise line

segment, and the other as discrete samples

25/04/2014 DASFAA 2014 Tutorial 35

LIP distance •  Locality In-between Polylines

– Polygon is the set of polygons formed between intersection points

– 

25/04/2014 DASFAA 2014 Tutorial 36

Nikos Pelekis et al, Similarity Search in Trajectory Databases. Symposium on Temporal Representation and Reasoning 2007

LIP distance •  Only work for 2-dimensional trajectories •  Polygon à polyhedron: non-trivial change

25/04/2014 DASFAA 2014 Tutorial 37

Spatial-temporal similarity measure •  All the spatial-only similarity measures can

incorporate time information in trajectories •  Lp-norm and DTW can apply a temporal

constrain for more synchronized alignment •  EDR and LCSS can apply a temporal threshold

on top of spatial threshold

25/04/2014 DASFAA 2014 Tutorial 38

Classification

25/04/2014 DASFAA 2014 Tutorial 39

OWD LIP

Synchronous Euclidean Distance

Spatial-only Spatial-temporal

Discrete

Continuous

Lp-norm DTW LCSS EDR

DTW, LCSS, EDR with time constrain

DTW with temporal constrain

25/04/2014 DASFAA 2014 Tutorial 40

10

15 13

9

10

Time tolerance = 2

7

Spatial-temporal LCSS •  A temporal threshold controls how far in time

we can go in order to match two locations

25/04/2014 DASFAA 2014 Tutorial 41

VLACHOS, M., GUNOPULOS, D., AND KOLLIOS, G. Discovering similar multidimensional trajectories. ICDE 2002

p1

p2

p3

p4

p5

p’1 p’2

p’3

t1=1

t2=2

t3=4

t4=7

t5=11

t’1=1

t’2=3

t’3=4

Time threshold=2

Classification

25/04/2014 DASFAA 2014 Tutorial 42

DTW, LCSS, EDR with time constrain

OWD LIP

Synchronous Euclidean Distance

Spatial-only Spatial-temporal

Discrete

Continuous

Lp-norm DTW LCSS EDR

SED •  Synchronous Euclidean Distance

– Euclidean distance between locations at the same time instance of two trajectories

•  Regard trajectory as continuous function of time

25/04/2014 DASFAA 2014 Tutorial 43

Mirco Nanni, Dino Pedreschi, Time-focused clustering of trajectories of moving objects. Journal of Intelligent Information Systems (2006) POTAMIAS, M., PATROUMPAS, K., AND SELLIS, T. K. Sampling trajectory streams with spatiotemporal criteria. SSDBM 2006

SED

25/04/2014 DASFAA 2014 Tutorial 44

t=0 t=10

t=5

t=0

t=15 t=25

t=20

t=12 t=10

t=7

t=3

Virtually create a sample point at t=3

Trajectory index •  Similarity measures give the way to calculate

the distance between two trajectories •  However …

– Huge amount of trajectories – Linear scan is inefficient

25/04/2014 DASFAA 2014 Tutorial 45

Trajectory index •  3D R-tree •  STR-tree (Spatio-temporal R-tree) •  TB-tree (Trajecoty Bundle) •  Multi-version R-tree

– Partition temporal dimension •  Grid-based index

– Partition spatial dimension

25/04/2014 DASFAA 2014 Tutorial 46

3D R-tree •  Indexing position samples only

– Cannot answer queries about movements in-between those samples

•  Indexing line segments

25/04/2014 DASFAA 2014 Tutorial 47

x  

Time  

y  

Problem of R-tree •  Large “dead space”

– MBB covers a large portion of the space with no data – Low pruning power

•  Trajectory preservation – Line segments are grouped merely by spatial proximity – Regardless which trajectory they belong to – Retrieving a trajectory requires visits of different paths

in the tree

25/04/2014 DASFAA 2014 Tutorial 48

Augmented 3D R-tree •  Augment leaf node with orientation

information •  Distance to line segments can be approximated

more accurately

25/04/2014 DASFAA 2014 Tutorial 49

[Pfoster 2000] Dieter Pfoster, Christian S. Jensen, Yannis T., Novel approaches to the indexing of moving object trajectories. VLDB, 2000

q

STR-tree •  Extension of augmented 3D R-tree •  Insertion

– Try to keep the line segments belonging to the same trajectory together

– Find the leaf node containing the predecessor •  Node split

– Put disconnected segments into new node – Put the most recent backward-connected segment

into new node

25/04/2014 DASFAA 2014 Tutorial 50

[Pfoster 2000] Dieter Pfoster, Christian S. Jensen, Yannis T., Novel approaches to the indexing of moving object trajectories. VLDB, 2000

TB-tree (Trajectory Bundle) •  A leaf node only contains segments belonging

to the same trajectory •  Leaf nodes containing same trajectory are

linked •  Strictly preserve trajectories

25/04/2014 DASFAA 2014 Tutorial 51

[Pfoster 2000] Dieter Pfoster, Christian S. Jensen, Yannis T., Novel approaches to the indexing of moving object trajectories. VLDB, 2000

TB-tree (Trajectory Bundle)

25/04/2014 DASFAA 2014 Tutorial 52

Reflection •  Previous index treat spatial and temporal

dimensions equally –  3D tree structure

•  In trajectory database, temporal dimension is more dynamic – New segments are appended to existing trajectories – Archived trajectories rarely update

•  Indexing spatial and temporal dimensions separately –  Partition temporal dimension first –  Partition spatial dimension first

25/04/2014 DASFAA 2014 Tutorial 53

Multi-version R-tree

25/04/2014 DASFAA 2014 Tutorial 54

HR-­‐tree  

For  each  2mestamp,  an  R-­‐tree  is  created.  So,  there  are  many  R-­‐trees.  These  R-­‐trees  are  indexed.    

Query for trajectories in a given region and in a given time interval: 1. The R-tree at the timestamp is found first 2. The trajectories in the specified region are retrieved from the R-tree.

[Nascimento1998] Nascimento, M., Silva, J. Towards Historical R-trees. ACM SAC, 1998 [Tao2001a] Tao, Y., Papadias, D.: Efficient historical r-trees. In: ssdbm, p. 0223. Published by the IEEE Computer Society (2001) [Xu2005]Xu, X., Han, J., Lu, W.: Rt-tree: An improved r-tree indexing structure for temporal spatial databases. In: Int. Symp. on Spatial Data Handling, 2005 [Tao2001b] Tao, Y., Papadias, D.: Mv3r-tree: A spatio-temporal access method for timestamp and interval queries. In: VLDB, pp. 431-440 (2001)

Grid-based index •  Partition space into non-overlapping cells •  Trajectory segments in each cell are indexed

on temporal dimension •  Query processing:

– Spatial filtering – Temporal filtering

25/04/2014 DASFAA 2014 Tutorial 55

[Prasad2003] V. Prasad Chakka Adam C. Everspaugh Jignesh M., Patel, Indexing Large Trajectory Data Sets With SETI, CIDR 2003

Grid-based index

25/04/2014 DASFAA 2014 Tutorial 56

[Prasad2003] V. Prasad Chakka Adam C. Everspaugh Jignesh M., Patel, Indexing Large Trajectory Data Sets With SETI, CIDR 2003

Outline •  Background

–  What is trajectory –  Where do they come from –  Why are they useful –  Characteristics

•  Trajectory similarity search –  Query classification –  Trajectory similarity measures –  Trajectory index

•  Similarity-based trajectory mining –  Popular route mining –  Co-traveller discovery –  Trajectory clustering

25/04/2014 DASFAA 2014 Tutorial 57

Popular route mining •  Shortest path may not be the favourable •  Find the most popular/desirable path using the

GPS trajectories of past travellers •  Classification

– No specific source/destination – hot route discovery

– Specific source/destination – popular route search

25/04/2014 DASFAA 2014 Tutorial 58

T-pattern •  A set of individual trajectories that share the

property of visiting the same sequence of places with similar travel times

25/04/2014 DASFAA 2014 Tutorial 59

1. Spatial discretization: discretize space into finite set of regions of interest (RoI) 2. Translate trajectories into sequence of RoIs 3. Adapt sequential pattern mining algorithms with time constrain

F. Giannotti, M. Nanni, F. Pinelli, and D. Pedreschi. Trajectory pattern mining. In SIGKDD, pages 330–339, 2007

Periodic pattern •  Find the repeat pattern for individual’s

trajectory

25/04/2014 DASFAA 2014 Tutorial 60

1.  Pre-defined and synchronized timestamps 2.  Adaptive region: density-based cluster 3.  Trajectories are translated to sequence of regions at pre-defined time instances

N. Mamoulis, H. Cao, G. Kollios, M. Hadjieleftheriou, Y. Tao, and D. W. Cheung. Mining, indexing, and querying historical spatiotemporal data. In SIGKDD, pages 236–245, 2004.

Interesting travel sequence •  A sequence of interesting locations that is

travelled by experienced drivers frequently •  Interestingness of a location?

– How many experienced travellers visited it

•  Experience of a traveller? – How many interesting locations she has visited

25/04/2014 DASFAA 2014 Tutorial 61

Y. Zheng, L. Zhang, X. Xie, and W.-Y. Ma. Mining interesting locations and travel sequences from gps trajectories. In WWW, pages 791–800, 2009.

Interesting travel sequence •  Hyperlink-Induced Topic Search (HITS)

25/04/2014 DASFAA 2014 Tutorial 62

Popular route search •  Given source/destination, find/estimate the

most popular route in between •  Ideally, we can just count the number of

trajectories on different paths connecting the two locations

25/04/2014 DASFAA 2014 Tutorial 63

Z. Chen, H. T. Shen, and X. Zhou. Discovering popular routes from trajectories. In ICDE, pages 900–911, 2011

Popular route discovery •  In real scenarios, it’s not easy to find such

well-divided groups

•  Even worse, there’s no trajectory connecting two locations at all!

25/04/2014 DASFAA 2014 Tutorial 64

Popular route discovery •  Construct a transfer network from raw trajectories

as an intermediate result to capture the moving behaviors between locations – Node: cluster of turning points – Edge: trajectories passing two nodes

25/04/2014 DASFAA 2014 Tutorial 65

Popular route discovery •  Transfer probability

•  Find the route with the highest joint transfer probability w.r.t. destination

25/04/2014 DASFAA 2014 Tutorial 66

Find popular route from uncertain trajectories •  Low-sampling-rate trajectories have high

degree of uncertainty •  Uncertain trajectories are prevalent! •  Is it possible to recover the original route given

a low-sampling-rate trajectory? •  What if…

– There is a historical set of uncertain trajectories

25/04/2014 DASFAA 2014 Tutorial 67

Kai Zheng, Yu Zheng, Xing Xie and Xiaofang Zhou. Reducing Uncertainty of Low-Sampling-Rate Trajectories. ICDE 2012

Find popular route from uncertain trajectories •  Can we find popular routes for specific s/d

using low-sampling-rate trajectories

25/04/2014 DASFAA 2014 Tutorial 68

Infrequent samples on the same path can reinforce each other, and they collectively form a more ‘dense’ trajectory

Find popular route from uncertain trajectories •  Find PR for a given sequence of locations •  Two-phase approach

– Local PR construction – Global PR search

25/04/2014 DASFAA 2014 Tutorial 69

Find popular route from uncertain trajectories without road networks •  A road network is not always available or

applicable

25/04/2014 DASFAA 2014 Tutorial 70

1.  Discretize space into disjoint cells 2.  Derive the transfer graph using cells 3.  Infer frequent ‘virtual edges’ on the graph

L.-Y. Wei, Y. Zheng, and W.-C. Peng. Constructing popular routes from uncertain trajectories. In ACM SIGKDD, pages 195–203, 2012.

Time period-based most frequent path (TPMFP) •  Find the most frequent path for specific s/d and

time period •  Desired properties for a MFP

– Suffix optimal: suffix of a MFP is also a MFP – Length insensitive: MFP shouldn’t favor long/short

route – Bottleneck free: MFP shouldn’t contain infrequent

edges

25/04/2014 DASFAA 2014 Tutorial 71

Wuman Luo, Haoyu Tan, Lei Chen, Lionel M. Ni. Finding Time Period-Based Most Frequent Path in Big Trajectory Data. SIGMOD 2013

Time period-based most frequent path (TPMFP) •  Footmark graph

– A weighted sub-graph – Edge frequency: number of trajectories reaching vd

during T –  Path frequency: non-decreasingly sorted sequence of

edge frequencies

25/04/2014 DASFAA 2014 Tutorial 72

v1 à v2: 14, v2 à v3: 10 v3 à v12: 10, v2 à v12: 8

V1 à v2 à v12: (8, 14) V1àv2àv3àv12: (10,10,14) V1àv10àv11àv12: (1,21,21)

Time period-based most frequent path (TPMFP) •  Compare path frequency

– More-frequent-than relation (>) –  F > F’ if their first different value fj > fj’ –  (10,10,14) > (8,14) > (1,21,21)

•  Property – A total order, so MFP always exist – Guarantee the suffix optimal – Length of path doesn’t matter a lot –  Path with infrequent edge frequency be disadvantaged

25/04/2014 DASFAA 2014 Tutorial 73

PR search is more challenging •  Specific source/destination (and time

constraint) – Data are sparse – How to utilize more relevant data?

•  Online performance – Efficiency is critical

25/04/2014 DASFAA 2014 Tutorial 74

Summary •  Background

–  What is trajectory –  Where do they come from –  Why are they useful –  Characteristics

•  Trajectory similarity search –  Query classification –  Trajectory similarity measures –  Trajectory index

•  Similarity-based trajectory mining –  Popular route mining –  Co-traveller discovery –  Trajectory clustering

25/04/2014 DASFAA 2014 Tutorial 75

Thank you •  Questions

– kevinz@itee.uq.edu.au •  DKE group at UQ

– http://www.itee.uq.edu.au/dke/dke-lab

•  My research – http://staff.itee.uq.edu.au/kevinz/

25/04/2014 DASFAA 2014 Tutorial 76