Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory...

76
Similarity-based Analysis for Trajectory Data Kevin Zheng 25/04/2014 DASFAA 2014 Tutorial 1

Transcript of Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory...

Page 1: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Similarity-based Analysis for Trajectory Data

Kevin Zheng

25/04/2014 DASFAA 2014 Tutorial 1

Page 2: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Outline •  Background

–  What is trajectory –  Where do they come from –  Why are they useful –  Characteristics

•  Trajectory similarity search –  Query classification –  Trajectory similarity measures –  Trajectory index

•  Similarity-based trajectory mining –  Popular route mining –  Co-traveller discovery –  Trajectory clustering

25/04/2014 DASFAA 2014 Tutorial 2

Page 3: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Outline •  Background

–  What is trajectory –  Where do they come from –  Why are they useful –  Characteristics

•  Trajectory similarity search –  Query classification –  Trajectory similarity measures –  Trajectory index

•  Similarity-based trajectory mining –  Popular route mining –  Co-traveller discovery –  Trajectory clustering

25/04/2014 DASFAA 2014 Tutorial 3

Page 4: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

What is trajectory? •  Historical location records of moving objects •  In mathematics

– Continuous function: time à location – Location can be any dimension

•  In real applications – Locations are sampled periodically – A finite sequence of time-stamped locations: <p1,

t1>, <p2, t2> …, <pn, tn> – p: two or three dimensions (longitude, latitude)

25/04/2014 DASFAA 2014 Tutorial 4

Page 5: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Where is it from?

25/04/2014 DASFAA 2014 Tutorial 5

Page 6: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Where is it from? •  GPS module on moving objects

– Vehicles, mobile phone users, animals •  Online social network

– Twitter, Flickr, Facebook, Weibo

•  Sensors – Surveillance cameras, RFID, WiFi

•  More …

25/04/2014 DASFAA 2014 Tutorial 6

Page 7: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Who cares about it? •  Government

–  Traffic pattern analysis –  Public transportation management –  Urban planning

•  Business –  Location-based service –  Personalized advertisement & recommendation –  Taxi company, logistic company

•  Scientists & Researchers –  Zoologist, meteorologist, astronomer –  Open problems, challenging tasks

•  More …

25/04/2014 DASFAA 2014 Tutorial 7

Page 8: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Trajectory data are BIG •  Volume •  Velocity •  Variety

25/04/2014 DASFAA 2014 Tutorial 8

Page 9: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Volume •  In 2010, 1 billion vehicles

– Taxi, logistic companies keep tracking their vehicles

– Self-driving car in near future? •  In 2012, 1.08 billion smartphone users •  In 2013, 20 million surveillance cameras in

China •  They are generator!

– The data keep accumulated

25/04/2014 DASFAA 2014 Tutorial 9

Page 10: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Velocity •  Not just huge, they’re being generated quickly •  Vehicle tracking & navigation

– Re-position every few seconds •  Geo-tagged social media

– 2 million Flickr photos per day, 5% geo-tagged – 100 million posts on Sina Weibo per day, 1-2%

geo-tagged – 400 million tweets per day, 1% geo-tagged

•  Sensors – How many cars pass a road camera every day?

25/04/2014 DASFAA 2014 Tutorial 10

Page 11: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Geo-tagged tweets

25/04/2014 DASFAA 2014 Tutorial 11

Images courtesy of Twitter

Page 12: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Variety •  Data source •  Tracking devices

– Car GPS, smartphones, sensors •  Tracking methods

– Sampling strategy, sampling rate,

•  Spatial length & temporal duration •  Data quality

25/04/2014 DASFAA 2014 Tutorial 12

Page 13: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Research directions •  Scalable, real-time data processing •  Flexible database storage and index •  Effective similarity measures •  Uncertainty management •  Data compression

25/04/2014 DASFAA 2014 Tutorial 13

Key and fundamental research problem: similarity-based analysis

Page 14: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Outline •  Background

–  What is trajectory –  Where do they come from –  Why are they useful –  Characteristics

•  Trajectory similarity search –  Query classification –  Trajectory similarity measures –  Trajectory index

•  Similarity-based trajectory mining –  Popular route mining –  Co-traveller discovery –  Trajectory clustering

25/04/2014 DASFAA 2014 Tutorial 14

Page 15: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Similarity-based analysis for trajectories •  Core problem: trajectory similarity search

–  Input: a trajectory dataset D, a query Q – Output: a subset of D that are ‘similar’ to Q

•  Foundation – Trajectory similarity measures

•  Approach –  Index and search algorithm

•  Application –  Popular route mining (route recommendation) –  co-traveller discovery, clustering, classification, etc…

25/04/2014 DASFAA 2014 Tutorial 15

Page 16: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Similarity query classification •  P-query

– Query: point(s) •  R-query

– Query: region (spatial & temporal dimension) •  T-query

– Query: trajectory

25/04/2014 DASFAA 2014 Tutorial 16

Page 17: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

P-query (single point)

25/04/2014 DASFAA 2014 Tutorial 17

ts te

ts

te

q

𝐷(𝑞,  𝑇)= 𝑚𝑖𝑛𝑑𝑖𝑠𝑡(𝑞,𝑝)  𝑝∈𝑇 and satisfy tc dist(q,p): -  Lp-norm -  Network distance

Query location: q Temporal constraint (optional): tc = [ts, te]

[Tao2002] Tao Y., Papadias D. and Shen Q., Continuous nearest neighbour search, VLDB, 2002

Page 18: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

P-query (multiple points)

25/04/2014 DASFAA 2014 Tutorial 18

q1

q2

q3

q4

[Chen2010] Chen Z., Shen HT., Zhou X., Zheng Y and Xie X., Searching trajectories by locations – an efficiency study. SIGMOD 2010

Query locations Q: q1, q2, q3, q4 D(Q,T) is an aggregate function of D(q,T)

Page 19: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

R-query •  Spatial region: R •  Temporal interval:[ts, te]

25/04/2014 DASFAA 2014 Tutorial 19

ts te

ts te

[Pfoster 2000] Dieter Pfoster, Christian S. Jensen, Yannis T., Novel approaches to the indexing of moving object trajectories. VLDB, 2000

R

Ask for trajectories in a given region during a time interval

Page 20: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

T-query •  Query: Tq

25/04/2014 DASFAA 2014 Tutorial 20

Tq

How to measure their distance?

Page 21: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Trajectory similarity measures •  Many-to-many mapping •  Different semantic/applications •  Different lengths •  Different sampling rates •  Noises •  Temporal dimension?

25/04/2014 DASFAA 2014 Tutorial 21

Page 22: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Classification

25/04/2014 DASFAA 2014 Tutorial 22

Lp-norm DTW LCSS EDR

DTW, LCSS, EDR with time constrain

OWD LIP

Synchronous Euclidean Distance

Spatial-only Spatial-temporal

Discrete

Continuous

Consider location only

Consider both location and time

Based on location samples

Based on line segments or curves

Page 23: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Classification

25/04/2014 DASFAA 2014 Tutorial 23

DTW, LCSS, EDR with time constrain

OWD LIP

Synchronous Euclidean Distance

Spatial-only Spatial-temporal

Discrete

Continuous

Lp-norm DTW LCSS EDR

Page 24: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Lp-norm •  Average Lp-norm distance of all matched

locations •  1-to-1 mapping •  Trajectories are of the same length

25/04/2014 DASFAA 2014 Tutorial 24

Page 25: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Lp-norm •  Cannot detect similar trajectories with different

sampling rates •  Sensitive to noise

25/04/2014 DASFAA 2014 Tutorial 25

Page 26: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

DTW •  Dynamic Time Warping distance

– Adaptation from time series distance measure – Used to handle time shift and scale in time series

•  Optimal order-aware alignment between two sequences – Goal: minimize the aggregate distance between

matched points •  1-to-many mapping

25/04/2014 DASFAA 2014 Tutorial 26

Yi, Byoung-Kee, Jagadish, HV and Faloutsos, Christos, Efficient retrieval of similar time sequences under time warping. ICDE 1998

Page 27: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

DTW for trajectories •  Nothing to do with ‘time’ at all •  Useful when detecting similar trajectories with

different sampling rates •  Sensitive to noise

25/04/2014 DASFAA 2014 Tutorial 27

Page 28: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

LCSS •  Longest Common Sub-Sequence •  Adaptation of string similarity

– Lcss(‘abcde’,’bd’) = 2 •  Threshold-based equality relationship

– Two locations are regarded as equal if they’re ‘close’ (compared to a threshold)

•  1-to-(1 or null) mapping

25/04/2014 DASFAA 2014 Tutorial 28

VLACHOS, M., GUNOPULOS, D., AND KOLLIOS, G. Discovering similar multidimensional trajectories. ICDE 2002

Page 29: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

LCSS •  Insensitive to noise •  Not easy to define threshold •  May return dissimilar trajectories

25/04/2014 DASFAA 2014 Tutorial 29

p1

p2

p3

p4

p5

p’1 p’2

p’3

Page 30: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

EDR •  Edit Distance on Real sequence •  Adaptation from Edit Distance on strings

– Number of insert, delete, replace needed to convert A into B

•  Threshold-based equality relationship – Two locations are regarded as equal if they’re

‘close’ (compared to a threshold)

25/04/2014 DASFAA 2014 Tutorial 30

Lei Chen, M. Tamer Ozsu, Vincent Oria, Robust and Fast Similarity Search for Moving Object Trajectories. SIGMOD 2005

Page 31: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

EDR •  Value means the number of operations, not

“distance between locations” –  Insensitive to noise

25/04/2014 DASFAA 2014 Tutorial 31

p1

p2

p3

p4

p5

p’1 p’2

p’3 insert

insert replace

Page 32: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

LCSS and EDR •  They are both count-based

– LCSS counts the number of matched pairs – EDR counts the cost of operations needed to fix

the unmatched pairs •  Higher LCSS, lower EDR •  If cost(replace) = cost(insert) + cost(delete): •  EDR(X,Y) = L(X)+L(Y) – 2LCSS(X,Y)

25/04/2014 DASFAA 2014 Tutorial 32

Page 33: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Classification

25/04/2014 DASFAA 2014 Tutorial 33

DTW, LCSS, EDR with time constrain

Synchronous Euclidean Distance

Spatial-only Spatial-temporal

Discrete

Continuous

Lp-norm DTW LCSS EDR

OWD LIP

Page 34: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

OWD •  One Way Distance from T1 to T2 is:

–  Integral of the distance from points of T1 to T2

– Divided by the length of T1

•  Make it into symmetric measure

25/04/2014 DASFAA 2014 Tutorial 34

Bin Lin, Jianwen Su, One Way Distance: For Shape Based Similarity Search of Moving Object Trajectories. In Geoinformatica (2008)

Page 35: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

OWD example •  Consider one trajectory as piece-wise line

segment, and the other as discrete samples

25/04/2014 DASFAA 2014 Tutorial 35

Page 36: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

LIP distance •  Locality In-between Polylines

– Polygon is the set of polygons formed between intersection points

– 

25/04/2014 DASFAA 2014 Tutorial 36

Nikos Pelekis et al, Similarity Search in Trajectory Databases. Symposium on Temporal Representation and Reasoning 2007

Page 37: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

LIP distance •  Only work for 2-dimensional trajectories •  Polygon à polyhedron: non-trivial change

25/04/2014 DASFAA 2014 Tutorial 37

Page 38: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Spatial-temporal similarity measure •  All the spatial-only similarity measures can

incorporate time information in trajectories •  Lp-norm and DTW can apply a temporal

constrain for more synchronized alignment •  EDR and LCSS can apply a temporal threshold

on top of spatial threshold

25/04/2014 DASFAA 2014 Tutorial 38

Page 39: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Classification

25/04/2014 DASFAA 2014 Tutorial 39

OWD LIP

Synchronous Euclidean Distance

Spatial-only Spatial-temporal

Discrete

Continuous

Lp-norm DTW LCSS EDR

DTW, LCSS, EDR with time constrain

Page 40: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

DTW with temporal constrain

25/04/2014 DASFAA 2014 Tutorial 40

10

15 13

9

10

Time tolerance = 2

7

Page 41: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Spatial-temporal LCSS •  A temporal threshold controls how far in time

we can go in order to match two locations

25/04/2014 DASFAA 2014 Tutorial 41

VLACHOS, M., GUNOPULOS, D., AND KOLLIOS, G. Discovering similar multidimensional trajectories. ICDE 2002

p1

p2

p3

p4

p5

p’1 p’2

p’3

t1=1

t2=2

t3=4

t4=7

t5=11

t’1=1

t’2=3

t’3=4

Time threshold=2

Page 42: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Classification

25/04/2014 DASFAA 2014 Tutorial 42

DTW, LCSS, EDR with time constrain

OWD LIP

Synchronous Euclidean Distance

Spatial-only Spatial-temporal

Discrete

Continuous

Lp-norm DTW LCSS EDR

Page 43: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

SED •  Synchronous Euclidean Distance

– Euclidean distance between locations at the same time instance of two trajectories

•  Regard trajectory as continuous function of time

25/04/2014 DASFAA 2014 Tutorial 43

Mirco Nanni, Dino Pedreschi, Time-focused clustering of trajectories of moving objects. Journal of Intelligent Information Systems (2006) POTAMIAS, M., PATROUMPAS, K., AND SELLIS, T. K. Sampling trajectory streams with spatiotemporal criteria. SSDBM 2006

Page 44: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

SED

25/04/2014 DASFAA 2014 Tutorial 44

t=0 t=10

t=5

t=0

t=15 t=25

t=20

t=12 t=10

t=7

t=3

Virtually create a sample point at t=3

Page 45: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Trajectory index •  Similarity measures give the way to calculate

the distance between two trajectories •  However …

– Huge amount of trajectories – Linear scan is inefficient

25/04/2014 DASFAA 2014 Tutorial 45

Page 46: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Trajectory index •  3D R-tree •  STR-tree (Spatio-temporal R-tree) •  TB-tree (Trajecoty Bundle) •  Multi-version R-tree

– Partition temporal dimension •  Grid-based index

– Partition spatial dimension

25/04/2014 DASFAA 2014 Tutorial 46

Page 47: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

3D R-tree •  Indexing position samples only

– Cannot answer queries about movements in-between those samples

•  Indexing line segments

25/04/2014 DASFAA 2014 Tutorial 47

x  

Time  

y  

Page 48: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Problem of R-tree •  Large “dead space”

– MBB covers a large portion of the space with no data – Low pruning power

•  Trajectory preservation – Line segments are grouped merely by spatial proximity – Regardless which trajectory they belong to – Retrieving a trajectory requires visits of different paths

in the tree

25/04/2014 DASFAA 2014 Tutorial 48

Page 49: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Augmented 3D R-tree •  Augment leaf node with orientation

information •  Distance to line segments can be approximated

more accurately

25/04/2014 DASFAA 2014 Tutorial 49

[Pfoster 2000] Dieter Pfoster, Christian S. Jensen, Yannis T., Novel approaches to the indexing of moving object trajectories. VLDB, 2000

q

Page 50: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

STR-tree •  Extension of augmented 3D R-tree •  Insertion

– Try to keep the line segments belonging to the same trajectory together

– Find the leaf node containing the predecessor •  Node split

– Put disconnected segments into new node – Put the most recent backward-connected segment

into new node

25/04/2014 DASFAA 2014 Tutorial 50

[Pfoster 2000] Dieter Pfoster, Christian S. Jensen, Yannis T., Novel approaches to the indexing of moving object trajectories. VLDB, 2000

Page 51: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

TB-tree (Trajectory Bundle) •  A leaf node only contains segments belonging

to the same trajectory •  Leaf nodes containing same trajectory are

linked •  Strictly preserve trajectories

25/04/2014 DASFAA 2014 Tutorial 51

[Pfoster 2000] Dieter Pfoster, Christian S. Jensen, Yannis T., Novel approaches to the indexing of moving object trajectories. VLDB, 2000

Page 52: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

TB-tree (Trajectory Bundle)

25/04/2014 DASFAA 2014 Tutorial 52

Page 53: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Reflection •  Previous index treat spatial and temporal

dimensions equally –  3D tree structure

•  In trajectory database, temporal dimension is more dynamic – New segments are appended to existing trajectories – Archived trajectories rarely update

•  Indexing spatial and temporal dimensions separately –  Partition temporal dimension first –  Partition spatial dimension first

25/04/2014 DASFAA 2014 Tutorial 53

Page 54: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Multi-version R-tree

25/04/2014 DASFAA 2014 Tutorial 54

HR-­‐tree  

For  each  2mestamp,  an  R-­‐tree  is  created.  So,  there  are  many  R-­‐trees.  These  R-­‐trees  are  indexed.    

Query for trajectories in a given region and in a given time interval: 1. The R-tree at the timestamp is found first 2. The trajectories in the specified region are retrieved from the R-tree.

[Nascimento1998] Nascimento, M., Silva, J. Towards Historical R-trees. ACM SAC, 1998 [Tao2001a] Tao, Y., Papadias, D.: Efficient historical r-trees. In: ssdbm, p. 0223. Published by the IEEE Computer Society (2001) [Xu2005]Xu, X., Han, J., Lu, W.: Rt-tree: An improved r-tree indexing structure for temporal spatial databases. In: Int. Symp. on Spatial Data Handling, 2005 [Tao2001b] Tao, Y., Papadias, D.: Mv3r-tree: A spatio-temporal access method for timestamp and interval queries. In: VLDB, pp. 431-440 (2001)

Page 55: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Grid-based index •  Partition space into non-overlapping cells •  Trajectory segments in each cell are indexed

on temporal dimension •  Query processing:

– Spatial filtering – Temporal filtering

25/04/2014 DASFAA 2014 Tutorial 55

[Prasad2003] V. Prasad Chakka Adam C. Everspaugh Jignesh M., Patel, Indexing Large Trajectory Data Sets With SETI, CIDR 2003

Page 56: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Grid-based index

25/04/2014 DASFAA 2014 Tutorial 56

[Prasad2003] V. Prasad Chakka Adam C. Everspaugh Jignesh M., Patel, Indexing Large Trajectory Data Sets With SETI, CIDR 2003

Page 57: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Outline •  Background

–  What is trajectory –  Where do they come from –  Why are they useful –  Characteristics

•  Trajectory similarity search –  Query classification –  Trajectory similarity measures –  Trajectory index

•  Similarity-based trajectory mining –  Popular route mining –  Co-traveller discovery –  Trajectory clustering

25/04/2014 DASFAA 2014 Tutorial 57

Page 58: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Popular route mining •  Shortest path may not be the favourable •  Find the most popular/desirable path using the

GPS trajectories of past travellers •  Classification

– No specific source/destination – hot route discovery

– Specific source/destination – popular route search

25/04/2014 DASFAA 2014 Tutorial 58

Page 59: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

T-pattern •  A set of individual trajectories that share the

property of visiting the same sequence of places with similar travel times

25/04/2014 DASFAA 2014 Tutorial 59

1. Spatial discretization: discretize space into finite set of regions of interest (RoI) 2. Translate trajectories into sequence of RoIs 3. Adapt sequential pattern mining algorithms with time constrain

F. Giannotti, M. Nanni, F. Pinelli, and D. Pedreschi. Trajectory pattern mining. In SIGKDD, pages 330–339, 2007

Page 60: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Periodic pattern •  Find the repeat pattern for individual’s

trajectory

25/04/2014 DASFAA 2014 Tutorial 60

1.  Pre-defined and synchronized timestamps 2.  Adaptive region: density-based cluster 3.  Trajectories are translated to sequence of regions at pre-defined time instances

N. Mamoulis, H. Cao, G. Kollios, M. Hadjieleftheriou, Y. Tao, and D. W. Cheung. Mining, indexing, and querying historical spatiotemporal data. In SIGKDD, pages 236–245, 2004.

Page 61: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Interesting travel sequence •  A sequence of interesting locations that is

travelled by experienced drivers frequently •  Interestingness of a location?

– How many experienced travellers visited it

•  Experience of a traveller? – How many interesting locations she has visited

25/04/2014 DASFAA 2014 Tutorial 61

Y. Zheng, L. Zhang, X. Xie, and W.-Y. Ma. Mining interesting locations and travel sequences from gps trajectories. In WWW, pages 791–800, 2009.

Page 62: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Interesting travel sequence •  Hyperlink-Induced Topic Search (HITS)

25/04/2014 DASFAA 2014 Tutorial 62

Page 63: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Popular route search •  Given source/destination, find/estimate the

most popular route in between •  Ideally, we can just count the number of

trajectories on different paths connecting the two locations

25/04/2014 DASFAA 2014 Tutorial 63

Z. Chen, H. T. Shen, and X. Zhou. Discovering popular routes from trajectories. In ICDE, pages 900–911, 2011

Page 64: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Popular route discovery •  In real scenarios, it’s not easy to find such

well-divided groups

•  Even worse, there’s no trajectory connecting two locations at all!

25/04/2014 DASFAA 2014 Tutorial 64

Page 65: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Popular route discovery •  Construct a transfer network from raw trajectories

as an intermediate result to capture the moving behaviors between locations – Node: cluster of turning points – Edge: trajectories passing two nodes

25/04/2014 DASFAA 2014 Tutorial 65

Page 66: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Popular route discovery •  Transfer probability

•  Find the route with the highest joint transfer probability w.r.t. destination

25/04/2014 DASFAA 2014 Tutorial 66

Page 67: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Find popular route from uncertain trajectories •  Low-sampling-rate trajectories have high

degree of uncertainty •  Uncertain trajectories are prevalent! •  Is it possible to recover the original route given

a low-sampling-rate trajectory? •  What if…

– There is a historical set of uncertain trajectories

25/04/2014 DASFAA 2014 Tutorial 67

Kai Zheng, Yu Zheng, Xing Xie and Xiaofang Zhou. Reducing Uncertainty of Low-Sampling-Rate Trajectories. ICDE 2012

Page 68: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Find popular route from uncertain trajectories •  Can we find popular routes for specific s/d

using low-sampling-rate trajectories

25/04/2014 DASFAA 2014 Tutorial 68

Infrequent samples on the same path can reinforce each other, and they collectively form a more ‘dense’ trajectory

Page 69: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Find popular route from uncertain trajectories •  Find PR for a given sequence of locations •  Two-phase approach

– Local PR construction – Global PR search

25/04/2014 DASFAA 2014 Tutorial 69

Page 70: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Find popular route from uncertain trajectories without road networks •  A road network is not always available or

applicable

25/04/2014 DASFAA 2014 Tutorial 70

1.  Discretize space into disjoint cells 2.  Derive the transfer graph using cells 3.  Infer frequent ‘virtual edges’ on the graph

L.-Y. Wei, Y. Zheng, and W.-C. Peng. Constructing popular routes from uncertain trajectories. In ACM SIGKDD, pages 195–203, 2012.

Page 71: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Time period-based most frequent path (TPMFP) •  Find the most frequent path for specific s/d and

time period •  Desired properties for a MFP

– Suffix optimal: suffix of a MFP is also a MFP – Length insensitive: MFP shouldn’t favor long/short

route – Bottleneck free: MFP shouldn’t contain infrequent

edges

25/04/2014 DASFAA 2014 Tutorial 71

Wuman Luo, Haoyu Tan, Lei Chen, Lionel M. Ni. Finding Time Period-Based Most Frequent Path in Big Trajectory Data. SIGMOD 2013

Page 72: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Time period-based most frequent path (TPMFP) •  Footmark graph

– A weighted sub-graph – Edge frequency: number of trajectories reaching vd

during T –  Path frequency: non-decreasingly sorted sequence of

edge frequencies

25/04/2014 DASFAA 2014 Tutorial 72

v1 à v2: 14, v2 à v3: 10 v3 à v12: 10, v2 à v12: 8

V1 à v2 à v12: (8, 14) V1àv2àv3àv12: (10,10,14) V1àv10àv11àv12: (1,21,21)

Page 73: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Time period-based most frequent path (TPMFP) •  Compare path frequency

– More-frequent-than relation (>) –  F > F’ if their first different value fj > fj’ –  (10,10,14) > (8,14) > (1,21,21)

•  Property – A total order, so MFP always exist – Guarantee the suffix optimal – Length of path doesn’t matter a lot –  Path with infrequent edge frequency be disadvantaged

25/04/2014 DASFAA 2014 Tutorial 73

Page 74: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

PR search is more challenging •  Specific source/destination (and time

constraint) – Data are sparse – How to utilize more relevant data?

•  Online performance – Efficiency is critical

25/04/2014 DASFAA 2014 Tutorial 74

Page 75: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Summary •  Background

–  What is trajectory –  Where do they come from –  Why are they useful –  Characteristics

•  Trajectory similarity search –  Query classification –  Trajectory similarity measures –  Trajectory index

•  Similarity-based trajectory mining –  Popular route mining –  Co-traveller discovery –  Trajectory clustering

25/04/2014 DASFAA 2014 Tutorial 75

Page 76: Similarity-based Analysis for Trajectory Datadasfaa14/slides/... · • Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering

Thank you •  Questions

– [email protected] •  DKE group at UQ

– http://www.itee.uq.edu.au/dke/dke-lab

•  My research – http://staff.itee.uq.edu.au/kevinz/

25/04/2014 DASFAA 2014 Tutorial 76