Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

Post on 23-Jan-2015

616 views 2 download

description

 

Transcript of Event Clusters Detection on Flickr Images using a Suffix-Tree Structure

1

Event Cluster Detection on Flickr Images using a Suffix-Tree Structure

Massimiliano Ruocco and Heri Ramampiaro

Dept. Of Computer and Information Science Norwegian University of Science and Technology

ruocco@idi.ntnu.no

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

2

Outline

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

3

Outline

1.  Introduction 1.  Problem Statement 2.  Related Works 3.  Contributions

2.  Proposed approach 1.  Problem definition 2.  Preliminary 3.  Algorithm Overview

3.  Evaluation 4.  Conclusions

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

4

Problem Statement

Event Detection

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

5

Problem Statement

Event Detection

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Event detection topic has its origin from the TDT (Topic Detection and Tracking) project(1):

(1) http://projects.ldc.upenn.edu/TDT/!

6

Problem Statement

Event Detection

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Event detection topic has its origin from the TDT (Topic Detection and Tracking) project(1):

-  Objective: aggregate stories over time into single event topic

(1) http://projects.ldc.upenn.edu/TDT/!

7

Problem Statement

Event Detection

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Event detection topic has its origin from the TDT (Topic Detection and Tracking) project(1):

-  Objective: aggregate stories over time into single event topic

(1) http://projects.ldc.upenn.edu/TDT/!

Something happening in a certain place at a certain time [Yang, Pierce, Carbonell 1999]

8

Problem Statement

Event Detection

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

9

Problem Statement

Event Detection

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Most previous works focus on time-tagged document streams can be classified as:

10

Problem Statement

Event Detection

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Most previous works focus on time-tagged document streams can be classified as:

-  Retrospective Detection : discover unidentified events in a collection of news [Yang et al. 1998]

11

Problem Statement

Event Detection

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Most previous works focus on time-tagged document streams can be classified as:

-  Retrospective Detection : discover unidentified events in a collection of news [Yang et al. 1998]

-  Online Detection : detect events in real-time from a stream of news [Brants et al. 2003]

12

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Problem Statement

Web Photo-Sharing Apps – New Needs

13

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Huge Amount of Pictures

Problem Statement

Web Photo-Sharing Apps – New Needs

14

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Huge Amount of Pictures

Time!User!Location!Tags!

26 Oct 2010 RMax

26:12, 23:14 Roma, Sky, Bridge

…!

Problem Statement

Web Photo-Sharing Apps – New Needs

15

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Huge Amount of Pictures

Time!User!Location!Tags!

26 Oct 2010 RMax

26:12, 23:14 Roma, Sky, Bridge

…!

New Needs

Knowledge Extraction

Browse

Retrieve

Problem Statement

Web Photo-Sharing Apps – New Needs

16

Problem Statement

Challenges

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

17

Problem Statement

Challenges

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Event detection on Tagged Picture from Photo-Sharing Apps -  Web-scale environment -  Use of contextual information -  Noisy annotation

18

Problem Statement

Challenges

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Event detection on Tagged Picture from Photo-Sharing Apps -  Web-scale environment -  Use of contextual information -  Noisy annotation

19

Related Works

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

20

Related Works

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Event Clustering (Visual/Temporal information) [Loui, Savakis 2002]

-  Albuming user photo collections

-  Not scalable to large dataset!

-  Limited to user photo collection! -  No Locational Information!

21

Related Works

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Event Clustering (Visual/Temporal information) [Loui, Savakis 2002]

-  Albuming user photo collections

-  Not scalable to large dataset!

-  Limited to user photo collection! -  No Locational Information!

-  Event/Place Semantic Identification (Temporal information) [Rattenbury et al. 2007]

-  Extraction of event and place semantics for tags assigned to Flickr photos

-  Scale-Structure Identification (SSI) method to analyze the tag usage distribution

-  SSI is limited for large dataset!

-  Location information is not considered!

22

Related Works

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Event Clustering (Visual/Temporal information) [Loui, Savakis 2002]

-  Albuming user photo collections

-  Not scalable to large dataset!

-  Limited to user photo collection! -  No Locational Information!

-  Event/Place Semantic Identification (Temporal information) [Rattenbury et al. 2007]

-  Extraction of event and place semantics for tags assigned to Flickr photos

-  Scale-Structure Identification (SSI) method to analyze the tag usage distribution

-  SSI is limited for large dataset!

-  Location information is not considered!

-  Event Tag Detection (Spatial/Temporal information) [Chen, Roy 2009] -  Detect event tags from Flickr photos

-  As [Rattenbury et al. 2007] use SSI method to analyze the tag usage distribution

-  SSI is used over locational and spatial distributions simultaneously

23

Problem Definition

Hypothesis

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

24

Problem Definition

Hypothesis

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Something happening in a certain place at a certain time [Yang, Pierce, Carbonell 1999]

25

Problem Definition

Hypothesis

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Something happening in a certain place at a certain time [Yang, Pierce, Carbonell 1999]

Something happening in a certain place at a certain time with a certain tag

26

Problem Definition

Hypothesis

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Something happening in a certain place at a certain time [Yang, Pierce, Carbonell 1999]

Something happening in a certain place at a certain time with a certain tag

Event Cluster ej {tj=tj, dti=dtj, gi=gj, Ii,Ij ek }

27

Problem Definition

Hypothesis

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Something happening in a certain place at a certain time [Yang, Pierce, Carbonell 1999]

Something happening in a certain place at a certain time with a certain tag

Event Cluster ej {tj=tj, dti=dtj, gi=gj, Ii,Ij ek } Not the opposite !

28

Problem Definition

Hypothesis – Landmark clusters

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

time

colosseo!g

Location Event Cluster ek

{tj=tj, dti=dtj, gi=gj, Ii,Ij ek }

29

Problem Definition

Hypothesis – Landmark clusters

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

time

Location

colosseo!g

dt

Event Cluster ek

{tj=tj, dti=dtj, gi=gj, Ii,Ij ek }

30

Problem Definition

Hypothesis – Landmark clusters

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

time

Location

colosseo!g

dt

Event Cluster ek {tj=tj, dti=dtj, gi=gj, Ii,Ij ek } Not the opposite !

Event Cluster ek

Landmark Clusters

{tj=tj, dti=dtj, gi=gj, Ii,Ij ek }

31

Problem Definition

Hypothesis – Event clusters

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

time

Location

g

32

Problem Definition

Hypothesis – Event clusters

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

time

Location

g

dt

33

Problem Definition

Hypothesis – Event clusters

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

time

Location

g

dt

applepies!

34

Problem Definition

Hypothesis – Event clusters

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

time

Location

g

dt

applepies!

Event Cluster ek {tj=tj, dti=dtj, gi=gj, Ii,Ij ek }

Event Cluster ek

Landmark Clusters

Event Clusters

{tj=tj, dti=dtj, gi=gj, Ii,Ij ek }

35

Problem Definition

Hypothesis – Event clusters

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

time

Location

g

dt

applepies!

Event Cluster ek {tj=tj, dti=dtj, gi=gj, Ii,Ij ek } The opposite is true !

Event Cluster ek

Landmark Clusters

Event Clusters

{tj=tj, dti=dtj, gi=gj, Ii,Ij ek }

36

Problem Definition

Hypothesis – Event clusters

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

time

Location

g applepies!

Event Cluster ek {tj=tj, dti=dtj, gi=gj, Ii,Ij ek } The opposite is true !

{tj=tj, dti=dtj, gi=gj, Ii,Ij ek }

Event Cluster ek

Landmark Clusters

Event Clusters

37

Problem Definition

New Formulation

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

time

Location

g applepies!

time

g applepies!

dt

Event Cluster ek Event

Clusters

Location

=

Sdgt Sgt

38

Problem Definition

New Formulation

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

time

Location

g applepies!

time

g applepies!

dt

Event Cluster ek { (dt, g, t) : Sdgt = Sgt} with (Sdgt = ek)

Event Cluster ek Event

Clusters

{ (dt, g, t) : Sdgt = Sgt} with (Sdgt = ek)

Location

=

Sdgt Sgt

39

Preliminary

Suffix-Tree Clustering [Zamir 1998]

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

40

Preliminary

Suffix-Tree Clustering [Zamir 1998]

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Suffix-Tree based

41

Preliminary

Suffix-Tree Clustering [Zamir 1998]

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Suffix-Tree based -  Mainly used in text (web) document clustering

42

Preliminary

Suffix-Tree Clustering [Zamir 1998]

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Suffix-Tree based -  Mainly used in text (web) document clustering -  Three step process:

1  Document cleaning 2  Base clusters identification 3  Base clusters merging

43

Preliminary

Suffix-Tree Clustering [Zamir 1998]

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Suffix-Tree based -  Mainly used in text (web) document clustering -  Three step process:

1  Document cleaning 2  Base clusters identification 3  Base clusters merging

-  Incremental clustering

44

Preliminary

Suffix-Tree Clustering [Zamir 1998]

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Suffix-Tree based -  Mainly used in text (web) document clustering -  Three step process:

1  Document cleaning 2  Base clusters identification 3  Base clusters merging

-  Incremental clustering -  Cluster label inferred by the tree structure

45

Preliminary

Suffix-Tree Clustering [Zamir 1998]

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Suffix-Tree based -  Mainly used in text (web) document clustering -  Three step process:

1  Document cleaning 2  Base clusters identification 3  Base clusters merging

-  Incremental clustering -  Cluster label inferred by the tree structure -  Phrase-Based model

46

Preliminary

Suffix-Tree Clustering [Zamir 1998]

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Suffix-Tree based -  Mainly used in text (web) document clustering -  Three step process:

1  Document cleaning 2  Base clusters identification 3  Base clusters merging

-  Incremental clustering -  Cluster label inferred by the tree structure -  Phrase-Based model -  Snippet-tolerant

47

Preliminary

Suffix-Tree Clustering [Zamir 1998]

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Suffix-Tree based -  Mainly used in text (web) document clustering -  Three step process:

1  Document cleaning 2  Base clusters identification 3  Base clusters merging

-  Incremental clustering -  Cluster label inferred by the tree structure -  Phrase-Based model -  Snippet-tolerant -  Overlapped clusters

48

Preliminary

Suffix-Tree

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

49

Preliminary

Suffix-Tree

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Given a string S suffix-tree is a Compact Trie containing all the suffixes of S

-  Rooted directed tree -  Each internal node other than root has at least two children -  Each edge leaving a particular node is labelled with a non-empty

substring of S

50

Preliminary

Suffix-Tree

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Given a string S suffix-tree is a Compact Trie containing all the suffixes of S

-  Rooted directed tree -  Each internal node other than root has at least two children -  Each edge leaving a particular node is labelled with a non-empty

substring of S

Papua ‘apua’ ‘pua’ ‘ua’ ‘a’

51

Preliminary

Suffix-Tree

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Given a string S suffix-tree is a Compact Trie containing all the suffixes of S

-  Rooted directed tree -  Each internal node other than root has at least two children -  Each edge leaving a particular node is labelled with a non-empty

substring of S

-  Suffix-Tree construction performs in linear time (O(n)) ([Ukkonen 1995])

Papua ‘apua’ ‘pua’ ‘ua’ ‘a’

52

Algorithm Overview

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

53

Algorithm Overview

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Suffix Tree Construction

Event clusters extraction

Event Clusters merge

Data cleaning Data extension

… Primary!Party!Election!Campaign!

… Concert!Music!John! …

Ii = (T, g, dt)

54

Algorithm Overview Data Cleaning and Extension

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

55

Algorithm Overview Data Cleaning and Extension

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Cleaning: Ii = (T,g,dt) Ii’ = (T’,g,dt) -  Stopword removal (with extended vocabulary) + Stemming

56

Algorithm Overview Data Cleaning and Extension

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Cleaning: Ii = (T,g,dt) Ii’ = (T’,g,dt) -  Stopword removal (with extended vocabulary) + Stemming

-  Extension: Ii’ = (T’,g,dt) Ii’’ = (T’’,g,dt) -  Spatial and Temporal information are encoded in the annotation set T

57

Algorithm Overview Data Cleaning and Extension

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Cleaning: Ii = (T,g,dt) Ii’ = (T’,g,dt) -  Stopword removal (with extended vocabulary) + Stemming

-  Extension: Ii’ = (T’,g,dt) Ii’’ = (T’’,g,dt) -  Spatial and Temporal information are encoded in the annotation set T

T’’ = {t’’1, …, t’’l } t’’i = [s1(dt) + s2(g) + ti ]

where s1 and s2 encoding function from date/location to string

58

Algorithm Overview Data Cleaning and Extension

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Cleaning: Ii = (T,g,dt) Ii’ = (T’,g,dt) -  Stopword removal (with extended vocabulary) + Stemming

-  Extension: Ii’ = (T’,g,dt) Ii’’ = (T’’,g,dt) -  Spatial and Temporal information are encoded in the annotation set T

T’’ = {t’’1, …, t’’l } t’’i = [s1(dt) + s2(g) + ti ]

where s1 and s2 encoding function from date/location to string

s1 and s2 define the granularity in space (geographical grid) and time

59

Algorithm Overview Data Cleaning and Extension

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Cleaning: Ii = (T,g,dt) Ii’ = (T’,g,dt) -  Stopword removal (with extended vocabulary) + Stemming

-  Extension: Ii’ = (T’,g,dt) Ii’’ = (T’’,g,dt) -  Spatial and Temporal information are encoded in the annotation set T

T’’ = {t’’1, …, t’’l } t’’i = [s1(dt) + s2(g) + ti ]

where s1 and s2 encoding function from date/location to string

acmm2010 florence multimedia

26Oct2010 43.77:11.24 acmm2010 26Oct2010 43.77:11.24 florence 26Oct2010 43.77:11.24 multimedia

s1 and s2 define the granularity in space (geographical grid) and time

60

Algorithm Overview Data Cleaning and Extension

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Cleaning: Ii = (T,g,dt) Ii’ = (T’,g,dt) -  Stopword removal (with extended vocabulary) + Stemming

-  Extension: Ii’ = (T’,g,dt) Ii’’ = (T’’,g,dt) -  Spatial and Temporal information are encoded in the annotation set T

T’’ = {t’’1, …, t’’l } t’’i = [s1(dt) + s2(g) + ti ]

where s1 and s2 encoding function from date/location to string

acmm2010 florence multimedia

26Oct2010 43.77:11.24 acmm2010 26Oct2010 43.77:11.24 florence 26Oct2010 43.77:11.24 multimedia

s1 and s2 define the granularity in space (geographical grid) and time

s1(26/10/2010) s2(43.777864,11.249029)

T’ T’’

61

Algorithm Overview ST Construction and Event Extraction

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

62

Algorithm Overview ST Construction and Event Extraction

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Image Ii’’ : document snippet

Ψl

Ψ’l

Ii’’ = (T’’,g,dt) T’’ = {t’’1, …, t’’l } t’’i = [s1(dt) + s2(g) + ti ]

63

Algorithm Overview ST Construction and Event Extraction

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Image Ii’’ : document snippet

-  Extract Candidate event clusters Ψl : -  Ψl ([s1(dt) + s2(g) + ti ])

Ψl

Ψ’l

Ii’’ = (T’’,g,dt) T’’ = {t’’1, …, t’’l } t’’i = [s1(dt) + s2(g) + ti ]

64

Algorithm Overview ST Construction and Event Extraction

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Image Ii’’ : document snippet

-  Extract Candidate event clusters Ψl : -  Ψl ([s1(dt) + s2(g) + ti ])

Ψl

Ψ’l

Ii’’ = (T’’,g,dt) T’’ = {t’’1, …, t’’l } t’’i = [s1(dt) + s2(g) + ti ]

Event Cluster ek { (dt, g, t) : Sdgt = Sgt} with (Sdgt = ek)

65

Algorithm Overview ST Construction and Event Extraction

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Image Ii’’ : document snippet

-  Extract Candidate event clusters Ψl : -  Ψl ([s1(dt) + s2(g) + ti ])

-  Extract Ψ’l ([s2(g) + ti ])

Ψl

Ψ’l

Ii’’ = (T’’,g,dt) T’’ = {t’’1, …, t’’l } t’’i = [s1(dt) + s2(g) + ti ]

Event Cluster ek { (dt, g, t) : Sdgt = Sgt} with (Sdgt = ek)

66

Algorithm Overview ST Construction and Event Extraction

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Image Ii’’ : document snippet

-  Extract Candidate event clusters Ψl : -  Ψl ([s1(dt) + s2(g) + ti ])

-  Extract Ψ’l ([s2(g) + ti ]) -  Compare Ψl and Ψ’l

Ψl

Ψ’l

Ii’’ = (T’’,g,dt) T’’ = {t’’1, …, t’’l } t’’i = [s1(dt) + s2(g) + ti ]

Event Cluster ek { (dt, g, t) : Sdgt = Sgt} with (Sdgt = ek)

67

Algorithm Overview ST Construction and Event Extraction

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Image Ii’’ : document snippet

-  Extract Candidate event clusters Ψl : -  Ψl ([s1(dt) + s2(g) + ti ])

-  Extract Ψ’l ([s2(g) + ti ]) -  Compare Ψl and Ψ’l -  IF (Ψl = Ψ’l) Ψl ([s1(dt) + s2(g) + ti ]) is event cluster

Ψl

Ψ’l

Ii’’ = (T’’,g,dt) T’’ = {t’’1, …, t’’l } t’’i = [s1(dt) + s2(g) + ti ]

Event Cluster ek { (dt, g, t) : Sdgt = Sgt} with (Sdgt = ek)

68

Algorithm Overview ST Construction and Event Extraction

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Image Ii’’ : document snippet

-  Extract Candidate event clusters Ψl : -  Ψl ([s1(dt) + s2(g) + ti ])

-  Extract Ψ’l ([s2(g) + ti ]) -  Compare Ψl and Ψ’l -  IF (Ψl = Ψ’l) Ψl ([s1(dt) + s2(g) + ti ]) is event cluster -  Label inferred from the structure

Ψl

Ψ’l

Ii’’ = (T’’,g,dt) T’’ = {t’’1, …, t’’l } t’’i = [s1(dt) + s2(g) + ti ]

Event Cluster ek { (dt, g, t) : Sdgt = Sgt} with (Sdgt = ek)

69

Algorithm Overview Extraction and Merge

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Ψl

Ψ’l

70

Algorithm Overview Extraction and Merge

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Extracted event clusters : {e1, …,en}

Ψl

Ψ’l

71

Algorithm Overview Extraction and Merge

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Extracted event clusters : {e1, …,en} -  Merge semantically similar cluster:

Ψl

Ψ’l

72

Algorithm Overview Extraction and Merge

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Extracted event clusters : {e1, …,en} -  Merge semantically similar cluster:

Ψl

Ψ’l

θ(ei,e j ) =ei ∩ e jmin(ei,e j )

73

Evaluation - Dataset

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

74

Evaluation - Dataset

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Dataset collected from Flickr -  Only geo-tagged picture -  12 June 2008 – 11 June 2010 (729 days) -  San Francisco Area

#Images ~ 350K #Tags ~ 3M

75

Evaluation - Measure

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

76

Evaluation - Measure

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  List of ranked Clusters: {e1, e2, …}

77

Evaluation - Measure

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  List of ranked Clusters: {e1, e2, …} -  Ranking according to cluster's size: |ei|

78

Evaluation - Measure

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  List of ranked Clusters: {e1, e2, …} -  Ranking according to cluster's size: |ei| -  Drawback: lack of ground truth (recall measure)

79

Evaluation - Measure

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  List of ranked Clusters: {e1, e2, …} -  Ranking according to cluster's size: |ei| -  Drawback: lack of ground truth (recall measure)

Top-K Precision :

Rk

KRk : relevant clusters in the first k returned

80

Evaluation - Measure

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  List of ranked Clusters: {e1, e2, …} -  Ranking according to cluster's size: |ei| -  Drawback: lack of ground truth (recall measure)

Top-K Precision :

Rk

KRk : relevant clusters in the first k returned

Top-20 (K=20)

81

Evaluation

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Experiment on different granularity in time and space -  Time:

-  Space: Latitude Precision Longitude

Precision Square Size

(Meters)

0.01 0.01 1000m X 1000m

0.005 0.005 500m X 500m

0.002 0.002 200m X 200m

0.001 0.001 100m X 100m

1 day 1 week

Example 2008Oct12 2008:43

82

Evaluation - Results

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

100 m 200 m 500 m 1000 m

1 Day 1 Week 1 Day 1 Week 1 Day 1 Week 1 Day 1 Week

#Clusters #Ev. Prec. #Ev. Prec. #Ev. Prec. #Ev. Prec. #Ev. Prec. #Ev. Prec. #Ev. Prec. #Ev. Prec.

1 1 100% 1 100% 1 100% 1 100% 1 100% 1 100% 1 100% 1 100%

2 2 100% 2 100% 2 100% 2 100% 2 100% 2 100% 2 100% 1 50%

3 3 100% 3 100% 3 100% 3 100% 3 100% 3 100% 3 100% 2 67%

20 15 75% 14 70% 15 75% 14 70% 14 70% 13 65% 13 65% 14 70%

83

Evaluation - Results

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

Top-

20 p

reci

sion

84

Conclusion

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

85

Conclusion

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Novel algorithm for event cluster extraction: -  from large amount of Flickr images -  Multi-user photo collection -  Incremental clustering algorithm

86

Conclusion

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Novel algorithm for event cluster extraction: -  from large amount of Flickr images -  Multi-user photo collection -  Incremental clustering algorithm

-  Extension of STC previously used only to cluster text documents

87

Conclusion

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Novel algorithm for event cluster extraction: -  from large amount of Flickr images -  Multi-user photo collection -  Incremental clustering algorithm

-  Extension of STC previously used only to cluster text documents -  Based on a Suffix-Tree (construction O(n))

88

Conclusion

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Novel algorithm for event cluster extraction: -  from large amount of Flickr images -  Multi-user photo collection -  Incremental clustering algorithm

-  Extension of STC previously used only to cluster text documents -  Based on a Suffix-Tree (construction O(n)) -  Automatic annotation of clusters

89

Conclusion

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Novel algorithm for event cluster extraction: -  from large amount of Flickr images -  Multi-user photo collection -  Incremental clustering algorithm

-  Extension of STC previously used only to cluster text documents -  Based on a Suffix-Tree (construction O(n)) -  Automatic annotation of clusters -  Noise reduction in the tag using extended vocabulary for stopword

removal

90

Conclusion

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Novel algorithm for event cluster extraction: -  from large amount of Flickr images -  Multi-user photo collection -  Incremental clustering algorithm

-  Extension of STC previously used only to cluster text documents -  Based on a Suffix-Tree (construction O(n)) -  Automatic annotation of clusters -  Noise reduction in the tag using extended vocabulary for stopword

removal -  Spatial and Time information considered

91

Conclusion

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

-  Novel algorithm for event cluster extraction: -  from large amount of Flickr images -  Multi-user photo collection -  Incremental clustering algorithm

-  Extension of STC previously used only to cluster text documents -  Based on a Suffix-Tree (construction O(n)) -  Automatic annotation of clusters -  Noise reduction in the tag using extended vocabulary for stopword

removal -  Spatial and Time information considered -  Analysis of different granularity of time and space

92

Thanks ( ) for the attention!

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

谢谢

http://www.idi.ntnu.no/~ruocco/

93

Thanks ( ) for the attention!

QUESTIONS?

Massimiliano Ruocco – Event Cluster Detection on Flickr Images using a Suffix-Tree Structure – IEEE ISM2010

谢谢

http://www.idi.ntnu.no/~ruocco/