SED2012 Dataset

Post on 11-May-2015

513 views 3 download

Tags:

description

Presentation of the SED2012 dataset @ MMSys 2013, Oslo, Norway

Transcript of SED2012 Dataset

The 2012 Social Event Detection DatasetSymeon Papadopoulos1, Emmanouil Schinas1, Vasileios Mezaris1, Raphaël Troncy2, Yiannis Kompatsiaris1

1 CERTH-ITI, Thessaloniki, Greece2 EURECOM, Sophia Antipolis, France

Oslo, 28 Feb - 1 Mar 2013

2

SED2012 Overview

• Large collection (>160K) of CC-licensed Flickr photos and some of their metadata

• Event annotations for 149 target events (of specific categories and locations of interest)

• Primary use: Social event detection– Used in the context of MediaEval 2012 (SED task)

• Secondary uses: image geotagging, distractors in CBIR, city summarization

3

Dataset Overview

Flickr photo collection• 167,332 photos• 4,422 unique contributors• Creative Commons licenses

Event Annotations• Challenge 1: Technical events in Germany• Challenge 2: Soccer events in Hamburg and Madrid• Challenge 3: Indignados movement events in Madrid

4

Data Collection Process

• Flickr API: http://www.flickr.com/services/api/• Used method flickr.photo.search with five

geographical centres: Barcelona, Cologne, Hamburg, Hannover, Madrid

• Time period: Jan 2009 – Dec 2011• All photos CC licensed• 403 photos from the

EventMedia collectionR. Troncy, B. Malocha, and A. Fialho. Linking Events with Media. In 6th Intern. Conference on Semantic Systems (I-SEMANTICS), Graz, Austria, 2010

5

Photo Distribution

Place distribution

Yearly distribution

Language distribution

6

Dataset Collection MotivationSelection of five cities (three German, two Spanish):• Include large number of non-English text metadata (cf.

language distribution table)• Ensure existence of numerous events for the target types • Include distractor images:

– Challenge 2: Cologne, Hannover distractor for Hamburg, Barcelona distractor for Madrid

– Challenge 3: Barcelona distractor for Madrid

Selection of only geotagged photos:• Ease of annotation

Selection of only CC-licensed photos:• Reuse of collection for research

7

Tag Statistics (1/2)

51,611 unique tags

prevalence of location specific tags

event-specific tags

number of users using the tag

8

Tag Statistics (2/2)

barcelonaspain

madrid>20K photos have no tags

83.9% less than or equal to 10 tags >40K tags appear less than 10 times

>57% of tags appear once or twice

9

User Statistics

30 most active users contribute ~30% of dataset

60% of users less than 10 photos

10

Ground Truth Creation• Manual annotations by use of CrEve

– web-based annotation– two-round annotation by five annotators (three in the

first, two in the second)– interactive annotation (search & annotate)– each round terminated as soon as no new event-related

photos discovered– approximate effort: 100 person-hours

• Annotations for Challenge 1 enriched by EventMedia (403 photos featuring technical events in Germany)

C. Zigkolis, S. Papadopoulos, G. Filippou, Y. Kompatsiaris, A. Vakali. Collaborative Event Annotation in Tagged Photo Collections. Multimedia Tools & Applications, 2012

11

Ground Truth Statistics (1/3)

10 events related with >100 photos

~27% of events associated with 1 or 2 photos

12

Ground Truth Statistics (2/3)106 events are captured by single users

9 events captured by more than 10 people

erroneous timestamps in photos

The majority of events last for less than a day (typical for soccer)

13

Ground Truth Statistics (3/3)Madrid events

Vicente Calderon stadium

Puerta del SolSantiago Bernabeu stadium

Stadium of Butarque

14

Technical Event ExamplesPHP Unconf. 2010 Gamescom 2009

CeBIT 2010 Convention Camp 2011

15

Soccer Event ExamplesReal Madrid – Milan (2010) World Cup 2010

St. Pauli – HSV (2010) Spain – Colombia (2011)

16

Indignados Event ExamplesInaugural march, 15 May Large gathering, 20 May

Gathering, 15 Oct Demonstration, 17 Nov

17

Evaluation• F-measure (macro), Precision, Recall

– goodness of retrieved photos, but not how well they were clustered into events

• Normalized Mutual Information (NMI)– compares automatically extracted clustering of

photos into events with the ground truth• Evaluation script is made available together

with the dataset.• Implementation of event detection available:

http://mklab.iti.gr/project/sed2012_certh

Questions

@sympapadopoulos www.slideshare.net/sympapadopoulos