CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community...

27
Copyright©2014 NTT corp. All Rights Reserved. CVPR2014 reading “Reconstructing storyline graphs for image recommendation from web community photos” Akisato Kimura <[email protected]> [@_akisato]

description

Introducing a paper presented in CVPR2014. This paper focused on predicting causality relations between photos from web photo streams.

Transcript of CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community...

Page 1: CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community photos"

Copyright©2014 NTT corp. All Rights Reserved.

CVPR2014 reading “Reconstructing storyline graphs for image recommendation from web community photos”

Akisato Kimura <[email protected]> [@_akisato]

Page 2: CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community photos"

1

1-page summary

• Creating a storyline graph from a set of photo sequences (and optionally friendship graphs) for a topic of interest.

• A photo sequence = list[ zip( photos, time stamps ) ], created by a single user.

• A storyline = a series of events with chronological or causal relations, represented by a directed graph.

Page 3: CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community photos"

2

Why not storylines? (1)

• Many topics of interest consist of a sequence of activities or events repeated across photo streams. Independence day = marathon race (1,2) + parades (3-6) +

barbeque + fireworks (8-9)

Page 4: CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community photos"

3

Why not storylines? (2)

• A storyline can characterize various branching narrative structure associated with the topic. A single photo stream = a linear thread of story by a user. Its aggregation reveals underlying big pictures.

Page 5: CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community photos"

4

Related work by the 1st author

CVPR14 oral

CVPR14

CVPR13 oral

WSDM13

KDD12

ECCV10

+ another line of research: WSDM14, CVPR12 oral, ICCV11 oral, NIPS09, CVPR08 oral

Page 6: CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community photos"

5

ECCV10 paper

Generating a sparse similarity network of web images & associated time stamps • The method is simple: connecting temporally close & visually

similar images • It reveals subtopic outbreaks and evolutions.

Page 7: CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community photos"

6

KDD12 paper

Modeling an image stream with a point process • This enables us to predict what images are likely to appear

at a future time point by extrapolating the image stream

Page 8: CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community photos"

7

WSDM13 paper

Modeling an image stream with point processes & developing a regularized multi-task regression • For retrieving relevant and temporally suitable images for a

given word, time point and optionally user information.

Page 9: CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community photos"

8

CVPR13 paper

Aligning and segmenting multiple web photo streams for inferring storylines

Page 10: CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community photos"

9

CVPR14 paper

• Creating a storyline graph from photo streams • Segmentation in CVPR13 seems redundant. • Image clustering might be sufficient for representing

subtopics, as shown in KDD12 & WSDM13 papers.

Page 11: CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community photos"

10

Another CVPR14 paper

A set of videos is useful for creating a storyline graph • Videos convey temporal smoothness between frames, which is

often missing in photo streams.

Page 12: CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community photos"

11

Problem definition

[ Input ] A set of photo streams The set of photo streams 𝑷𝑷 = 𝑃𝑃1, … ,𝑃𝑃𝐿𝐿

A photo stream 𝑃𝑃𝑙𝑙 = 𝑝𝑝1𝑙𝑙 , … ,𝑝𝑝𝐿𝐿𝑙𝑙𝑙𝑙 ,

taken by a single person within a period of time [0,𝑇𝑇]

A photo 𝑝𝑝𝑗𝑗𝑙𝑙 = (𝑥𝑥𝑗𝑗𝑙𝑙 , 𝑡𝑡𝑗𝑗𝑙𝑙) , a pair of an image descriptor and a time stamp.

[ Output ] A storyline graph The storyline graph 𝑮𝑮 = (𝑶𝑶,𝑬𝑬) Each node in 𝑶𝑶 = an image cluster. Edges 𝑬𝑬 = 𝑬𝑬𝑡𝑡 𝑡𝑡 smoothly change over time. Each edge 𝑬𝑬𝑡𝑡 is represented by an adjacency matrix 𝑨𝑨𝑡𝑡.

Page 13: CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community photos"

12

Storyline graphs in detail

• Why image clusters for nodes? Images are too many, much of them are redundant.

• Edges should be sparse and time-varying Time-varying: popular transitions smoothly change over

time

timeline

At 12PM

At 7PM t = 10AM t = 12PM t = 2PM

Page 14: CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community photos"

13

Image encoding

4 different image (global) descriptors • [SIFT] 3-level spatial pyramid histograms for HSV color SIFT • [HOG2x2] 3-level spatial pyramid histograms for HOG. • [Tiny] 32x32 TinyImages. • [Scene] SUN397 detector outputs.

Constructing image clusters by K-means (K=600) + assigning 𝑐𝑐-NN clusters with Gaussian weighting

• In the case of [Scene], top-𝑐𝑐 detector outputs are used.

• Each descriptor 𝑥𝑥𝑗𝑗𝑙𝑙 has at most 4𝑐𝑐 non-zero components.

Page 15: CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community photos"

14

Modeling photo streams

Introducing several practical assumptions

All the photo streams are taken independently of one another.

Every photo stream obeys 1st-order Markovians.

𝑓𝑓 𝒙𝒙𝑗𝑗𝑙𝑙 , 𝑡𝑡𝑗𝑗𝑙𝑙 𝒙𝒙𝑗𝑗−1𝑙𝑙 , 𝑡𝑡𝑗𝑗−1𝑙𝑙 = �𝑓𝑓(𝑥𝑥𝑗𝑗,𝑑𝑑𝑙𝑙 , 𝑡𝑡𝑗𝑗𝑙𝑙|𝒙𝒙𝑗𝑗−1𝑙𝑙 , 𝑡𝑡𝑗𝑗−1𝑙𝑙 )

𝐷𝐷

𝑑𝑑=1

All the elements in a descriptor are conditionally independent one another given the previous descriptor.

Page 16: CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community photos"

15

Modeling a storyline

A simple linear model for

Encoding temporal transitions into 𝑨𝑨𝑒𝑒

The log likelihood (for stationary A)

To be minimized

Page 17: CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community photos"

16

Optimization

A simple least squares if 𝑨𝑨𝑡𝑡 is time-independent.

Introducing neighborhood selection [Meinshausen+ 2006]

Plus 𝑙𝑙1-regularization Gaussian kernel for 𝑡𝑡𝑖𝑖 centered at 𝑡𝑡

Introducing sparsity into 𝑨𝑨𝑡𝑡

Page 18: CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community photos"

17

Incorporating additional information

Strategy : introducing a product kernel 1. Original = neighborhood selection

2. If you’d customize the graph for a particular user 𝑢𝑢𝑞𝑞

3. If you’d introduce seasonal trends

𝑠𝑠𝑞𝑞 = 𝑠𝑠(𝑚𝑚𝑞𝑞) : A function of months to seasons

Page 19: CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community photos"

18

Image recommendation with storylines

2 typical tasks for sequential image prediction 1. Given an image sequence, predict K next likely images 2. Given two parts of temporary distant image sequences,

estimate the most likely path between them

A state space model would be helpful for those tasks (remember, )

1. Applying the forward algorithm. 2. Exploiting the forward-backward algorithm with EM.

1. 2.

Page 20: CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community photos"

19

Experiments

1. Evaluating reconstructed storyline graphs via user studies with AMT.

2. Quantitatively comparing the performance for the 2 types of image prediction tasks. a. Predicting next likely images. b. Filling in missing parts of a photo stream.

[Baseline] 1. PageRank-based image retrieval (details missing) 2. HMM for modeling photo sequences 3. Clustering-based summarization

Page 21: CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community photos"

20

Dataset

3.3M Flickr images of 42K photo streams for 24 classes

The friendship graph was indirectly built from group information (The edge weight indicates the number of groups that both users join together).

Page 22: CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community photos"

21

Scheme for evaluations

[ Basic idea ] Let each turker to compare tuples of images representing the storyline graphs. 1. Each algorithm generates storyline per topic. 2. Sample 100 standard images as test instances. 3. Each algorithm predicts next most-likely image after the test

instance. 4. [ Turker task (>3 turners per test image)]

✔ Our method

Baseline 2 Test image

B

A A crowd of human subjects evaluate only a basic unit (i.e. important edge of storyline).

Page 23: CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community photos"

22

Evaluating storyline graphs

Better than baselines (HMM, PageRank & Clustering).

𝐼𝐼𝑞𝑞 𝐼𝐼𝑒𝑒 𝐼𝐼𝑞𝑞 𝐼𝐼𝑒𝑒 𝐼𝐼𝑞𝑞 𝐼𝐼𝑒𝑒

[66.5, 67.5, 69.4] over (HMM), (Page), (Clust)

Page 24: CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community photos"

23

Setting: Image prediction tasks (1)

• The “future prediction” task

Method 1 estimates

Hidden Groundtruth

23

Procedures Training (80%)

Build storyline graph

Task (I): Given a short sequence of test PS, predict next likely images

Measure similarity!

? ? ? ? ?

Page 25: CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community photos"

24

Setting: Image prediction tasks (2)

• The “filling in gaps” task

Procedures Training (80%)

Build storyline graph

Method 1 estimates

Hidden GT

24

Task (II): Given a pair of distant sequences, fill in missing parts

? ? ? ? ?

Measure similarity!

Page 26: CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community photos"

25

Performance measured by PSNR

Future prediction - Personalized

Future prediction - Normal [9.60, 8.99, 8.86, 8.75]

[9.53, 9.01, 8.85, 8.75]

Filling in gaps - Personalized

Filling in gaps - Normal [9.70, 8.97, 8.89, 8.96]

[9.57, 9.05, 8.87, 8.93]

Page 27: CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community photos"

26

Qualitative evaluations