CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community...

Copyright©2014 NTT corp. All Rights Reserved.

CVPR2014 reading “Reconstructing storyline graphs for image recommendation from web community photos”

Akisato Kimura <[email protected]> [@_akisato]

1

1-page summary

• Creating a storyline graph from a set of photo sequences (and optionally friendship graphs) for a topic of interest.

• A photo sequence = list[ zip( photos, time stamps ) ], created by a single user.

• A storyline = a series of events with chronological or causal relations, represented by a directed graph.

2

Why not storylines? (1)

• Many topics of interest consist of a sequence of activities or events repeated across photo streams. Independence day = marathon race (1,2) + parades (3-6) +

barbeque + fireworks (8-9)

3

Why not storylines? (2)

• A storyline can characterize various branching narrative structure associated with the topic. A single photo stream = a linear thread of story by a user. Its aggregation reveals underlying big pictures.

4

Related work by the 1st author

CVPR14 oral

CVPR14

CVPR13 oral

WSDM13

KDD12

ECCV10

+ another line of research: WSDM14, CVPR12 oral, ICCV11 oral, NIPS09, CVPR08 oral

5

ECCV10 paper

Generating a sparse similarity network of web images & associated time stamps • The method is simple: connecting temporally close & visually

similar images • It reveals subtopic outbreaks and evolutions.

6

KDD12 paper

Modeling an image stream with a point process • This enables us to predict what images are likely to appear

at a future time point by extrapolating the image stream

7

WSDM13 paper

Modeling an image stream with point processes & developing a regularized multi-task regression • For retrieving relevant and temporally suitable images for a

given word, time point and optionally user information.

8

CVPR13 paper

Aligning and segmenting multiple web photo streams for inferring storylines

9

CVPR14 paper

• Creating a storyline graph from photo streams • Segmentation in CVPR13 seems redundant. • Image clustering might be sufficient for representing

subtopics, as shown in KDD12 & WSDM13 papers.

10

Another CVPR14 paper

A set of videos is useful for creating a storyline graph • Videos convey temporal smoothness between frames, which is

often missing in photo streams.

11

Problem definition

[ Input ] A set of photo streams The set of photo streams 𝑷𝑷 = 𝑃𝑃1, … ,𝑃𝑃𝐿𝐿

A photo stream 𝑃𝑃𝑙𝑙 = 𝑝𝑝1𝑙𝑙 , … ,𝑝𝑝𝐿𝐿𝑙𝑙𝑙𝑙 ,

taken by a single person within a period of time [0,𝑇𝑇]

A photo 𝑝𝑝𝑗𝑗𝑙𝑙 = (𝑥𝑥𝑗𝑗𝑙𝑙 , 𝑡𝑡𝑗𝑗𝑙𝑙) , a pair of an image descriptor and a time stamp.

[ Output ] A storyline graph The storyline graph 𝑮𝑮 = (𝑶𝑶,𝑬𝑬) Each node in 𝑶𝑶 = an image cluster. Edges 𝑬𝑬 = 𝑬𝑬𝑡𝑡 𝑡𝑡 smoothly change over time. Each edge 𝑬𝑬𝑡𝑡 is represented by an adjacency matrix 𝑨𝑨𝑡𝑡.

12

Storyline graphs in detail

• Why image clusters for nodes? Images are too many, much of them are redundant.

• Edges should be sparse and time-varying Time-varying: popular transitions smoothly change over

time

timeline

At 12PM

At 7PM t = 10AM t = 12PM t = 2PM

13

Image encoding

4 different image (global) descriptors • [SIFT] 3-level spatial pyramid histograms for HSV color SIFT • [HOG2x2] 3-level spatial pyramid histograms for HOG. • [Tiny] 32x32 TinyImages. • [Scene] SUN397 detector outputs.

Constructing image clusters by K-means (K=600) + assigning 𝑐𝑐-NN clusters with Gaussian weighting

• In the case of [Scene], top-𝑐𝑐 detector outputs are used.

• Each descriptor 𝑥𝑥𝑗𝑗𝑙𝑙 has at most 4𝑐𝑐 non-zero components.

14

Modeling photo streams

Introducing several practical assumptions

All the photo streams are taken independently of one another.

Every photo stream obeys 1st-order Markovians.

𝑓𝑓 𝒙𝒙𝑗𝑗𝑙𝑙 , 𝑡𝑡𝑗𝑗𝑙𝑙 𝒙𝒙𝑗𝑗−1𝑙𝑙 , 𝑡𝑡𝑗𝑗−1𝑙𝑙 = �𝑓𝑓(𝑥𝑥𝑗𝑗,𝑑𝑑𝑙𝑙 , 𝑡𝑡𝑗𝑗𝑙𝑙|𝒙𝒙𝑗𝑗−1𝑙𝑙 , 𝑡𝑡𝑗𝑗−1𝑙𝑙 )

𝐷𝐷

𝑑𝑑=1

All the elements in a descriptor are conditionally independent one another given the previous descriptor.

15

Modeling a storyline

A simple linear model for

Encoding temporal transitions into 𝑨𝑨𝑒𝑒

The log likelihood (for stationary A)

To be minimized

16

Optimization

A simple least squares if 𝑨𝑨𝑡𝑡 is time-independent.

Introducing neighborhood selection [Meinshausen+ 2006]

Plus 𝑙𝑙1-regularization Gaussian kernel for 𝑡𝑡𝑖𝑖 centered at 𝑡𝑡

Introducing sparsity into 𝑨𝑨𝑡𝑡

17

Incorporating additional information

Strategy : introducing a product kernel 1. Original = neighborhood selection

2. If you’d customize the graph for a particular user 𝑢𝑢𝑞𝑞

3. If you’d introduce seasonal trends

𝑠𝑠𝑞𝑞 = 𝑠𝑠(𝑚𝑚𝑞𝑞) : A function of months to seasons

18

Image recommendation with storylines

2 typical tasks for sequential image prediction 1. Given an image sequence, predict K next likely images 2. Given two parts of temporary distant image sequences,

estimate the most likely path between them

A state space model would be helpful for those tasks (remember, )

1. Applying the forward algorithm. 2. Exploiting the forward-backward algorithm with EM.

1. 2.

19

Experiments

1. Evaluating reconstructed storyline graphs via user studies with AMT.

2. Quantitatively comparing the performance for the 2 types of image prediction tasks. a. Predicting next likely images. b. Filling in missing parts of a photo stream.

[Baseline] 1. PageRank-based image retrieval (details missing) 2. HMM for modeling photo sequences 3. Clustering-based summarization

20

Dataset

3.3M Flickr images of 42K photo streams for 24 classes

The friendship graph was indirectly built from group information (The edge weight indicates the number of groups that both users join together).

21

Scheme for evaluations

[ Basic idea ] Let each turker to compare tuples of images representing the storyline graphs. 1. Each algorithm generates storyline per topic. 2. Sample 100 standard images as test instances. 3. Each algorithm predicts next most-likely image after the test

instance. 4. [ Turker task (>3 turners per test image)]

✔ Our method

Baseline 2 Test image

B

A A crowd of human subjects evaluate only a basic unit (i.e. important edge of storyline).

22

Evaluating storyline graphs

Better than baselines (HMM, PageRank & Clustering).

𝐼𝐼𝑞𝑞 𝐼𝐼𝑒𝑒 𝐼𝐼𝑞𝑞 𝐼𝐼𝑒𝑒 𝐼𝐼𝑞𝑞 𝐼𝐼𝑒𝑒

[66.5, 67.5, 69.4] over (HMM), (Page), (Clust)

23

Setting: Image prediction tasks (1)

• The “future prediction” task

Method 1 estimates

Hidden Groundtruth

23

Procedures Training (80%)

Build storyline graph

Task (I): Given a short sequence of test PS, predict next likely images

Measure similarity!

? ? ? ? ?

24

Setting: Image prediction tasks (2)

• The “filling in gaps” task

Procedures Training (80%)

Build storyline graph

Method 1 estimates

Hidden GT

24

Task (II): Given a pair of distant sequences, fill in missing parts

? ? ? ? ?

Measure similarity!

25

Performance measured by PSNR

Future prediction - Personalized

Future prediction - Normal [9.60, 8.99, 8.86, 8.75]

[9.53, 9.01, 8.85, 8.75]

Filling in gaps - Personalized

Filling in gaps - Normal [9.70, 8.97, 8.89, 8.96]

[9.57, 9.05, 8.87, 8.93]

26

Qualitative evaluations

CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community...

Technology

Transcript of CVPR2014 reading "Reconstructing storyline graphs for image recommendation from web community...