Active Frame Selection for Label Propagation in Videos Sudheendra Vijayanarasimhan and Kristen...

Active Frame Selection for Label Propagation in VideosSudheendra Vijayanarasimhan and Kristen Grauman

Department of Computer Science, University of Texas at Austin

Motivation

Main Idea

Estimating Expected Label Propagation Error

Results

Video Label Propagation

Active Frame Selection

Manually labeling objects in video is tedious and expensive, yet such annotations are valuable for object and activity recognition.

Existing methods for interactive labeling•Propagate labels from an arbitrarily selected frame, and/or•Assume a human will intervene repeatedly to correct errors

Our active approach outperforms the baselines for all values of k, and saves hours of manual effort per video, if cost to correct errors is proportional to number of mislabeled pixels.

Error in terms of average number of mislabeled pixels, in hundreds of pixels

Our error predictions in C follow the actual

errors closely.

In this case, our method automatically selects frames with high resolution information of most of the objects.

Total annotation timeAccuracy per frame, sorted from high to low

Segtrackk = 5

Datasets: Camseq01: 101 frames of a moving driving scene, Camvid seq05: 3000 frames of a driving scene, Labelme 8126: 167 frames of a traffic signal, Segtrack: 6 videos with moving objects.

Baselines• Uniform-f: samples frames uniformly and transfers labels forward• Uniform: samples frames uniformly and transfers labels in both directions.• Keyframe: selects frames with k-way spectral clustering on Gist features.

1 i

…

nn-1

b

…

i+1 i+2

Case 1: 1-way end.n > i

1 2 i = n

…

i-1

b = 1

Case 2: 1-way beg. b = 1 and n = i

Pixel Flow + MRF Label PropagationEnhance flow model with space-time Markov Random Field:• Infer label maps that are smooth in space and time• Exploit object appearance models defined by labeled frames.

We explicitly model the probability that pixel p in frame t will be mislabeled if we were to obtain its label from frame t+1:

, where

Distances use flow to estimate errors due to boundaries, occlusions, and when pixels change in appearance, or enter/leave the frame:

Appearance Motion

If more than one frame separates the labeled frame rt and current frame t, we compute the accumulated error recursively (and analogously for lt):

Identify the k frames which, if labeled, would propagate to the rest of the video with minimal expected error.

Propagate labels to all other frames

…

Actively select k informative frames

Segment and label selected frames

Highlights of our approach• Annotate all objects in a video with minimal manual effort.• Jointly select k most useful frames via predicted “trackability”• Efficient dynamic programming solution

Pixel Flow Label PropagationUse dense optical flow to track each pixel in both the forward

and backward directions, until it reaches the closest labeled frame on either side.

flow fwd

label prop back

label prop fwd

flow back

… …

Occlusion

To segment an N-frame video, there are two sources of manual effort cost:1. the cost of fully labeling a frame from scratch, denoted Cl 2. the cost of correcting errors by propagation, denoted Cc.

Our approach yields higher accuracy, especially for frames far from labeled frames. It reduces effort better than the baselines, and can also predict the optimal number of frames to have labeled, k*.

Errors and time saved

Example of actively selected frames

1 2

…

i

…

nn-1

…

Sequence frame index:

Selected frame index:

bb-1

…

b-2…

Dynamic programming solution

Let be the optimal value of for selecting b frames from the first n frames, where i denotes the index of the b-th selected frame.

For a given k, we show how to obtain the optimal value:

in time, compared to for a naïve exhaustive search.

that minimizes expected effort:

where

ObjectiveWe want

1 i = n

bb-1

… … …

j m i-1j+1

Case 3: Both ways.b > 1 and n = i

Let the N x N matrix C record the frame-to-frame predicted errors:

Active Frame Selection for Label Propagation in Videos Sudheendra Vijayanarasimhan and Kristen...

Documents

Transcript of Active Frame Selection for Label Propagation in Videos Sudheendra Vijayanarasimhan and Kristen...