Stanford Artificial Intelligence Laboratoryai.stanford.edu/~olga/posters/cvpr10-poster.pdf · 2010....

1
Sliding window detection is inherently slow because of the large number of windows to classify ... ... especially with many object classes to detect Goal: use segmentation to propose windows likely to contain objects to avoid exhaustive search However: (1) Segmentation takes time (2) Different segmentations work for different objects Want to amortize cost of segmentation across object classes Detection of 9 indoor objects (LabelMe [4] + Stanford office scenes) Compared to sliding window, we classify two orders of magnitude fewer windows and obtain a 10x runtime speedup while maintaining the same detection accuracy. Detection of 2 outdoor objects (StreetScenes [5]) Compared to sliding window, we classify 60.4x fewer windows and obtain a 15.2x runtime speedup. A Steiner tree approach to efficient object detection Olga Russakovsky and Andrew Y. Ng Computer Science Department, Stanford University Unsupervised segmentation into superpixels [1] Smoothing parameter (s): Segmentation threshold (k): Minimum segment size (m): Generating rectangular windows from superpixels Bounding box parameter (b): Trimming parameter (p): How to choose pipeline parameters that work well for detecting all object classes of interest? PIPELINE FOR PROPOSING REGIONS Given: A directed graph G = (E, V) with costs c(E) on the edges, a set of Steiner nodes S ⊆ V , and a root node r ∈ V . Find: Minimum cost tree rooted at r that spans all vertices in S . . This is NP-hard but can be efficiently approximated [2]. Claim: On the graph G below, a minimum cost Steiner tree corresponds to the set of parameters that minimize the overall computational cost while achieving the desired detection performance for each object class. DIRECTED STEINER TREE FORMULATION SUMMARY (1) Present a classifier-agnostic approach to speeding up sliding window object detection (2) Efficiently detect multiple object classes within a scene using a novel Steiner tree formulation for parameter selection PROPOSING WINDOWS FOR DETECTION EXPERIMENTAL RESULTS REFERENCES [1] P. F. Felzenszwalb and D. P. Huttenlocher. Efficient Graph-Based Image Segmentation. International Journal of Computer Vision, vol. 59, no. 2, September 2004 [2] M. Charikar, C. Chekuri, T. Cheung, Z. Dai, A. Goel, S. Guha, Ming Li. Approximation Algorithms for Directed Steiner Problems. Symposium on Discrete Algorithms. 1998. [3] A. Torralba, K. P. Murphy and W. T. Freeman. Sharing visual features for multiclass and multiview object detection. IEEE Trans.: PAMI, vol. 29, no. 5, May 2007. [4] B. C. Russell, A. Torralba, K. P. Murphy, W. T. Freeman, LabelMe: a database and web-based tool for image annotation. International Journal of Computer Vision, vol. 77, no. 1-3, May, 2008. [5] S. M. Bileschi. StreetScenes: towards scene understanding in still images. PhD Thesis, 2006. Original image Smoothed image Small k (for detecting small objects) Large k (for detecting large objects) Small m (retains many small segments) Large m (merges small segments into bigger regions) OUTLINE Learning: (1) Train a binary classifier for each object class (2) On a set of training scenes Propose multiple sets of candidate windows, controlled by parameters s, k, m, b, p Evaluate how good each set of windows is for detecting each object of interest (3) Construct the Steiner graph and find the optimal parameters to use for each object class Recognition: (1) Segment the scene using parameters chosen for each object class, reusing computation whenever possible (2) Run the object classifiers only on the generated windows Method Time per object (s) Number of windows Avg. detection accuracy (AUC) Sliding windows 18.85 52398 0.443 OPT 4.62 1685 0.489 5% OPT 2.43 917 0.462 10% OPT 1.72 570 0.446 20% OPT 1.29 394 0.421 Edge exists if the set of windows generated by the path path results in adequate object detection performance Steiner nodes : : each corresponds to an object class to be detected Every path to a 5 th level node defines a set of parameters which can be used by the object detection pipeline to generate a set of windows Root: original image Branching factor: number of parameter settings at each step of pipeline Object detection pipeline Weight on edge: processing time of this step in the pipeline We chose (for each object class independently) the parameter setting that gives the optimal detection performance on the training set We consider any parameter setting that performs within 5%, 10% and 20% respectively of this optimum More edges in the Steiner graph ... Great for ski-boots, bad for mugs Good for mugs, bad for ski-boots 5 strategies: (1) each segment, (2) all adjacent pairs of segments (3) each segment and all its neighbors, and all (4) vertical and (5) horizontal triples of segments. All combinations are considered, thus 2 5 - 1 = 31 possible values of parameter b Segments Around all adjacent pairs of segments Low p (less trimming) High p (aggressively trim sparse edges) Around each segment The detector of [3] is optimized for dense scanning of regions as in the sliding windows scenario; thus, an 100x reduction in the number of windows scanned yields only a 10x reduction in running time in our implementation.

Transcript of Stanford Artificial Intelligence Laboratoryai.stanford.edu/~olga/posters/cvpr10-poster.pdf · 2010....

Page 1: Stanford Artificial Intelligence Laboratoryai.stanford.edu/~olga/posters/cvpr10-poster.pdf · 2010. 6. 10. · Author: Olga Russakovsky Created Date: 6/10/2010 2:17:13 PM

Sliding window detection is inherently slow because of the large number of windows to classify ...

... especially with many object classes to detect

Goal: use segmentation to propose windows likely to containobjects to avoid exhaustive search

However:(1) Segmentation takes time(2) Different segmentations work for different objects

Want to amortize cost of segmentation across object classes

Detection of 9 indoor objects (LabelMe [4] + Stanford office scenes)

Compared to sliding window, we classify two orders ofmagnitude fewer windows and obtain a 10x runtime speedup while maintaining the same detection accuracy.

Detection of 2 outdoor objects (StreetScenes [5])

Compared to sliding window, we classify 60.4x fewer windows and obtain a 15.2x runtime speedup.

A Steiner tree approach to efficient object detectionOlga Russakovsky and Andrew Y. Ng Computer Science Department, Stanford University

Unsupervised segmentation into superpixels [1]

Smoothing parameter (s):

Segmentation threshold (k):

Minimum segment size (m):

Generating rectangular windows from superpixels

Bounding box parameter (b):

Trimming parameter (p):

How to choose pipeline parameters that work wellfor detecting all object classes of interest?

PIPELINE FOR PROPOSING REGIONS

Given: A directed graph G = (E, V) with costs c(E) on the edges, a set of Steiner nodes S ⊆ V , and a root node r ∈ V .

Find:

Minimum cost tree rooted at r that spans all vertices in S.. This is NP-hard but can be efficiently approximated [2].

Claim:On the graph G below, a minimum cost Steiner treecorresponds to the set of parameters that minimize theoverall computational cost while achieving the desireddetection performance for each object class.

DIRECTED STEINER TREE FORMULATIONSUMMARY

(1) Present a classifier-agnostic approach to speeding up sliding window object detection

(2) Efficiently detect multiple object classes within a scene using a novel Steiner tree formulation for parameter selection

PROPOSING WINDOWS FOR DETECTION

EXPERIMENTAL RESULTS

REFERENCES

[1] P. F. Felzenszwalb and D. P. Huttenlocher. Efficient Graph-Based Image Segmentation. International Journal of Computer Vision, vol. 59, no. 2, September 2004

[2] M. Charikar, C. Chekuri, T. Cheung, Z. Dai, A. Goel, S. Guha, Ming Li. Approximation Algorithms for Directed Steiner Problems. Symposium on Discrete Algorithms. 1998.

[3] A. Torralba, K. P. Murphy and W. T. Freeman. Sharing visual features for multiclass and multiview object detection. IEEE Trans.: PAMI, vol. 29, no. 5, May 2007.

[4] B. C. Russell, A. Torralba, K. P. Murphy, W. T. Freeman, LabelMe: a database and web-based tool for image annotation. International Journal of Computer Vision, vol. 77, no. 1-3, May, 2008.

[5] S. M. Bileschi. StreetScenes: towards scene understanding in still images. PhD Thesis, 2006.

Originalimage

Smoothedimage

Small k(for detecting small objects)

Large k(for detecting large objects)

Small m (retains

many small segments)

Large m(merges small segments into bigger regions)

OUTLINE

Learning:(1) Train a binary classifier for each object class(2) On a set of training scenes

● Propose multiple sets of candidate windows, controlled by parameters s, k, m, b, p

● Evaluate how good each set of windows is for detecting each object of interest

(3) Construct the Steiner graph and find the optimal parameters to use for each object class

Recognition:(1) Segment the scene using parameters chosen for each object class, reusing computation whenever possible(2) Run the object classifiers only on the generated windows

Method Time perobject (s)

Number ofwindows

Avg. detectionaccuracy (AUC)

Sliding windows 18.85 52398 0.443OPT 4.62 1685 0.489

5% OPT 2.43 917 0.46210% OPT 1.72 570 0.44620% OPT 1.29 394 0.421

Edge exists ifthe set of windows

generated by the path path results in adequate

object detection

performanceSteiner nodes:: each corresponds to an object class to be detected

Every path to a 5th level

node defines a set of

parameters which can be used by the

object detection pipeline to

generate a set of windows

Root: original imageBranching

factor:number of parameter settings at each step of pipeline

Ob

ject

dete

cti

on

pip

elin

e

Weight on edge:

processing time of this step in the

pipeline

We chose (for each object class independently) the parameter setting

that gives the optimal detection performance on the training set

We consider any parameter setting that performs within

5%, 10% and 20% respectively of this optimum

More edges in

the Steiner graph

...

Great for ski-boots, bad for mugs

Good for mugs, bad for ski-boots

5 strategies: (1) each segment, (2) all adjacent pairs of segments(3) each segment and all its neighbors, andall (4) vertical and (5) horizontal triples of segments.

All combinations are considered, thus 25 - 1 = 31 possible values of parameter b

Segments

Around all adjacent pairs of segments

Low p(less trimming)

High p (aggressively trim sparse edges)

Around each segment

The detector of [3] is optimized for dense scanning of regions as in the sliding windows scenario; thus, an 100x reduction in the number of windows scanned

yields only a 10x reduction in running time in our implementation.