Learning to Segment from Diverse Data

Learning to Segment from Diverse Data

M. Pawan Kumar

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Daphne KollerHaithem Turki Dan Preston

AimLearn accurate parameters for a segmentation model

- Segmentation without generic foreground or background classes

- Train using both strongly and weakly supervised data

Data in Vision“Strong” Supervision

“Car”

“Weak” Supervision

“One hand tied behind the back…. “

Data for Vision

“Car”

“Strong” Supervision “Weak” Supervision

Types of DataSpecific foreground classes, generic background class

PASCALVOC

SegmentationDatasets

Types of DataSpecific background classes, generic foreground class

StanfordBackground

Dataset

Types of DataBounding boxes for objects

PASCAL VOC Detection Datasets

Thousands of freely available images

Current methods only use small, controlled datasets

Types of DataImage-level labels

ImageNet, Caltech …


“Car”

Types of DataNoisy data from web search

Google Image, Flickr, Picasa …..

Millions of freely available images

Outline• Region-based Segmentation Model

• Problem Formulation

• Inference

• Results

Region-based Segmentation Model

ObjectModels

Pixels

Regions



• Inference

• Results

Problem FormulationTreat missing information as latent variables

Joint FeatureVector

Image x Annotation y Complete Annotation (y,h)

Region featuresDetection features

Pairwise contrastPairwise context

(x,y,h)

Problem FormulationTreat missing information as latent variables

Image x Annotation y Complete Annotation (y,h)

(y*,h*) = argmax wT(x,y,h)

Latent Structural SVM

Trained by minimizing overlap loss ∆

Self-Paced Learning

Start with an initial estimate w0

Update wt+1 by solving a biconvex problem

min ||w||2 + C∑i vii - K∑i vi

wT(xi,yi,hi) - wT(xi,y,h)≥ (yi, y, h) - i

Update hi = maxhH wtT(xi,yi,h)

Kumar, Packer and Koller, 2010

AnnotationConsistentInference

Loss Augmented Inference



• Inference

• Results

Generic Classes

DICTIONARYOF

REGIONSD

MERGE AND INTERSECT WITH SEGMENTS TO FORM

PUTATIVE REGIONS

SELECT REGIONS

ITERATE UNTILCONVERGENCE

Current Regions Over-Segmentations

min Ty s.t. y SELECT(D)

Kumar and Koller, 2010

Generic ClassesBinary yr(0) = 1 iff r is not selected Binary yr(1) = 1 iff r is selected

miny ∑ r(i)yr(i) + ∑ rs(i,j)yrs(i,j)

s.t. yr(0) + yr(1) = 1 Assign one label to r from L

yrs(i,0) + yrs(i,1) = yr(i) Ensure yrs(i,j) = yr(i)ys(j)

∑r “covers” u yr(1) = 1 Each super-pixel is coveredby exactly one selected region

yr(i), yrs(i,j) {0,1} Binary variables

Minimize the energy

yrs(0,j) + yrs(1,j) = ys(j)

Generic Classes

DICTIONARYOF

REGIONSD

MERGE AND INTERSECT WITH SEGMENTS TO FORM

PUTATIVE REGIONS

SELECT REGIONS

ITERATE UNTILCONVERGENCE

Current Regions Over-Segmentations

min Ty s.t. y SELECT(D)

Kumar and Koller, 2010∆new ≤ ∆prev

Simultaneous region selection and labeling

ExamplesIteration 1 Iteration 3 Iteration 6

Bounding Boxes

min Ty

y SELECT(D)∆new ≤ ∆prev

za {0,1}za ≤ r “covers” a yr(c)

+ Ka (1-za)

Each row and each column of bounding box is covered

ExamplesIteration 1 Iteration 2 Iteration 4

Image-Level Labels

min Ty

y SELECT(D)∆new ≤ ∆prev

z {0,1}z ≤ yr(c)

+ K (1-z)

Image must contain the specified object



• Inference

• Results

DatasetStanford Background

Generic background class20 foreground classes

Generic foreground class7 background classes

PASCAL VOC 2009

+

Dataset

Train - 572 imagesValidation - 53 images

Test - 90 images


Test - 750 images

Stanford BackgroundPASCAL VOC 2009

+

Baseline: Closed-loop learning (CLL), Gould et al., 2009

Results

-202468

10

LSVM

-10123456

LSVM

PASCAL VOC 2009

SBD

Improvement over CLL


CLL - 24.7%LSVM - 26.9%

CLL - 53.1%LSVM - 54.3%

DatasetStanford BackgroundPASCAL VOC 2009 + 2010

+


Test - 90 images


Test - 750 imagesBounding Boxes - 1564 images

ResultsPASCAL VOC 2009

SBD

-202468

1012

LSVMBOX

-2-10123456

LSVMBOX



CLL - 24.7%LSVM - 26.9%BOX - 28.3%

CLL - 53.1%LSVM - 54.3%BOX - 54.8%

DatasetStanford BackgroundPASCAL VOC 2009 + 2010

+


Test - 90 images


Test - 750 imagesBounding Boxes - 1564 images

+ 1000 image-level labels (ImageNet)

ResultsPASCAL VOC 2009

SBD

-2-10123456

LSVMBOXLABEL

-202468

101214

LSVMBOXLABEL



CLL - 24.7%LSVM - 26.9%BOX - 28.3%LABEL - 28.8%

CLL - 53.1%LSVM - 54.3%BOX - 54.8%LABEL - 55.3%

Examples

Failure Modes

Examples

Types of DataSpecific foreground classes, generic background class

PASCALVOC

SegmentationDatasets

Types of DataSpecific background classes, generic foreground class

StanfordBackground

Dataset

Types of DataBounding boxes for objects

PASCAL VOC Detection Datasets


Types of DataImage-level labels

ImageNet, Caltech …


“Car”

Types of DataNoisy data from web search

Google Image, Flickr, Picasa …..

Millions of freely available images

Two ProblemsThe “Noise” Problem

Self-Paced Learning

The “Size” Problem

Self-Paced Learning

Questions?

Learning to Segment from Diverse Data

Documents

Transcript of Learning to Segment from Diverse Data