Learning to Segment from Diverse Data
description
Transcript of Learning to Segment from Diverse Data
Learning to Segment from Diverse Data
M. Pawan Kumar
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Daphne KollerHaithem Turki Dan Preston
AimLearn accurate parameters for a segmentation model
- Segmentation without generic foreground or background classes
- Train using both strongly and weakly supervised data
Data in Vision“Strong” Supervision
“Car”
“Weak” Supervision
“One hand tied behind the back…. “
Data for Vision
“Car”
“Strong” Supervision “Weak” Supervision
Types of DataSpecific foreground classes, generic background class
PASCALVOC
SegmentationDatasets
Types of DataSpecific background classes, generic foreground class
StanfordBackground
Dataset
Types of DataBounding boxes for objects
PASCAL VOC Detection Datasets
Thousands of freely available images
Current methods only use small, controlled datasets
Types of DataImage-level labels
ImageNet, Caltech …
Thousands of freely available images
“Car”
Types of DataNoisy data from web search
Google Image, Flickr, Picasa …..
Millions of freely available images
Outline• Region-based Segmentation Model
• Problem Formulation
• Inference
• Results
Region-based Segmentation Model
ObjectModels
Pixels
Regions
Outline• Region-based Segmentation Model
• Problem Formulation
• Inference
• Results
Problem FormulationTreat missing information as latent variables
Joint FeatureVector
Image x Annotation y Complete Annotation (y,h)
Region featuresDetection features
Pairwise contrastPairwise context
(x,y,h)
Problem FormulationTreat missing information as latent variables
Image x Annotation y Complete Annotation (y,h)
(y*,h*) = argmax wT(x,y,h)
Latent Structural SVM
Trained by minimizing overlap loss ∆
Self-Paced Learning
Start with an initial estimate w0
Update wt+1 by solving a biconvex problem
min ||w||2 + C∑i vii - K∑i vi
wT(xi,yi,hi) - wT(xi,y,h)≥ (yi, y, h) - i
Update hi = maxhH wtT(xi,yi,h)
Kumar, Packer and Koller, 2010
AnnotationConsistentInference
Loss Augmented Inference
Outline• Region-based Segmentation Model
• Problem Formulation
• Inference
• Results
Generic Classes
DICTIONARYOF
REGIONSD
MERGE AND INTERSECT WITH SEGMENTS TO FORM
PUTATIVE REGIONS
SELECT REGIONS
ITERATE UNTILCONVERGENCE
Current Regions Over-Segmentations
min Ty s.t. y SELECT(D)
Kumar and Koller, 2010
Generic ClassesBinary yr(0) = 1 iff r is not selected Binary yr(1) = 1 iff r is selected
miny ∑ r(i)yr(i) + ∑ rs(i,j)yrs(i,j)
s.t. yr(0) + yr(1) = 1 Assign one label to r from L
yrs(i,0) + yrs(i,1) = yr(i) Ensure yrs(i,j) = yr(i)ys(j)
∑r “covers” u yr(1) = 1 Each super-pixel is coveredby exactly one selected region
yr(i), yrs(i,j) {0,1} Binary variables
Minimize the energy
yrs(0,j) + yrs(1,j) = ys(j)
Generic Classes
DICTIONARYOF
REGIONSD
MERGE AND INTERSECT WITH SEGMENTS TO FORM
PUTATIVE REGIONS
SELECT REGIONS
ITERATE UNTILCONVERGENCE
Current Regions Over-Segmentations
min Ty s.t. y SELECT(D)
Kumar and Koller, 2010∆new ≤ ∆prev
Simultaneous region selection and labeling
ExamplesIteration 1 Iteration 3 Iteration 6
ExamplesIteration 1 Iteration 3 Iteration 6
ExamplesIteration 1 Iteration 3 Iteration 6
Bounding Boxes
min Ty
y SELECT(D)∆new ≤ ∆prev
za {0,1}za ≤ r “covers” a yr(c)
+ Ka (1-za)
Each row and each column of bounding box is covered
ExamplesIteration 1 Iteration 2 Iteration 4
ExamplesIteration 1 Iteration 2 Iteration 4
ExamplesIteration 1 Iteration 2 Iteration 4
Image-Level Labels
min Ty
y SELECT(D)∆new ≤ ∆prev
z {0,1}z ≤ yr(c)
+ K (1-z)
Image must contain the specified object
Outline• Region-based Segmentation Model
• Problem Formulation
• Inference
• Results
DatasetStanford Background
Generic background class20 foreground classes
Generic foreground class7 background classes
PASCAL VOC 2009
+
Dataset
Train - 572 imagesValidation - 53 images
Test - 90 images
Train - 1274 imagesValidation - 225 images
Test - 750 images
Stanford BackgroundPASCAL VOC 2009
+
Baseline: Closed-loop learning (CLL), Gould et al., 2009
Results
-202468
10
LSVM
-10123456
LSVM
PASCAL VOC 2009
SBD
Improvement over CLL
Improvement over CLL
CLL - 24.7%LSVM - 26.9%
CLL - 53.1%LSVM - 54.3%
DatasetStanford BackgroundPASCAL VOC 2009 + 2010
+
Train - 572 imagesValidation - 53 images
Test - 90 images
Train - 1274 imagesValidation - 225 images
Test - 750 imagesBounding Boxes - 1564 images
ResultsPASCAL VOC 2009
SBD
-202468
1012
LSVMBOX
-2-10123456
LSVMBOX
Improvement over CLL
Improvement over CLL
CLL - 24.7%LSVM - 26.9%BOX - 28.3%
CLL - 53.1%LSVM - 54.3%BOX - 54.8%
DatasetStanford BackgroundPASCAL VOC 2009 + 2010
+
Train - 572 imagesValidation - 53 images
Test - 90 images
Train - 1274 imagesValidation - 225 images
Test - 750 imagesBounding Boxes - 1564 images
+ 1000 image-level labels (ImageNet)
ResultsPASCAL VOC 2009
SBD
-2-10123456
LSVMBOXLABEL
-202468
101214
LSVMBOXLABEL
Improvement over CLL
Improvement over CLL
CLL - 24.7%LSVM - 26.9%BOX - 28.3%LABEL - 28.8%
CLL - 53.1%LSVM - 54.3%BOX - 54.8%LABEL - 55.3%
Examples
Examples
Failure Modes
Examples
Types of DataSpecific foreground classes, generic background class
PASCALVOC
SegmentationDatasets
Types of DataSpecific background classes, generic foreground class
StanfordBackground
Dataset
Types of DataBounding boxes for objects
PASCAL VOC Detection Datasets
Thousands of freely available images
Types of DataImage-level labels
ImageNet, Caltech …
Thousands of freely available images
“Car”
Types of DataNoisy data from web search
Google Image, Flickr, Picasa …..
Millions of freely available images
Two ProblemsThe “Noise” Problem
Self-Paced Learning
The “Size” Problem
Self-Paced Learning
Questions?