Learning Object Detectors From Weakly Supervised Image Data
Embed Size (px)
description
Transcript of Learning Object Detectors From Weakly Supervised Image Data

КОМПЬЮТЕРНОЕ ЗРЕНИЕ: ОБУЧЕНИЕ РАСПОЗНАВАНИЮ ОБЪЕКТОВ
Kate Saenko, University of Massachusetts, Lowell

COMPUTER VISION: LEARNING TO DETECT OBJECTS
Kate Saenko, University of Massachusetts, Lowell

What is computer vision?3

Computer Vision4
Terminator 2
we’re not quite there yet, but….
terminator 2, enemy of the state (from UCSD “Fact or Fiction” DVD)

Machine Learning: What is it?
Program a computer to learn from experience
Learn from “big data”

Machine Learning in practice

Machine learning is not perfect7

Machine learning is not perfect8

Personal photo albums
Lots of image data available!

What are applications of computer vision?11

Surveillance and security
Computer Vision: Surveillance and Security

Smart cars
Mobileye Vision systems currently in high-end BMW, GM, Volvo models By 2010: 70% of car manufacturers
Slide content courtesy of Amnon Shashua

Scientific Images

Medical Imaging
Image guided surgeryGrimson et al., MIT
3D imagingMRI, CT
slide by S. Seitz

Vision for Robotics
http://www.robocup.org/NASA’s Mars Spirit Roverhttp://en.wikipedia.org/wiki/Spirit_rover
slide by S. Seitz

Object Detection: Face Detection
Viola and Jones, Robust object detection using a boosted cascade of simple features, CVPR 2001

What is object detection?18

Goal of object detection19
Detect: PERSON

Why is object detection difficult?20

Why is object detection difficult?21
Can you detect all objects in this image?

Easy to collect data on the web!22

Difficult to label image annotations23
Easy to label from search engine
Much more difficult and costly to label
dog apple
dog apple

Goal of this research:24
Learn from weakly labeled data!

How well can we do without bounding box labels?
25
Computer detecting pedestrians

26
Computer detecting 7,000 object categories
How well can we do without bounding box labels?

Join work with Karim Ali
Confidence-rated Multiple instance Boosting for Detection

Motivation28
Object Detection High accuracy requires large labeled data sets Scalability
Reducing annotation requirements Semi-supervised Learning Active Learning Multiple-Instance Learning

Overview29
CR-MILBOOST

Multiple instance learning with noise30
MI Learning cannot handle noisy bags

Outline31
Reminder: What is MIL?
CR-MILBoost (CVPR’14)
Conclusion & Future Work
Discussion

Reminder: What is MIL?32
Supervised Learning Each instance has an associated label
MIL: Weaker Supervision Examples come in bags Each Bag has a label
Negative Bag: all instances in bag are negative Positive Bag: at least one instance in bag is positive

Supervised vs MIL (binary)33
Supervised Learning MI Learning

Related Methods34
How to estimate latent labels for positives
Gartner, ICML’02 Xu, ICML’04 Andrews, NIPS’03
Bunescu, ICML’07 SVM Constraints
Viola, NIPS’07
Supervised MIL

CR-MILBOOST35
MILBoost

CR-MILBOOST36
MILBoost

CR-MILBOOST37
Two Step Procedure Estimate Probabilities on latent label Integrate estimate in new loss
Mitigates label estimation error by incorporating priors

CR-MILBOOST38
Step 1

CR-MILBOOST39
Step 2

CR-MILBOOST40
Step 2

Experiments: Features41
Weak Learners: An edge orientation A sub-window A threshold
Simple, Efficient Q=4, number of stumps

Experiments: Pedestrian Detection42
Training Data 200 images automatically downloaded from the web 200 “objectness” bounding boxes

Experiments: Pedestrian Detection43
Testing Data INRIA Person 300 images containing 600 pedestrians

Experiments: Pedestrian Detection44

Experiments: Pedestrian Detection45

Experiments: Pedestrian Detection46

Experiments: Horse Detection47
Training Data 200 images automatically downloaded from the web 200 “objectness” bounding boxes

Experiments: Horse Detection48
Testing Data 200 images containing 200 side-view horses

Experiments: Horse Detection49

Experiments: Horse Detection50

Experiments: Horse Detection51

Conclusion52
New MIL method: CR-MILBOOST Two step procedure
Dramatic increase in performance 200% on two datasets
Quality of selected examples still suffer from additional ambiguity when compared to the fully supervised examples

Joint work with Judy Hoffman, Eric Tzeng, Sergio Guadarrama and Trevor Darrell at UC Berkeley
Adapting Deep CNNs from Classification to Detection
54

Recall: classification is easier than detection55
Classification label: Easy to label
Detection label: much more difficult and costly!
dog apple
dog apple

ICLASSIFY
dog
apple
IDET
dog
apple
ICLASSIFY
cat
WCLASSIFYdog
WCLASSIFYapple
ClassifiersWDET
dog
WDETapple
Detectors
WCLASSIFYcat WDET
cat IDET
?
Main idea behind the approach

cat: 0.90
dog: 0.85
airplane: 0.05
person: 0.10
layers 1-5
fc6 fc7fcA
fcB
Classification data from categories A and B
Train Classification CNN
cat
dog
Deep Convolutional Neural Network

dog: 0.87
person: 0.15
cat: 0.90
dog: 0.85
background: 0.25
airplane: 0.05
person: 0.10
layers 1-5
det layers 1-5
fc6
detfc6
fc7
detfc7
fcA
fcB
detfcB
Classification data from categories A and B
Train Classification CNN
Detection data from categories B
Labeledwarped region
Train adapteddetection CNN
dog
cat
dog
background
background: 0.25
detlayers 1-5
detfc6
detfc7
Final Combined and fully adapted CNN
cat: 0.90
airplane: 0.02detfcA
dog: 0.45
person: 0.15
detfcB
adapt
background
(c) Output Layer Adaptation
(a) C
lass
ifica
tion
CNN
(b) Hidden Layer Adaptation

Results on ILSVRC 2013 Detection

Results on ILSVRC 2013 Detection

Results on ILSVRC 2013 Detection


Preliminary results on 7K categories63

Conclusion64
Presented two new methods for object detector training with minimal bounding box annotation MIL based method for learning from results of image
search Adaptation from classification to detection task

Questions?65