Department of Electrical & Computer Engineering
Visual Recognition with Humans
in the Loop
Authors: Steve Branson, Catherine Wah, Florian Schroff, Boris
Babenko, Peter Welinder, Pietro Perona, and Serge Belongie
Presented by: Yan Fang
Department of Electrical & Computer Engineering
Overview
• Problem Introduction- Challenge
- Goal
- Related Work
• Approach- Method Overview
- Incorporating Computer Vision
- User Response
• Experiments & Results- Datasets & Configuration
- Performance Evaluation
- Results
• Conclusion & Discussion
Department of Electrical & Computer Engineering
Problem Introduction
• Multi-class Object Recognition
• Challenge: Computer vision performs bad on fine-grain category
Inter-category:
easy for computer and human
Fine-grain category:
hard for computer and human
Department of Electrical & Computer Engineering
Why do we care?
• Low performance on basic-level category of CV
algorithms, not acceptable
• Low object category number in most datasets
• Important problem to study - help people recognize
types of objects they don't yet know how to identify
Department of Electrical & Computer Engineering
Why is it hard?
Difficulties for Human in Fine-grain category classification
Easy
Recognize sub-class Recognize visual attributes
Hard
Department of Electrical & Computer Engineering
Why is it hard?
Compare Human with Computer:
Human Computer
Memory, Expertise, Knowledge
Limited Good
Basic Visual Capabilities
Good Limited
Department of Electrical & Computer Engineering
Combine them together
Blue Belly
Finch?
Bunting?
Hard for computer and human Easy for human Easy for computer
Department of Electrical & Computer Engineering
Goal
• Build a human-computer framework for multi-class object
recognition
• Easy to plug in any object recognition algorithm
• Use assistance of human to improve performance
• Minimize the human effort in recognition task
• Good enough for real-life application
Department of Electrical & Computer Engineering
Related Work
• Recognition of tightly-related categories- Dataset: Oxford Flowers 102, UIUC Birds, and STONEFLY
shortcoming: scaling, object domain, performance
- Similar work: Botanist's Field Guide
difference: intention (for expert/layperson), processing of image
• Areas combine vision, learning with human input- Relevance feedback, active learning, expert system
- Similar but different from this work
• Scaling to large number of category- Class taxonomies feature sharing, error correcting output codes
(ECOC), attribute based classification methods
- Can be plug into this work
Department of Electrical & Computer Engineering
Approach
Department of Electrical & Computer Engineering
Method Overview
Goal: Given image, classify bird category
• Pose question about visual property for human, easy to answer
• Intelligently select question, exploit visual content by step
• Make decision based on refined probability distribution
Department of Electrical & Computer Engineering
Method Overview
Example of Visual 20-question game for human
A database of C classes needs O(log C) questions,
can be faster with computer vision
http://20q.net
Department of Electrical & Computer Engineering
Algorithm Details
Some terms:
A set of possible questions (e.g. IsRed?, HasStripes?, BellyColor?)
Answer with confidence value
// Initialize question set
// Ask question iteratively
// Pick question by information gain
// Pose the question
// Make the decision
Department of Electrical & Computer Engineering
More notations
For time step t, select question
is the history response set
is the index of question in question set
is the current probability distribution for classification
is the information gain obtained if ask another
question
Department of Electrical & Computer Engineering
Select Next Question
Maximizing Information Gain like decision tree algorithm
Kullback–Leibler divergence, measure of
difference between two distributions
Entropy of
Depends on CV algorithm
Depends on user response
Cross-Entropy?
Department of Electrical & Computer Engineering
Incorporate Computer Vision
• Any recognition algorithm can be plugged in, e.g. classifier
like SVM that uses attributes or features
• The purpose of computer vision is to evaluate
• This conditional prior helps update the current class
distribution and determine which question to ask
• It’s OK not to use any CV algorithm, can be obtained
by any probability distribution, or simply replaced with prior
Department of Electrical & Computer Engineering
Incorporate Computer Vision
• Simple framework using Bayesian rule:
• Assume user response is class-dependent not image-
dependent
Department of Electrical & Computer Engineering
Modeling User Response
• Assume questions are answered independently given
the category (experimentally work)
Department of Electrical & Computer Engineering
Modeling User Response
• Dependencies of terms:
Department of Electrical & Computer Engineering
Modeling User Response
• Still need
• Assume
Weighted Dirichlet Prior
Global Attribute Prior Pooling together certainty labels
Department of Electrical & Computer Engineering
Modeling User Response
Example of user response
Department of Electrical & Computer Engineering
Experiments & Results
Department of Electrical & Computer Engineering
Dataset & Configuration
Bird200 Dataset
• 6033 images, 200 species
• Difficult for layperson
Questions:
• 25 question, 288 binary attributes
• Deterministic attributes from whatbird.com
Department of Electrical & Computer Engineering
Answer Collection
Mechanical Turk Interface:
• Non-expert answer
• Prototypical image with
supplementary material
• Use randomly selected
answer
Department of Electrical & Computer Engineering
Evaluation
Method Configuration:
• No Computer Vision
• Classifier based on SIFT, VL Features from Andrea
Vedaldi
• Classifier based on attributes
Evaluation:
• Ask T question, measure classification accuracy
• Provide images of the class with highest probability
after each question, user stop process by verify these
images
Department of Electrical & Computer Engineering
Results & Performance
• No Computer Vision
• Contribution of modeling user response
• Non-expert user is not ideal
Department of Electrical & Computer Engineering
Results & Performance
• Question number vs Accuracy
• CV algorithms do improve performance when
fewer questions are asked
Department of Electrical & Computer Engineering
Results & Performance
• User stop tests
• CV algorithms reduce the labor of easy tasks
Department of Electrical & Computer Engineering
Results & Performance
• Similar Performance on Animal with Attributes
• Attribute works better than 1-vs-all
Department of Electrical & Computer Engineering
Computer Vision Help Case
• Computers help select the proper question which
helps the correct recognition
Department of Electrical & Computer Engineering
Human Response Help Case
• User response help correct the wrong prediction
of computer vision
Department of Electrical & Computer Engineering
Failed Case
• Cropped image lead to the failing response of
certain question (attributes of belly)
• Two species are naturally similar, questions fail to
capture the distinguish attributes
Department of Electrical & Computer Engineering
Conclusion
• Pros- A framework combine computer vision and human recognition
- Compatible with any CV algorithms
- Human inputs improve the accuracy on hard recognition task
- Computer reduce human labor on easy task
- Practical for real application that help non-expert human
• Cons- Cropped image can lead to failure answers to questions
- Might not work on very similar species
- The attribute selection is very complicate and depends on expert
knowledge
Department of Electrical & Computer Engineering
Future Work
• Trend: reduce/exclude human efforts in the
framework
• Improve CV performance on hard problem
• Develop better question design and selection
mechanism
Department of Electrical & Computer Engineering
Discussion & Questions
Department of Electrical & Computer Engineering
What's It Going to Cost You? : Predicting Effort vs.
Informativeness for Multi-Label Image Annotations
Sudheendra Vijayanarasimhan and Kristen Grauman
Department of Electrical & Computer Engineering
Overview
• Problem
• Method Overview
• Experiments & Results
• Conclusion & Discussion
Department of Electrical & Computer Engineering
Problem Introduction
• Annotation of train data is very important to visual recognition
• Manual effort required, and images are not equally informative
• Active learning does not fit visual category learning:- Images contain multiple objects need multiple labels
- More annotation type, regions, segments
- Each annotation cost different efforts due to different types and
image
Department of Electrical & Computer Engineering
Proposed Method
• A new active learning framework weight informativeness against
effort for annotation
• A multiple-instance, multi-label learning (MIML) formulation help
select most promising annotation
• Capable of choose both image and the types of annotation
• Learn from human to predict the effort cost of different image
Department of Electrical & Computer Engineering
Active Learning
Department of Electrical & Computer Engineering
Method Overview
Department of Electrical & Computer Engineering
Method Overview
Step 1.
Learn object categories from
multi-label images, with a
mixture of weak and strong
labels.
MIML- multiple-instance multi-
label learning
Department of Electrical & Computer Engineering
MIML Scenario
Unlabeled images are
oversegmented into regions
Multiple bag of regions
Different level of annotation
provide different informativeness
Department of Electrical & Computer Engineering
Method Overview
Step 2. Active multi-level selection of multi-label annotations
• surveys unlabeled and partially labeled images,
• predicts the tradeoff between its informativeness versus the manual
effort
• Select the promising annotation and update classifier
Department of Electrical & Computer Engineering
Experiments
• The MSRCv2 dataset, 591 images and 21 classes
• Evaluate three aspects:- Accuracy of learning from multi-label examples
- Accuracy of annotation cost prediction
- Effectiveness to reduce manual effort
• RBF kernel for SVM, set parameter based on cross-validation, ignore
void region
Department of Electrical & Computer Engineering
Results
• Segment image and obtain texton and color histogram of each bulb
• Each image is a bag, segment is instance
• Image-level label
• Accuracy on new image and new region
Department of Electrical & Computer Engineering
Results
• Gather data with Amazon’s Mechanical Turk
• Classifiers on “Easy” and “Hard”
• Regressors predict actual time cost
Department of Electrical & Computer Engineering
Results
• Comparison of different selection strategy
• Accuracy: average value of the diagonal of the confusion matrix
• Region-level accuracy
• 80 random images added into unlabeled pool
Department of Electrical & Computer Engineering
Results
• Comparison of with or without cost prediction function
• Work on Tree and Airplane, not Sky
Department of Electrical & Computer Engineering
Results
• Numbers for evaluation of active selection
• Active selection takes less effort to achieve the same level of
accuracy
Department of Electrical & Computer Engineering
Contribution
• An active learning framework choose annotation example
based on the balance of manual efforts and informativeness
• Handle annotation types on different level
• Active learning reduce much manual efforts
• Effectively predict the cost of annotation
• Multi-level and multi-label strategy outperform traditional
active method
Department of Electrical & Computer Engineering
Discussion & Question
Top Related