A Bayesian Approach to Recognition

86
A Bayesian Approach to Recognition Moshe Blank Ita Lifshitz Reverend Thomas Bayes 1702-1761

description

A Bayesian Approach to Recognition. Moshe Blank Ita Lifshitz. Reverend Thomas Bayes 1702-1761. Agenda. Bayesian decision theory Maximum Likelihood Bayesian Estimation Recognition Simple probabilistic model Mixture model More advanced probabilistic model “One-Shot” Learning. - PowerPoint PPT Presentation

Transcript of A Bayesian Approach to Recognition

Page 1: A Bayesian Approach to Recognition

A Bayesian Approach to Recognition Moshe Blank

Ita LifshitzReverend Thomas Bayes

1702-1761

Page 2: A Bayesian Approach to Recognition

Agenda

Bayesian decision theory Maximum Likelihood Bayesian Estimation

Recognition Simple probabilistic model Mixture model More advanced probabilistic model “One-Shot” Learning

Page 3: A Bayesian Approach to Recognition

Bayesian Decision Theory

We are given a training set T of samples of class c.

Given a query image x, want to know the probability it belongs to the class, p(x)

We know that the class has some fixed distribution, with unknown parameters θ, that is p(x|θ) is known

Bayes rule tells us:

p(x|T) = ∫p(x,θ|T)dθ = ∫p(x|θ)p(θ|T)dθ What can we do about p(θ|T)?

Page 4: A Bayesian Approach to Recognition

Maximum Likelihood Estimation

What can we do about p(θ|T)?

Choose parameter value θML, that make the training data most probable:

θML = arg max P(T|θ)p(θ|T) = δ(θ – θML)

∫p(x|θ)p(θ|T)dθ = p(x| θML)

Page 5: A Bayesian Approach to Recognition

ML Illustration

Assume that the points of T are drawn from some normal distribution with known variance and unknown mean

Page 6: A Bayesian Approach to Recognition

Bayesian Estimation

The Bayesian Estimation approach considers θ as a random variable.

Before we observe the training data, the parameters are described by a prior p(θ) which is typically very broad.

Once the data is observed, we can make use of Bayes’ formula to find posterior p(θ|T). Since some values of the parameters are more consistent with the data than others, the posterior is narrower than prior.

Page 7: A Bayesian Approach to Recognition

Bayesian Estimation

Unlike ML, Bayesian estimation does not choose a specific value for θ, but instead performs a weighted average over all possible values of θ.

Why is it more accurate then ML?

Page 8: A Bayesian Approach to Recognition

Maximal Likelihood vs Bayesian

ML and Bayesian estimations are asymptotically equivalent and “consistent”.

ML is typically computationally easier. ML is often easier to interpret: it returns the single best

model (parameter) whereas Bayesian gives a weighted average of models.

But for a finite training data (and given a reliable prior) Bayesian is more accurate (uses more of the information).

Bayesian with “flat” prior is essentially ML; with asymmetric and broad priors the methods lead to different solutions.

Page 9: A Bayesian Approach to Recognition

Agenda

Bayesian decision theory Recognition Simple probabilistic model Mixture model More advanced probabilistic model “One-Shot” Learning

Page 10: A Bayesian Approach to Recognition

Objective

Given an image, decide whether or not it contains an object of a specific class.

Page 11: A Bayesian Approach to Recognition

Main Issues

Representation Learning Recognition

Page 12: A Bayesian Approach to Recognition

Approaches to Recognition

Photometric properties – filter subspaces, neural networks, principal analysis…

Geometric constraints between low level object features – alignment, geometric invariance, geometric hashing…

Object Model

Gadi Lifshitz
more comments
Page 13: A Bayesian Approach to Recognition

Fischler & Elschlager, 1973

Yuille, ‘91 Brunelli & Poggio, ‘93 Lades, v.d. Malsburg et al. ‘93 Cootes, Lanitis, Taylor et al. ‘95 Amit & Geman, ‘95, ‘99 Perona et al. ‘95, ‘96, ‘98, ‘00, ‘02

Model: constellation of Parts

Gadi Lifshitz
Read the abstracts of ...
Page 14: A Bayesian Approach to Recognition

Perona’s Approach

Objects are represented as a probabilistic constellation of rigid parts (features).

The variability within a class is represented by a joint probability density function on the shape of the constellation and the appearance of the parts.

Page 15: A Bayesian Approach to Recognition

Agenda

Bayesian decision theory Recognition Simple probabilistic model

Model parameterization Feature Selection Learning

Mixture model More advanced probabilistic model “One-Shot” Learning

Page 16: A Bayesian Approach to Recognition

Weber, Weilling, Perona - 2000

Unsupervised Learning of Models for Recognition

Towards Automatic Discovery of Object Categories

Page 17: A Bayesian Approach to Recognition

Unsupervised Learning

Learn to recognize object class given a set of class and background pictures, without preprocessing – labeling, segmentation, alignment.

Page 18: A Bayesian Approach to Recognition

Model Description

Each object is constructed of F parts, each of a certain type.

Relations between the part locations define the shape of the object.

Page 19: A Bayesian Approach to Recognition

Image Model

Image is transformed into a collection of parts

Objects are modeled as sub collections

Page 20: A Bayesian Approach to Recognition

Model Parameterization

Given an image we detect potential object parts, to obtain the following observable:

Page 21: A Bayesian Approach to Recognition

Hypothesis

When presented with an un-segmented and unlabeled image, we do not know which parts correspond to the foreground.

Assuming the image contains the object, use vector of indices h to indicate which of the observables correspond to a foreground point (i.e. real part of the object).

We call h hypothesis since it is a guess on the structure of the object. h = (h1, …, hT) is not observable.

Page 22: A Bayesian Approach to Recognition

Additional Hidden Variables

We denote by the locations of the unobserved object parts.

b = sign(h) – binary vector indicates which parts were detected

n = number of background parts detected of each type

mx

Page 23: A Bayesian Approach to Recognition

Probabilistic Model

We can now define a generative probabilistic model for the object class using the probability density function:

Page 24: A Bayesian Approach to Recognition

Model Details

Since n, b are determined by Xo, h, we have:

By Bayesian rule:

Page 25: A Bayesian Approach to Recognition

Model Details

Full table of joint probabilities (for small F) or F independent detection rate probabilities for large F

Page 26: A Bayesian Approach to Recognition

Model Details

Poisson probability density function with parameter Mt for detection of feature of type t

Page 27: A Bayesian Approach to Recognition

Model Details

Uniform probability over all hypotheses consistent with n and b

Page 28: A Bayesian Approach to Recognition

Model Details

Where - coordinates of all foreground detections, and - coordinates of all background detections

Page 29: A Bayesian Approach to Recognition

Sample object classes

Page 30: A Bayesian Approach to Recognition

Invariance to Translation Rotation and Scale There is no use in modeling the shape of the object in terms of

absolute pixel positions of the features. We apply a transformation on features’ coordinates to make the

shape invariant to translation, rotation and scale.

But the feature detector must be invariant to the transformations as well!

Page 31: A Bayesian Approach to Recognition

Automatic Part Selection

Find points of interest in all training images

Apply Vector Quantization and clustering to get 100 total candidate patterns.

Page 32: A Bayesian Approach to Recognition

Automatic Part Selection

Points of interest patterns

Page 33: A Bayesian Approach to Recognition

Method Scheme

Part Selection

Model

Learning Test

Page 34: A Bayesian Approach to Recognition

Automatic Part Selection

Find subset of candidate parts of (small) size F to be used in the model that gives the best performance in the learning phase.

57%

87%

51%

Page 35: A Bayesian Approach to Recognition

Learning

Goal: Find θ = {μ, Σ, p(b), M} which best explains the observed (training) data

μ, Σ – expectation and covariance parameters of the joint Gaussian modeling the shape of the foreground

b – random variable denoting whether each of the parts of the model is detected or not

M – average number of background detections for each of the parts

Page 36: A Bayesian Approach to Recognition

Learning

Goal: Find θ = {μ, Σ, p(b), M} which best explains the observed (training) data,

i.e. maximize the likelihood

arg max p( Xo | θ )

θ

Done using the EM method

Page 37: A Bayesian Approach to Recognition

Expectation Maximization (EM)

EM is an iterative optimization method to estimate some unknown parameters θ, given measurement data, but not given some “hidden” variables J.

We want to maximize the posterior probability of the parameters θ given the data U, marginalizing over J:

Page 38: A Bayesian Approach to Recognition

Expectation Maximization (EM)

Choose an initial parameter θ0

Guess of unknown hidden data

E-Step:

Estimate unobserved data using θk

M-Step:

Compute Maximum Likelihood

Estimate parameter θk+1 using estimated data

Observed Data

Guess of parameters θk

Page 39: A Bayesian Approach to Recognition

Expectation Maximization (EM)

alternate between estimating the unknowns θ and the hidden variables J.

EM algorithm converges to a local maximum

Page 40: A Bayesian Approach to Recognition

Method Scheme

Part Selection

Model

Learning Test

Page 41: A Bayesian Approach to Recognition

Recognition

Using the maximum a posteriori approach we consider the ratio

R =

where h0 is the null hypothesis – which explains all parts as background noise.

We accept the image as belonging to the class if R is above a certain threshold.

Page 42: A Bayesian Approach to Recognition

Database

Two classes – faces and cars 100 training images for each class 100 test images for each class Images vary in scale, location of the

object, lighting conditions Images have cluttered background No manual preprocessing

Page 43: A Bayesian Approach to Recognition

Learning Results

Page 44: A Bayesian Approach to Recognition

Model Performance

Average training and testing errors measured as 1-Area(ROC)

Suggests 4 parts model for faces and 5 parts model for cars as optimal.

Page 45: A Bayesian Approach to Recognition

Multiple use of parts

Part ‘C’ has high variance along the vertical direction – can be detected in several locations – bumper, license plate or roof.

Part Labels:

Page 46: A Bayesian Approach to Recognition

Recognition Results

Average success rate (at even False Positive and False Negative ratios):• Faces: 93.5%• Cars: 86.5%

Page 47: A Bayesian Approach to Recognition

Agenda

Bayesian decision theory Recognition Simple probabilistic model Mixture model More advanced probabilistic model “One-Shot” Learning

Page 48: A Bayesian Approach to Recognition

Mixture Model

Gaussian model works good for homogenous classes, but real life objects can be far from homogenous.

Can we extend our approach to multi-model classes?

Page 49: A Bayesian Approach to Recognition

Mixture Model

An object is modeled using Ω different components, each is a probabilistic model:

Each component “sees the whole picture”. Components are trained together.

Page 50: A Bayesian Approach to Recognition

Database

Faces with different viewing angles – 0°, 15°, …, 90°

Cars – rear view and side view

Tree leaves – of several types

Page 51: A Bayesian Approach to Recognition

Tuning of the mixture components

Each training image was assigned to the component which responds to it the most, i.e. one that maximizes .

Page 52: A Bayesian Approach to Recognition

Results

Misclassification error at even false positive and false negative rate for training and test sets

Zero false alarm detection rate (ZFA-DR).

Page 53: A Bayesian Approach to Recognition

Separately trained components

Two components trained independently on two subclasses of the cars class.

When merged into a mixture model with p(w) = 0.5, gave worse results than two-components model trained on both subclasses simultaniously.

Page 54: A Bayesian Approach to Recognition

Agenda

Bayesian decision theory Recognition Simple probabilistic model Mixture model More advanced probabilistic model

Feature Selection Model parameterization Results

“One-Shot” Learning

Page 55: A Bayesian Approach to Recognition

Fergus, Perona, Zisserman

Object Class Recognition By Scale Invariant Learning - Proc. of the IEEE Conf on Computer Vision and Pattern Recognition - 2003

Page 56: A Bayesian Approach to Recognition

Object Class Recognition By Scale Invariant Learning

Extended version of previous model (by weber et al.)

New feature detector Probabilistic model for appearance instead

of feature types

Page 57: A Bayesian Approach to Recognition

Feature Detection

Kadir-Brady feature detector

Detects salient regions over different scales and locations

Choose N most salient regions

Each feature contains scale and location information

Page 58: A Bayesian Approach to Recognition

Notation

X – Shape : Locations of the features A – Appearance : Representations of the

features S – Scale : Vector of feature scales h – Hypothesis : Which part is represented

by which observed feature.

Page 59: A Bayesian Approach to Recognition

Feature Appearance

Feature contents is rescaled to a 11x11 pixel patch

Normalization Reduce data dimension

from 121 to 15 dimensions using PCA method

Result is the appearance vector for the part

11x11 patch

c1c2

Normalize

Projection ontoPCA basis

c15

Page 60: A Bayesian Approach to Recognition

Recognition

Assuming we learned the model parameters θ. Given an image we extract X, S, A and can make a Bayesian decision:

We apply threshold to the likelihood ratio R to decide whether the input image belongs to the class.

Page 61: A Bayesian Approach to Recognition

Recognition

The term p(X, S, A | θ) can be factored into:

Each of the terms has a closed (computable) form given the model parameters θ

Page 62: A Bayesian Approach to Recognition

Part appearance pdf

Foreground model Clutter modelGaussian Gaussian

Page 63: A Bayesian Approach to Recognition

Shape pdf

Foreground model Clutter model

Gaussian Uniform

Page 64: A Bayesian Approach to Recognition

Relative Scale pdf

Gaussian

Log(scale)

Uniform

Log(scale)

Foreground model Clutter model

Page 65: A Bayesian Approach to Recognition

Detection Probability pdf

Foreground model Clutter model

Probability of detection

0.8 0.75 0.9

Poisson probability density function on

the number of detections

Page 66: A Bayesian Approach to Recognition

Learning

Want to estimate model parameters:

Using EM method find that will best explain the training set images, i.e. maximize the likelihood:

Page 67: A Bayesian Approach to Recognition

Sample Model

Page 68: A Bayesian Approach to Recognition

Sample Model

Page 69: A Bayesian Approach to Recognition

Confusion Table

How good is a model for object class A is for distinguishing images of class B from background images?

Page 70: A Bayesian Approach to Recognition

Comparison of Results

Average performance of the models at ROC equal error rates:

Scale invariant learning:

Page 71: A Bayesian Approach to Recognition

Agenda

Bayesian decision theory Recognition Simple probabilistic model Mixture model More advanced probabilistic model “One-Shot” Learning

Page 72: A Bayesian Approach to Recognition

Fei-Fei, Fergus, Perona

A bayesian Approach to Unsupervised One-Shot Learning of Object Categories - Proc. ICCV. 2003

Page 73: A Bayesian Approach to Recognition

Small Training Set

Humans can learn a new category using very few training examples.

Rule-of-thumb in computer learning tells us that number of training examples should be 5-10 times the number of model parameters.

Can computers do better?

Page 74: A Bayesian Approach to Recognition

Prior knowledge about objects

Page 75: A Bayesian Approach to Recognition

Incorporating prior knowledge

Bayesian methods allow us to use a “prior” information p(θ) about the nature of objects. Given the new observations we can update our knowledge into a “posterior” p(θ|x)

Page 76: A Bayesian Approach to Recognition

Bayesian Decision

Given test image, we want to make a Bayesian decision by comparing:

P(object | test, train) vs. P(clutter | test, train)

P(test | object, train) p(Object)

∫P(test | θ, object) p(θ | object, train) dθ

Page 77: A Bayesian Approach to Recognition

Bayesian Decision

∫P(test | θ, object) p(θ | object, train) dθ

Until now we used the ML approach – approximating p(θ) by a delta function centered at the θML = arg max p(θ).

This will not work for small training set.

Page 78: A Bayesian Approach to Recognition

Maximum Likelihood vs. Bayesian Learning

Maximum Likelihood

Bayesian Learning

Page 79: A Bayesian Approach to Recognition

Experimental setup

Learn three object categories using ML approach

Estimate the prior hyper-parameters

Use VBEM to learn new object category from few images

Page 80: A Bayesian Approach to Recognition

Prior Hyper-Parameters

Page 81: A Bayesian Approach to Recognition

Performance Results – Motorbikes

1 training image 5 training images

Page 82: A Bayesian Approach to Recognition

Performance Results – Motorbikes

Page 83: A Bayesian Approach to Recognition

Performance Results – Face Model

1 training image 5 training images

Page 84: A Bayesian Approach to Recognition

Performance Results – Face Model

Page 85: A Bayesian Approach to Recognition

Results Comparison

Algorithm # training images

Learning speed Error rate

Burl, et al.

Weber, et al.

Fergus, et al.

200~400 Hours 5.6 -10 %

Bayesian

One-Shot 1 ~ 5 < 1 min 8 –15 %

Page 86: A Bayesian Approach to Recognition

References

Object Class Recognition By Scale Invariant Learning – Fergus, Perona, Zisserman - 2003

A Bayesian Approach to Unsupervised One-Shot Learning of Object Categories - Fei-Fei, Fergus, Perona - 2003

Towards Automatic Discovery of Object Categories – Weber, Welling, Perona – 2000

Unsupervised Learning of Models for Recognition – Weber, Welling, Perona – 2000

Recognition of Planar Object Classes – Burl, Perona – 1996 Pattern Classification and Scene Analysis – Duda, Hart –

1973