Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C....

73
Prof. Didier Stricker Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de 2D Image Processing Introduction to object recognition & SVM 1

Transcript of Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C....

Page 1: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

Prof. Didier Stricker

Kaiserlautern University

http://ags.cs.uni-kl.de/

DFKI – Deutsches Forschungszentrum für Künstliche Intelligenz

http://av.dfki.de

2D Image Processing

Introduction to object recognition & SVM

1

Page 2: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

• Basic recognition tasks

• A statistical learning approach

• Traditional recognition pipeline• Bags of features

• Classifiers

• Support Vector Machine

Overview

2

Page 3: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

Image classification

• outdoor/indoor

• city/forest/factory/etc.

3

Page 4: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

Image tagging

• street

• people

• building

• mountain

• …

4

Page 5: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

Object detection

• find pedestrians

5

Page 6: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

• walking

• shopping

• rolling a cart

• sitting

• talking

• …

Activity recognition

6

Page 7: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

mountain

building

tree

banne

r

marketpeople

street

lamp

sky

building

Image parsing

7

Page 8: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

This is a busy street in an Asian city.

Mountains and a large palace or

fortress loom in the background. In the

foreground, we see colorful souvenir

stalls and people walking around and

shopping. One person in the lower left

is pushing an empty cart, and a couple

of people in the middle are sitting,

possibly posing for a photograph.

Image description

8

Page 9: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

Image classification

9

Page 10: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

Apply a prediction function to a feature

representation of the image to get the

desired output:

f( ) = “apple”

f( ) = “tomato”

f( ) = “cow”

The statistical learning framework

10

Page 11: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

y = f(x)

Training: given a training set of labeled examples

{(x1,y1), …, (xN,yN)}, estimate the prediction function f by

minimizing the prediction error on the training set

Testing: apply f to a never before seen test example x

and output the predicted value y = f(x)

The statistical learning framework

output prediction

function

Image

feature

11

Page 12: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

Prediction

Steps

Training Labels

Training

Images

Training

Training

Image Features

Image Features

Testing

Test Image

Learned model

Learned model

Slide credit: D. Hoiem

12

Page 13: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

Traditional recognition pipeline

Hand-designed

feature

extraction

Trainable

classifier

Image

Pixels

• Features are not learned

• Trainable classifier is often generic (e.g. SVM)

Object

Class

13

Page 14: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

Bags of features

14

Page 15: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

1. Extract local features

2. Learn “visual vocabulary”

3. Quantize local features using visual vocabulary

4. Represent images by frequencies of “visual words”

Traditional features: Bags-of-features

15

Page 16: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

• Sample patches and extract descriptors

1. Local feature extraction

16

Page 17: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

• Orderless document representation:

frequencies of words from a dictionary Salton & McGill (1983)

• Classification to determine document categories

Bags of features: Motivation

17

Page 18: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

Traditional recognition pipeline

Hand-designed

feature

extraction

Trainable

classifier

Image

PixelsObject

Class

18

Page 19: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

f(x) = label of the training example nearest to x

All we need is a distance function for our inputs

No training required!

Classifiers: Nearest neighbor

Test

exampleTraining

examples

from class 1

Training

examples

from class 2

19

Page 20: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

• For a new point, find the k closest points from training data

• Vote for class label with labels of the k points

K-nearest neighbor classifier

k = 5

20

Page 21: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

Which classifier is more robust to outliers?

K-nearest neighbor classifier

Credit: Andrej Karpathy, http://cs231n.github.io/classification/ 21

Page 22: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

K-nearest neighbor classifier

Credit: Andrej Karpathy, http://cs231n.github.io/classification/ 22

Page 23: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

Linear classifiers

Find a linear function to separate the classes:

f(x) = sgn(w x + b)

23

Page 24: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

Visualizing linear classifiers

Source: Andrej Karpathy, http://cs231n.github.io/linear-classify/ 24

Page 25: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

• NN pros: Simple to implement

Decision boundaries not necessarily linear

Works for any number of classes

Nonparametric method

• NN cons: Need good distance function

Slow at test time

• Linear pros: Low-dimensional parametric representation

Very fast at test time

• Linear cons: Works for two classes

How to train the linear function?

What if data is not linearly separable?

Nearest neighbor vs. linear classifiers

25

Page 26: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

26

Linear Classifiers can lead to many equally

valid choices for the decision boundary

Maximum Margin

Are these really

“equally valid”?

Page 27: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

27

How can we pick which is best?

Maximize the size of the margin.

Max Margin

Are these really

“equally valid”?

Large Margin

Small Margin

Page 28: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

28

Support Vectors are those

input points (vectors) on

the decision boundary

1. They are vectors

2. They “support” the

decision hyperplane

Support Vectors

Page 29: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

29

Define this as a decision

problem

The decision hyperplane:

No fancy math, just the

equation of a hyperplane.

Support Vectors

Page 30: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

30

Any line in the plane is perpendicular to

the normal of the plane

Which can be written in the form of

Where the W is the normal, and b is the –ve distance

of the plane to the origin (up to the scale of w).

Recall: Equation of a plane

Page 31: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

31

Define this as a decision problem

The decision hyperplane:

Decision Function:

Support Vectors

Page 32: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

32

Define this as a decision problem

The decision hyperplane:

Margin hyperplanes:

Support Vectors

Page 33: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

33

The decision hyperplane:

Scale invariance

Support Vectors

Page 34: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

Support Vectors

The decision hyperplane:

Scale invariance

34

Page 35: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

Support Vectors

The decision hyperplane:

Scale invariance

35

This scaling does not change the

decision hyperplane, or the support

vector hyperplanes. But we will

eliminate a variable from the

optimization

Page 36: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

36

We will represent the size of

the margin in terms of w.

This will allow us to

simultaneously Identify a decision boundary

Maximize the margin

What are we optimizing?

Page 37: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

37

1. There must at least

one point that lies on

each support

hyperplanes

How do we represent the size of

the margin in terms of w?

Proof outline: If not, we

could define a larger

margin support

hyperplane that does

touch the nearest

point(s).

Page 38: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

38

1. There must at least one point

that lies on each support

hyperplanes

2. Thus:

How do we represent the size of the

margin in terms of w?

3. And:

Page 39: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

How do we represent the size of the

margin in terms of w?

1. There must at least one

point that lies on each

support hyperplanes

2. Thus:

39

3. And:

Page 40: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

The vector w is perpendicular to the decision hyperplane

If the dot product of two vectors equals zero, the two vectors are perpendicular.

How do we represent the size of the

margin in terms of w?

40

Page 41: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

41

The margin is the projection

of x1 – x2 onto w, the normal

of the hyperplane.

How do we represent the size of the

margin in terms of w?

Page 42: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

42

Recall: Vector Projection

Page 43: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

43

The margin is the projection

of x1 – x2 onto w, the normal of

the hyperplane.

How do we represent the size of the

margin in terms of w?

Size of the Margin:

Projection:

Page 44: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

44

Goal: maximize the margin

Maximizing the margin

Linear separability of the data

by the decision boundary

Page 45: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

45

If constraint optimization then Lagrange Multipliers

Optimize the “Primal”

Max Margin Loss Function

Lagrange Multiplier + Karush-Kuhn-Tucker (KKT) condition

Page 46: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

46

Optimize the “Primal”

Max Margin Loss Function

Partial wrt b

Page 47: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

47

Optimize the “Primal”

Max Margin Loss Function

Partial wrt w

Now have to find αi.

Substitute back to the Loss function

Page 48: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

48

Construct the “dual”

Max Margin Loss Function

With:

Then:

Page 49: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

49

Solve this quadratic program to identify the

lagrange multipliers and thus the weights

Dual formulation of the error

There exist (rather) fast approaches to quadratic

program solving in both C, C++, Python, Java and R

Page 50: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

50

Quadratic Programming

•If Q is positive semi definite, then f(x) is convex.

•If f(x) is convex, then there is a single maximum.

Page 51: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

51

In constraint optimization: At the optimal solution

Constraint * Lagrange Multiplier = 0

Karush-Kuhn-Tucker Conditions

Only points on the decision boundary contribute to the solution!

Page 52: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

Support Vector Expansion

When αi is non-zero then xi is a support vector

When αi is zero xi is not a support vector52

New decision Function

Page 53: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

• General idea: the original input space can always

be mapped to some higher-dimensional feature

space where the training set is separable

Nonlinear SVMs

Φ: x→ φ(x)

Image source 53

Page 54: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

• Linearly separable dataset in 1D:

• Non-separable dataset in 1D:

• We can map the data to a higher-dimensional space:

0 x

0 x

0 x

x2

Nonlinear SVMs

Slide credit: Andrew Moore54

Page 55: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

• General idea: the original input space can always be

mapped to some higher-dimensional feature space

where the training set is separable.

• The kernel trick: instead of explicitly computing the

lifting transformation φ(x), define a kernel function K

such that

K(x,y) = φ(x) · φ(y)

(to be valid, the kernel function must satisfy

Mercer’s condition)

The kernel trick

55

Page 56: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

• Linear SVM decision function:

The kernel trick

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

and Knowledge Discovery, 1998

bybi iii xxxw

Support

vector

learned

weight

56

Page 57: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

bKybyi

iii

i

iii ),()()( xxxx

The kernel trick

• Linear SVM decision function:

• Kernel SVM decision function:

• This gives a nonlinear decision boundary in the original feature space

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

and Knowledge Discovery, 1998

bybi iii xxxw

57

Page 58: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

Polynomial kernel: dcK )(),( yxyx

58

Page 59: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

• Also known as the radial basis function

(RBF) kernel:

Gaussian kernel

||x – y||

K(x, y)

2

2

1exp),( yxyx

K

59

Page 60: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

Gaussian kernel

SV’s

60

Page 61: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

• Pros Kernel-based framework is very powerful, flexible

Training is convex optimization, globally optimal solution

can be found

Amenable to theoretical analysis

SVMs work very well in practice, even with very small

training sample sizes

• Cons No “direct” multi-class SVM, must combine two-class

SVMs (e.g., with one-vs-others)

Computation, memory (esp. for nonlinear SVMs)

SVMs: Pros and cons

61

Page 62: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

• Generalization refers to the ability to correctly

classify never before seen examples

• Can be controlled by turning “knobs” that affect

the complexity of the model

Generalization

Training set (labels known) Test set (labels

unknown)62

Page 63: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

• Training error: how does the model perform on the

data on which it was trained?

• Test error: how does it perform on never before seen

data?

Diagnosing generalization ability

Source: D. Hoiem

Training error

Test error

Underfitting Overfitting

Model complexity HighLow

Err

or

63

Page 64: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

• Underfitting: training and test error are both high Model does an equally poor job on the training and the test set

Either the training procedure is ineffective or the model is too

“simple” to represent the data

• Overfitting: Training error is low but test error is high• Model fits irrelevant characteristics (noise) in the training data

Model is too complex or amount of training data is insufficient

Underfitting and overfitting

Underfitting

Overfitting

Good generalization

Figure source64

Page 65: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

Effect of training set size

Many training examples

Few training examples

Model complexity HighLow

Te

st E

rro

r

Source: D. Hoiem65

Page 66: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

Effect of training set size

Many training examples

Few training examples

Model complexity HighLow

Te

st E

rro

r

Source: D. Hoiem66

Page 67: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

• Split the data into training, validation, and test subsets

• Use training set to optimize model parameters

• Use validation test to choose the best model

• Use test set only to evaluate performance

Validation

Model complexity

Training set loss

Test set loss

Err

or

Validation

set loss

Stopping point

67

Page 68: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

Histogram of Oriented Gradients (HoG)

[Dalal and Triggs, CVPR 2005]

10x10 cells

20x20 cells

HoGify

68

Page 69: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

Histogram of Oriented Gradients

(HoG)

69

Page 70: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

Many examples on Internet:

https://www.youtube.com/watch?v=b_DSTuxdGDw

Histogram of Oriented Gradients

(HoG)

[Dalal and Triggs, CVPR 2005] 70

Page 71: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

71

Page 72: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

Demo software:

http://cs.stanford.edu/people/karpathy/svmjs/demo

A useful website: www.kernel-machines.org

Software: LIBSVM: www.csie.ntu.edu.tw/~cjlin/libsvm/

SVMLight: svmlight.joachims.org/

Resources

72

Page 73: Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

73

Thank you!