Linear Classification Lecture 3 - Artificial...

47
Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 1 Lecture 3: Linear Classification

Transcript of Linear Classification Lecture 3 - Artificial...

Page 1: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 20151

Lecture 3:Linear Classification

Page 2: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 20152

Last time: Image Classification

cat

assume given set of discrete labels{dog, cat, truck, plane, ...}

Page 3: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 20153

k-Nearest Neighbortraining set

test images

Page 4: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 20154

Linear Classification1. define a score function

class scores

Page 5: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 20155

Linear Classification1. define a score function

“weights” “bias vector”

data (image)

class scores“parameters”

Page 6: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 20156

Linear Classification1. define a score function(assume CIFAR-10 example so32 x 32 x 3 images, 10 classes)

weights bias vector

data (image)[3072 x 1]

class scores

Page 7: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 20157

Linear Classification1. define a score function(assume CIFAR-10 example so32 x 32 x 3 images, 10 classes)

weights[10 x 3072]

bias vector[10 x 1]

data (image)[3072 x 1]

class scores[10 x 1]

Page 8: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 20158

Linear Classification

Page 9: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 20159

Interpreting a Linear ClassifierQuestion:what can a linear classifier do?

Page 10: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201510

Interpreting a Linear Classifier

Example training classifiers on CIFAR-10:

Page 11: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201511

Interpreting a Linear Classifier

Page 12: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201512

Bias trick

Page 13: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201513

So far:We defined a (linear) score function:

Page 14: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201514

2. Define a loss function (or cost function, or objective)

Page 15: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201515

2. Define a loss function (or cost function, or objective)

- scores, label loss.

Page 16: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201516

2. Define a loss function (or cost function, or objective)

- scores, label loss.

Example:

Page 17: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201517

2. Define a loss function (or cost function, or objective)

- scores, label loss.

Example: Question: if you were to assign a single number to how “unhappy” you are with these scores, what would you do?

Page 18: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201518

2. Define a loss function (or cost function, or objective)One (of many ways) to do it: Multiclass SVM Loss

Page 19: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201519

2. Define a loss function (or cost function, or objective)One (of many ways) to do it: Multiclass SVM Loss

(One possible generalization of Binary Support Vector Machine to multiple classes)

Page 20: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201520

2. Define a loss function (or cost function, or objective)One (of many ways) to do it: Multiclass SVM Loss

loss due to example i sum over all

incorrect labelsdifference between the correct class score and incorrect class score

Page 21: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201521

loss due to example i sum over all

incorrect labelsdifference between the correct class score and incorrect class score

Page 22: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201522

Example:

e.g. 10

loss = ?

Page 23: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201523

Example:

e.g. 10

Page 24: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201524

Page 25: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201525

There is a bug with the objective…

Page 26: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201526

L2 Regularization

Regularization strength

Page 27: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201527

L2 regularization: motivation

Page 28: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201528

Page 29: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201529

Do we have to cross-validate both and ?

Page 30: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201530

Page 31: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201531

So far…1. Score function

2. Loss function

Page 32: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201532

Page 33: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201533

Softmax Classifier score functionis the same

(extension of Logistic Regression to multiple classes)

Page 34: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201534

Softmax Classifier score functionis the same

Page 35: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201535

Softmax Classifier score functionis the same

i.e. we’re minimizing the negative log likelihood.

softmax function

Page 36: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201536

Softmax Classifier score functionis the same

softmax function

Page 37: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201537

Softmax Classifier score functionis the same

i.e. we’re minimizing the negative log likelihood.

Page 38: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201538

Page 39: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201539

Softmax vs. SVM

- Interpreting the probabilities from the Softmax

Page 40: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201540

Softmax vs. SVM

- Interpreting the probabilities from the Softmax

suppose the weights W were only half as large

(we use a higher regularization strength)

Page 41: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201541

Softmax vs. SVM

- Interpreting the probabilities from the Softmax

suppose the weights W were only half as large:

Page 42: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201542

Softmax vs. SVM- Interpreting the probabilities from the Softmax

suppose the weights W were only half as large:

What happens in the limit, as the regularization strength goes to infinity?

Page 43: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201543

Softmax vs. SVM

scores:[10, -2, 3][10, 9, 9][10, -100, -100]

1

Page 44: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201544

Softmax vs. SVM1

Page 45: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201545

Interactive Web Demo time....

http://vision.stanford.edu/teaching/cs231n/linear-classify-demo/

Page 46: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201546

Summary- We introduced a parametric approach to

image classification- We defined a score function (linear map)- We defined a loss function (SVM / Softmax)

One problem remains: How to find W,b ?

Page 47: Linear Classification Lecture 3 - Artificial Intelligencevision.stanford.edu/teaching/cs231n/slides/2015/lecture3.pdf · Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015 2.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 201547

Next class: Optimization,

Backpropagation

Find the W,b that minimizes the loss function.