Download - Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Transcript
Page 1: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 20191

Lecture 3:Loss Functions

and Optimization

Page 2: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019

Administrative: Assignment 1

Released last week, due Wed 4/17 at 11:59pm

Reminder: Please double check your downloaded file is spring1819_assignment1.zip, not spring1718! As announced in piazza note @79, there was a legacy link being fixed just as the assignment was being released.

2

Page 3: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019

Administrative: Project proposal

Due Wed 4/24

3

Page 4: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019

Administrative: Piazza

Please check and read pinned piazza posts. There will be announcements soon re: alternative midterm requests, Google Cloud credits, etc.

4

Page 6: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019

Recall from last time: data-driven approach, kNN

6

1-NN classifier 5-NN classifier

train test

train testvalidation

Page 7: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019

Recall from last time: Linear Classifier

7

f(x,W) = Wx + b

Page 8: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019

Recall from last time: Linear Classifier

8

1. Define a loss function that quantifies our unhappiness with the scores across the training data.

2. Come up with a way of efficiently finding the parameters that minimize the loss function. (optimization)

TODO:

Cat image by Nikita is licensed under CC-BY 2.0; Car image is CC0 1.0 public domain; Frog image is in the public domain

Page 9: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 20199

cat

frog

car

3.25.1-1.7

4.91.3

2.0 -3.12.52.2

Suppose: 3 training examples, 3 classes.With some W the scores are:

Page 10: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201910

cat

frog

car

3.25.1-1.7

4.91.3

2.0 -3.12.52.2

Suppose: 3 training examples, 3 classes.With some W the scores are:

A loss function tells how good our current classifier is

Page 11: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201911

cat

frog

car

3.25.1-1.7

4.91.3

2.0 -3.12.52.2

Suppose: 3 training examples, 3 classes.With some W the scores are:

A loss function tells how good our current classifier is

Given a dataset of examples

Where is image and is (integer) label

Page 12: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201912

cat

frog

car

3.25.1-1.7

4.91.3

2.0 -3.12.52.2

Suppose: 3 training examples, 3 classes.With some W the scores are:

A loss function tells how good our current classifier is

Given a dataset of examples

Where is image and is (integer) label

Loss over the dataset is a average of loss over examples:

Page 13: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201913

cat

frog

car

3.25.1-1.7

4.91.3

2.0 -3.12.52.2

Suppose: 3 training examples, 3 classes.With some W the scores are:

Multiclass SVM loss:

Given an examplewhere is the image andwhere is the (integer) label,

and using the shorthand for the scores vector:

the SVM loss has the form:

Page 14: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201914

cat

frog

car

3.25.1-1.7

4.91.3

2.0 -3.12.52.2

Suppose: 3 training examples, 3 classes.With some W the scores are:

Multiclass SVM loss:

Given an examplewhere is the image andwhere is the (integer) label,

and using the shorthand for the scores vector:

the SVM loss has the form:

“Hinge loss”

Page 15: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201915

cat

frog

car

3.25.1-1.7

4.91.3

2.0 -3.12.52.2

Suppose: 3 training examples, 3 classes.With some W the scores are:

Multiclass SVM loss:

Given an examplewhere is the image andwhere is the (integer) label,

and using the shorthand for the scores vector:

the SVM loss has the form:

Page 16: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201916

cat

frog

car

3.25.1-1.7

4.91.3

2.0 -3.12.52.2

Suppose: 3 training examples, 3 classes.With some W the scores are:

Multiclass SVM loss:

Given an examplewhere is the image andwhere is the (integer) label,

and using the shorthand for the scores vector:

the SVM loss has the form:

= max(0, 5.1 - 3.2 + 1) +max(0, -1.7 - 3.2 + 1)= max(0, 2.9) + max(0, -3.9)= 2.9 + 0= 2.9Losses: 2.9

Page 17: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201917

cat

frog

car

3.25.1-1.7

4.91.3

2.0 -3.12.52.2

Suppose: 3 training examples, 3 classes.With some W the scores are:

Multiclass SVM loss:

Given an examplewhere is the image andwhere is the (integer) label,

and using the shorthand for the scores vector:

the SVM loss has the form:

Losses:

= max(0, 1.3 - 4.9 + 1) +max(0, 2.0 - 4.9 + 1)= max(0, -2.6) + max(0, -1.9)= 0 + 0= 002.9

Page 18: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201918

cat

frog

car

3.25.1-1.7

4.91.3

2.0 -3.12.52.2

Suppose: 3 training examples, 3 classes.With some W the scores are:

Multiclass SVM loss:

Given an examplewhere is the image andwhere is the (integer) label,

and using the shorthand for the scores vector:

the SVM loss has the form:

Losses:

= max(0, 2.2 - (-3.1) + 1) +max(0, 2.5 - (-3.1) + 1)= max(0, 6.3) + max(0, 6.6)= 6.3 + 6.6= 12.912.92.9 0

Page 19: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201919

cat

frog

car

3.25.1-1.7

4.91.3

2.0 -3.12.52.2

Suppose: 3 training examples, 3 classes.With some W the scores are:

Multiclass SVM loss:

Given an examplewhere is the image andwhere is the (integer) label,

and using the shorthand for the scores vector:

the SVM loss has the form:

Loss over full dataset is average:

Losses: 12.92.9 0 L = (2.9 + 0 + 12.9)/3 = 5.27

Page 20: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201920

cat

frog

car

3.25.1-1.7

4.91.3

2.0 -3.12.52.2

Suppose: 3 training examples, 3 classes.With some W the scores are:

Multiclass SVM loss:

Given an examplewhere is the image andwhere is the (integer) label,

and using the shorthand for the scores vector:

the SVM loss has the form:

Q: What happens to loss if car scores change a bit?Losses: 12.92.9 0

Page 21: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201921

cat

frog

car

3.25.1-1.7

4.91.3

2.0 -3.12.52.2

Suppose: 3 training examples, 3 classes.With some W the scores are:

Multiclass SVM loss:

Given an examplewhere is the image andwhere is the (integer) label,

and using the shorthand for the scores vector:

the SVM loss has the form:

Q2: what is the min/max possible loss?Losses: 12.92.9 0

Page 22: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201922

cat

frog

car

3.25.1-1.7

4.91.3

2.0 -3.12.52.2

Suppose: 3 training examples, 3 classes.With some W the scores are:

Multiclass SVM loss:

Given an examplewhere is the image andwhere is the (integer) label,

and using the shorthand for the scores vector:

the SVM loss has the form:

Q3: At initialization W is small so all s ≈ 0.What is the loss?Losses: 12.92.9 0

Page 23: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201923

cat

frog

car

3.25.1-1.7

4.91.3

2.0 -3.12.52.2

Suppose: 3 training examples, 3 classes.With some W the scores are:

Multiclass SVM loss:

Given an examplewhere is the image andwhere is the (integer) label,

and using the shorthand for the scores vector:

the SVM loss has the form:

Q4: What if the sum was over all classes? (including j = y_i)Losses: 12.92.9 0

Page 24: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201924

cat

frog

car

3.25.1-1.7

4.91.3

2.0 -3.12.52.2

Suppose: 3 training examples, 3 classes.With some W the scores are:

Multiclass SVM loss:

Given an examplewhere is the image andwhere is the (integer) label,

and using the shorthand for the scores vector:

the SVM loss has the form:

Q5: What if we used mean instead of sum?Losses: 12.92.9 0

Page 25: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201925

cat

frog

car

3.25.1-1.7

4.91.3

2.0 -3.12.52.2

Suppose: 3 training examples, 3 classes.With some W the scores are:

Multiclass SVM loss:

Given an examplewhere is the image andwhere is the (integer) label,

and using the shorthand for the scores vector:

the SVM loss has the form:

Q6: What if we used

Losses: 12.92.9 0

Page 26: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019

Multiclass SVM Loss: Example code

26

Page 27: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201927

E.g. Suppose that we found a W such that L = 0. Is this W unique?

Page 28: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201928

E.g. Suppose that we found a W such that L = 0. Is this W unique?

No! 2W is also has L = 0!

Page 29: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201929

Suppose: 3 training examples, 3 classes.With some W the scores are:

cat

frog

car

3.25.1-1.7

4.91.3

2.0 -3.12.52.2

= max(0, 1.3 - 4.9 + 1) +max(0, 2.0 - 4.9 + 1)= max(0, -2.6) + max(0, -1.9)= 0 + 0= 0

0Losses: 2.9

Before:

With W twice as large:= max(0, 2.6 - 9.8 + 1) +max(0, 4.0 - 9.8 + 1)= max(0, -6.2) + max(0, -4.8)= 0 + 0= 0

Page 30: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201930

E.g. Suppose that we found a W such that L = 0. Is this W unique?

No! 2W is also has L = 0! How do we choose between W and 2W?

Page 31: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019

Regularization

31

Data loss: Model predictions should match training data

Page 32: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019

Regularization

32

Data loss: Model predictions should match training data

Regularization: Prevent the model from doing too well on training data

Page 33: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019

Regularization

33

Data loss: Model predictions should match training data

Regularization: Prevent the model from doing too well on training data

= regularization strength(hyperparameter)

Page 34: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019

Regularization

34

Data loss: Model predictions should match training data

Regularization: Prevent the model from doing too well on training data

= regularization strength(hyperparameter)

Simple examplesL2 regularization: L1 regularization: Elastic net (L1 + L2):

Page 35: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019

Regularization

35

Data loss: Model predictions should match training data

Regularization: Prevent the model from doing too well on training data

= regularization strength(hyperparameter)

Simple examplesL2 regularization: L1 regularization: Elastic net (L1 + L2):

More complex:DropoutBatch normalizationStochastic depth, fractional pooling, etc

Page 36: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019

Regularization

36

Data loss: Model predictions should match training data

Regularization: Prevent the model from doing too well on training data

= regularization strength(hyperparameter)

Why regularize?- Express preferences over weights- Make the model simple so it works on test data- Improve optimization by adding curvature

Page 37: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019

Regularization: Expressing Preferences

37

L2 Regularization

Page 38: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019

Regularization: Expressing Preferences

38

L2 Regularization

L2 regularization likes to “spread out” the weights

Page 39: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019

Regularization: Prefer Simpler Models

39

x

y

Page 40: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019

Regularization: Prefer Simpler Models

40

x

yf1 f2

Page 41: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019

Regularization: Prefer Simpler Models

41

x

yf1 f2

Regularization pushes against fitting the data too well so we don’t fit noise in the data

Page 42: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201942

Softmax Classifier (Multinomial Logistic Regression)

cat

frog

car

3.25.1-1.7

Want to interpret raw classifier scores as probabilities

Page 43: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201943

Softmax Classifier (Multinomial Logistic Regression)

cat

frog

car

3.25.1-1.7

Want to interpret raw classifier scores as probabilitiesSoftmax Function

Page 44: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201944

Softmax Classifier (Multinomial Logistic Regression)

cat

frog

car

3.25.1-1.7

Want to interpret raw classifier scores as probabilitiesSoftmax Function

24.5164.00.18

exp

unnormalized probabilities

Probabilities must be >= 0

Page 45: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201945

Softmax Classifier (Multinomial Logistic Regression)

cat

frog

car

3.25.1-1.7

Want to interpret raw classifier scores as probabilitiesSoftmax Function

24.5164.00.18

0.130.870.00

exp normalize

unnormalized probabilities

Probabilities must be >= 0

Probabilities must sum to 1

probabilities

Page 46: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201946

Softmax Classifier (Multinomial Logistic Regression)

cat

frog

car

3.25.1-1.7

Want to interpret raw classifier scores as probabilitiesSoftmax Function

24.5164.00.18

0.130.870.00

exp normalize

unnormalized probabilities

Probabilities must be >= 0

Probabilities must sum to 1

probabilitiesUnnormalized log-probabilities / logits

Page 47: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201947

Softmax Classifier (Multinomial Logistic Regression)

cat

frog

car

3.25.1-1.7

Want to interpret raw classifier scores as probabilitiesSoftmax Function

24.5164.00.18

0.130.870.00

exp normalize

unnormalized probabilities

Probabilities must be >= 0

Probabilities must sum to 1

probabilitiesUnnormalized log-probabilities / logits

Li = -log(0.13) = 2.04

Page 48: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201948

Softmax Classifier (Multinomial Logistic Regression)

cat

frog

car

3.25.1-1.7

Want to interpret raw classifier scores as probabilitiesSoftmax Function

24.5164.00.18

0.130.870.00

exp normalize

unnormalized probabilities

Probabilities must be >= 0

Probabilities must sum to 1

probabilitiesUnnormalized log-probabilities / logits

Li = -log(0.13) = 2.04

Maximum Likelihood EstimationChoose weights to maximize the likelihood of the observed data(See CS 229 for details)

Page 49: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201949

Softmax Classifier (Multinomial Logistic Regression)

cat

frog

car

3.25.1-1.7

Want to interpret raw classifier scores as probabilitiesSoftmax Function

24.5164.00.18

0.130.870.00

exp normalize

unnormalized probabilities

Probabilities must be >= 0

Probabilities must sum to 1

probabilitiesUnnormalized log-probabilities / logits

1.000.000.00Correct probs

compare

Page 50: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201950

Softmax Classifier (Multinomial Logistic Regression)

cat

frog

car

3.25.1-1.7

Want to interpret raw classifier scores as probabilitiesSoftmax Function

24.5164.00.18

0.130.870.00

exp normalize

unnormalized probabilities

Probabilities must be >= 0

Probabilities must sum to 1

probabilitiesUnnormalized log-probabilities / logits

1.000.000.00Correct probs

compare

Kullback–Leibler divergence

Page 51: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201951

Softmax Classifier (Multinomial Logistic Regression)

cat

frog

car

3.25.1-1.7

Want to interpret raw classifier scores as probabilitiesSoftmax Function

24.5164.00.18

0.130.870.00

exp normalize

unnormalized probabilities

Probabilities must be >= 0

Probabilities must sum to 1

probabilitiesUnnormalized log-probabilities / logits

1.000.000.00Correct probs

compare

Cross Entropy

Page 52: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201952

Softmax Classifier (Multinomial Logistic Regression)

cat

frog

car

3.25.1-1.7

Want to interpret raw classifier scores as probabilitiesSoftmax Function

Maximize probability of correct class Putting it all together:

Page 53: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201953

Softmax Classifier (Multinomial Logistic Regression)

cat

frog

car

3.25.1-1.7

Want to interpret raw classifier scores as probabilitiesSoftmax Function

Maximize probability of correct class Putting it all together:

Q: What is the min/max possible loss L_i?

Page 54: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201954

Softmax Classifier (Multinomial Logistic Regression)

cat

frog

car

3.25.1-1.7

Want to interpret raw classifier scores as probabilitiesSoftmax Function

Maximize probability of correct class Putting it all together:

Q: What is the min/max possible loss L_i?A: min 0, max infinity

Page 55: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201955

Softmax Classifier (Multinomial Logistic Regression)

cat

frog

car

3.25.1-1.7

Want to interpret raw classifier scores as probabilitiesSoftmax Function

Maximize probability of correct class Putting it all together:

Q2: At initialization all s will be approximately equal; what is the loss?

Page 56: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201956

Softmax Classifier (Multinomial Logistic Regression)

cat

frog

car

3.25.1-1.7

Want to interpret raw classifier scores as probabilitiesSoftmax Function

Maximize probability of correct class Putting it all together:

Q2: At initialization all s will be approximately equal; what is the loss?A: log(C), eg log(10) ≈ 2.3

Page 57: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201957

Softmax vs. SVM

Page 58: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201958

Softmax vs. SVM

Page 59: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201959

Softmax vs. SVM

assume scores:[10, -2, 3][10, 9, 9][10, -100, -100]and

Q: Suppose I take a datapoint and I jiggle a bit (changing its score slightly). What happens to the loss in both cases?

Page 60: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201960

Recap- We have some dataset of (x,y)- We have a score function: - We have a loss function:

e.g.

Softmax

SVM

Full loss

Page 61: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201961

Recap- We have some dataset of (x,y)- We have a score function: - We have a loss function:

e.g.

Softmax

SVM

Full loss

How do we find the best W?

Page 62: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201962

Optimization

Page 63: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201963

This image is CC0 1.0 public domain

Page 64: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201964Walking man image is CC0 1.0 public domain

Page 65: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201965

Strategy #1: A first very bad idea solution: Random search

Page 66: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201966

Lets see how well this works on the test set...

15.5% accuracy! not bad!(SOTA is ~95%)

Page 67: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201967

Strategy #2: Follow the slope

Page 68: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201968

Strategy #2: Follow the slope

In 1-dimension, the derivative of a function:

In multiple dimensions, the gradient is the vector of (partial derivatives) along each dimension

The slope in any direction is the dot product of the direction with the gradientThe direction of steepest descent is the negative gradient

Page 69: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201969

current W:

[0.34,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347

gradient dW:

[?,?,?,?,?,?,?,?,?,…]

Page 70: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201970

current W:

[0.34,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347

W + h (first dim):

[0.34 + 0.0001,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25322

gradient dW:

[?,?,?,?,?,?,?,?,?,…]

Page 71: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201971

gradient dW:

[-2.5,?,?,?,?,?,?,?,?,…]

(1.25322 - 1.25347)/0.0001= -2.5

current W:

[0.34,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347

W + h (first dim):

[0.34 + 0.0001,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25322

Page 72: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201972

gradient dW:

[-2.5,?,?,?,?,?,?,?,?,…]

current W:

[0.34,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347

W + h (second dim):

[0.34,-1.11 + 0.0001,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25353

Page 73: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201973

gradient dW:

[-2.5,0.6,?,?,?,?,?,?,?,…]

current W:

[0.34,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347

W + h (second dim):

[0.34,-1.11 + 0.0001,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25353

(1.25353 - 1.25347)/0.0001= 0.6

Page 74: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201974

gradient dW:

[-2.5,0.6,?,?,?,?,?,?,?,…]

current W:

[0.34,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347

W + h (third dim):

[0.34,-1.11,0.78 + 0.0001,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347

Page 75: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201975

gradient dW:

[-2.5,0.6,0,?,?,?,?,?,?,…]

current W:

[0.34,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347

W + h (third dim):

[0.34,-1.11,0.78 + 0.0001,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347

(1.25347 - 1.25347)/0.0001= 0

Page 76: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201976

gradient dW:

[-2.5,0.6,0,?,?,?,?,?,?,…]

current W:

[0.34,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347

W + h (third dim):

[0.34,-1.11,0.78 + 0.0001,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347

Numeric Gradient- Slow! Need to loop over

all dimensions- Approximate

Page 77: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201977

This is silly. The loss is just a function of W:

want

Page 78: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201978

This is silly. The loss is just a function of W:

want

This image is in the public domain This image is in the public domain

Use calculus to compute an analytic gradient

Page 79: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201979

gradient dW:

[-2.5,0.6,0,0.2,0.7,-0.5,1.1,1.3,-2.1,…]

current W:

[0.34,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347

dW = ...(some function data and W)

Page 80: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201980

In summary:- Numerical gradient: approximate, slow, easy to write

- Analytic gradient: exact, fast, error-prone

=>

In practice: Always use analytic gradient, but check implementation with numerical gradient. This is called a gradient check.

Page 81: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201981

Gradient Descent

Page 82: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201982

original W

negative gradient directionW_1

W_2

Page 83: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201983

Page 84: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019

Stochastic Gradient Descent (SGD)

84

Full sum expensive when N is large!

Approximate sum using a minibatch of examples32 / 64 / 128 common

Page 85: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201985

Interactive Web Demo

http://vision.stanford.edu/teaching/cs231n-demos/linear-classify/

Page 86: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019

Aside: Image Features

86

f(x) = WxClass scores

Page 87: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019

Aside: Image Features

87

f(x) = WxClass scores

Feature Representation

Page 88: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019

Image Features: Motivation

88

x

y

Cannot separate red and blue points with linear classifier

Page 89: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019

Image Features: Motivation

89

x

y

r

θ

f(x, y) = (r(x, y), θ(x, y))

Cannot separate red and blue points with linear classifier

After applying feature transform, points can be separated by linear classifier

Page 90: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019

Example: Color Histogram

90

+1

Page 91: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019

Example: Histogram of Oriented Gradients (HoG)

91

Divide image into 8x8 pixel regionsWithin each region quantize edge direction into 9 bins

Example: 320x240 image gets divided into 40x30 bins; in each bin there are 9 numbers so feature vector has 30*40*9 = 10,800 numbers

Lowe, “Object recognition from local scale-invariant features”, ICCV 1999Dalal and Triggs, "Histograms of oriented gradients for human detection," CVPR 2005

Page 92: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019

Example: Bag of Words

92

Extract random patches

Cluster patches to form “codebook” of “visual words”

Step 1: Build codebook

Step 2: Encode images

Fei-Fei and Perona, “A bayesian hierarchical model for learning natural scene categories”, CVPR 2005

Page 93: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019

Aside: Image Features

93

Page 94: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019

Feature Extraction

Image features vs ConvNets

94

f10 numbers giving scores for classes

training

training

10 numbers giving scores for classes

Krizhevsky, Sutskever, and Hinton, “Imagenet classification with deep convolutional neural networks”, NIPS 2012.Figure copyright Krizhevsky, Sutskever, and Hinton, 2012. Reproduced with permission.

Page 95: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019

Next time:

Introduction to neural networks

Backpropagation

95


Top Related