CS 4803 / 7643: Deep Learning · Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n....

CS 4803 / 7643: Deep Learning

Dhruv Batra

Georgia Tech

Topics: – Regularization– Neural Networks

Administrativia• PS1/HW1 out

– Due: 09/09, 11:59pm– Asks about topics being covered now– Caveat: one more (extra credit) problem to be added– Please please please please start early– More details next class

(C) Dhruv Batra 2

Recap from last time

(C) Dhruv Batra 3

parametersor weights

f(x,W) 10 numbers giving class scores

Array of 32x32x3 numbers(3072 numbers total)

f(x,W) = Wx + b3072x1

10x1 10x3072

Parametric Approach: Linear Classifier

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Example with an image with 4 pixels, and 3 classes (cat/dog/ship)

Input image

56 231

Stretch pixels into column

Example with an image with 4 pixels, and 3 classes (cat/dog/ship)

0.2 -0.5 0.1 2.0

1.5 1.3 2.1 0.0

0 0.25 0.2 -0.3

WInput image

56 231

Stretch pixels into column

Cat score

Dog score

Ship score

(C) Dhruv Batra 7Image Credit: Andrej Karpathy, CS231n

Linear Classifier: Three Viewpoints

f(x,W) = Wx

Algebraic Viewpoint Visual Viewpoint Geometric Viewpoint

One template per class

Hyperplanes cutting up space

1. Define a loss function that quantifies our unhappiness with the scores across the training data.

2. Come up with a way of efficiently finding the parameters that minimize the loss function. (optimization)

Cat image by Nikita is licensed under CC-BY 2.0; Car image is CC0 1.0 public domain; Frog image is in the public domain

Recall from last time: Linear Classifier

Softmax vs. SVM

2.0 -3.1

Suppose: 3 training examples, 3 classes.With some W the scores are:

Multiclass SVM loss:

Given an examplewhere is the image andwhere is the (integer) label,

and using the shorthand for the scores vector:

the SVM loss has the form:

“Hinge loss”

Softmax vs. SVM

Softmax Classifier (Multinomial Logistic Regression)

Want to interpret raw classifier scores as probabilities

Softmax Function

unnormalized probabilities

Probabilities must be >= 0

Softmax Function

exp normalize

Probabilities must sum to 1

probabilities

Softmax Function

exp normalize

probabilitiesUnnormalized log-probabilities / logits

Softmax Function

exp normalize

Li = -log(0.13) = 2.04

Softmax Function

exp normalize

Li = -log(0.13) = 2.04

Maximum Likelihood EstimationChoose probabilities to maximize the likelihood of the observed data

Softmax Function

exp normalize

0.00Correct probs

compare

Softmax Function

exp normalize

0.00Correct probs

compare

Kullback–Leibler divergence

Softmax Function

exp normalize

0.00Correct probs

compare

Cross Entropy

Log-Likelihood / KL-Divergence / Cross-Entropy

(C) Dhruv Batra 22

Plan for Today• Regularization• Neural Networks

(C) Dhruv Batra 23

Regularization

Data loss: Model predictions should match training data

Regularization

Regularization: Prevent the model from doing too well on training data

Regularization

= regularization strength(hyperparameter)

Regularization Intuition in Polynomial Regression

Polynomial Regression

(C) Dhruv Batra 29

(C) Dhruv Batra 30

Error Decomposition

(C) Dhruv Batra 31

Reality

Estimation

ErrorO

ptimization

Error = 0

Modeling Erro

model class

Softmax

FC HxWx3

Multi-class Logistic Regression

Polynomial Regression• Demo: https://arachnoid.com/polysolve/

• You are a scientist studying runners. – You measure average speeds of the best runners at different ages.

• Data: Age (years), Speed (mph)– 10 6– 15 9– 20 11– 25 12– 29 13– 40 11– 50 10– 60 9

(C) Dhruv Batra 32

Regularization

Simple examplesL2 regularization: L1 regularization: Elastic net (L1 + L2):

Regularization

Simple examplesL2 regularization: L1 regularization: Elastic net (L1 + L2):

More complex:DropoutBatch normalizationStochastic depth, fractional pooling, etc

Regularization

Why regularize?- Express preferences over weights- Make the model simple so it works on test data- Improve optimization by adding curvature

- We have some dataset of (x,y)- We have a score function: - We have a loss function:

Softmax

Full loss

- We have some dataset of (x,y)- We have a score function: - We have a loss function:

Softmax

Full loss

How do we find the best W?

Next: Neural Networks

So far: Linear Classifiers

f(x) = Wx

Class scores

Hard cases for a linear classifier

Class 1: First and third quadrants

Class 2: Second and fourth quadrants

Class 1: 1 <= L2 norm <= 2

Class 2:Everything else

Class 1: Three modes

Class 2:Everything else

Aside: Image Features

f(x) = WxClass scores

Feature Representation

Image Features: Motivation

Cannot separate red and blue points with linear classifier

Image Features: Motivation

f(x, y) = (r(x, y), θ(x, y))

Cannot separate red and blue points with linear classifier

After applying feature transform, points can be separated by linear classifier

Example: Color Histogram

Example: Histogram of Oriented Gradients (HoG)

Divide image into 8x8 pixel regionsWithin each region quantize edge direction into 9 bins

Example: 320x240 image gets divided into 40x30 bins; in each bin there are 9 numbers so feature vector has 30*40*9 = 10,800 numbers

Lowe, “Object recognition from local scale-invariant features”, ICCV 1999Dalal and Triggs, "Histograms of oriented gradients for human detection," CVPR 2005

Feature Extraction

Image features vs Neural Nets

f10 numbers giving scores for classes

training

10 numbers giving scores for classes

Krizhevsky, Sutskever, and Hinton, “Imagenet classification with deep convolutional neural networks”, NIPS 2012.Figure copyright Krizhevsky, Sutskever, and Hinton, 2012. Reproduced with permission.

Error Decomposition

(C) Dhruv Batra 52

Reality

Estimation

ErrorO

ptimization

Error = 0

Modeling Erro

model class

Softmax

FC HxWx3

Multi-class Logistic Regression

(Before) Linear score function:

Neural networks: without the brain stuff

(Now) 2-layer Neural Network

x hW1 sW2

3072 100 10

x hW1 sW2

3072 100 10

(Now) 2-layer Neural Network or 3-layer Neural Network

Multilayer Networks• Cascaded “neurons”• The output from one layer is the input to the next• Each layer has its own sets of weights

(C) Dhruv Batra 58Image Credit: Andrej Karpathy, CS231n

“Fully-connected” layers

“2-layer Neural Net”, or“1-hidden-layer Neural Net”

“3-layer Neural Net”, or“2-hidden-layer Neural Net”

Neural networks: Architectures

This image by Fotis Bobolas is licensed under CC-BY 2.0

Impulses carried toward cell body

Impulses carried away from cell body

This image by Felipe Peruchois licensed under CC-BY 3.0

dendrite

cell body

presynaptic terminal

dendrite

cell body

sigmoid activation function

dendrite

cell body

Be very careful with your brain analogies!

Biological Neurons:● Many different types● Dendrites can perform complex non-linear computations● Synapses are not a single weight but a complex non-linear dynamical

system● Rate code may not be adequate

[Dendritic Computation. London and Hausser]

Sigmoid

Leaky ReLU

Maxout

Activation functions

Activation Functions• sigmoid vs tanh

(C) Dhruv Batra 67

A quick note

(C) Dhruv Batra 68Image Credit: LeCun et al. ‘98

Rectified Linear Units (ReLU)

(C) Dhruv Batra 69

[Krizhevsky et al., NIPS12]

Demo Time• https://playground.tensorflow.org

CS 4803 / 7643: Deep Learning · Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n....

Documents

Transcript of CS 4803 / 7643: Deep Learning · Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n....

Fei Jia CS Resume 3.0.1fei-jia.github.io/Fei Jia CS Resume.pdfMicrosoft Word - Fei Jia CS Resume 3.0.1.docx Created Date: 4/9/2015 5:32:17 AM ...

Curriculum Vitae - University of Southern Californiafeisha/cv.pdf · Curriculum Vitae Fei Sha ... August 25, 2016 3. Fei Sha Curriculum Vitae ... CS PhD Curriculum Standing Committee,

crowdsourcing, benchmarking & other cool thingsimage-net.org/papers/ImageNet_2010.pdf · crowdsourcing, benchmarking & other cool things Fei-Fei Li (publish under L. Fei-Fei) Computer

Li Fei-Fei, Stanford Rob Fergus, NYU Antonio ... - People

hewan omni aify angel fei fei p4.docx

University Physics 226N/231N Old Dominion University ... · 15-10-2012 · Prof. Satogata / Fall 2012 ODU University Physics 226N/231N 6 Center of Mass ! For any object or group

University Physics 226N/231N Old Dominion University More ...

Lecture3 xing fei-fei

Defense-Fei Fei

FEI - Dominio FEI 01 en-us

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 2 ...cs231n.stanford.edu/slides/2017/cs231n_2017_lecture2.pdf · All pixels change when ... Fei-Fei Li & Justin Johnson & Serena

Li- Jia Li, Richard Socher , Li Fei-Fei

CS 7643: Deep Learning - Georgia Institute of Technology 28 28 Slide Credit: Fei-FeiLi, Justin Johnson, Serena Yeung, CS 231n 32 32 3 Convolution Layer activation maps 6 28 28 For

CS 2770: Computer Visionkovashka/cs1674_fa20/vision... · 2020. 10. 22. · Fei-Fei Li & Justin Johnson & SerenaYeung Lecture 11- 2 Classification + Localization Semantic Segmentation

FEI -8 FEI 0.5 - 10 14.0 16.0 8.0 10.0 [RoHS2 (AWG) (22-20) · 14.0 16.0 8.0 10.0 [rohs2 (awg) (22-20) 5- (20-19) (18) (16) fei o. fei fei fei fei 75-10 16.0 14.0 16.0 14.0 16.0 10.0

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 ...cs231n.stanford.edu/slides/2018/cs231n_2018_lecture04.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12,

Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei Li,

New Neurofusion: Fusing MEG and EEG Datacs231n.stanford.edu/reports/2016/pdfs/321_Report.pdf · 2016. 3. 23. · Neurofusion: Fusing MEG and EEG Data Paul Warren Stanford CS 231n,

CS 231N Final Report: Single-Stream Action …cs231n.stanford.edu/reports/2017/pdfs/29.pdfCS 231N Final Report: Single-Stream Action Proposals in Videos Ines Chami Institute for Computational

FEI Prohibited Substances List - FEI Clean Sport