L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of...

119
CS 4803 / 7643: Deep Learning Zsolt Kira Georgia Tech Topics: (Finish) Modern CNNs Detection & Segmentation architectures

Transcript of L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of...

Page 1: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

CS 4803 / 7643: Deep Learning

Zsolt KiraGeorgia Tech

Topics: – (Finish) Modern CNNs– Detection & Segmentation architectures

Page 2: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Administrivia• PS2/HW2 out today!

– Due: 03/03, 11:55pm– Less involved than PS1/HW1

• Please send project proposal!– Get feedback before next week

(C) Dhruv Batra and Zsolt Kira 2

Page 3: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Last Time

Page 4: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Handwriting Recognition Example

Page 5: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Translation Invariance

Page 6: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Some Rotation Invariance

Page 7: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Some Scale Invariance

Page 8: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Case Studies• There are several generations of ConvNets

– 2012 – 2014: AlexNet, ZNet, VGGNet• Conv-Relu, Pooling, Fully connected, Softmax• Deeper ones (VGGNet) tend to do better

– 2014• Fully-convolutional networks for semantic segmentation• Matrix outputs rather than just one probability distribution

– 2014-2016• Fully-convolutional networks for classification• Less parameters, faster than comparable Gen1 networks• GoogleNet, ResNet

– 2014-2016• Detection layers (proposals)• Caption generation (combine with RNNs for language)

Page 9: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 10: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 11: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 12: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

AlexNet: 60M params

ZNet: 75M

VGG: 138M

GoogleNet: 5M

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 13: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 14: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 15: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Importance of Depth

• After a while, adding depth decreases performance

• At first, vanishing/exploding gradients

• normalized initialization• Batch normalization• 2nd order methods

• Then, optimization limitation– Deeper network should

be able to mimic shallow ones

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 16: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 17: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 18: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 19: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 20: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 21: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 22: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Localization and Detection

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 23: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Class ScoresCat: 0.9Dog: 0.05Car: 0.01...

So far: Image Classification

This image is CC0 public domain Vector:4096

Fully-Connected:4096 to 1000

Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced with permission. 

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 24: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Computer Vision Tasks

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 25: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Computer Vision Tasks

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 26: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Classification + Localization

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 27: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

CLS - ImageNet

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 28: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Idea 1: Localization as Regression

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 29: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 30: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 31: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 32: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 33: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Per-Class vs. Class Agnostic

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 34: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Where to attach?

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 35: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Multiple Objects

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 36: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Human Pose Estimation

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 37: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Sliding Window: Overfeat

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 38: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Sliding Window: Overfeat

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 39: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Sliding Window: Overfeat

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 40: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Sliding Window: Overfeat

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 41: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Sliding Window: Overfeat

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 42: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Sliding Window: Overfeat

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 43: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Sliding Window: Overfeat

Why aren’t boxes across grid?Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 44: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Sliding Window: Overfeat

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 45: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 46: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 47: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 48: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 49: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 50: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 51: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 52: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 53: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 54: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Detection as Classification

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 55: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Detection as Classification

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 56: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Detection as Classification

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 57: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Detection as Classification

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 58: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 59: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Detection as Classification

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 60: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

R-CNN

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 61: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

R-CNN Training

Page 62: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

R-CNN Training

Page 63: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

R-CNN Training

Page 64: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

R-CNN Training

Page 65: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

R-CNN Training

Page 66: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

R-CNN Training

Page 67: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Datasets

Page 68: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Evaluation Metric

Page 69: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

R-CNN Results

Page 70: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

R-CNN Results

Page 71: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

R-CNN Results

Page 72: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

R-CNN Results

Page 73: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

R-CNN Results

Page 74: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 75: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 76: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Region of Interest (ROI) Pooling

Page 77: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Region of Interest (ROI) Pooling

Page 78: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Region of Interest (ROI) Pooling

Page 79: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Region of Interest (ROI) Pooling

Page 80: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Region of Interest (ROI) Pooling

Page 81: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Fast R-CNN Results

Page 82: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Fast R-CNN Results

Page 83: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Fast R-CNN Results

Page 84: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Fast R-CNN Results

Page 85: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Fast R-CNN Results

Page 86: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 87: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Slide Credit: Yacov Hel-Or

Page 88: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Slide Credit: Yacov Hel-Or

Page 89: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

https://www.youtube.com/watch?v=KYNDzlcQMWA

Joint Coordinates

Also used for pose estimation

Page 90: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,
Page 91: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Computer Vision Tasks

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 92: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Semantic Segmentation

Cow

Grass

Sky

Label each pixel in the image with a category label

Don’t differentiate instances, only care about pixels

This image is CC0 public domain

Grass

Cat

Sky

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 93: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Semantic Segmentation Idea: Sliding Window

Full image

Extract patchClassify center pixel with CNN

Cow

Cow

Grass

Farabet et al, “Learning Hierarchical Features for Scene Labeling,” TPAMI 2013Pinheiro and Collobert, “Recurrent Convolutional Neural Networks for Scene Labeling”, ICML 2014

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 94: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Semantic Segmentation Idea: Sliding Window

Full image

Extract patchClassify center pixel with CNN

Cow

Cow

GrassProblem: Very inefficient! Not reusing shared features between overlapping patches Farabet et al, “Learning Hierarchical Features for Scene Labeling,” TPAMI 2013

Pinheiro and Collobert, “Recurrent Convolutional Neural Networks for Scene Labeling”, ICML 2014

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 95: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Semantic Segmentation Idea: Fully Convolutional

Input:3 x H x W

Convolutions:D x H x W

Conv Conv Conv Conv

Scores:C x H x W

argmax

Predictions:H x W

Design a network as a bunch of convolutional layers to make predictions for pixels all at once!

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 96: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Semantic Segmentation Idea: Fully Convolutional

Input:3 x H x W

Convolutions:D x H x W

Conv Conv Conv Conv

Scores:C x H x W

argmax

Predictions:H x W

Design a network as a bunch of convolutional layers to make predictions for pixels all at once!

Problem: convolutions at original image resolution will be very expensive ...

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 97: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Dilated Convolutions

(C) Dhruv Batra 98Figure Credit: Dumoulin and Visin, https://arxiv.org/pdf/1603.07285.pdf

Page 98: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Dilated Convolutions (Examples)

(C) Dhruv Batra 99

Page 99: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

(C) Dhruv Batra 100Figure Credit: Yu and Koltun, ICLR16

Page 100: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Different Idea: Still Classify, Change Test Time

• Can run on an image of any size!

(C) Dhruv Batra 101Figure Credit: [Long, Shelhamer, Darrell CVPR15]

Page 101: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Better Semantic Segmentation Idea: Fully Convolutional with Bottleneck

Input:3 x H x W Predictions:

H x W

Design network as a bunch of convolutional layers, with downsampling and upsampling inside the network!

High-res:D1 x H/2 x W/2

High-res:D1 x H/2 x W/2

Med-res:D2 x H/4 x W/4

Med-res:D2 x H/4 x W/4

Low-res:D3 x H/4 x W/4

Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR 2015Noh et al, “Learning Deconvolution Network for Semantic Segmentation”, ICCV 2015

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 102: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Semantic Segmentation Idea: Fully Convolutional

Input:3 x H x W Predictions:

H x W

Design network as a bunch of convolutional layers, with downsampling and upsampling inside the network!

High-res:D1 x H/2 x W/2

High-res:D1 x H/2 x W/2

Med-res:D2 x H/4 x W/4

Med-res:D2 x H/4 x W/4

Low-res:D3 x H/4 x W/4

Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR 2015Noh et al, “Learning Deconvolution Network for Semantic Segmentation”, ICCV 2015

Downsampling:Pooling, strided convolution

Upsampling:???

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 103: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

In-Network upsampling: “Unpooling”

1 2

3 4

Input: 2 x 2 Output: 4 x 4

1 1 2 2

1 1 2 2

3 3 4 4

3 3 4 4

Nearest Neighbor

1 2

3 4

Input: 2 x 2 Output: 4 x 4

1 0 2 0

0 0 0 0

3 0 4 0

0 0 0 0

“Bed of Nails”

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 104: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

In-Network upsampling: “Max Unpooling”

Input: 4 x 4

1 2 6 3

3 5 2 1

1 2 2 1

7 3 4 8

1 2

3 4

Input: 2 x 2 Output: 4 x 4

0 0 2 0

0 1 0 0

0 0 0 0

3 0 0 4

Max UnpoolingUse positions from pooling layer

5 6

7 8

Max PoolingRemember which element was max!

… Rest of the network

Output: 2 x 2

Corresponding pairs of downsampling and upsampling layers

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 105: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Learnable Upsampling: Transpose Convolution

Recall:Typical 3 x 3 convolution, stride 1 pad 1

Input: 4 x 4 Output: 4 x 4

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 106: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Learnable Upsampling: Transpose Convolution

Recall: Normal 3 x 3 convolution, stride 1 pad 1

Input: 4 x 4 Output: 4 x 4

Dot product between filter and input

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 107: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Learnable Upsampling: Transpose Convolution

Input: 4 x 4 Output: 4 x 4

Dot product between filter and input

Recall: Normal 3 x 3 convolution, stride 1 pad 1

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 108: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Input: 4 x 4 Output: 2 x 2

Learnable Upsampling: Transpose Convolution

Recall: Normal 3 x 3 convolution, stride 2 pad 1

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 109: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Input: 4 x 4 Output: 2 x 2

Dot product between filter and input

Learnable Upsampling: Transpose Convolution

Recall: Normal 3 x 3 convolution, stride 2 pad 1

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 110: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Learnable Upsampling: Transpose Convolution

Input: 4 x 4 Output: 2 x 2

Dot product between filter and input

Filter moves 2 pixels in the input for every one pixel in the output

Stride gives ratio between movement in input and output

Recall: Normal 3 x 3 convolution, stride 2 pad 1

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 111: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Learnable Upsampling: Transpose Convolution

3 x 3 transpose convolution, stride 2 pad 1

Input: 2 x 2 Output: 4 x 4

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 112: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Input: 2 x 2 Output: 4 x 4

Input gives weight for filter

Learnable Upsampling: Transpose Convolution

3 x 3 transpose convolution, stride 2 pad 1

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 113: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Input: 2 x 2 Output: 4 x 4

Input gives weight for filter

Sum where output overlaps

Learnable Upsampling: Transpose Convolution

3 x 3 transpose convolution, stride 2 pad 1

Filter moves 2 pixels in the output for every one pixel in the input

Stride gives ratio between movement in output and input

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 114: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Input: 2 x 2 Output: 4 x 4

Input gives weight for filter

Sum where output overlaps

Learnable Upsampling: Transpose Convolution

3 x 3 transpose convolution, stride 2 pad 1

Filter moves 2 pixels in the output for every one pixel in the input

Stride gives ratio between movement in output and input

Other names:-Deconvolution (bad)-Upconvolution-Fractionally stridedconvolution-Backward stridedconvolution

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 115: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Transpose Convolution: 1D Example

a

b

x

y

z

ax

ay

az + bx

by

bz

Input FilterOutput

Output contains copies of the filter weighted by the input, summing at where at overlaps in the output

Need to crop one pixel from output to make output exactly 2x input

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 116: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Transposed Convolution• https://distill.pub/2016/deconv-checkerboard/

(C) Dhruv Batra 117

Page 117: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Semantic Segmentation Idea: Fully Convolutional

Input:3 x H x W Predictions:

H x W

Design network as a bunch of convolutional layers, with downsampling and upsampling inside the network!

High-res:D1 x H/2 x W/2

High-res:D1 x H/2 x W/2

Med-res:D2 x H/4 x W/4

Med-res:D2 x H/4 x W/4

Low-res:D3 x H/4 x W/4

Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR 2015Noh et al, “Learning Deconvolution Network for Semantic Segmentation”, ICCV 2015

Downsampling:Pooling, strided convolution

Upsampling:Unpooling or strided transpose convolution

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 118: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

skip layers

interp + sum

interp + sum

dense output119

end-to-end, joint learningof semantics and location

Slide credit: Jonathan Long

Page 119: L13 modern cnns training - cc.gatech.edu · Case Studies •There are several generations of ConvNets –2012 –2014: AlexNet, ZNet, VGGNet •Conv-Relu, Pooling, Fully connected,

Next Time• Guest lecture by Zhile Ren (Dhruv Batra post-doc) on 3D

120