CS6501: Deep Learning for Visual Recognition...
Transcript of CS6501: Deep Learning for Visual Recognition...
![Page 1: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/1.jpg)
CS6501: Deep Learning for Visual RecognitionCNN Architectures
![Page 2: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/2.jpg)
ILSVRC: Imagenet Large Scale Visual Recognition Challenge
[Russakovsky et al 2014]
![Page 3: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/3.jpg)
The Problem: ClassificationClassify an image into 1000 possible classes:
e.g. Abyssinian cat, Bulldog, French Terrier, Cormorant, Chickadee, red fox, banjo, barbell, hourglass, knot, maze, viaduct, etc.
cat, tabby cat (0.71)Egyptian cat (0.22)red fox (0.11)…..
![Page 4: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/4.jpg)
The Data: ILSVRC
Imagenet Large Scale Visual Recognition Challenge (ILSVRC): Annual Competition
1000 Categories
~1000 training images per Category
~1 million images in total for training
~50k images for validation
Only images released for the test set but no annotations,
evaluation is performed centrally by the organizers (max 2 per week)
![Page 5: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/5.jpg)
The Evaluation Metric: Top K-error
cat, tabby cat (0.61)Egyptian cat (0.22)red fox (0.11)Abyssinian cat (0.10)French terrier (0.03)…..
True label: Abyssinian cat
Top-1 error: 1.0 Top-1 accuracy: 0.0
Top-2 error: 1.0 Top-2 accuracy: 0.0
Top-3 error: 1.0 Top-3 accuracy: 0.0
Top-4 error: 0.0 Top-4 accuracy: 1.0
Top-5 error: 0.0 Top-5 accuracy: 1.0
![Page 6: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/6.jpg)
Top-5 error on this competition (2012)
![Page 7: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/7.jpg)
Alexnet (Krizhevsky et al NIPS 2012)
![Page 8: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/8.jpg)
Alexnet
https://www.saagie.com/fr/blog/object-detection-part1
![Page 9: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/9.jpg)
Pytorch Code for Alexnet
• In-class analysis
https://github.com/pytorch/vision/blob/master/torchvision/models/alexnet.py
![Page 10: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/10.jpg)
Dropout Layer
Srivastava et al 2014
model.train()
model.eval()
![Page 11: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/11.jpg)
Preprocessing and Data Augmentation
![Page 12: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/12.jpg)
Preprocessing and Data Augmentation
256
256
![Page 13: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/13.jpg)
Preprocessing and Data Augmentation
224x224
![Page 14: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/14.jpg)
Preprocessing and Data Augmentation
224x224
![Page 15: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/15.jpg)
True label: Abyssinian cat
![Page 16: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/16.jpg)
•Using ReLUs instead of Sigmoid or Tanh•Momentum + Weight Decay•Dropout (Randomly sets Unit outputs to zero during training) •GPU Computation!
Some Important Aspects
![Page 17: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/17.jpg)
What is happening?
https://www.saagie.com/fr/blog/object-detection-part1
![Page 18: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/18.jpg)
Feature extraction
(SIFT)
Feature encoding
(Fisher vectors)
Classification(SVM or softmax)
SIFT + FV + SVM (or softmax)
Convolutional Network(includes both feature extraction and classifier)
Deep Learning
![Page 19: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/19.jpg)
VGG Network
https://github.com/pytorch/vision/blob/master/torchvision/models/vgg.py
Simonyan and Zisserman, 2014.
Top-5:
https://arxiv.org/pdf/1409.1556.pdf
![Page 20: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/20.jpg)
BatchNormalization Layer
https://arxiv.org/abs/1502.03167
![Page 21: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/21.jpg)
GoogLeNet
https://github.com/kuangliu/pytorch-cifar/blob/master/models/googlenet.py
Szegedy et al. 2014https://www.cs.unc.edu/~wliu/papers/GoogLeNet.pdf
![Page 22: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/22.jpg)
Further Refinements – Inception v3, e.g.
GoogLeNet (Inceptionv1) Inception v3
![Page 23: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/23.jpg)
ResNet (He et al CVPR 2016)
http://felixlaumon.github.io/assets/kaggle-right-whale/resnet.png
Sorry, does not fit in slide.
https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py
![Page 24: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/24.jpg)
Slide by Mohammad Rastegari
![Page 25: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/25.jpg)
![Page 26: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/26.jpg)
https://arxiv.org/pdf/1608.06993.pdf
![Page 27: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/27.jpg)
https://arxiv.org/pdf/1608.06993.pdf
![Page 28: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/28.jpg)
Object Detection
cat
deer
![Page 29: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/29.jpg)
Object Detection as Classification
CNNdeer?cat?background?
![Page 30: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/30.jpg)
Object Detection as Classification
CNNdeer?cat?background?
![Page 31: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/31.jpg)
Object Detection as Classification
CNNdeer?cat?background?
![Page 32: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/32.jpg)
Object Detection as Classificationwith Sliding Window
CNNdeer?cat?background?
![Page 33: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/33.jpg)
Object Detection as Classificationwith Box Proposals
![Page 34: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/34.jpg)
Box Proposal Method – SS: Selective Search
Segmentation As Selective Search for Object Recognition. van de Sande et al. ICCV 2011
![Page 35: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/35.jpg)
RCNN
Rich feature hierarchies for accurate object detection and semantic segmentation. Girshick et al. CVPR 2014.
https://people.eecs.berkeley.edu/~rbg/papers/r-cnn-cvpr.pdf
![Page 36: CS6501: Deep Learning for Visual Recognition CNNArchitecturesvicente/deeplearning/2019/slides/lecture10.pdf · cat, tabby cat (0.61) Egyptian cat (0.22) red fox (0.11) Abyssinian](https://reader036.fdocuments.net/reader036/viewer/2022081601/611dc5862b2849525d5a7c75/html5/thumbnails/36.jpg)
Questions?
36