Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

50
[course site] Object Detection Day 2 Lecture 4 #DLUPC Amaia Salvador [email protected] PhD Candidate Universitat Politècnica de Catalunya

Transcript of Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

Page 1: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

[course site]

Object DetectionDay 2 Lecture 4

#DLUPC

Amaia [email protected]

PhD CandidateUniversitat Politècnica de Catalunya

Page 2: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

Object Detection

CAT, DOG, DUCK

The task of assigning a label and a bounding box to all objects in the image

2

Page 3: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

Object Detection: Datasets

3

20 categories6k training images

6k validation images10k test images

200 categories456k training images

60k validation + test images

80 categories200k training images60k val + test images

Page 4: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

Object Detection as Classification

Classes = [cat, dog, duck]

Cat ? NO

Dog ? NO

Duck? NO

4

Page 5: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

Classes = [cat, dog, duck]

Cat ? NO

Dog ? NO

Duck? NO

5

Object Detection as Classification

Page 6: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

Classes = [cat, dog, duck]

Cat ? YES

Dog ? NO

Duck? NO

6

Object Detection as Classification

Page 7: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

Classes = [cat, dog, duck]

Cat ? NO

Dog ? NO

Duck? NO

7

Object Detection as Classification

Page 8: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

Problem: Too many positions & scales to test

Solution: If your classifier is fast enough, go for it8

Object Detection as Classification

Page 9: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

Object Detection with ConvNets?

Convnets are computationally demanding. We can’t test all positions & scales !

Solution: Look at a tiny subset of positions. Choose them wisely :)9

Page 10: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

Region Proposals

● Find “blobby” image regions that are likely to contain objects● “Class-agnostic” object detector● Look for “blob-like” regions

Slide Credit: CS231n 10

Page 11: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

Region Proposals

Selective Search (SS) Multiscale Combinatorial Grouping (MCG)

[SS] Uijlings et al. Selective search for object recognition. IJCV 2013

[MCG] Arbeláez, Pont-Tuset et al. Multiscale combinatorial grouping. CVPR 2014 11

Page 12: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

Object Detection with Convnets: R-CNN

Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014

12

Page 13: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

R-CNN

Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014

1. Train network on proposals

2. Post-hoc training of SVMs & Box regressors on fc7 features

13

Page 14: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

R-CNN

14

We expect: We get:

Page 15: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

R-CNN

Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014

1. Train network on proposals

2. Post-hoc training of SVMs & Box regressors on fc7 features

3. Non Maximum Suppression + score threshold

15

Page 16: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

R-CNN

Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014

16

Page 17: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

R-CNN: Problems

1. Slow at test-time: need to run full forward pass of CNN for each region proposal

2. SVMs and regressors are post-hoc: CNN features not updated in response to SVMs and regressors

3. Complex multistage training pipeline

Slide Credit: CS231n 17

Page 18: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

Fast R-CNN

Girshick Fast R-CNN. ICCV 2015

Solution: Share computation of convolutional layers between region proposals for an image

R-CNN Problem #1: Slow at test-time: need to run full forward pass of CNN for each region proposal

18

Page 19: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

Fast R-CNN: Sharing features

Hi-res input image:3 x 800 x 600

with region proposal

Convolution and Pooling

Hi-res conv features:C x H x W

with region proposal

Fully-connected layers

Max-pool within each grid cell

RoI conv features:C x h x w

for region proposal

Fully-connected layers expect low-res conv features:

C x h x w

Slide Credit: CS231n 19Girshick Fast R-CNN. ICCV 2015

Page 20: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

Fast R-CNN

Solution: Train it all at together E2E

R-CNN Problem #2&3: SVMs and regressors are post-hoc. Complex training.

20Girshick Fast R-CNN. ICCV 2015

Page 21: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

Fast R-CNN

Slide Credit: CS231n

R-CNN Fast R-CNN

Training Time: 84 hours 9.5 hours

(Speedup) 1x 8.8x

Test time per image 47 seconds 0.32 seconds

(Speedup) 1x 146x

mAP (VOC 2007) 66.0 66.9

Using VGG-16 CNN on Pascal VOC 2007 dataset

Faster!

FASTER!

Better!

21

Page 22: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

Fast R-CNN: Problem

Slide Credit: CS231n

R-CNN Fast R-CNN

Test time per image 47 seconds 0.32 seconds

(Speedup) 1x 146x

Test time per imagewith Selective Search 50 seconds 2 seconds

(Speedup) 1x 25x

Test-time speeds don’t include region proposals

22

Page 23: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

Faster R-CNN

Con

v la

yers Region Proposal Network

FC6

Class probabilitiesFC7

FC8

RPN Proposals

RoI Pooling

Conv5_3

RPN Proposals

23Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015

Learn proposals end-to-end sharing parameters with the classification network

Page 24: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

Faster R-CNN

Con

v la

yers Region Proposal Network

FC6

Class probabilitiesFC7

FC8

RPN Proposals

RoI Pooling

Conv5_3

RPN Proposals

Fast R-CNN

24Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015

Learn proposals end-to-end sharing parameters with the classification network

Page 25: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

Region Proposal Network

Objectness scores(object/no object)

Bounding Box Regression

In practice, k = 9 (3 different scales and 3 aspect ratios)

25Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015

Page 26: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

Faster R-CNN: Training

Con

v la

yers Region Proposal Network

FC6

Class probabilitiesFC7

FC8

RPN Proposals

RoI Pooling

Conv5_3

RPN Proposals

26Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015

RoI Pooling is not differentiable w.r.t box coordinates. Solutions:● Alternate training● Ignore gradient of classification branch w.r.t proposal coordinates● Make pooling function differentiable (spoiler D3L6)

Page 27: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

Faster R-CNN

Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015

R-CNN Fast R-CNN Faster R-CNN

Test time per image(with proposals)

50 seconds 2 seconds 0.2 seconds

(Speedup) 1x 25x 250x

mAP (VOC 2007) 66.0 66.9 66.9

Slide Credit: CS231n 27

Page 28: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

Faster R-CNN

28

● Faster R-CNN is the basis of the winners of COCO and ILSVRC 2015&2016 object detection competitions.

He et al. Deep residual learning for image recognition. CVPR 2016

Page 29: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

YOLO: You Only Look Once

29Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016

Proposal-free object detection pipeline

Page 30: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

YOLO: You Only Look Once

Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 30

Page 31: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

YOLO: You Only Look Once

31

Each cell predicts:

- For each bounding box:- 4 coordinates (x, y, w, h)- 1 confidence value

- Some number of class probabilities

For Pascal VOC:

- 7x7 grid- 2 bounding boxes / cell- 20 classes

7 x 7 x (2 x 5 + 20) = 7 x 7 x 30 tensor = 1470 outputs

Page 32: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

YOLO: You Only Look Once

Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 32

Dog

Bicycle Car

Dining Table

Predict class probability for each cell

Page 33: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

YOLO: You Only Look Once

Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 33

+ NMS+ Score threshold

Page 34: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

SSD: Single Shot MultiBox Detector

Liu et al. SSD: Single Shot MultiBox Detector, ECCV 2016 34

Same idea as YOLO, + several predictors at different stages in the network

Page 35: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

YOLOv2

35Redmon & Farhadi. YOLO900: Better, Faster, Stronger. CVPR 2017

Page 36: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

YOLOv2

36

Results on Pascal VOC 2007

Page 37: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

YOLOv2

37

Results on COCO test-dev 2015

Page 38: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

Summary

38

Proposal-based methods● R-CNN● Fast R-CNN● Faster R-CNN● SPPnet ● R-FCNProposal-free methods● YOLO, YOLOv2● SSD

Page 39: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

Resources

39

● Official implementations:○ Faster R-CNN [caffe]○ Yolov2 [darknet]○ SSD [caffe]○ R-FCN [caffe][MxNet]

● Unofficial ports to other frameworks are likely to exist… eg type “yolo tensorflow” in your browser and pick the one you like best.

● Or… use the newly released Object detection API by Google: SSD, R-FCN & Faster R-CNN (code & pretrained models in tensorflow)

Object detection tutorials (project ideas maybe?):● Toy object detection (squares, circles, etc.) (keras)● Object detection (pets dataset) (tensorflow)

Page 40: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

Questions?

Page 41: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

YOLO: Training

41Slide credit: YOLO Presentation @ CVPR 2016

For training, each ground truth bounding box is matched into the right cell

Page 42: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

YOLO: Training

42Slide credit: YOLO Presentation @ CVPR 2016

For training, each ground truth bounding box is matched into the right cell

Page 43: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

YOLO: Training

43Slide credit: YOLO Presentation @ CVPR 2016

Optimize class prediction in that cell:dog: 1, cat: 0, bike: 0, ...

Page 44: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

YOLO: Training

44Slide credit: YOLO Presentation @ CVPR 2016

Predicted boxes for this cell

Page 45: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

YOLO: Training

45Slide credit: YOLO Presentation @ CVPR 2016

Find the best one wrt ground truth bounding box, optimize it (i.e. adjust its coordinates to be closer to the ground truth’s coordinates)

Page 46: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

YOLO: Training

46Slide credit: YOLO Presentation @ CVPR 2016

Increase matched box’s confidence, decrease non-matched boxes confidence

Page 47: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

YOLO: Training

47Slide credit: YOLO Presentation @ CVPR 2016

Increase matched box’s confidence, decrease non-matched boxes confidence

Page 48: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

YOLO: Training

48Slide credit: YOLO Presentation @ CVPR 2016

For cells with no ground truth detections, confidences of all predicted boxes are decreased

Page 49: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

YOLO: Training

49Slide credit: YOLO Presentation @ CVPR 2016

For cells with no ground truth detections:● Confidences of all predicted

boxes are decreased ● Class probabilities are not

adjusted

Page 50: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)

YOLO: Training, formally

50Slide credit: YOLO Presentation @ CVPR 2016

Bounding box coordinate regression

Bounding boxscore prediction

Classscore prediction

= 1 if box j and cell i are matched together, 0 otherwise

= 1 if box j and cell i are NOT matched together

= 1 if cell i has an object present