CS230: Lecture 5cs230.stanford.edu/fall2017/files/CS230_Handout6.pdf · CS230: Lecture 5 Advanced...

18
Kian Katanforoosh, Andrew Ng CS230: Lecture 5 Advanced topics in Object Detection Kian Katanforoosh, Andrew Ng Stanford University

Transcript of CS230: Lecture 5cs230.stanford.edu/fall2017/files/CS230_Handout6.pdf · CS230: Lecture 5 Advanced...

Page 1: CS230: Lecture 5cs230.stanford.edu/fall2017/files/CS230_Handout6.pdf · CS230: Lecture 5 Advanced topics in Object Detection Kian Katanforoosh, Andrew Ng Stanford University. Kian

Kian Katanforoosh, Andrew Ng

CS230: Lecture 5Advanced topics in Object Detection

Kian Katanforoosh, Andrew Ng Stanford University

Page 2: CS230: Lecture 5cs230.stanford.edu/fall2017/files/CS230_Handout6.pdf · CS230: Lecture 5 Advanced topics in Object Detection Kian Katanforoosh, Andrew Ng Stanford University. Kian

Kian Katanforoosh, Andrew Ng

I. Object detection Intuition

II. R-CNN

III. Fast R-CNN

IV. Faster R-CNN

V. YOLO

Today’s outline

We will learn:

- the Intuition behind object detection methods

- a series of computer vision papers

Let’s go

Page 3: CS230: Lecture 5cs230.stanford.edu/fall2017/files/CS230_Handout6.pdf · CS230: Lecture 5 Advanced topics in Object Detection Kian Katanforoosh, Andrew Ng Stanford University. Kian

Kian Katanforoosh, Andrew Ng

+(bx ,by )

bh

bw

: confidence of an object being present in the bounding boxpc = 1c = 0 : class of the object being detected (here 0 for “car”)

y = (pc ,bx ,by ,bh ,bw ,c)Object Detection Intuition

Page 4: CS230: Lecture 5cs230.stanford.edu/fall2017/files/CS230_Handout6.pdf · CS230: Lecture 5 Advanced topics in Object Detection Kian Katanforoosh, Andrew Ng Stanford University. Kian

Kian Katanforoosh, Andrew Ng

Object Detection Intuition

y = (pc ,bx ,by ,bh ,bw ,c)

Page 5: CS230: Lecture 5cs230.stanford.edu/fall2017/files/CS230_Handout6.pdf · CS230: Lecture 5 Advanced topics in Object Detection Kian Katanforoosh, Andrew Ng Stanford University. Kian

Kian Katanforoosh, Andrew Ng

Object Detection Intuition

Classifierypc

⎧⎨⎩

Far too many computations

Page 6: CS230: Lecture 5cs230.stanford.edu/fall2017/files/CS230_Handout6.pdf · CS230: Lecture 5 Advanced topics in Object Detection Kian Katanforoosh, Andrew Ng Stanford University. Kian

Kian Katanforoosh, Andrew Ng

R-CNN (Regions with CNN features)

input image region proposal

warped proposal

CNN

0.9310.4330.331!0.9420.1580.039

⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟

4096-d

SVM (car)SVM (pedestrian)

…SVM (traffic light)

“yes”“no”

“no”…

How do you find these regions?

Ross Girshick Jeff Donahue Trevor Darrell Jitendra Malik: Rich feature hierarchies for accurate object detection and semantic segmentation (2014)

Page 7: CS230: Lecture 5cs230.stanford.edu/fall2017/files/CS230_Handout6.pdf · CS230: Lecture 5 Advanced topics in Object Detection Kian Katanforoosh, Andrew Ng Stanford University. Kian

Kian Katanforoosh, Andrew Ng

R-CNN (Regions with CNN features)

input image

“segmentation”

J.R.R. Uijlings, K.E.A. van de Sande, T. Gevers , and A.W.M. Smeulders: Selective Search for Object Recognition (2012)

“proposals”

segmentation labels region proposals

At test time on a new image, ≈ 2000 regions proposals!region

CNN

0.9310.4330.331!0.9420.1580.039

⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟

4096-d

SVM (car)SVM (pedestrian)

…SVM (traffic light)

“yes”“no”

“no”…

x 2000

But we only needed two boxes: 1 car + 1 pedestrian …

Page 8: CS230: Lecture 5cs230.stanford.edu/fall2017/files/CS230_Handout6.pdf · CS230: Lecture 5 Advanced topics in Object Detection Kian Katanforoosh, Andrew Ng Stanford University. Kian

Kian Katanforoosh, Andrew Ng

Non-max suppression

Still slow… And training is not unified: CNN and each SVM have to be trained separately!

region + scores + classes0.50 “pedestrian”

0.75 “pedestrian”

0.32 “tree”

0.11 “car”0.12 “car”

0.14 “zebra”

0.01 “fire hydrant

0.54 “car”

input image

selective search

region proposals

Intersection Union Intersection over Union

IoU = B1∩ B2B1∪ B2

=

B1B2 B1 B2

0.75 “pedestrian”

0.54 “car”

NMS +

score threshold

CNN + SVMs

Page 9: CS230: Lecture 5cs230.stanford.edu/fall2017/files/CS230_Handout6.pdf · CS230: Lecture 5 Advanced topics in Object Detection Kian Katanforoosh, Andrew Ng Stanford University. Kian

Kian Katanforoosh, Andrew Ng

Evaluate the performance of your model

Derek Hoiem, Yodsawalai Chodpathumwan, and Qieyun Dai: Diagnosing Error in Object Detectors

- BG: background and non-labelled objects - Loc: localization error (IoU) - Sim: Confusion with similar objects - Oth: Confusion with dissimilar objects

Page 10: CS230: Lecture 5cs230.stanford.edu/fall2017/files/CS230_Handout6.pdf · CS230: Lecture 5 Advanced topics in Object Detection Kian Katanforoosh, Andrew Ng Stanford University. Kian

Kian Katanforoosh, Andrew Ng

Fast R-CNN

R-CNN drawbacks:

- Training is a multi-stage pipeline

- Training is expensive in space and time

- Testing is slow

Fast R-CNN solutions:

- Share computations for the CNN (not per-region anymore)

- Make training single-stage with multi-task loss

- No disk storage

Page 11: CS230: Lecture 5cs230.stanford.edu/fall2017/files/CS230_Handout6.pdf · CS230: Lecture 5 Advanced topics in Object Detection Kian Katanforoosh, Andrew Ng Stanford University. Kian

Kian Katanforoosh, Andrew Ng

input image + RoIs

CNN

Fast R-CNN

feature map

RoI Pooling layer

0.9310.4330.331!0.9420.1580.039

⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟

4096-d

FCs

FCs (Softmax)

FCs (Bbox regressor)

#classes

0.140.540.190.34

⎜⎜⎜⎜

⎟⎟⎟⎟

0.310.130.47!0.430.810.09

⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟

0.240.320.180.14

⎜⎜⎜⎜

⎟⎟⎟⎟…

#classes L p, u, t u , v( ) = Lcls p, u( ) + λΙ u ≥ 1[ ]Lloc tu , v( )

How do you train this?

Ross Girshick: Fast R-CNN (2015)

Page 12: CS230: Lecture 5cs230.stanford.edu/fall2017/files/CS230_Handout6.pdf · CS230: Lecture 5 Advanced topics in Object Detection Kian Katanforoosh, Andrew Ng Stanford University. Kian

Kian Katanforoosh, Andrew Ng

Let’s make it even faster

Page 13: CS230: Lecture 5cs230.stanford.edu/fall2017/files/CS230_Handout6.pdf · CS230: Lecture 5 Advanced topics in Object Detection Kian Katanforoosh, Andrew Ng Stanford University. Kian

Kian Katanforoosh, Andrew Ng

Faster R-CNN

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun: Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (2016)

input image (no RoI)

CNN

feature map

Region Proposal Network

feature map with RoIs

“attention”

feature mapk anchors

per window FCs (Softmax)

FCs (Bbox regressor)

0.140.540.190.34

⎜⎜⎜⎜

⎟⎟⎟⎟

0.240.320.180.14

⎜⎜⎜⎜

⎟⎟⎟⎟…

k

0.310.130.47!0.430.810.09

⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟

k scores

L pi{ }, {ti}( ) = 1Ncls

Lclsi∑ pi , pi

* ( )+ λ 1Nreg

pi*Lreg

i∑ ti , ti

*( )

Fast R-CNN

Page 14: CS230: Lecture 5cs230.stanford.edu/fall2017/files/CS230_Handout6.pdf · CS230: Lecture 5 Advanced topics in Object Detection Kian Katanforoosh, Andrew Ng Stanford University. Kian

Kian Katanforoosh, Andrew Ng

YOLO

Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi: You Only Look Once: Unified, Real-Time Object Detection (2015)Joseph Redmon, Ali Farhadi: YOLO9000: Better, Faster, Stronger (2016)

Deep CNN

reduction factor: 32

preprocessed image (608, 608, 3)

encoding (19,19, 5, 85)19

19

box 1

box 2

box 3

box 4

box 5

bwbx by bhpc…

80 class probabilities

Page 15: CS230: Lecture 5cs230.stanford.edu/fall2017/files/CS230_Handout6.pdf · CS230: Lecture 5 Advanced topics in Object Detection Kian Katanforoosh, Andrew Ng Stanford University. Kian

Kian Katanforoosh, Andrew Ng

box 180 class probabilities

bx bwby bhpc p1 p2 p3 p4 p5 p76 p77 p78 p79 p80… … … … … … … … … … … … … … …

pc *

p1p2p3!p78p79p80

⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟

=

pc p1pc p2pc p3!pc p78pc p79pc p80

⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟

=

0.120.230.44!0.070.110.09

⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟

find the max

the box has detected c = 3 (“car”) with probability score: 0.44

box: (bx ,by ,bh ,bw )scores =score: 0.44

class: c = 3 (“car”)

(bx ,by ,bh ,bw )

YOLO

Page 16: CS230: Lecture 5cs230.stanford.edu/fall2017/files/CS230_Handout6.pdf · CS230: Lecture 5 Advanced topics in Object Detection Kian Katanforoosh, Andrew Ng Stanford University. Kian

Kian Katanforoosh, Andrew Ng

YOLO

Need to filter: - Score thresholding

- NMS

Page 17: CS230: Lecture 5cs230.stanford.edu/fall2017/files/CS230_Handout6.pdf · CS230: Lecture 5 Advanced topics in Object Detection Kian Katanforoosh, Andrew Ng Stanford University. Kian

Kian Katanforoosh, Andrew Ng

Before non-max suppression After non-max suppression

Non-MaxSuppression

YOLO

Page 18: CS230: Lecture 5cs230.stanford.edu/fall2017/files/CS230_Handout6.pdf · CS230: Lecture 5 Advanced topics in Object Detection Kian Katanforoosh, Andrew Ng Stanford University. Kian

Kian Katanforoosh, Andrew Ng

YOLO

Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi: You Only Look Once: Unified, Real-Time Object Detection

demo