ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

69
ILSVRC 2015 CLS-LOC. Multi-Class AttentionNet. D. Yoo 1 , K. Paeng 1 , S. Park 1 , S. Hwang 2 , H. E. Kim 2 , J. Lee 2 , M. Jang 2 , A. S. Paek 2 , K. K. Kim 1 , S. D. Kim 1 , I. S. Kweon 1 . 1 KAIST, 2 Lunit Inc.

Transcript of ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Page 1: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

ILSVRC 2015 CLS-LOC.

Multi-Class AttentionNet.D. Yoo1, K. Paeng1, S. Park1, S. Hwang2, H. E. Kim2, J. Lee2,

M. Jang2, A. S. Paek2, K. K. Kim1, S. D. Kim1, I. S. Kweon1.1KAIST, 2Lunit Inc.

Page 2: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

State-of-the-art methods for object localization.

Page 3: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

State-of-the-art methods for object localization.

1) Box-regression with a CNN.

[Szegedy et al., NIPS’13],

DeepMultiBox [Erhan et al., CVPR’14],OverFeat [Sermanet et al., ICLR’14],

Page 4: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

State-of-the-art methods for object localization.

1) Box-regression with a CNN.

(−) Direct mapping from an image to an exactbounding box is relatively difficult for a CNN.

(X1,Y1)

(X2,Y2)

CN

N

Page 5: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

State-of-the-art methods for object localization.

2) Region proposal + classifier.

R-CNN [Gkioxari et al., CVPR’14],Fast R-CNN [Gkioxari, ICCV’15],

Faster R-CNN [Ren et al., NIPS’15],DeepMultiBox [Erhan et al., CVPR’14],

Page 6: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

State-of-the-art methods for object localization.

2) Region proposal + classifier.

(−) Prone to focus on discriminative part (e.g. face)rather than entire object (e.g. human body).

Page 7: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Idea:Ensemble of weak directions.

Page 8: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Idea:Ensemble of weak directions.

Page 9: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Idea:Ensemble of weak directions.

Page 10: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Idea:Ensemble of weak directions.

Page 11: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Idea:Ensemble of weak directions.

Page 12: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Idea:Ensemble of weak directions.

Page 13: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Idea:Ensemble of weak directions.

Stop signal.

Page 14: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Idea:Ensemble of weak directions.

Stop signal.

Page 15: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Idea:Ensemble of weak directions.

Stop signal.

Page 16: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Idea:Ensemble of weak directions.

Stop signal.

Stop signal.

Page 17: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Idea:Ensemble of weak directions.

Stop signal.

Stop signal.

Page 18: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Model:

Page 19: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Model:(CNN regression model)

Page 20: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Model:Rather than CNN regression model, we use CNN classification model.

Page 21: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Model:Rather than CNN regression model, we use CNN classification model.

Define weak directions:fixed length, and quantized.

Page 22: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Strength to the previous methods.

Box-regression:(−) Relatively

difficult for a CNN.

Weak direction:

(+) Relatively

easy for a CNN.

Page 23: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Strength to the previous methods.

R-CNN:(−) Focuses on

distinctive parts.

Box-regression:(−) Relatively

difficult for a CNN.

Weak direction:

(+) Relatively

easy for a CNN.

Stop signal:

(+) Supervision of

clear terminal point.

Stop signal.

Stop signal.

Page 24: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

AttentionNet:Two layers for each corner.

CNN

Top-left corner. Bottom-right corner.

Page 25: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

AttentionNet:Two layers for each corner.

CNN

Top-left corner. Bottom-right corner.

Page 26: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

AttentionNet:Two layers for each corner.

CNN

F F

Top-left corner. Bottom-right corner.

Page 27: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

AttentionNet: iterative classification.

Page 28: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

AttentionNet.

AttentionNet: iterative classification.

Resize

CN

N

F•

F•

Page 29: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

AttentionNet.

AttentionNet: iterative classification.

F &&

F

Resize

CN

N

F•

F•

1

Reject.

Page 30: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Detected.

AttentionNet.

AttentionNet: iterative classification.

F &&

F

• &&

0

Resize

CN

N

F•

F•

1 1

Reject.

Page 31: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Detected.

AttentionNet.

AttentionNet: iterative classification.

F &&

F

• &&

0 0

Resize

CN

N

F•

F•

1 1

Reject.

Page 32: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Detected.

AttentionNet.

AttentionNet: iterative classification.

F &&

F

• &&

0 0

Resize

CN

N

F•

F•

1 1

Reject.

Page 33: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Detected.

AttentionNet.

AttentionNet: iterative classification.

F &&

F

• &&

0 0

Resize

CN

N

F•

F•

1 1

Reject.

Page 34: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Detected.

AttentionNet.

AttentionNet: iterative classification.

F &&

F

• &&

0 0

Resize

CN

N

F•

F•

1 1

Reject.

Page 35: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...
Page 36: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...
Page 37: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Initial box proposal:

Page 38: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Initial box proposal:

Boxes satisfying .

Page 39: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Initial box proposal:

Boxes satisfying .

Rejected.

Page 40: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Initial box proposal:

Boxes satisfying .

Rejected.

Page 41: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Initial box proposal:

Boxes satisfying .

Continue.

Page 42: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Initial box proposal:

Boxes satisfying .

Detected.

AttentionNet.

F &&

F

• &&

0 0

Resize

CN

N

F•

F•

1 1

Reject.

Page 43: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Initial box proposal:

Boxes satisfying .

Multi-{scale, aspect ratio} sliding window searchusing fully-convolutional network.

Page 44: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Initial detection and refinement.

Page 45: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Initial detection and refinement.

Page 46: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Initial detection and refinement.

Page 47: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Initial detection and refinement.

Page 48: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Initial detection and refinement.

Page 49: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Extension to multiple classes.

AttentionNet.

CNN

Page 50: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Extension to multiple classes.

AttentionNet.

CNN

Class 1.

Page 51: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Extension to multiple classes.

Multi-class AttentionNet.

CNN

Class 1. Class 2.

Page 52: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Extension to multiple classes.

CNN

Class 1. Class 2. Class 3.

Multi-class AttentionNet.

Page 53: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Extension to multiple classes.

CNN

Class 1. Class 2. Class 3. Class N.

Multi-class AttentionNet.

Page 54: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Extension to multiple classes.

CNN

Class 1. Class 2. Class 3. Class N.

Multi-class AttentionNet.

Class-wise direction layers. Classification layer.

Page 55: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Final architecture.

•↘ →↓ •↑ ←↖

Conv8-C1-TL.

1*1*1,024*4

Conv8-C1-BR.

1*1*1,024*4

Conv8-C2-TL.

1*1*1,024*4

Conv8-C2-BR.

1*1*1,024*4

Conv8-CN-TL.

1*1*1,024*4

Conv8-CN-BR.

1*1*1,024*4

•↘ →↓ •↑ ←↖ •↘ →↓ •↑ ←↖

Conv8-CLS.

1*1*1,024*(N+1)

FC1 C2 C3 ⋯ CN

� � � �� � �

Classification layer.

Directional layers.

GoogLeN

et

[Sze

gedy

et a

l, CVPR’1

5]

Page 56: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Training multi-class AttentionNet.

Page 57: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Training multi-class AttentionNet.

•Pre-training.• GoogLeNet [Szegedy et al, CVPR’15].

• ILSVRC-CLS dataset.

Page 58: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Training multi-class AttentionNet.

•Pre-training.• GoogLeNet [Szegedy et al, CVPR’15].

• ILSVRC-CLS dataset.

•Fine-tuning.• # epochs: 5.

• # training region: 22M. (randomly generated.)• Learning rate of the classification layer: 0.01.

• Learning rate of the 2K(=1K+1K) directional layers: 0.01.

• Learning rate of the layers from conv1 to conv21: 0.001.

Page 59: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Training multi-class AttentionNet.

𝐿𝑜𝑠𝑠 =1

3𝐿𝑜𝑠𝑠𝑇𝐿 +

1

3𝐿𝑜𝑠𝑠𝐵𝑅 +

1

3𝐿𝑜𝑠𝑠𝐶𝐿𝑆,

Page 60: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Training multi-class AttentionNet.

𝐿𝑜𝑠𝑠 =1

3𝐿𝑜𝑠𝑠𝑇𝐿 +

1

3𝐿𝑜𝑠𝑠𝐵𝑅 +

1

3𝐿𝑜𝑠𝑠𝐶𝐿𝑆,

𝐿𝑜𝑠𝑠𝑇𝐿 =1

𝑁

𝑖=1

𝑁

𝑡𝑐𝑖𝑇𝐿 ≠ 0 ⋅ 𝑆𝑜𝑓𝑡𝑀𝑎𝑥𝐿𝑜𝑠𝑠 𝑦𝑐𝑖

𝑇𝐿, 𝑡𝑐𝑖𝑇𝐿 ,

𝐿𝑜𝑠𝑠𝐵𝑅 =1

𝑁

𝑖=1

𝑁

𝑡𝑐𝑖𝐵𝑅 ≠ 0 ⋅ 𝑆𝑜𝑓𝑡𝑀𝑎𝑥𝐿𝑜𝑠𝑠 𝑦𝑐𝑖

𝐵𝑅 , 𝑡𝑐𝑖𝐵𝑅 ,

𝐿𝑜𝑠𝑠𝐶𝐿𝑆 = 𝑆𝑜𝑓𝑡𝑀𝑎𝑥𝐿𝑜𝑠𝑠 𝑦𝐶𝐿𝑆, 𝑡𝐶𝐿𝑆 .

Page 61: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Test:Given top-5 class predictions,

we detect the classes by AttentionNet.

Page 62: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Test:Given top-5 class predictions,

we detect the classes by AttentionNet.

•Top-5 class prediction (7% Err): Ensemble of GoogLeNet, GoogLeNet-BN, VGG-16.

Page 63: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Test:Given top-5 class predictions,

we detect the classes by AttentionNet.

•Top-5 class prediction (7% Err): Ensemble of GoogLeNet, GoogLeNet-BN, VGG-16.

•Number of multi-{scale, aspect ratio} inputs: 6.

Page 64: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Results on validation set.

Page 65: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Results on validation set.

Method. Top-5 CLS-LOC Error.

OverFeat [Sermanet et al., ICLR’14] 30.00%

VGG [Simonyan and Zisserman, ICLR’15] 26.90%

GoogLeNet [Szegedy et al, CVPR’15] 26.70% (test set)

Page 66: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Results on validation set.

Method. Top-5 CLS-LOC Error.

OverFeat [Sermanet et al., ICLR’14] 30.00%

VGG [Simonyan and Zisserman, ICLR’15] 26.90%

GoogLeNet [Szegedy et al, CVPR’15] 26.70% (test set)

A single “Multi-class AttentionNet”, without test augmentation.

16.11%

Page 67: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Results on validation set.

Method. Top-5 CLS-LOC Error.

OverFeat [Sermanet et al., ICLR’14] 30.00%

VGG [Simonyan and Zisserman, ICLR’15] 26.90%

GoogLeNet [Szegedy et al, CVPR’15] 26.70% (test set)

A single “Multi-class AttentionNet”, without test augmentation.

16.11%

A single “Multi-class AttentionNet”, with test augmentation (original and flip).

14.96%

Page 68: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Results on validation set.

Method. Top-5 CLS-LOC Error.

OverFeat [Sermanet et al., ICLR’14] 30.00%

VGG [Simonyan and Zisserman, ICLR’15] 26.90%

GoogLeNet [Szegedy et al, CVPR’15] 26.70% (test set)

A single “Multi-class AttentionNet”, without test augmentation.

16.11%

A single “Multi-class AttentionNet”, with test augmentation (original and flip).

14.96%

Note that we use a SINGLE “Multi-class AttiontionNet”.

Page 69: ILSVRC 2015 CLS-LOC. D. Yoo1, K. Paeng1, S. Park1, S. Hwang2 ...

Related publication:

Donggeun Yoo, Sunggyun Park, Joon-Young Lee, Anthony S. Paek, In So Kweon,

AttentionNet: Aggregating Weak Directions for Accurate Object Detection,

In ICCV, 2015.