딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae...

72
딥러닝 모델 엑기스 추출 Knowledge Distillation Machine Learning & Visual Computing Lab 김유민 1

Transcript of 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae...

Page 1: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

딥러닝 모델 엑기스 추출Knowledge Distillation

Machine Learning & Visual Computing Lab

김유민

1

Page 2: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

- Compression Methods

- Distillation

- Papers

- Summary

CONTEXT

2

Page 3: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

COMPRESSION METHODS

3

Page 4: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

• Pruning

• Quantization

• Efficient Design

• Distillation

• Etc...

[1] Song Han, Jeff Pool, John Tran and Willan J. Dally. Learning both Weights and Connections for Efficient Neural Networks. In NIPS, 2015.

[2] Song Han, Huizi Mao and William J. Dally. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. In ICLR, 2016.

[3] Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto and Hartwig Adam. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. In arXiv, 2017.

[4] Chun-Fu(Richard) Chen et al. Big-Little Net: an Efficient Multi-Scale Feature Representation for visual and speech recognition. In ICLR, 2019.

+

<Weight Pruning[1]>

<Big-little net architecture[4]><Depthwise separable convolution[3]>

<Weight Quantization[2]>

4

Page 5: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

DISTILLATION

5

Page 6: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

LEARNING TEACHING

teacher

student student student

Teacher’sKnowledge

6

Teacher-Student Relation

Page 7: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scitt Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke and Andrew Rabinovich. Going Deeper with Convolutions. In CVPR, 2015.

<Teacher Network>

<Student Network>

Large Neural NetworkClassification Accuracy : 93.33%

Small Neural NetworkClassification Accuracy : 58.34%

teacher

student

=

=

Teacher’sKnowledge

7

Teacher-Student Relation in Deep Neural Network

Page 8: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scitt Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke and Andrew Rabinovich. Going Deeper with Convolutions. In CVPR, 2015.

<Teacher Network>

<Student Network>

Large Neural NetworkClassification Accuracy : 93.33%

Small Neural NetworkClassification Accuracy : 58.34%

teacher

student

=

=

TEACHINGLEARNING TEACHINGLEARNING

Teacher’sKnowledge

8

Teacher-Student Relation in Deep Neural Network

Page 9: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scitt Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke and Andrew Rabinovich. Going Deeper with Convolutions. In CVPR, 2015.

<Teacher Network>

<Learned Student Network>

Large Neural NetworkClassification Accuracy : 93.33%

Small Neural NetworkClassification Accuracy : 58.34% -> 93.33%

teacher

student

=

=

Teacher’sKnowledge

9

Teacher-Student Relation in Deep Neural Network

Page 10: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Do Deep Nets Really Need to be Deep?

2014NIPS

10

Page 11: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Animal pictures – www.freepik.com

𝒏𝒕𝒉 Layer(=last layer) 43 15 10 8

Softmax( 𝒀𝒊 = 𝒆𝒛𝒊

σ𝒊=𝟏𝒏 𝒆𝒛𝒊

)

0.070.03 0.7 0.11 0.09

00 1 0 0True label

Cross-entropy Loss : L(Y, 𝒀) = -σ𝒊𝒀𝒊 𝐥𝐨𝐠 𝒀

<Last section of DNN>

𝒏 − 𝟏𝒕𝒉 Layer

Final output softmax distribution

PropagationDirection

Do Deep Nets Really Need to be Deep?. In NIPS, 2014.- Lei Jimmy Ba and Rich Caruana.

11

Page 12: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Animal pictures – www.freepik.com

𝒏𝒕𝒉 Layer(=last layer)=> Logits

43 15 10 8

Softmax( 𝒀𝒊 = 𝒆𝒛𝒊

σ𝒊=𝟏𝒏 𝒆𝒛𝒊

)

0.070.03 0.7 0.11 0.09

00 1 0 0True label

Cross-entropy Loss : L(Y, 𝒀) = -σ𝒊𝒀𝒊 𝐥𝐨𝐠 𝒀

<Last section of DNN>

𝒏 − 𝟏𝒕𝒉 Layer

Final output softmax distribution

PropagationDirection

Do Deep Nets Really Need to be Deep?. In NIPS, 2014.- Lei Jimmy Ba and Rich Caruana.

12

Page 13: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Animal pictures – www.freepik.com

Teacher Network

Output

<1st step. Training Teacher Network>

InputImage

TrueLabel

Classification Loss

Do Deep Nets Really Need to be Deep?. In NIPS, 2014.- Lei Jimmy Ba and Rich Caruana.

13

Page 14: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Animal pictures – www.freepik.com

Teacher Network

StudentNetwork

Logit

Logit

<2nd step. Training Student Network>

L2 Loss with each Logits

Classification Loss

Freezing all weights(Pre-trained)

Do Deep Nets Really Need to be Deep?. In NIPS, 2014.- Lei Jimmy Ba and Rich Caruana.

14

Cat: 32Tiger: 24Dog: 15Lion: 12

::

Cat: 84%Tiger: 7%Dog: 3%Lion: 2%

::

<Logits> <softmax probability>

Page 15: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

<Better teacher, Better student><Student deosn’t overfit>

Do Deep Nets Really Need to be Deep?. In NIPS, 2014.- Lei Jimmy Ba and Rich Caruana.

15

Page 16: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Distilling the Knowledge in a Neural Network

2014NIPS workshop

16

Page 17: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Animal pictures – www.freepik.com

𝒏𝒕𝒉 Layer(=last layer)=> Logits

43 15 10 8

Softmax( 𝒀𝒊 = 𝒆𝒛𝒊

σ𝒊=𝟏𝒏 𝒆𝒛𝒊

)

0.070.03 0.7 0.11 0.09

00 1 0 0True label

Cross-entropy Loss : L(Y, 𝒀) = -σ𝒊𝒀𝒊 𝐥𝐨𝐠 𝒀

<Last section of DNN>

𝒏 − 𝟏𝒕𝒉 Layer

Final output softmax distribution

PropagationDirection

Distilling the knowledge in a Neural Network. In NIPS workshop, 2014.- Geoffrey Hinton, Oriol Vinyals and Jeff Dean.

17

Page 18: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

𝒏𝒕𝒉 Layer(=last layer)=> Logits

43 15 10 8

Softmax( 𝒀𝒊 = 𝒆𝒛𝒊

σ𝒊=𝟏𝒏 𝒆𝒛𝒊

)

0.070.03 0.7 0.11 0.09

00 1 0 0True label

Cross-entropy Loss : L(Y, 𝒀) = -σ𝒊𝒀𝒊 𝐥𝐨𝐠 𝒀

<Last section of DNN>

𝒏 − 𝟏𝒕𝒉 Layer

Final output softmax distribution

PropagationDirection

Animal pictures – www.freepik.com

Distilling the knowledge in a Neural Network. In NIPS workshop, 2014.- Geoffrey Hinton, Oriol Vinyals and Jeff Dean.

18

Page 19: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

SOFT TARGET=> KNOWLEDGE

43 15 10 8

Soft Target Function( 𝒀𝒊 = 𝒆𝒛𝒊/𝑻

σ𝒊=𝟏𝒏 𝒆𝒛𝒊/𝑻

)

0.120.08 0.4 0.21 0.19

HARD TARGET

43 15 10 8

Softmax( 𝒀𝒊 = 𝒆𝒛𝒊

σ𝒊=𝟏𝒏 𝒆𝒛𝒊

)

0.070.03 0.7 0.11 0.09

𝒀𝒊 = 𝒆𝒛𝒊/𝑻

σ𝒊=𝟏𝒏 𝒆𝒛𝒊/𝑻

<Soft target function>

Class index

Probability

<Soften the hard target distrubution>

Hard Target

Soft Target

Distilling the knowledge in a Neural Network. In NIPS workshop, 2014.- Geoffrey Hinton, Oriol Vinyals and Jeff Dean.

19

Page 20: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Teacher Network

StudentNetwork

Soft Target

Soft Target

Classification Loss

Distillation Loss(KL Divergence)

<Knowledge Distillation structure>

Freezing all weights(Pre-trained)

Cat: 40%Tiger: 33%Dog: 10%Lion: 5%

::

Cat: 7%Tiger: 50%Dog: 35%Lion: 3%

::

<Hard target to soft target distribution>

Distilling the knowledge in a Neural Network. In NIPS workshop, 2014.- Geoffrey Hinton, Oriol Vinyals and Jeff Dean.

20

Cat: 74%Tiger: 12%Dog: 7%Lion: 2%

::

Cat: 2%Tiger: 84%Dog: 5%Lion: 3%

::

Page 21: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

FitNets: Hints for Thin Deep Nets

2015ICLR

21

Page 22: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

<1st step. Making thin & deeper student network>

<Pre-trained teacher network>

<Thin & deeper student network>

Number of

channels

Number of layers

Number of

channels

Number of layer

FitNets: Hints for Thin Deep Nets. In ICLR, 2015.- Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta and Yoshua Bengio.

22

Page 23: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

<2nd step. Training through regressor>

<Pre-trained teacher network(Hint)>

<Thin & deeper student network(Guided)>

R R R

Hint Hint Hint

FitNets: Hints for Thin Deep Nets. In ICLR, 2015.- Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta and Yoshua Bengio.

23

Page 24: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

<3rd step. Training through KD>

<Pre-trained teacher network(Hint)>

<Thin & deeper student network(Guided)>

Knowledge Distillation

Loss

FitNets: Hints for Thin Deep Nets. In ICLR, 2015.- Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta and Yoshua Bengio.

24

Page 25: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

FitNets: Hints for Thin Deep Nets. In ICLR, 2015.- Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta and Yoshua Bengio.

25

Page 26: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning

2017CVPR

26

Page 27: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

G(𝒂𝟏, . . . , 𝒂𝒏) = (𝒂𝟏, 𝒂𝟏) (𝒂𝟏, 𝒂𝟐)(𝒂𝟐, 𝒂𝟏) (𝒂𝟐, 𝒂𝟐)

⋯⋯

(𝒂𝟏, 𝒂𝒏)(𝒂𝟐, 𝒂𝒏)

⋮ ⋮ ⋱ ⋮(𝒂𝒏, 𝒂𝟏) (𝒂𝒏, 𝒂𝟐) ⋯ (𝒂𝒏, 𝒂𝒏)

<Gram Matrix definition>

<Gram Matrix with across layers>* (a, b) = Inner product of a and b

FLOW FLOW FLOW

A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning. In CVPR, 2017.- Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim.

27

Page 28: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

<Gram Matrix with across layers>

G(𝒂𝟏, . . . , 𝒂𝒏, 𝒃𝟏, . . . , 𝒃𝒎) = (𝒂𝟏, 𝒃𝟏) (𝒂𝟏, 𝒃𝟐)(𝒂𝟐, 𝒃𝟏) (𝒂𝟐, 𝒃𝟐)

⋯⋯

(𝒂𝟏, 𝒃𝒎)(𝒂𝟐, 𝒃𝒎)

⋮ ⋮ ⋱ ⋮(𝒂𝒏, 𝒃𝟏) (𝒂𝒏, 𝒃𝟐) ⋯ (𝒂𝒏, 𝒃𝒎)

* (a, b) = Inner product of a and b

<Flow of solution procedure(FSP) matrix>

FLOW FLOW FLOW

A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning. In CVPR, 2017.- Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim.

28

Page 29: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

<Distillation Structure>

A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning. In CVPR, 2017.- Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim.

29

Page 30: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning. In CVPR, 2017.- Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim.

30

Page 31: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Paying More Attention to Attention: Improving the Performance of CNN via Attention Transfer

2017ICLR

31

Page 32: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

<Attention Transfer>

Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer. In ICLR, 2017.- Sergey Zagoruyko and Nikos Komodakis.

<Normal image & Attention map>32

Page 33: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

<Feature characteristic of each layer>

<Attention mapping functions>

Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer. In ICLR, 2017.- Sergey Zagoruyko and Nikos Komodakis.

33

Page 34: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

<Attention Transfer structure>

Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer. In ICLR, 2017.- Sergey Zagoruyko and Nikos Komodakis.

34

Page 35: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer. In ICLR, 2017.- Sergey Zagoruyko and Nikos Komodakis.

35

Page 36: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer. In ICLR, 2017.- Sergey Zagoruyko and Nikos Komodakis.

36

Page 37: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Paraphrasing Complex Network: Network Compression viaFactor Transfer

2018NIPS

37

Page 38: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

INPUT

ENCODERDECODER

<Autoencoder structure>

OUTPUT

CODE

<t-SNE visualization of factor space>

Paraphrasing Complex Network: Network Compression via Factor Transfer.In NIPS, 2018.- Jangho Kim, SeongUk Park and Nojun Kwak.

38

Page 39: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Paraphrasing Complex Network: Network Compression via Factor Transfer.In NIPS, 2018.- Jangho Kim, SeongUk Park and Nojun Kwak.

INPUT

OUTPUT

FACTOR

PARAPHRASER

<Factor Transfer>

INPUT

FACTOR

TRANSLATOR

<Paraphraser & Translator structure>39

Page 40: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Paraphrasing Complex Network: Network Compression via Factor Transfer.In NIPS, 2018.- Jangho Kim, SeongUk Park and Nojun Kwak.

INPUT

OUTPUT

FACTOR

PARAPHRASER

<Factor Transfer>

INPUT

FACTOR

TRANSLATOR

<Paraphraser & Translator structure>40

Page 41: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

<Factor Transfer structure>

Paraphrasing Complex Network: Network Compression via Factor Transfer.In NIPS, 2018.- Jangho Kim, SeongUk Park and Nojun Kwak.

41

Page 42: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Paraphrasing Complex Network: Network Compression via Factor Transfer.In NIPS, 2018.- Jangho Kim, SeongUk Park and Nojun Kwak.

42

Page 43: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Born-Again Neural Networks

2018ICML

43

Page 44: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Born-Again Neural Networks. In ICML, 2018.- Tommaso Furlanello, Zachary C. Lipton, Michael Tschannen, Laurent Itti and Anima Anandkumar.

44

<Teacher Network>

<Student Network>

<Teacher Network>

<Student Network>

Classification Accuracy: 88% Classification Accuracy: 88%

Classification Accuracy:56% -> 78% Classification Accuracy:

88% -> 94%

Distillation Loss Distillation Loss

Page 45: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Born-Again Neural Networks. In ICML, 2018.- Tommaso Furlanello, Zachary C. Lipton, Michael Tschannen, Laurent Itti and Anima Anandkumar.

45

Page 46: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Born-Again Neural Networks. In ICML, 2018.- Tommaso Furlanello, Zachary C. Lipton, Michael Tschannen, Laurent Itti and Anima Anandkumar.

46

Page 47: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Network Recasting:A Universal Method for Network Architecture Transformation

2019AAAI

47

Page 48: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Network Recasting: A Universal Method for Network Architecture Transformation. In AAAI, 2019.- Joonsang Yu, Sungbum Kang and Kiyoung Choi.

48

<Teacher Network>

<Student Network>

Distillation Loss

Classification Accuracy: 88%

Classification Accuracy:56% -> 78%

Number of

channels

Number of layers

Page 49: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Network Recasting: A Universal Method for Network Architecture Transformation. In AAAI, 2019.- Joonsang Yu, Sungbum Kang and Kiyoung Choi.

49

<Teacher Network>

<Student Network>

MSE Loss

Page 50: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Network Recasting: A Universal Method for Network Architecture Transformation. In AAAI, 2019.- Joonsang Yu, Sungbum Kang and Kiyoung Choi.

<Teacher Network>

<Student Network>

MSE Loss

49

Page 51: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Network Recasting: A Universal Method for Network Architecture Transformation. In AAAI, 2019.- Joonsang Yu, Sungbum Kang and Kiyoung Choi.

<Teacher Network>

<Student Network>

MSE Loss

49

Page 52: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Network Recasting: A Universal Method for Network Architecture Transformation. In AAAI, 2019.- Joonsang Yu, Sungbum Kang and Kiyoung Choi.

<Teacher Network>

<Student Network>

MSE Loss

49

Page 53: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Network Recasting: A Universal Method for Network Architecture Transformation. In AAAI, 2019.- Joonsang Yu, Sungbum Kang and Kiyoung Choi.

49

<Teacher Network>

<Student Network>

MSE Loss

Page 54: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Network Recasting: A Universal Method for Network Architecture Transformation. In AAAI, 2019.- Joonsang Yu, Sungbum Kang and Kiyoung Choi.

49

<Teacher Network>

<Student Network>

MSE Loss

Page 55: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Network Recasting: A Universal Method for Network Architecture Transformation. In AAAI, 2019.- Joonsang Yu, Sungbum Kang and Kiyoung Choi.

49

<Teacher Network>

<Student Network>

MSE Loss

Page 56: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Network Recasting: A Universal Method for Network Architecture Transformation. In AAAI, 2019.- Joonsang Yu, Sungbum Kang and Kiyoung Choi.

49

<Teacher Network>

<Student Network>

MSE Loss

Page 57: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Network Recasting: A Universal Method for Network Architecture Transformation. In AAAI, 2019.- Joonsang Yu, Sungbum Kang and Kiyoung Choi.

49

<Teacher Network>

<Student Network>

MSE Loss

Page 58: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Network Recasting: A Universal Method for Network Architecture Transformation. In AAAI, 2019.- Joonsang Yu, Sungbum Kang and Kiyoung Choi.

49

<Teacher Network>

<Student Network>

MSE Loss

Page 59: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Network Recasting: A Universal Method for Network Architecture Transformation. In AAAI, 2019.- Joonsang Yu, Sungbum Kang and Kiyoung Choi.

49

<Teacher Network>

<Student Network>

MSE Loss

Page 60: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Network Recasting: A Universal Method for Network Architecture Transformation. In AAAI, 2019.- Joonsang Yu, Sungbum Kang and Kiyoung Choi.

49

<Teacher Network>

<Student Network>

MSE Loss

Page 61: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

<Block recasting case1> <Block recasting case2>

Network Recasting: A Universal Method for Network Architecture Transformation. In AAAI, 2019.- Joonsang Yu, Sungbum Kang and Kiyoung Choi.

50

Page 62: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

<Method for dimension mismatch> <Recasting Methods>

Network Recasting: A Universal Method for Network Architecture Transformation. In AAAI, 2019.- Joonsang Yu, Sungbum Kang and Kiyoung Choi.

51

Page 63: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

<Network recasting structure>

Network Recasting: A Universal Method for Network Architecture Transformation. In AAAI, 2019.- Joonsang Yu, Sungbum Kang and Kiyoung Choi.

52

Page 64: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Network Recasting: A Universal Method for Network Architecture Transformation. In AAAI, 2019.- Joonsang Yu, Sungbum Kang and Kiyoung Choi.

53

Page 65: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Network Recasting: A Universal Method for Network Architecture Transformation. In AAAI, 2019.- Joonsang Yu, Sungbum Kang and Kiyoung Choi.

54

Page 66: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

RelationalKnowledge Distillation

2019CVPR

55

Page 67: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

<Difference between KD and RKD>

Relational Knowledge Distillation. In CVPR, 2019.- Wonpyo Park, Dongju Kim, Yan Lu and Minsu Cho.

56

Page 68: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

<Distance-wise distillation loss> <Angle-wise distillation loss>

Relational Knowledge Distillation. In CVPR, 2019.- Wonpyo Park, Dongju Kim, Yan Lu and Minsu Cho.

57

Page 69: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Relational Knowledge Distillation. In CVPR, 2019.- Wonpyo Park, Dongju Kim, Yan Lu and Minsu Cho.

58

Page 70: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

SUMMARY

59

Page 71: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

•Regularizer

•Teaching & Learning

•Various Fields(Model Compression, Transfer Learning, Few shot Learning, Meta Learning...)

60

Page 72: 딥러닝모델엑기스추출 Knowledge Distillation€¦ · - Junho Yim, Donggyu Joo, Jihoon Bae and Junmo Kim. 27 ... - Jangho Kim, SeongUk Park and Nojun Kwak. I N P U T O U T

Q&A

61