Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References...

51
February 1, 2018 Understanding Generalization Ou Changkun (LMU, [email protected]) Understanding Ou Changkun WiSe 2017/2018 University of Munich, Seminar Deep Learning [email protected] February 1, 2018 1

Transcript of Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References...

Page 1: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected])

Understanding

Ou Changkun

WiSe 2017/2018University of Munich, Seminar Deep Learning

[email protected]

February 1, 2018

1

Page 2: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected]) 2

“This paper explains why deep learning can generalize well”

Page 3: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected])

Outline

1 Motivation

1 Motivation

● Backgrounds Introduction

● Recap of Traditional Generalization Theory

2 Rethinking of Generalization3 Bounds of Generalization Gap4 DARC Regularizer

3

Page 4: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected]) 4

What is Generalization?

1 Motivation Backgrounds Introduction

●●●●

Definition: Generalization Bound

Page 5: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected]) 5

Goals of Generalization Theory

1 Motivation Backgrounds Introduction

●●●

Generalization theory is all about: Test error - Training error = ????

Page 6: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected]) 6

The Force of Generalization

1 Motivation Backgrounds Introduction

●○○○

●○○○

●●

Page 7: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected])

Recap I: Model Complexity

1 Motivation Recap of Generalization Theory

7

Images souce [ Mohri et al. 2012 ]

Memorization Generalization

●●●●●

: Test error ≤ Traning error + Complexity Penalty

Rademacher bound:

VC bound:

error

complexity

test error

complexity term

training error

Page 8: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected])

Recap II: Algorithmic Stability & Robustness

1 Motivation Recap of Generalization Theory

8

Image Source [Google Inc., 2015]

: Test error ≤ Traning error + Algorithm Penalty

Page 9: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected])

Recap III: Flat Region Conjecture

1 Motivation Recap of Generalization Theory

9

○○

Loss landscape, Image source [Li et al. 2018]

Image source [He et al. 2015]

“Flat region” “Sharp region”

Image source [Dauphin et al. 2015]

Page 10: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected])

Outline

1 Motivation2 Rethinking of Generalization

● Understanding generalization failure

● Emperical risk guarantee

3 Bounds of Generalization Gap4 DARC Regularizer

10

Page 11: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected])

On the origin of “paradox”

2 Rethinking of Generalization

11

●●

●○

●○

Page 12: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected]) 12

Empirical Risk Guarantee (Estimating Test Error)

2 Rethinking of Generalization Empirical Risk Guarantee

●●●

Theorem: empirical risk estimation (for finite hypothesis class)

Page 13: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected])

13

Empirical Risk Guarantee II: An Example

2 Rethinking of Generalization Empirical Risk Guarantee

Theorem: empirical risk estimation (for finite hypothesis class)

Page 14: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected])

Outline

1 Motivation2 Rethinking of Generalization3 Bounds of Generalization Gap

● Data-dependent Bound

● Data-independent Bound

4 DARC Regularizer

14

Page 15: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected])

“Good” Dataset

3 Bounds of Generalization Gap Data-dependent Bound

Definition: Concentrated dataset

15

●●

β β β

Page 16: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected])

Data-dependent Bound (Generalization Quality)

3 Bounds of Generalization Gap Data-dependent Bound

16

Theorem: Data-dependent Bound

●●●

Page 17: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected]) 17

Two-phase Training Technique

3 Bounds of Generalization Gap Data-independent Bound

… …

… ……

“Standard” Phase

Train the network in standard way

with αm samples of the dataset;

… …

… ……

“Freeze” Phase

Freeze activation pattern and keep training with the

rest of (1-α)m samples.

* ratio = two-phase/base

Page 18: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected])

Data-independent Bound (Determine Good Model)

3 Bounds of Generalization Gap Data-independent Bound

Theorem: Data-independent bound

18

●● ρ, m_sigma

●●●

Page 19: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected])

Data-independent Bound: Examples

3 Bounds of Generalization Gap Data-independent Bound

Theorem: Data-independent bound

19

●●

○ ≤

●○ ≤

Page 20: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected])

Outline

1 Motivation2 Rethinking of Generalization3 Bounds of Generalization4 DARC Regularizer

● Directly Approximately Regularizing Complexity

● DARC1 Experiments

20

Page 21: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected])

Directly Approximately Regularizing Complexity (DARC) Family

4 DARC Regularizer Directly Approximately Regularizing Complexity

21

DARC1 Regularization

Page 22: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected])

Experiment results: DARC1 Regularizer on MNIST & CIFAR-10

4 DARC Regularizer DARC1 Experiments

Test errorratio

MNIST (ND) MNIST CIFAR-10

mean stdv mean stdv mean stdv

DARC1/Base 0.89 0.61 0.95 0.67 0.97 0.79

22

DARC1 implementation (Keras)

def darc1_loss(model, lamb=0.001):

def _loss(y_true, y_pred):

original_loss = K.categorical_crossentropy(y_true, y_pred)

custom_loss = lamb*K.max(K.sum(K.abs(model.layers[-1].output), axis=0))

return original_loss + custom_loss

return _loss

Page 23: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected])

TL;DR: Only Model Architecture & Dataset Matter

5 Conclusions

●●●●●

23

This paper is all about: Test error ≤ Traning error + (Model, Data) Penalty

Page 24: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected]) 24

Page 25: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected])

Outline

1 Motivation2 Rethinking of Generalization3 Bounds of Generalization4 DARC Regularizer5 Related Readings

25

Page 26: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected])

Related Readings I: Previous Researches

References

[ Shalev-Shwartz et al. 2014 ]Understanding machine learning from theory to algorithmsCambridge University Press.

[ Neyshabur et al. 2017 ]Exploring generalization in deep learning

arXiv preprint: 1706.08947, NIPS 2017

[ Xu et al. 2010 ]Robustness and generalization

arXiv preprint: 1005.2243, JMLR 2012

[ Bousquet et al. 2000 ]Stability and generalization

JMLR 2000

26

Page 27: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected])

Related Readings II: Rethinking Generalization

References

[ Zhang et al. 2017 ]Understanding deep learning requires rethinking generalizationarXiv preprint: 1611.03530, ICLR 2017

[ Krueger et al. 2017 ]Deep nets don’t learn via memorizationICLR 2017 Workshop

[ Kawaguchi et al. 2017 ]Generalization in deep learning

arXiv preprint: 1710.05468

27

Page 28: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected])

Related Readings III: The Trick of Deep Path

References

[ Baldi, et al. 1989 ]Neural networks and principal component analysis: Learning from examples without local minimaJournal of Neural networks, 1989, Vol.2 52–58

[ Choromanska et al. 2015 ]The Loss Surfaces of Multilayer NetworksarXiv preprint: 1412.0233, JMLR 2015

[ Kawaguchi 2016 ]Deep learning without poor local minimaNIPS 2016, oral presentation paper

28

Page 29: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected])

Related Readings IV: Recent Research

References

[ Dziugaite et al. 2017 ]Computing Non-vacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many

More Parameters than Training DataarXiv preprint: 1703.11008

[ Bartlett et al. 2017 ]Spectrally-normalized margin bounds for neural networksarXiv preprint: 1706.08498, NIPS 2017

[ Neyshabur et al. 2018 ]A PAC-Bayesian approach to spectrally-normalized margin bounds for neural networksarXiv preprint: 1707.09564, ICLR 2018

[ Neyshabur et al. 2018 ]Certifiable Distributional Robustness with Principled Adversarial TrainingarXiv preprint: 1710.10571, ICLR 2018 Best Paper

29

Page 30: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected]) 30

Page 31: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected]) 31

Page 32: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected]) 32

Page 33: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected]) 33

General Concpet Questions

1 Motivation Backgrounds Introduction

●○

●○

●○

Page 34: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected]) 34

Page 35: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected])

Properties of (over-parameterized) linear models

2 Rethinking of Generalization

Theorem: Generalization failure on linear models

35

Page 36: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected])

On the origin of “paradox”

2 Rethinking of Generalization

Proposition

36

Page 37: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected]) 37

Page 38: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected]) 38

Path Trick in ReLU Net [ Choromanska et al. 2015 ]

3 Bounds of Generalization Gap Data-dependent Bound

… …

… ……

Layer (H) Layer (H+1)Layer 1

Page 39: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected])

Concentrated Dataset

3 Bounds of Generalization Gap Data-dependent Bound

39

Definition 3: “good” dataset

Page 40: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected])

Data-dependent Guarantee

3 Bounds of Generalization Gap Data-dependent Bound

40

Proposition 4: Data-dependent Bound

Page 41: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected]) 41

Page 42: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected])

Why z?

2 Rethinking of Generalization Empirical Risk Guarantee

42

Bernstein Inequality

●●

Page 43: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected]) 43

Page 44: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected])

SGD, z, w; relations?

3 Bounds of Generalization Gap Data-independent Bound

44

Did you forget that sigma depends on w?

No, they could be independent.

Page 45: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected]) 45

Two-phase training

3 Bounds of Generalization Gap Data-independent Bound

… …

… ……

… …

… ……

Standard Phase

Freeze Phase

Train the network in standard way with

partial dataset;

Freeze activation pattern and keep

training with the rest of dataset.

Page 46: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected]) 46

Page 47: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected])

Directly Approximately Regularizing Complexity (DARC)

Backstage Slides Directly Approximately Regularizing Complexity

Bound by Rademacher Complexity [Koltchinskii and Panchenko 2002]

47

Page 48: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected]) 48

Page 49: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected])

Experiment results: DARC1 Regularizer with ResNet-50 on ILSVRC2012

Backstage Slides Experiments

49

●○○○

10 Image classes Entirely

Page 50: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected]) 50

Page 51: Understanding - LMU Medieninformatik · Related Readings II: Rethinking Generalization References 5[ Zhang et al. 2017 ] Understanding deep learning requires rethinking generalization

February 1, 2018Understanding GeneralizationOu Changkun (LMU, [email protected])

Personal Opinions regarding SGD

5 Conclusions

● nonlinearity

● bypass

51