DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND...

72
DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1

Transcript of DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND...

Page 1: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

DEEPLEARNINGANDNEURALNETWORKS:BACKGROUNDANDHISTORY

1

Page 2: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

On-lineResources• http://neuralnetworksanddeeplearning.com/index.htmlOnlinebookbyMichaelNielsen

• http://matlabtricks.com/post-5/3x3-convolution-kernels-with-online-demo - ofconvolutions

• https://cs.stanford.edu/people/karpathy/convnetjs/demo/mnist.html - demoofCNN

• http://scs.ryerson.ca/~aharley/vis/conv/ - 3Dvisualization

• http://cs231n.github.io/ StanfordCSclassCS231n:ConvolutionalNeuralNetworksforVisualRecognition.

• http://www.deeplearningbook.org/ MITPressbookfromBengio etal,freeonlineversion

2

Page 3: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Ahistoryofneuralnetworks• 1940s-60’s:

– McCulloch&Pitts;Hebb:modelingrealneurons– Rosenblatt,Widrow-Hoff::perceptrons– 1969:Minskey &Papert,Perceptrons bookshowedformallimitationsofone-layerlinearnetwork

• 1970’s-mid-1980’s:…• mid-1980’s– mid-1990’s:

– backprop andmulti-layernetworks– Rumelhart andMcClellandPDP bookset– Sejnowski’s NETTalk,BP-basedtext-to-speech– NeuralInfoProcessingSystems(NIPS)conferencestarts

• Mid1990’s-early2000’s:…• Mid-2000’stocurrent:

– Moreandmoreinterestandexperimentalsuccess

3

Page 4: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

4

Page 5: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

5

Page 6: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

6

Page 7: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

7

Page 8: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Multilayernetworks• Simplestcase:classifierisamultilayernetworkoflogisticunits

• Eachunittakessomeinputsandproducesoneoutputusingalogisticclassifier

• Outputofoneunitcanbetheinputofanother

Input layer

Output layer

Hidden layer

v1=S(wTX)w0,1

x1

x2

1

v2=S(wTX)

z1=S(wTV)

w1,1

w2,1

w0,2

w1,2

w2,2

w1

w2

8

Page 9: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Learningamultilayernetwork• Definealoss(simplestcase:squarederror)

– Butoveranetworkof“units”• Minimizelosswithgradientdescent

– Youcandothisovercomplexnetworksifyoucantakethegradient ofeachunit:everycomputationisdifferentiable

JX,y (w) = y i − ˆ y i( )i∑

2

v1=S(wTX)w0,1

x1

x2

1

v2=S(wTX)

z1=S(wTV)

w1,1

w2,1

w0,2

w1,2

w2,2

w1

w2

9

Page 10: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

ANNsinthe90’s• Mostly2-layernetworksorelsecarefullyconstructed“deep”networks(eg CNNs)

• WorkedwellbuttrainingwasslowandfinickyNov1998– Yann LeCunn,Bottou,Bengio,Haffner

10

Page 11: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

ANNsinthe90’s• Mostly2-layernetworksorelsecarefullyconstructed“deep”networks

• Workedwellbuttrainingtypicallytookweekswhenguidedbyanexpert

SVM:98.9-99.2%accurate

CNNs:98.3-99.3%accurate

11

Page 12: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Learningamultilayernetwork• Definealoss(simplestcase:squarederror)

– Butoveranetworkof“units”• Minimizelosswithgradientdescent

– Youcandothisovercomplexnetworksifyoucantakethegradient ofeachunit:everycomputationisdifferentiable

JX,y (w) = y i − ˆ y i( )i∑

2

12

Page 13: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Example:weightupdatesformultilayerANNwithsquarelossandlogisticunits

δk ≡ tk − ak( ) ak 1− ak( )

δ j ≡ δkwkj( )k∑ aj 1− aj( )

For nodes k in output layer:

For nodes j in hidden layer:

For all weights:

“Propagate errors backward”BACKPROP

wkj = wkj −ε δkajwji = wji −ε δ jai

Can carry this recursion out further if you have multiple hidden layers

13

Page 14: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

BACKPROP FORMLPS

14

Page 15: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

BackProp inMatrix-VectorNotation

MichaelNielson:http://neuralnetworksanddeeplearning.com/

15

Page 16: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Notation

16

Page 17: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Notation

Eachdigitis28x28pixels=784inputs

17

Page 18: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Notation

wl isweightmatrixforlayerl

18

Page 19: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Notation

activationbias

wl

al andbl

19

Page 20: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Notation

Matrix:wl

Vector:alVector:bl

activationbias

weight

vectoràvector function:componentwise logistic

Vector:zl

20

Page 21: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Computationis“feedforward”

forl=1,2,…L:

21

Page 22: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Notation

Costfunctiontooptimize:sumoverexamplesx

where

Matrix:wl

Vector:alVector:bl

Vector:zl

Vector:y22

Page 23: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Notation

23

Page 24: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

BackProp:lastlayer

Levellforl=1,…,LMatrix:wl

Vectors:• biasbl• activational• pre-sigmoidactiv:zl• targetoutputy• “localerror”δl

Matrixform:

componentsarecomponentsare

componentwiseproductofvectors

24

Page 25: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

BackProp:lastlayer

Levellforl=1,…,LMatrix:wl

Vectors:• biasbl• activational• pre-sigmoidactiv:zl• targetoutputy• “localerror”δl

Matrixformforsquareloss:

25

Page 26: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

BackProp:erroratlevellintermsoferroratlevell+1

Levellforl=1,…,LMatrix:wl

Vectors:• biasbl• activational• pre-sigmoidactiv:zl• targetoutputy• “localerror”δl

whichwecanusetocompute

26

Page 27: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

BackProp:summary

Levellforl=1,…,LMatrix:wl

Vectors:• biasbl• activational• pre-sigmoidactiv:zl• targetoutputy• “localerror”δl

27

Page 28: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Computationpropagateserrorsbackward

forl=L,L-1,…1:

forl=1,2,…L:

28

Page 29: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

EXPRESSIVENESSOFDEEPNETWORKS

29

Page 30: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

DeepANNsareexpressive• OnelogisticunitcanimplementandANDoranORofasubsetofinputs– e.g.,(x3 ANDx5 AND…ANDx19)

• Everyboolean functioncanbeexpressedasanORofANDs– e.g.,(x3 ANDx5 )OR(x7 ANDx19)OR…

• SoonehiddenlayercanexpressanyBF

(But it might need lots and lots of hidden units)

30

Page 31: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

DeepANNsareexpressive• OnelogisticunitcanimplementandANDoranORofasubsetofinputs

– e.g.,(x3 ANDx5 AND…ANDx19)• Everyboolean functioncanbeexpressedasanORofANDs

– e.g.,(x3 ANDx5 )OR(x7 ANDx19)OR…• SoonehiddenlayercanexpressanyBF

• Example:parity(x1,…,xN)=1iff offnumberofxi’saresettoone

Parity(a,b,c,d)=(a&-b&-c&-d)OR(-a&b&-c&-d)OR…#listallthe“1s”(a&b&c&-d)OR(a&b&-c&d)OR…#listallthe“3s”

SizeingeneralisO(2N)

31

Page 32: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

DeeperANNsaremoreexpressive• Atwo-layernetworkneedsO(2N)units• Atwo-layernetworkcanexpressbinaryXOR• A2*logN layernetworkcanexpresstheparityofNinputs(even/oddnumberof1’s)– WithO(logN)unitsinabinarytree

• Deepnetwork+parametertying~=subroutinesx1

x2

x3

x4

x5

x6

x7

x832

Page 33: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Hypotheticalcodeforfacerecognition

http://neuralnetworksanddeeplearning.com/chap1.html

….

….

33

Page 34: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

PARALLELTRAININGFORANNS

34

Page 35: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

HowareANNstrained?• Typically,withsomevariantofstreamingSGD– Keepthedataondisk,inapreprocessedform– Loopoveritmultipletimes– Keepthemodelinmemory

• Solutiontobigdata:butlongtrainingtimes!

• However,some parallelismisoftenused….

35

Page 36: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Recap:logisticregressionwithSGDP(Y =1| X = x) = p = 1

1+ e−x⋅w

36

Thispartcomputes

innerproduct<x,w>

Thispartlogisticof<x,w>

Page 37: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Recap:logisticregressionwithSGDP(Y =1| X = x) = p = 1

1+ e−x⋅w

37

Ononeexample:computes

innerproduct<x,w>

There’ssomechancetocomputethisinparallel…canwedomore?

a z

Page 38: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

InANNswehavemanymanylogisticregressionnodes

38

Page 39: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Recap:logisticregressionwithSGD

39

ai zi

LetxbeanexampleLetwi betheinputweightsforthei-th hiddenunitThenoutputai =x.wi

Page 40: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Recap:logisticregressionwithSGD

40

ai zi

LetxbeanexampleLetwi betheinputweightsforthei-th hiddenunitThena =xWisoutputforallmunits w1 w2 w3 … wm

0.1 -0.3 …

-1.7 …

..

W=

Page 41: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Recap:logisticregressionwithSGD

41

LetX beamatrixwithk examplesLetwi betheinputweightsforthei-th hiddenunitThenA =X Wisoutputforallmunitsforallkexamples

w1 w2 w3 … wm

0.1 -0.3 …

-1.7 …

0.3 …

1.2

x1 1 0 1 1x2 …

xk

XW=

x1.w1 x1.w2 … x1.wm

xk.w1 … … xk.wm

There’salotofchancestodothisinparallel

Page 42: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

ANNsandmulticoreCPUs• Modernlibraries(Matlab,numpy,…)domatrixoperationsfast,inparallel

• ManyANNimplementationsexploitthisparallelismautomatically

• Keyimplementationissueisworkingwithmatricescomfortably

42

Page 43: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

ANNsandGPUs• GPUsdomatrixoperationsveryfast,inparallel– Fordensematrixes,notsparseones!

• TrainingANNsonGPUsiscommon– SGDandminibatch sizesof128

• ModernANNimplementationscanexploitthis• GPUsarenotsuper-expensive– $500forhigh-endone– largemodelswithO(107)parameterscanfitinalarge-memoryGPU(12Gb)

• Speedupsof20x-50xhavebeenreported

43

Page 44: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

ANNsandmulti-GPUsystems• TherearewaystosetupANNcomputationssothattheyarespreadacrossmultipleGPUs– SometimesinvolvessomesortofIPM– SometimesinvolvespartitioningthemodelacrossmultipleGPUs

– Oftenneededforverylargenetworks– Notespeciallyeasytoimplementanddowithmostcurrenttools

44

Page 45: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

WHYAREDEEPNETWORKSHARDTOTRAIN?

45

Page 46: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Recap:weightupdatesformultilayerANN

δ Lk ≡ tk − ak( ) ak 1− ak( )

δ hj ≡ δ h+1j wkj( )

k∑ aj 1− aj( )

For nodes k in output layer L:

For nodes j in hidden layer h:

What happens as the layers get further and further from the output layer? E.g., what’s gradient for the bias term with several layers after it?

46

Page 47: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Gradientsareunstable

What happens as the layers get further and further from the output layer? E.g., what’s gradient for the bias term with several layers after it in a trivial net?

Maxat1/4

If weights are usually < 1 then we are multiplying by many numbers < 1 so the gradients get very small.

The vanishing gradient problem

47

Page 48: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Gradientsareunstable

What happens as the layers get further and further from the output layer? E.g., what’s gradient for the bias term with several layers after it in a trivial net?

Maxat1/4

If weights are usually > 1 then we are multiplying by many numbers > 1 so the gradients get very big.

The exploding gradient problem(lesscommonbutpossible)

48

Page 49: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

AIStats2010

Histogramofgradientsina5-layernetworkforanartificialimagerecognitiontask

input

output

49

Page 50: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

AIStats2010

Wewillgettothesetrickseventually….

50

Page 51: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

It’seasyforsigmoidunitstosaturate

Learningrateapproacheszero

andunitis“stuck”

componentsare

51

Page 52: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

It’seasyforsigmoidunitstosaturate

Forabignetworktherearelotsofweightedinputstoeachneuron.Ifanyofthemaretoolargethentheneuronwillsaturate.Soneuronsget

stuckwithafewlargeinputsORmanysmallones.52

Page 53: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

It’seasyforsigmoidunitstosaturate• Ifthereare500non-zeroinputsinitializedwithaGaussian~N(0,1)thentheSDis 500 ≈ 22.4

53

Page 54: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

• SaturationvisualizationfromGlorot &Bengio 2010-- usingasmarterinitializationscheme

It’seasyforsigmoidunitstosaturate

Closest-to-outputhiddenlayerstillstuckforfirst100

epochs

54

Page 55: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

WHAT’SDIFFERENTABOUTMODERNANNS?

55

Page 56: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Somekeydifferences• Useofsoftmax andentropiclossinsteadofquadraticloss.

• Useofalternatenon-linearities– reLU andhyperbolictangent

• Betterunderstandingofweightinitialization• Dataaugmentation– Especiallyforimagedata

56

Page 57: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Cross-entropyloss

Comparetogradientforsquarelosswhena~=1y=0andx=1

∂C∂w

=σ (z)− y

a z

57

Page 58: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Cross-entropyloss

58

Page 59: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Cross-entropyloss

59

Page 60: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Softmax outputlayer

Softmax

Networkoutputsaprobabilitydistribution!Cross-entropylossafterasoftmax layergivesaverysimple,numericallystablegradient

Δwij =(yi-zi)yj

60

Page 61: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Somekeydifferences• Useofsoftmax andentropiclossinsteadofquadraticloss.– Oftenlearningisfasterandmorestableaswellasgettingbetteraccuraciesinthelimit

• Useofalternatenon-linearities• Betterunderstandingofweightinitialization• Dataaugmentation– Especiallyforimagedata

61

Page 62: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Somekeydifferences• Useofsoftmax andentropiclossinsteadofquadraticloss.– Oftenlearningisfasterandmorestableaswellasgettingbetteraccuraciesinthelimit

• Useofalternatenon-linearities– reLU andhyperbolictangent

• Betterunderstandingofweightinitialization• Dataaugmentation– Especiallyforimagedata

62

Page 63: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Alternativenon-linearities• Changessofar– Changedtheloss fromsquareerrortocross-entropy

– Proposedaddinganotheroutputlayer(softmax)• Anewchange:modifyingthenonlinearity– ThelogisticisnotwidelyusedinmodernANNs

63

Page 64: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Alternativenon-linearities• Anewchange:modifyingthenonlinearity– ThelogisticisnotwidelyusedinmodernANNs

Alternate1:tanh

Likelogisticfunctionbutshiftedtorange[-1,+1]

64

Page 65: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

AIStats2010

Wewillgettothesetrickseventually….

depth5

65

Page 66: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Alternativenon-linearities• Anewchange:modifyingthenonlinearity– reLU oftenusedinvisiontasks

Alternate2:rectifiedlinearunit

Linearwithacutoffatzero

(Implement:clipthegradientwhenyoupasszero)

66

Page 67: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Alternativenon-linearities• Anewchange:modifyingthenonlinearity– reLU oftenusedinvisiontasks

Alternate2:rectifiedlinearunit

Softversion:log(exp(x)+1)

Doesn’tsaturate(atoneend)Sparsifies outputsHelpswithvanishinggradient

67

Page 68: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Somekeydifferences• Useofsoftmax andentropiclossinsteadofquadraticloss.– Oftenlearningisfasterandmorestableaswellasgettingbetteraccuraciesinthelimit

• Useofalternatenon-linearities– reLU andhyperbolictangent

• Betterunderstandingofweightinitialization• Dataaugmentation– Especiallyforimagedata

68

Page 69: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

It’seasyforsigmoidunitstosaturate

Forabignetworktherearelotsofweightedinputstoeachneuron.Ifanyofthemaretoolargethentheneuronwillsaturate.Soneuronsget

stuckwithafewlargeinputsORmanysmallones.69

Page 70: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

It’seasyforsigmoidunitstosaturate• Ifthereare500non-zeroinputsinitializedwithaGaussian~N(0,1)thentheSDis

• Commonheuristicsforinitializingweights:

500 ≈ 22.4

U −1#inputs

, −1#inputs

"

#$$

%

&''N 0, 1

#inputs

!

"##

$

%&&

70

Page 71: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

• SaturationvisualizationfromGlorot &Bengio 2010using

U −1#inputs

, −1#inputs

"

#$$

%

&''

It’seasyforsigmoidunitstosaturate

71

Page 72: DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND …wcohen/10-605/deep-1.pdf · DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1. ... Convolutional Neural Networks for

Initializingtoavoidsaturation• InGlorot andBengio theysuggestweightsiflevelj(withnj inputs)from

Thisisnotalwaysthesolution– butgoodinitializationisveryimportant fordeepnets!

Firstbreakthroughdeeplearningresultswerebasedoncleverpre-traininginitialization

schemes,wheredeepnetworkswereseededwithweightslearnedfromunsupervised

strategies

72