Neural Networks: Backpropagation -...

55
Machine Learning Neural Networks: Backpropagation 1 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others

Transcript of Neural Networks: Backpropagation -...

Page 1: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

MachineLearning

NeuralNetworks:Backpropagation

1BasedonslidesandmaterialfromGeoffreyHinton,RichardSocher,DanRoth,Yoav Goldberg,ShaiShalev-Shwartz andShaiBen-David,andothers

Page 2: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Thislecture

• Whatisaneuralnetwork?

• Predictingwithaneuralnetwork

• Trainingneuralnetworks– Backpropagation

• Practicalconcerns

3

Page 3: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Traininganeuralnetwork

• Given– Anetworkarchitecture(layoutofneurons,theirconnectivityand

activations)– Adatasetoflabeledexamples

• S={(xi,yi)}

• Thegoal:Learntheweightsoftheneuralnetwork

• Remember:Forafixedarchitecture,aneuralnetworkisafunctionparameterizedbyitsweights– Prediction:! = $$(&,()

4

Page 4: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Recall:Learningaslossminimization

WehaveaclassifierNN thatiscompletelydefinedbyitsweightsLearntheweightsbyminimizingaloss*

6

Perhapswitharegularizermin(.*($$ &/,( , !/)�

/

Sofar,wesawthatthisstrategyworkedfor:1. LogisticRegression2. SupportVectorMachines3. Perceptron4. LMSregression

Allofthesearelinearmodels

Sameideafornon-linearmodelstoo!

Eachminimizesadifferentlossfunction

Page 5: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Backtoourrunningexample

7

output

Givenaninputx,howistheoutputpredicted

y = 2345 + 244

5 74 + 2845 78

78 = 9(238: + 248

: ;4 + 288: ;8)

z4 = 9(234: + 244

: ;4 + 284: ;8)

Page 6: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Backtoourrunningexample

9

output

Givenaninputx,howistheoutputpredicted

Supposethetruelabelforthisexampleisanumber!/

Wecanwritethesquarelossforthisexampleas:

* = 12!– !/ 8

y = 2345 + 244

5 74 + 2845 78

78 = 9(238: + 248

: ;4 + 288: ;8)

z4 = 9(234: + 244

: ;4 + 284: ;8)

Page 7: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Learningaslossminimization

WehaveaclassifierNN thatiscompletelydefinedbyitsweightsLearntheweightsbyminimizingaloss*

10

Perhapswitharegularizermin(.*($$ ;/, 2 , !/)�

/

Howdowesolvetheoptimizationproblem?

Page 8: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Stochasticgradientdescent

GivenatrainingsetS={(xi,yi)},x 2 <d

1. Initializeparametersw2. Forepoch=1…T:

1. Shufflethetrainingset2. Foreachtrainingexample(xi,yi)2 S:

• TreatthisexampleastheentiredatasetComputethegradientofthelossA*($$ &/, ( , !/)

• Update:( ← (− DEA*($$ &/, ( , !/))

3. Returnw

11

min(.*($$ ;/, 2 , !/)

/

Page 9: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Stochasticgradientdescent

GivenatrainingsetS={(xi,yi)},x 2 <d

1. Initializeparametersw2. Forepoch=1…T:

1. Shufflethetrainingset2. Foreachtrainingexample(xi,yi)2 S:

• TreatthisexampleastheentiredatasetComputethegradientofthelossA*($$ &/, ( , !/)

• Update:( ← (− DEA*($$ &/, ( , !/))

3. Returnw

12

min(.*($$ ;/, 2 , !/)

/

Page 10: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Stochasticgradientdescent

GivenatrainingsetS={(xi,yi)},x 2 <d

1. Initializeparametersw2. Forepoch=1…T:

1. Shufflethetrainingset2. Foreachtrainingexample(xi,yi)2 S:

• TreatthisexampleastheentiredatasetComputethegradientofthelossA*($$ &/, ( , !/)

• Update:( ← (− DEA*($$ &/, ( , !/))

3. Returnw

13

min(.*($$ ;/, 2 , !/)

/

Page 11: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Stochasticgradientdescent

GivenatrainingsetS={(xi,yi)},x 2 <d

1. Initializeparametersw2. Forepoch=1…T:

1. Shufflethetrainingset2. Foreachtrainingexample(xi,yi)2 S:

• TreatthisexampleastheentiredatasetComputethegradientofthelossA*($$ &/, ( , !/)

• Update:( ← (− DEA*($$ &/, ( , !/))

3. Returnw

14

min(.*($$ ;/, 2 , !/)

/

Page 12: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Stochasticgradientdescent

GivenatrainingsetS={(xi,yi)},x 2 <d

1. Initializeparametersw2. Forepoch=1…T:

1. Shufflethetrainingset2. Foreachtrainingexample(xi,yi)2 S:

• TreatthisexampleastheentiredatasetComputethegradientofthelossA*($$ &/, ( , !/)

• Update:( ← (− DEA*($$ &/, ( , !/))

3. Returnw

15

min(.*($$ ;/, 2 , !/)

/

Page 13: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Stochasticgradientdescent

GivenatrainingsetS={(xi,yi)},x 2 <d

1. Initializeparametersw2. Forepoch=1…T:

1. Shufflethetrainingset2. Foreachtrainingexample(xi,yi)2 S:

• TreatthisexampleastheentiredatasetComputethegradientofthelossA*($$ &/, ( , !/)

• Update:( ← (− DEA*($$ &/, ( , !/))

3. Returnw

17

°t:learningrate,manytweakspossible

Theobjectiveisnotconvex.Initializationcanbeimportant

min(.*($$ ;/, 2 , !/)

/

Page 14: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Stochasticgradientdescent

GivenatrainingsetS={(xi,yi)},x 2 <d

1. Initializeparametersw2. Forepoch=1…T:

1. Shufflethetrainingset2. Foreachtrainingexample(xi,yi)2 S:

• TreatthisexampleastheentiredatasetComputethegradientofthelossA*($$ &/, ( , !/)

• Update:( ← (− DEA*($$ &/, ( , !/))

3. Returnw

18

°t:learningrate,manytweakspossible

Theobjectiveisnotconvex.Initializationcanbeimportant

min(.*($$ ;/, 2 , !/)

/

Havewesolvedeverything?

Page 15: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Thederivativeofthelossfunction?

Iftheneuralnetworkisadifferentiablefunction,wecanfindthegradient

– Ormaybeitssub-gradient– Thisisdecidedbytheactivationfunctionsandthelossfunction

ItwaseasyforSVMsandlogisticregression– Onlyonelayer

Buthowdowefindthesub-gradientofamorecomplexfunction?

– Eg:Arecentpaperuseda~150layerneuralnetworkforimageclassification!

19Weneedanefficientalgorithm:Backpropagation

A*($$ &/, ( , !/)

Page 16: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Thederivativeofthelossfunction?

Iftheneuralnetworkisadifferentiablefunction,wecanfindthegradient

– Ormaybeitssub-gradient– Thisisdecidedbytheactivationfunctionsandthelossfunction

ItwaseasyforSVMsandlogisticregression– Onlyonelayer

Buthowdowefindthesub-gradientofamorecomplexfunction?

– Eg:Arecentpaperuseda~150layerneuralnetworkforimageclassification!

20Weneedanefficientalgorithm:Backpropagation

A*($$ &/, ( , !/)

Page 17: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Thederivativeofthelossfunction?

Iftheneuralnetworkisadifferentiablefunction,wecanfindthegradient

– Ormaybeitssub-gradient– Thisisdecidedbytheactivationfunctionsandthelossfunction

ItwaseasyforSVMsandlogisticregression– Onlyonelayer

Buthowdowefindthesub-gradientofamorecomplexfunction?

– Eg:Arecentpaperuseda~150layerneuralnetworkforimageclassification!

21Weneedanefficientalgorithm:Backpropagation

A*($$ &/, ( , !/)

Page 18: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Checkpoint

22

Wherearewe

Ifwehaveaneuralnetwork(structure,activationsandweights),wecanmakeapredictionforaninput

Ifwehadthetruelabeloftheinput,thenwecandefinethelossforthatexample

Ifwecantakethederivativeofthelosswithrespecttoeachoftheweights,wecantakeagradientstepinSGD

Questions?

Page 19: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Checkpoint

23

Wherearewe

Ifwehaveaneuralnetwork(structure,activationsandweights),wecanmakeapredictionforaninput

Ifwehadthetruelabeloftheinput,thenwecandefinethelossforthatexample

Ifwecantakethederivativeofthelosswithrespecttoeachoftheweights,wecantakeagradientstepinSGD

Questions?

Page 20: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Checkpoint

24

Wherearewe

Ifwehaveaneuralnetwork(structure,activationsandweights),wecanmakeapredictionforaninput

Ifwehadthetruelabeloftheinput,thenwecandefinethelossforthatexample

Ifwecantakethederivativeofthelosswithrespecttoeachoftheweights,wecantakeagradientstepinSGD

Questions?

Page 21: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Checkpoint

25

Wherearewe

Ifwehaveaneuralnetwork(structure,activationsandweights),wecanmakeapredictionforaninput

Ifwehadthetruelabeloftheinput,thenwecandefinethelossforthatexample

Ifwecantakethederivativeofthelosswithrespecttoeachoftheweights,wecantakeagradientstepinSGD

Questions?

Page 22: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Reminder:Chainruleforderivatives

– If7 isafunctionof!and! isafunctionof;• Then7 isafunctionof;,aswell

– Question:howtofindFGFH

27SlidecourtesyRichardSocher

Page 23: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Reminder:Chainruleforderivatives

– If7 =afunctionof!4 +afunctionof!8,andthe!/’sarefunctionsof;• Then7 isafunctionof;,aswell

– Question:howtofindFGFH

28SlidecourtesyRichardSocher

Page 24: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Reminder:Chainruleforderivatives

– If7 isasumoffunctionsof!/’s,andthe!/’sarefunctionsof;• Then7 isafunctionof;,aswell

– Question:howtofindFGFH

29SlidecourtesyRichardSocher

Page 25: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Backpropagation

30

output

* = 12!–!∗ 8

y = 2345 + 244

5 74 + 2845 78

78 = 9(238: + 248

: ;4 + 288: ;8)

z4 = 9(234: + 244

: ;4 + 284: ;8)

Page 26: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Backpropagation

31

WewanttocomputeFIFJKL

M andFI

FJKLN

output

* = 12!–!∗ 8

y = 2345 + 244

5 74 + 2845 78

78 = 9(238: + 248

: ;4 + 288: ;8)

z4 = 9(234: + 244

: ;4 + 284: ;8)

Page 27: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Backpropagation

32

Applyingthechainruletocomputethegradient(Andrememberingpartialcomputationsalongthewaytospeedupthings)

WewanttocomputeFIFJKL

M andFI

FJKLN

output

* = 12!–!∗ 8

y = 2345 + 244

5 74 + 2845 78

78 = 9(238: + 248

: ;4 + 288: ;8)

z4 = 9(234: + 244

: ;4 + 284: ;8)

Page 28: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Outputlayer

33

output* =

12!–!∗ 8

y = 2345 + 244

5 74 + 2845 78

O*O234

5 = O*O!

O!O234

3

Backpropagationexample

Page 29: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Outputlayer

34

output* =

12!–!∗ 8

y = 2345 + 244

5 74 + 2845 78

O*O234

5 = O*O!

O!O234

5

Backpropagationexample

Page 30: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Outputlayer

35

output* =

12!–!∗ 8

y = 2345 + 244

5 74 + 2845 78

O*O234

5 = O*O!

O!O234

5

O*O!

= ! − !∗

Backpropagationexample

Page 31: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Outputlayer

36

output* =

12!–!∗ 8

y = 2345 + 244

5 74 + 2845 78

O*O234

5 = O*O!

O!O234

5

O*O!

= ! − !∗O!O234

5 = 1

Backpropagationexample

Page 32: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Outputlayer

37

O*O244

5 = O*O!

O!O244

3

output* =

12!–!∗ 8

y = 2345 + 244

5 74 + 2845 78

Backpropagationexample

Page 33: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Outputlayer

38

O*O244

5 = O*O!

O!O244

5

output* =

12!–!∗ 8

y = 2345 + 244

5 74 + 2845 78

Backpropagationexample

Page 34: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Outputlayer

39

O*O244

5 = O*O!

O!O244

5

O*O!

= ! − !∗

output* =

12!–!∗ 8

y = 2345 + 244

5 74 + 2845 78

Backpropagationexample

Page 35: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Outputlayer

40

O*O244

5 = O*O!

O!O244

5

O*O!

= ! − !∗O!O234

5 = 74

output* =

12!–!∗ 8

y = 2345 + 244

5 74 + 2845 78

Backpropagationexample

Page 36: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Outputlayer

41

O*O244

5 = O*O!

O!O244

5

O*O!

= ! − !∗O!O234

5 = 74

output* =

12!–!∗ 8

y = 2345 + 244

5 74 + 2845 78

Wehavealreadycomputedthispartialderivativeforthepreviouscase

Cachetospeedup!

Backpropagationexample

Page 37: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Hiddenlayerderivatives

42

output

* = 12!–!∗ 8

y = 2345 + 244

5 74 + 2845 78

78 = 9(238: + 248

: ;4 + 288: ;8)

z4 = 9(234: + 244

: ;4 + 284: ;8)

Backpropagationexample

Page 38: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Hiddenlayerderivatives

43

WewantFIFJPP

N

output

* = 12!–!∗ 8

y = 2345 + 244

5 74 + 2845 78

78 = 9(238: + 248

: ;4 + 288: ;8)

z4 = 9(234: + 244

: ;4 + 284: ;8)

Backpropagationexample

Page 39: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Hiddenlayerderivatives

44

O*O288

: = O*O!

O!O288

:

Backpropagationexample * = 12!–!∗ 8

Page 40: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Hiddenlayer

45

O*O288

: = O*O!

O!O288

:

= O*O!

OO288

: (2345 + 244

5 74 + 2845 78)

y = 2345 + 244

5 74 + 2845 78

Backpropagationexample

Page 41: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Hiddenlayer

46

O*O288

: = O*O!

O!O288

:

= O*O!

OO288

: (2345 + 244

5 74 + 2845 78)

y = 2345 + 244

5 74 + 2845 78

= O*O!(244

5 OO288

: 74 + 2845 OO288

: 78)

Backpropagationexample

Page 42: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Hiddenlayer

47

O*O288

: = O*O!

O!O288

:

= O*O!

OO288

: (2345 + 244

5 74 + 2845 78)

y = 2345 + 244

5 74 + 2845 78

= O*O!(244

5 OO288

: 74 + 2845 OO288

: 78)

74 isnotafunctionof288:

0

Backpropagationexample

Page 43: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Hiddenlayer

48

O*O288

: = O*O!

O!O288

:

= O*O!

OO288

: (2345 + 244

5 74 + 2845 78)

y = 2345 + 244

5 74 + 2845 78

= O*O!2845 O78O288

:

Backpropagationexample

Page 44: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Hiddenlayer

49

O*O288

: = O*O!

O!O288

:

= O*O!

OO288

: (2345 + 244

5 74 + 2845 78)

= O*O!2845 O78O288

:

78 = 9(238: + 248

: ;4 + 288: ;8)

Backpropagationexample

Page 45: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Hiddenlayer

50

O*O288

: = O*O!

O!O288

:

= O*O!

OO288

: (2345 + 244

5 74 + 2845 78)

= O*O!2845 O78O288

:

78 = 9(238: + 248

: ;4 + 288: ;8)

Callthiss

Backpropagationexample

Page 46: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Hiddenlayer

51

O*O288

: = O*O!

O!O288

:

= O*O!

OO288

: (2345 + 244

5 74 + 2845 78)

= O*O!2845 O78O288

:

78 = 9(238: + 248

: ;4 + 288: ;8)

Callthiss

=O*O!2845 O78OQ

OQO288

:

Backpropagationexample

Page 47: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Hiddenlayer

52

O*O288

: = O*O!2845 O78OQ

OQO288

:

78 = 9(238: + 248

: ;4 + 288: ;8)

Callthiss

Backpropagationexample

(Frompreviousslide)

Page 48: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Hiddenlayer

53

O*O288

: = O*O!2845 O78OQ

OQO288

:

78 = 9(238: + 248

: ;4 + 288: ;8)

Callthiss

Eachofthesepartialderivativesiseasy

Backpropagationexample

Page 49: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Hiddenlayer

54

O*O288

: = O*O!2845 O78OQ

OQO288

:

78 = 9(238: + 248

: ;4 + 288: ;8)

Callthiss

O*O!

= ! − !∗

Eachofthesepartialderivativesiseasy

Backpropagationexample

Page 50: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Hiddenlayer

55

O*O288

: = O*O!2845 O78OQ

OQO288

:

78 = 9(238: + 248

: ;4 + 288: ;8)

Callthiss

O*O!

= ! − !∗

O78OQ

= 78(1 − 78)Why?Because78 Qisthelogisticfunctionwehavealreadyseen

Eachofthesepartialderivativesiseasy

Backpropagationexample

Page 51: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Hiddenlayer

56

O*O288

: = O*O!2845 O78OQ

OQO288

:

78 = 9(238: + 248

: ;4 + 288: ;8)

Callthiss

O*O!

= ! − !∗

O78OQ

= 78(1 − 78)Why?Because78 QisthelogisticfunctionwehavealreadyseenOQ

O288: = ;8

Eachofthesepartialderivativesiseasy

Backpropagationexample

Page 52: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Hiddenlayer

57

O*O288

: = O*O!2845 O78OQ

OQO288

:

78 = 9(238: + 248

: ;4 + 288: ;8)

Callthiss

O*O!

= ! − !∗

O78OQ

= 78(1 − 78)Why?Because78 QisthelogisticfunctionwehavealreadyseenOQ

O288: = ;8

Moreimportant:Wehavealreadycomputedmanyofthesepartialderivativesbecauseweareproceedingfromtoptobottom(i.e.backwards)

Backpropagationexample

Eachofthesepartialderivativesiseasy

Page 53: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

TheBackpropagationAlgorithm

Thesamealgorithmworksformultiplelayers

Repeatedapplicationofthechainruleforpartialderivatives– Firstperformforwardpassfrominputstotheoutput– Computeloss

– Fromtheloss,proceedbackwardstocomputepartialderivativesusingthechainrule

– Cachepartialderivativesasyoucomputethem• Willbeusedforlowerlayers

58

Page 54: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Mechanizinglearning

• Backpropagationgivesyouthegradientthatwillbeusedforgradientdescent– SGDgivesusagenericlearningalgorithm– Backpropagationisagenericmethodforcomputingpartialderivatives

• Arecursivealgorithmthatproceedsfromthetopofthenetworktothebottom

• Modernneuralnetworklibrariesimplementautomaticdifferentiationusingbackpropagation– Allowseasyexplorationofnetworkarchitectures– Don’thavetokeepderivingthegradientsbyhandeachtime

59

Page 55: Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Stochasticgradientdescent

GivenatrainingsetS={(xi,yi)},x 2 <d

1. Initializeparametersw2. Forepoch=1…T:

1. Shufflethetrainingset2. Foreachtrainingexample(xi,yi)2 S:

• Treatthisexampleastheentiredataset

• ComputethegradientofthelossA*($$ &/,( , !/) usingbackpropagation

• Update:( ← (− DEA*($$ &/,( , !/))

3. Returnw

60

°t:learningrate,manytweakspossible

Theobjectiveisnotconvex.Initializationcanbeimportant

min(.*($$ ;/, 2 , !/)

/