Novel Hierarchal Multichannel Deep Residual Network Model ...
Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper...
Transcript of Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper...
![Page 1: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/1.jpg)
DeepResidualNetworksDeepLearningGetsWayDeeper
8:30-10:30am,June19ICML2016tutorial
KaimingHeFacebookAIResearch*
*asofJuly2016.FormerlyaffiliatedwithMicrosoftResearchAsia
1x1conv,64
3x3conv,64
1x1conv,256
1x1conv,64
3x3conv,64
1x1conv,256
1x1conv,64
3x3conv,64
1x1conv,256
1x1conv,128
,/2
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,256
,/2
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,512
,/2
3x3conv,512
1x1conv,2048
1x1conv,512
3x3conv,512
1x1conv,2048
1x1conv,512
3x3conv,512
1x1conv,2048
avepool,fc1
000
7x7conv
,64,/2,pool/2
![Page 2: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/2.jpg)
Overview
• Introduction• Background• Fromshallowtodeep
• DeepResidualNetworks• From10layersto100layers• From100layersto1000layers
• Applications• Q&A
![Page 3: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/3.jpg)
Introduction
![Page 4: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/4.jpg)
Introduction
DeepResidualNetworks(ResNets)• “DeepResidualLearningforImageRecognition”.CVPR2016(nextweek)
• Asimpleandcleanframeworkoftraining“very”deepnets
• State-of-the-artperformancefor• Imageclassification• Objectdetection• Semanticsegmentation• andmore…
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.
![Page 5: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/5.jpg)
ResNets@ILSVRC&COCO2015Competitions
• 1stplacesinallfivemaintracks• ImageNetClassification:“Ultra-deep”152-layer nets• ImageNetDetection: 16% betterthan2nd• ImageNetLocalization: 27% betterthan2nd• COCODetection: 11% betterthan2nd• COCOSegmentation: 12% betterthan2nd
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.
*improvementsarerelativenumbers
![Page 6: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/6.jpg)
RevolutionofDepth
3.57
6.7 7.3
11.7
16.4
25.828.2
ILSVRC'15ResNet
ILSVRC'14GoogleNet
ILSVRC'14VGG
ILSVRC'13 ILSVRC'12AlexNet
ILSVRC'11 ILSVRC'10
ImageNetClassificationtop-5error(%)
shallow8layers
19layers22layers
152layers
8layers
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.
![Page 7: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/7.jpg)
RevolutionofDepth11x11conv,96,/4,pool/2
5x5conv,256,pool/2
3x3conv,384
3x3conv,384
3x3conv,256,pool/2
fc,4096
fc,4096
fc,1000
AlexNet,8layers(ILSVRC2012)
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.
![Page 8: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/8.jpg)
RevolutionofDepth11x11conv,96,/4,pool/2
5x5conv,256,pool/2
3x3conv,384
3x3conv,384
3x3conv,256,pool/2
fc,4096
fc,4096
fc,1000
AlexNet,8layers(ILSVRC2012)
3x3conv,64
3x3conv,64,pool/2
3x3conv,128
3x3conv,128,pool/2
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256,pool/2
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512,pool/2
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512,pool/2
fc,4096
fc,4096
fc,1000
VGG,19layers(ILSVRC2014)
input
Conv7x7+ 2(S)
MaxPool 3x3+ 2(S)
LocalRespNorm
Conv1x1+ 1(V)
Conv3x3+ 1(S)
LocalRespNorm
MaxPool 3x3+ 2(S)
Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
MaxPool 3x3+ 2(S)
Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
AveragePool 5x5+ 3(V)
Dept hConcat
Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
AveragePool 5x5+ 3(V)
Dept hConcat
MaxPool 3x3+ 2(S)
Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
AveragePool 7x7+ 1(V)
FC
Conv1x1+ 1(S)
FC
FC
Soft maxAct ivat ion
soft max0
Conv1x1+ 1(S)
FC
FC
Soft maxAct ivat ion
soft max1
Soft maxAct ivat ion
soft max2
GoogleNet,22layers(ILSVRC2014)
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.
![Page 9: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/9.jpg)
1x1conv,64
3x3conv,64
1x1conv,256
1x1conv,64
3x3conv,64
1x1conv,256
1x1conv,64
3x3conv,64
1x1conv,256
1x1conv,128,/2
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,256,/2
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,512,/2
3x3conv,512
1x1conv,2048
1x1conv,512
3x3conv,512
1x1conv,2048
1x1conv,512
3x3conv,512
1x1conv,2048
avepool,fc1000
7x7conv,64,/2,pool/2
AlexNet,8layers(ILSVRC2012)
RevolutionofDepthResNet,152layers(ILSVRC2015)
3x3conv,64
3x3conv,64,pool/2
3x3conv,128
3x3conv,128,pool/2
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256,pool/2
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512,pool/2
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512,pool/2
fc,4096
fc,4096
fc,1000
11x11conv,96,/4,pool/2
5x5conv,256,pool/2
3x3conv,384
3x3conv,384
3x3conv,256,pool/2
fc,4096
fc,4096
fc,1000
VGG,19layers(ILSVRC2014)
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.
![Page 10: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/10.jpg)
RevolutionofDepth
34
5866
86
HOG,DPM AlexNet(RCNN)
VGG(RCNN)
ResNet(FasterRCNN)*
PASCALVOC2007ObjectDetectionmAP (%)
shallow8layers
16layers
101layers
*w/otherimprovements&moredata
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.
Enginesofvisualrecognition
![Page 11: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/11.jpg)
ResNet’s objectdetectionresultonCOCO*theoriginalimageisfromtheCOCOdataset
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.
![Page 12: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/12.jpg)
Verysimple,easytofollow
• Manythird-partyimplementations(listinhttps://github.com/KaimingHe/deep-residual-networks)• FacebookAIResearch’sTorchResNet:• Torch,CIFAR-10,withResNet-20toResNet-110,trainingcode,andcurves:code• Lasagne,CIFAR-10,withResNet-32andResNet-56andtrainingcode:code• Neon,CIFAR-10,withpre-trainedResNet-32toResNet-110models,trainingcode,andcurves: code• Torch,MNIST,100layers:blog,code• AwinningentryinKaggle's rightwhalerecognitionchallenge:blog,code• Neon,Place2(mini),40layers:blog,code• …
• Easilyreproducedresults(e.g.TorchResNet:https://github.com/facebook/fb.resnet.torch)
• Aseriesofextensionsandfollow-ups• >200citationsin6monthsafterpostedonarXiv (Dec.2015)
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.
![Page 13: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/13.jpg)
Background
Fromshallowtodeep
![Page 14: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/14.jpg)
Traditionalrecognition
edges classifier “bus”?
pixelsclassifier “bus”?
histogram classifier “bus”?edges
SIFT/HOG
histogram classifier “bus”?edges K-means/sparsecode
shallower
deeper
Butwhat’snext?
![Page 15: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/15.jpg)
DeepLearning
histogram classifier “bus”?edges K-means/sparsecode
Specializedcomponents, domainknowledge required
“bus”?
Genericcomponents (“layers”),lessdomainknowledge
“bus”?
Repeatelementarylayers=>Goingdeeper
• End-to-endlearning• Richersolutionspace
![Page 16: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/16.jpg)
SpectrumofDepth
shallower deeper
5layers:easy
>10layers:initialization,BatchNormalization
>30layers:skipconnections
>100layers:identityskipconnections
>1000layers:?
![Page 17: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/17.jpg)
Initialization
LeCun etal1998“EfficientBackprop”Glorot &Bengio 2010“Understandingthedifficultyoftrainingdeepfeedforwardneuralnetworks”
input𝑋
output𝑌 = 𝑊𝑋
weight𝑊
1-layer:𝑉𝑎𝑟 𝑦 = (𝑛+,𝑉𝑎𝑟 𝑤 )𝑉𝑎𝑟[𝑥]
Multi-layer:
𝑉𝑎𝑟 𝑦 = (2𝑛3+,𝑉𝑎𝑟 𝑤33
)𝑉𝑎𝑟[𝑥]
If:• Linearactivation• 𝑥, 𝑦,𝑤:independentThen:
𝑛+, 𝑛567
![Page 18: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/18.jpg)
Initialization
LeCun etal1998“EfficientBackprop”Glorot &Bengio 2010“Understandingthedifficultyoftrainingdeepfeedforwardneuralnetworks”
1 3 5 7 9 11 13 15depth
exploding
vanishing
ideal
Forward:
𝑉𝑎𝑟 𝑦 = (2𝑛3+,𝑉𝑎𝑟 𝑤33
)𝑉𝑎𝑟[𝑥]
Backward:
𝑉𝑎𝑟𝜕𝜕𝑥 = (2𝑛3567𝑉𝑎𝑟 𝑤3
3
)𝑉𝑎𝑟[𝜕𝜕𝑦]
Bothforward(response)andbackward(gradient)signalcanvanish/explode
![Page 19: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/19.jpg)
Initialization
• Initializationunderlinear assumption
LeCun etal1998“EfficientBackprop”Glorot &Bengio 2010“Understandingthedifficultyoftrainingdeepfeedforwardneuralnetworks”
∏ 𝑛3+,𝑉𝑎𝑟 𝑤33 = 𝑐𝑜𝑛𝑠𝑡>? (healthyforward)and
∏ 𝑛3567𝑉𝑎𝑟 𝑤33 = 𝑐𝑜𝑛𝑠𝑡@?(healthybackward)
𝑛3+,𝑉𝑎𝑟 𝑤3 = 1or*
𝑛3567𝑉𝑎𝑟 𝑤3 = 1
*:𝑛3567 = 𝑛3BC+, ,soD5,E7FGD5,E7HG= ,IJKL
MNO
,HPQKLRS < ∞.
Itissufficienttouseeitherform.
“Xavier”init inCaffe
![Page 20: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/20.jpg)
Initialization
• InitializationunderReLU activation
∏ CV𝑛3
+,𝑉𝑎𝑟 𝑤33 = 𝑐𝑜𝑛𝑠𝑡>? (healthyforward)and
∏ CV𝑛3
567𝑉𝑎𝑟 𝑤33 = 𝑐𝑜𝑛𝑠𝑡@?(healthybackward)
12𝑛3
+,𝑉𝑎𝑟 𝑤3 = 1or
12𝑛3
567𝑉𝑎𝑟 𝑤3 = 1
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DelvingDeepintoRectifiers:SurpassingHuman-LevelPerformanceonImageNetClassification”.ICCV2015.
With𝐷 layers,afactorof2 perlayerhasexponentialimpactof2Y
“MSRA”init inCaffe
![Page 21: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/21.jpg)
Initialization
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DelvingDeepintoRectifiers:SurpassingHuman-LevelPerformanceonImageNetClassification”.ICCV2015.
ours
Xavier
22-layerReLU net:goodinit convergesfaster
𝑛𝑉𝑎𝑟 𝑤 = 1oursXavier
30-layerReLU net:goodinit isabletoconverge
12𝑛𝑉𝑎𝑟 𝑤 = 1
12𝑛𝑉𝑎𝑟 𝑤 = 1
𝑛𝑉𝑎𝑟 𝑤 = 1
*Figuresshowthebeginningoftraining
![Page 22: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/22.jpg)
BatchNormalization(BN)
• Normalizinginput(LeCun etal1998“EfficientBackprop”)
• BN:normalizingeachlayer,foreachmini-batch
• Greatlyacceleratetraining
• Lesssensitivetoinitialization
• Improveregularization
S.Ioffe &C.Szegedy.Batchnormalization:Acceleratingdeepnetworktrainingbyreducing internalcovariateshift.ICML2015
![Page 23: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/23.jpg)
BatchNormalization(BN)
S.Ioffe &C.Szegedy.Batchnormalization:Acceleratingdeepnetworktrainingbyreducing internalcovariateshift.ICML2015
layer 𝑥 𝑥Z =𝑥 − 𝜇𝜎 𝑦 = 𝛾𝑥Z + 𝛽
• 𝜇:meanof𝑥 inmini-batch• 𝜎:std of𝑥 inmini-batch• 𝛾:scale• 𝛽:shift
• 𝜇,𝜎:functionsof𝑥,analogoustoresponses
• 𝛾, 𝛽:parameterstobelearned,analogoustoweights
![Page 24: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/24.jpg)
BatchNormalization(BN)
S.Ioffe &C.Szegedy.Batchnormalization:Acceleratingdeepnetworktrainingbyreducing internalcovariateshift.ICML2015
layer 𝑥 𝑥Z =𝑥 − 𝜇𝜎 𝑦 = 𝛾𝑥Z + 𝛽
2modesofBN:• Trainmode:• 𝜇,𝜎 arefunctionsof𝑥;backprop gradients
• Testmode:• 𝜇,𝜎 arepre-computed*ontrainingset
*:byrunning average,orpost-processing aftertraining
Caution:makesureyourBNisinacorrectmode
![Page 25: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/25.jpg)
BatchNormalization(BN)
S.Ioffe &C.Szegedy.Batchnormalization:Acceleratingdeepnetworktrainingbyreducing internalcovariateshift.ICML2015
Figure takenfrom[S.Ioffe &C.Szegedy]
w/oBNbestofw/BNaccuracy
iter.
![Page 26: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/26.jpg)
DeepResidualNetworks
From10layersto100layers
![Page 27: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/27.jpg)
GoingDeeper
• Initializationalgorithms✓• BatchNormalization✓
• Islearningbetternetworksassimpleasstackingmorelayers?
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.
![Page 28: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/28.jpg)
Simplystackinglayers?
0 1 2 3 4 5 60
10
20
iter. (1e4)
trainerror(%)
0 1 2 3 4 5 60
10
20
iter. (1e4)
testerror(%)CIFAR-10
56-layer
20-layer
56-layer
20-layer
• Plain nets:stacking3x3convlayers…• 56-layernethashighertrainingerror andtesterrorthan20-layernet
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.
![Page 29: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/29.jpg)
Simplystackinglayers?
0 1 2 3 4 5 60
5
10
20
iter. (1e4)
erro
r (%
)
plain-20plain-32plain-44plain-56
CIFAR-10
20-layer32-layer44-layer56-layer
0 10 20 30 40 5020
30
40
50
60
iter. (1e4)
erro
r (%
)
plain-18plain-34
ImageNet-1000
34-layer
18-layer
• “Overlydeep”plainnetshavehighertrainingerror• Ageneralphenomenon,observedinmanydatasets
solid:test/valdashed:train
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.
![Page 30: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/30.jpg)
7x7conv,64,/2
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,128,/2
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,256,/2
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,512,/2
3x3conv,512
3x3conv,512
3x3conv,512
fc1000
ashallowermodel
(18layers)
adeepercounterpart(34layers)
7x7conv,64,/2
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,128,/2
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,256,/2
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,512,/2
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512
fc1000
“extra”layers
• Richersolutionspace
• Adeepermodelshouldnothavehighertrainingerror
• Asolutionbyconstruction:• originallayers:copiedfroma
learnedshallowermodel• extralayers:setasidentity• atleastthesametrainingerror
• Optimizationdifficulties:solverscannotfindthesolutionwhengoingdeeper…
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.
![Page 31: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/31.jpg)
DeepResidualLearning
• Plaintnet
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.
anytwostackedlayers
𝑥
𝐻(𝑥)
weightlayer
weightlayer
relu
relu
𝐻 𝑥 isanydesiredmapping,
hopethe2weightlayersfit𝐻(𝑥)
![Page 32: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/32.jpg)
DeepResidualLearning
• Residual net
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.
𝐻 𝑥 isanydesiredmapping,
hopethe2weightlayersfit𝐻(𝑥)
hope the2weightlayersfit𝐹(𝑥)
let𝐻 𝑥 = 𝐹 𝑥 + 𝑥weightlayer
weightlayer
relu
relu
𝑥
𝐻 𝑥 = 𝐹 𝑥 + 𝑥
identity𝑥
𝐹(𝑥)
![Page 33: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/33.jpg)
DeepResidualLearning
• 𝐹 𝑥 isaresidual mappingw.r.t.identity
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.
• Ifidentitywereoptimal,easytosetweightsas0
• Ifoptimalmappingisclosertoidentity,easiertofindsmallfluctuations
weightlayer
weightlayer
relu
relu
𝑥
𝐻 𝑥 = 𝐹 𝑥 + 𝑥
identity𝑥
𝐹(𝑥)
![Page 34: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/34.jpg)
RelatedWorks– ResidualRepresentations
• VLAD&FisherVector[Jegou etal2010],[Perronnin etal2007]
• Encodingresidual vectors;powerfulshallowerrepresentations.
• ProductQuantization(IVF-ADC)[Jegou etal2011]
• Quantizingresidual vectors;efficientnearest-neighborsearch.
• MultiGrid&HierarchicalPrecondition[Briggs,etal2000],[Szeliski1990,2006]• Solvingresidual sub-problems;efficientPDEsolvers.
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.
![Page 35: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/35.jpg)
7x7conv,64,/2
pool,/2
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,128,/2
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,256,/2
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,512,/2
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512
avgpool
fc1000
7x7conv,64,/2
pool,/2
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,128,/2
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,256,/2
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,512,/2
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512
avgpool
fc1000
Network“Design”
• Keepitsimple
• Ourbasicdesign (VGG-style)• all3x3conv(almost)
• spatialsize/2=>#filtersx2(~samecomplexityperlayer)
• Simpledesign;justdeep!
• Otherremarks:• nohiddenfc• nodropout
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.
plainnet ResNet
![Page 36: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/36.jpg)
Training
• Allplain/residualnetsaretrainedfromscratch
• Allplain/residualnetsuseBatchNormalization
• Standardhyper-parameters&augmentation
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.
![Page 37: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/37.jpg)
CIFAR-10experiments
0 1 2 3 4 5 60
5
10
20
iter. (1e4)
erro
r (%
)
plain-20plain-32plain-44plain-56
20-layer32-layer44-layer56-layer
CIFAR-10plainnets
0 1 2 3 4 5 60
5
10
20
iter. (1e4)
erro
r (%
)
ResNet-20ResNet-32ResNet-44ResNet-56ResNet-110
CIFAR-10ResNets
56-layer44-layer32-layer20-layer
110-layer
• DeepResNetscanbetrainedwithoutdifficulties• DeeperResNetshavelowertrainingerror,andalsolowertesterror
solid:testdashed:train
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.
![Page 38: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/38.jpg)
ImageNetexperiments
0 10 20 30 40 5020
30
40
50
60
iter. (1e4)
erro
r (%
)
ResNet-18ResNet-34
0 10 20 30 40 5020
30
40
50
60
iter. (1e4)
erro
r (%
)
plain-18plain-34
ImageNetplainnets ImageNetResNets
solid:testdashed:train
34-layer
18-layer
18-layer
34-layer
• DeepResNetscanbetrainedwithoutdifficulties• DeeperResNetshavelowertrainingerror,andalsolowertesterror
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.
![Page 39: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/39.jpg)
ImageNetexperiments
• Apracticaldesignofgoingdeeper
3x3,64
3x3,64
relu
relu
64-d
3x3,64
1x1,64relu
1x1,256relu
relu
256-d
all-3x3
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.
bottleneck(forResNet-50/101/152)
similarcomplexity
![Page 40: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/40.jpg)
ImageNetexperiments7.4
6.7
6.15.7
4
5
6
7
8
ResNet-34ResNet-50ResNet-101ResNet-15210-crop testing,top-5val error(%)
thismodelhaslowertimecomplexity
thanVGG-16/19
• Deeper ResNetshavelower error
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.
![Page 41: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/41.jpg)
ImageNetexperiments
3.57
6.7 7.3
11.7
16.4
25.828.2
ILSVRC'15ResNet
ILSVRC'14GoogleNet
ILSVRC'14VGG
ILSVRC'13 ILSVRC'12AlexNet
ILSVRC'11 ILSVRC'10
ImageNetClassificationtop-5error(%)
shallow8layers
19layers22layers
152layers
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.
8layers
![Page 42: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/42.jpg)
DiscussionsRepresentation,Optimization,Generalization
![Page 43: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/43.jpg)
Issuesonlearningdeepmodels
•Representation ability
•Optimization ability
•Generalization ability
• Abilityofmodeltofittrainingdata,ifoptimumcouldbefound
• IfmodelA’ssolutionspaceisasupersetofB’s,Ashouldbebetter.
• Feasibilityoffindinganoptimum
• Notallmodelsareequallyeasytooptimize
• Oncetrainingdataisfit,howgoodisthetestperformance
![Page 44: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/44.jpg)
HowdoResNetsaddresstheseissues?
•Representation ability
•Optimization ability
•Generalization ability
• Noexplicitadvantageonrepresentation(onlyre-parameterization),but
• Allowmodelstogodeeper
• Enableverysmoothforward/backwardprop
• Greatlyeaseoptimizingdeeper models
• Notexplicitlyaddressgeneralization,but
• Deeper+thinner isgoodgeneralization
![Page 45: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/45.jpg)
OntheImportanceofIdentityMapping
From100layersto1000layers
![Page 46: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/46.jpg)
Onidentitymappingsforoptimization
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.
𝑥cBC = 𝑓(ℎ 𝑥c + 𝐹 𝑥c )
𝑥𝑙
ℎ(𝑥c)𝐹(𝑥𝑙)layer
layer
• shortcutmapping:ℎ =identity
• after-addmapping:𝑓 =ReLU
![Page 47: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/47.jpg)
Onidentitymappingsforoptimization
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.
𝑥cBC = 𝑓(ℎ 𝑥c + 𝐹 𝑥c )
𝑥𝑙
ℎ(𝑥c)𝐹(𝑥𝑙)layer
layer
• shortcutmapping:ℎ =identity
• after-addmapping:𝑓 =ReLU
• Whatif𝑓 =identity?
![Page 48: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/48.jpg)
Onidentitymappingsforoptimization
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.
𝑥cBC = 𝑓(ℎ 𝑥c + 𝐹 𝑥c )
𝑥𝑙
ℎ(𝑥c)𝐹(𝑥𝑙)layer
layer
• shortcutmapping:ℎ =identity
• after-addmapping:𝑓 =ReLU
• Whatif𝑓 =identity?
![Page 49: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/49.jpg)
Verysmoothforwardpropagation
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.
𝑥cBC = 𝑥c + 𝐹 𝑥c
𝑥cBV = 𝑥cBC + 𝐹 𝑥cBC
![Page 50: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/50.jpg)
Verysmoothforwardpropagation
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.
𝑥cBC = 𝑥c + 𝐹 𝑥c
𝑥cBV = 𝑥c + 𝐹 𝑥c + 𝐹 𝑥cBC
𝑥cBV = 𝑥cBC + 𝐹 𝑥cBC
![Page 51: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/51.jpg)
Verysmoothforwardpropagation
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.
𝑥cBC = 𝑥c + 𝐹 𝑥c
𝑥cBV = 𝑥c + 𝐹 𝑥c + 𝐹 𝑥cBC
𝑥cBV = 𝑥cBC + 𝐹 𝑥cBC
𝑥g = 𝑥c +h𝐹 𝑥+
giC
+jc
![Page 52: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/52.jpg)
Verysmoothforwardpropagation
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.
𝑥g = 𝑥c +h𝐹 𝑥+
giC
+jc
7x7conv,64,/2
pool,/2
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,128,/2
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,256,/2
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,512,/2
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512
avgpool
fc1000
7x7conv,64,/2
pool,/2
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,128,/2
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,256,/2
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,512,/2
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512
avgpool
fc1000
𝑥g
𝑥c• Any𝑥c isdirectly forward-proptoany𝑥g,plus residual.• Any𝑥g isanadditiveoutcome.• incontrastto multiplicative:𝑥g = ∏ 𝑊+𝑥cgiC
+jc
![Page 53: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/53.jpg)
Verysmoothbackwardpropagation
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.
𝑥g = 𝑥c +h𝐹 𝑥+
giC
+jc
7x7conv,64,/2
pool,/2
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,128,/2
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,256,/2
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,512,/2
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512
avgpool
fc1000
7x7conv,64,/2
pool,/2
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,128,/2
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,256,/2
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,512,/2
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512
avgpool
fc1000
𝜕𝐸𝜕𝑥g
𝜕𝐸𝜕𝑥c
𝜕𝐸𝜕𝑥c
=𝜕𝐸𝜕𝑥g
𝜕𝑥g𝜕𝑥c
=𝜕𝐸𝜕𝑥g
(1 +𝜕𝜕𝑥c
h𝐹 𝑥+
giC
+jC
)
![Page 54: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/54.jpg)
Verysmoothbackwardpropagation
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.
7x7conv,64,/2
pool,/2
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,128,/2
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,256,/2
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,512,/2
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512
avgpool
fc1000
7x7conv,64,/2
pool,/2
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,128,/2
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,256,/2
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,512,/2
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512
avgpool
fc1000
𝜕𝐸𝜕𝑥g
𝜕𝐸𝜕𝑥c
𝜕𝐸𝜕𝑥c
=𝜕𝐸𝜕𝑥g
(1 +𝜕𝜕𝑥c
h𝐹 𝑥+
giC
+jC
)
• Any lmlno
isdirectly back-proptoanylmlnp,
plus residual.
• Anylmlnp
is additive;unlikelytovanish
• incontrastto multiplicative:lmlnp = ∏ 𝑊+lmlno
giC+jc
![Page 55: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/55.jpg)
Residualforeverylayer
𝑥g = 𝑥c +h𝐹 𝑥+
giC
+jc
forward:
𝜕𝐸𝜕𝑥c
=𝜕𝐸𝜕𝑥g
(1 +𝜕𝜕𝑥c
h𝐹 𝑥+
giC
+jC
)backward:
Enabledby:
• shortcutmapping:ℎ =identity
• after-addmapping:𝑓 =identity
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.
![Page 56: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/56.jpg)
Experiments
• Set1:whatifshortcutmappingℎ ≠identity
• Set2:whatifafter-addmapping𝑓 isidentity
• ExperimentsonResNetswithmorethan100layers• deepermodelssuffermorefromoptimizationdifficulty
![Page 57: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/57.jpg)
ExperimentSet1:whatifshortcutmappingℎ ≠identity?
![Page 58: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/58.jpg)
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.
(f) dropout shortcut(e) conv shortcut
3x3conv
3x3conv
additionReLU
1x1convReLU
3x3conv
3x3conv
addition
dropoutReLU
ReLU
(d) shortcut-only gating(c) exclusive gating
3x3conv
3x3conv
addition
1x1convsigmoid
1-
ReLU
ReLU
3x3conv
3x3conv
addition
1x1convsigmoid
1-
ReLU
ReLU
(a) original (b) constant scaling
3x3conv
3x3conv
additionReLU
ReLU
3x3conv
3x3conv
addition
0.5 0.5
ReLU
ReLU
ℎ 𝑥 = 𝑥error:6.6%
ℎ 𝑥 = 0.5𝑥error:12.4%
ℎ 𝑥 = gate ·𝑥error:8.7%
ℎ 𝑥 = gate ·𝑥error:12.9%
ℎ 𝑥 = conv(𝑥)error:12.2%
ℎ 𝑥 = dropout(𝑥)error:>20%
*ResNet-110onCIFAR-10
*similarto“HighwayNetwork”
![Page 59: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/59.jpg)
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.
(f) dropout shortcut(e) conv shortcut
3x3conv
3x3conv
additionReLU
1x1convReLU
3x3conv
3x3conv
addition
dropoutReLU
ReLU
(d) shortcut-only gating(c) exclusive gating
3x3conv
3x3conv
addition
1x1convsigmoid
1-
ReLU
ReLU
3x3conv
3x3conv
addition
1x1convsigmoid
1-
ReLU
ReLU
(a) original (b) constant scaling
3x3conv
3x3conv
additionReLU
ReLU
3x3conv
3x3conv
addition
0.5 0.5
ReLU
ReLU
ℎ 𝑥 = 𝑥error:6.6%
ℎ 𝑥 = 0.5𝑥error:12.4%
ℎ 𝑥 = gate ·𝑥error:8.7%
ℎ 𝑥 = gate ·𝑥error:12.9%
ℎ 𝑥 = conv(𝑥)error:12.2%
ℎ 𝑥 = dropout(𝑥)error:>20%
shortcutsblockedby
multiplications
*ResNet-110onCIFAR-10
![Page 60: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/60.jpg)
Ifℎ ismultiplicative,e.g.ℎ 𝑥 = λ𝑥
𝑥g = λgic𝑥c +h𝐹u 𝑥+
giC
+jc
forward:
𝜕𝐸𝜕𝑥c
=𝜕𝐸𝜕𝑥g
(λgic +𝜕𝜕𝑥c
h𝐹u 𝑥+
giC
+jC
)backward:
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.
• ifℎ ismultiplicative,shortcutsareblocked
• directpropagationisdecayed
*assuming𝑓 =identity
![Page 61: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/61.jpg)
3x3conv
3x3conv
addition
1x1convsigmoid
1-
ReLU
ReLU
3x3conv
3x3conv
additionReLU
ReLU
ℎ isidentity
ℎ isgating
• gatingshouldhavebetterrepresentationability(identityisaspecialcase),but
• optimizationdifficultydominatesresultsKaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.
solid:testdashed:train
![Page 62: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/62.jpg)
ExperimentSet2:whatifafter-addmapping𝑓 isidentity
![Page 63: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/63.jpg)
BN
ReLU
weight
BN
weight
addition
ReLU
xl
xl+1
BN
ReLU
weight
BN
weight
addition
ReLU
xl
xl+1
ReLU
weight
BN
ReLU
weight
BN
addition
xl
xl+1
𝑓 isReLU(originalResNet)
𝑓 isBN+ReLU 𝑓 isidentity(pre-activation ResNet)
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.
![Page 64: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/64.jpg)
BN
ReLU
weight
BN
weight
addition
ReLU
xl
xl+1
BN
ReLU
weight
BN
weight
addition
ReLU
xl
xl+1
𝑓 = ReLU 𝑓 = BN+ReLU
𝑓 = ReLU
𝑓 = BN+ReLU
• BNcouldblockprop• Keeptheshortestpassas
smoothaspossible
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.
solid:testdashed:train
![Page 65: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/65.jpg)
BN
ReLU
weight
BN
weight
addition
ReLU
xl
xl+1
𝑓 = ReLU 𝑓 = identity
ReLU
weight
BN
ReLU
weight
BN
addition
xl
xl+1
1001-layer ResNetsonCIFAR-10
𝑓 = ReLU𝑓 = identity
• ReLUcouldblockpropwhenthereare1000layers
• pre-activationdesigneasesoptimization(andimprovesgeneralization;seepaper)
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.
solid:testdashed:train
![Page 66: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/66.jpg)
method error (%)
NIN 8.81
DSN 8.22
FitNet 8.39
Highway 7.72
ResNet-110 (1.7M) 6.61
ResNet-1202 (19.4M) 7.93
ResNet-164, pre-activation (1.7M) 5.46
ResNet-1001, pre-activation (10.2M) 4.92 (4 .89±0 .14 )
method error (%)NIN 35.68
DSN 34.57
FitNet 35.04
Highway 32.39
ResNet-164(1.7M) 25.16
ResNet-1001(10.2M) 27.82
ResNet-164, pre-activation (1.7M) 24.33
ResNet-1001, pre-activation (10.2M) 22.71 (22 .68±0 .22 )
ComparisonsonCIFAR-10/100CIFAR-10 CIFAR-100
*allbasedonmoderateaugmentation
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.
![Page 67: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/67.jpg)
ImageNetExperiments
method data augmentation top-1error(%) top-5error(%)ResNet-152, original scale 21.3 5.5ResNet-152, pre-activation scale 21.1 5.5ResNet-200, original scale 21.8 6.0ResNet-200, pre-activation scale 20.7 5.3ResNet-200, pre-activation scale+aspectratio 20.1* 4.8*
*independentlyreproducedby:https://github.com/facebook/fb.resnet.torch/tree/master/pretrained#notes
trainingcodeandmodelsavailable.
ImageNetsingle-crop(320x320)val error
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.
![Page 68: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/68.jpg)
Summaryofobservations
• Keeptheshortestpathassmoothaspossible• bymakingℎ and𝑓 identity• forward/backwardsignalsdirectlyflowthroughthispath
• Featuresofanylayersareadditiveoutcomes
• 1000-layer ResNetscanbeeasilytrainedandhavebetteraccuracy
ReLU
weight
BN
ReLU
weight
BN
addition
xl
xl+1
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.
![Page 69: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/69.jpg)
FutureWorks
• Representation• skipping1layervs.multiplelayers?• Flatvs.Bottleneck?• Inception-ResNet [Szegedy etal2016]• ResNet inResNet [Targ etal2016]• Widthvs.Depth[Zagoruyko &Komodakis 2016]
• Generalization• DropOut,MaxOut,DropConnect,…• DropLayer(StochasticDepth)[Huangetal2016]
• Optimization• Withoutresidual/shortcut?
ReLU
weight
BN
ReLU
weight
BN
addition
xl
xl+1
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.
![Page 70: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/70.jpg)
Applications
“Featuresmatter”
![Page 71: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/71.jpg)
“Featuresmatter.”(quote[Girshicketal.2014],theR-CNNpaper)
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.
task 2nd-placewinner ResNets margin
(relative)
ImageNetLocalization(top-5error) 12.0 9.0 27%
ImageNetDetection([email protected]) 53.6 62.1 16%
COCO Detection([email protected]:.95) 33.5 37.3 11%
COCOSegmentation([email protected]:.95) 25.1 28.2 12%
• OurresultsareallbasedonResNet-101• Deeperfeaturesarewelltransferrable
absolute8.5%better!
![Page 72: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/72.jpg)
RevolutionofDepth
34
5866
86
HOG,DPM AlexNet(RCNN)
VGG(RCNN)
ResNet(FasterRCNN)*
PASCALVOC2007ObjectDetectionmAP (%)
shallow8layers
16layers
101layers
*w/otherimprovements&moredata
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.
Enginesofvisualrecognition
![Page 73: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/73.jpg)
DeepLearningforComputerVision
backbonestructure
ImageNetdata
classificationnetwork
pre-train features
detectionnetwork
(e.g.R-CNN)
segmentationnetwork(e.g.FCN)
…...
humanposeestimationnetwork
depthestimationnetwork
targetdata
fine-tune
![Page 74: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/74.jpg)
Example:ObjectDetection
ü boatü person
ImageClassification(what?)
ObjectDetection(what+where?)
![Page 75: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/75.jpg)
ObjectDetection:R-CNN
regionproposals~2,000
1CNNforeachregion
Region-based CNNpipeline
figurecredit:R.Girshicketal.
aeroplane? no.
..person? yes.
tvmonitor? no.
warped region..
CNN
inputimage classifyregions
Girshick,Donahue, Darrell,Malik.RichFeatureHierarchiesforAccurateObjectDetectionandSemanticSegmentation.CVPR2014
![Page 76: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/76.jpg)
ObjectDetection:R-CNN
• R-CNN
Girshick,Donahue, Darrell,Malik.RichFeatureHierarchiesforAccurateObjectDetectionandSemanticSegmentation.CVPR2014
CNN
feature
image
CNN
feature
CNN
feature
CNN
feature
pre-computedRegions-of-Interest
(RoIs)
End-to-Endtraining
![Page 77: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/77.jpg)
pre-computedRegions-of-Interest
(RoIs)
image
CNN
feature
featurefeature
ObjectDetection:FastR-CNN
• FastR-CNN
Girshick.FastR-CNN.ICCV2015
End-to-Endtraining
sharedconvlayers
RoI pooling
![Page 78: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/78.jpg)
ObjectDetection:FasterR-CNN
• FasterR-CNN• SolelybasedonCNN• Noexternalmodules• Eachstepisend-to-end
End-to-Endtraining
image
CNN
featuremap
RegionProposalNet
proposals
features
RoI pooling
Shaoqing Ren,KaimingHe,RossGirshick,&JianSun.“FasterR-CNN:TowardsReal-TimeObjectDetectionwithRegionProposalNetworks”.NIPS2015.
![Page 79: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/79.jpg)
ObjectDetection
backbonestructure
ImageNetdata
classificationnetwork
pre-train features
detectionnetwork
detectiondata
fine-tune
• AlexNet• VGG-16• GoogleNet• ResNet-101• …
• R-CNN• FastR-CNN• FasterR-CNN• MultiBox• SSD• …
“plug-in”features detectors
independentlydeveloped
![Page 80: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/80.jpg)
ObjectDetection
• Simply“FasterR-CNN+ResNet”
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.ShaoqingRen,KaimingHe,RossGirshick,&JianSun.“FasterR-CNN:TowardsReal-TimeObjectDetectionwithRegionProposalNetworks”.NIPS2015.
image
CNN
featuremap
RegionProposalNet
proposals
classifier
RoI pooling
FasterR-CNNbaseline [email protected] [email protected]:.95
VGG-16 41.5 21.5ResNet-101 48.4 27.2
COCOdetection resultsResNet-101has28%relativegain
vsVGG-16
![Page 81: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/81.jpg)
ObjectDetection
• RPNlearns proposalsbyextremelydeepnets• Weuseonly300proposals(nohand-designedproposals)
• Addcomponents:• Iterativelocalization• Contextmodeling• Multi-scaletesting
• AllarebasedonCNNfeatures;allareend-to-end
• Allbenefitmore fromdeeper features– cumulativegains!
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.ShaoqingRen,KaimingHe,RossGirshick,&JianSun.“FasterR-CNN:TowardsReal-TimeObjectDetectionwithRegionProposalNetworks”.NIPS2015.
![Page 82: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/82.jpg)
ResNet’s objectdetectionresultonCOCO*theoriginalimageisfromtheCOCOdataset
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.arXiv2015.ShaoqingRen,KaimingHe,RossGirshick,&JianSun.“FasterR-CNN:TowardsReal-TimeObjectDetectionwithRegionProposalNetworks”.NIPS2015.
![Page 83: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/83.jpg)
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.arXiv2015.ShaoqingRen,KaimingHe,RossGirshick,&JianSun.“FasterR-CNN:TowardsReal-TimeObjectDetectionwithRegionProposalNetworks”.NIPS2015.
*theoriginalimageisfromtheCOCOdataset
![Page 84: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/84.jpg)
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.arXiv2015.ShaoqingRen,KaimingHe,RossGirshick,&JianSun.“FasterR-CNN:TowardsReal-TimeObjectDetectionwithRegionProposalNetworks”.NIPS2015.
*theoriginalimageisfromtheCOCOdataset
![Page 85: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/85.jpg)
Resultsonrealvideo.ModelstrainedonMSCOCO(80categories).(frame-by-frame;notemporalprocessing)
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.arXiv2015.ShaoqingRen,KaimingHe,RossGirshick,&JianSun.“FasterR-CNN:TowardsReal-TimeObjectDetectionwithRegionProposalNetworks”.NIPS2015.
thisvideoisavailableonline:https://youtu.be/WZmSMkK9VuA
![Page 86: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/86.jpg)
MoreVisualRecognitionTasks
ResNet-basedmethodsleadonthesebenchmarks(incompletelist):• ImageNetclassification,detection,localization• MSCOCOdetection,segmentation• PASCALVOCdetection,segmentation
• Humanposeestimation[Newelletal2016]• Depthestimation[Laina etal2016]• Segmentproposal[Pinheiro etal2016]• …
PASCALdetectionleaderboard
PASCALsegmentationleaderboard
ResNet-101
ResNet-101
![Page 87: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/87.jpg)
PotentialApplications
ResNetshaveshownoutstandingorpromisingresultson:
VisualRecognition
ImageGeneration(PixelRNN,NeuralArt,etc.)
NaturalLanguageProcessing(VerydeepCNN)
SpeechRecognition(preliminaryresults)
Advertising,userprediction(preliminaryresults)
![Page 88: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/88.jpg)
ConclusionsoftheTutorial
• DeepResidualLearning:• Ultradeepnetworkscanbeeasytotrain• Ultradeepnetworkscangainaccuracyfromdepth• Ultradeeprepresentationsarewelltransferrable• Now200 layersonImageNetand1000layersonCIFAR!
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.
![Page 89: Deep Residual Networks - 2020 Conference...Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July](https://reader035.fdocuments.net/reader035/viewer/2022081323/5f0930bf7e708231d425a7dc/html5/thumbnails/89.jpg)
Resources
• ModelsandCode• OurImageNetmodelsinCaffe:https://github.com/KaimingHe/deep-residual-networks
• Manyavailableimplementation(listinhttps://github.com/KaimingHe/deep-residual-networks)
• FacebookAIResearch’sTorchResNet:https://github.com/facebook/fb.resnet.torch
• Torch,CIFAR-10,withResNet-20toResNet-110,trainingcode,andcurves:code• Lasagne,CIFAR-10,withResNet-32andResNet-56andtrainingcode:code• Neon,CIFAR-10,withpre-trainedResNet-32toResNet-110models,trainingcode,andcurves:code• Torch,MNIST,100layers:blog,code• AwinningentryinKaggle's rightwhalerecognitionchallenge:blog,code• Neon,Place2(mini),40layers:blog,code• …....
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.