Squeezing Deep Learning Into Mobile Phones
-
Upload
anirudh-koul -
Category
Technology
-
view
7.040 -
download
1
Transcript of Squeezing Deep Learning Into Mobile Phones
![Page 1: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/1.jpg)
Squeezing Deep Learning into mobile phones
- A Practitioners guideAnirudh Koul
![Page 2: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/2.jpg)
i
Anirudh Koul , @anirudhkoul , http://koul.aiProject Lead, Seeing AIApplied Researcher, Microsoft AI & ResearchAkoul at Microsoft dot com
Currently working on applying artificial intelligence for productivity, augmented reality and accessibilityAlong with Eugene Seleznev, Saqib Shaikh, Meher Kasam
![Page 3: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/3.jpg)
Why Deep Learning On Mobile?
i
Latency
Privacy
![Page 4: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/4.jpg)
Mobile Deep Learning Recipe
i
Mobile Inference Engine + Pretrained Model = DL App(Efficient) (Efficient)
![Page 5: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/5.jpg)
Building a DL App in _ time
![Page 6: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/6.jpg)
Building a DL App in 1 hour
![Page 7: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/7.jpg)
Use Cloud APIs
i
Microsoft Cognitive ServicesClarifaiGoogle Cloud VisionIBM Watson ServicesAmazon Rekognition
![Page 8: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/8.jpg)
Microsoft Cognitive Services
i
Models won the 2015 ImageNet Large Scale Visual Recognition ChallengeVision, Face, Emotion, Video and 21 other topics
![Page 9: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/9.jpg)
Building a DL App in 1 day
![Page 10: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/10.jpg)
ihttp://deeplearningkit.org/2015/12/28/deeplearningkit-deep-learning-for-ios-tested-on-iphone-6s-tvos-and-os-x-developed-in-metal-and-swift/
Energy to trainConvolutionalNeural Network
Energy to useConvolutionalNeural Network
![Page 11: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/11.jpg)
Base PreTrained Model
i
ImageNet – 1000 Object CategorizerInceptionResnet
![Page 12: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/12.jpg)
Running pre-trained models on mobile
i
MXNet TensorflowCNNDroidDeepLearningKitCaffeTorch
![Page 13: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/13.jpg)
MXNET
i
Amalgamation : Pack all the code in a single source file
Pro:• Cross Platform (iOS, Android), Easy porting• Usable in any programming language
Con:• CPU only, Slow https://github.com/Leliana/WhatsThis
![Page 14: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/14.jpg)
Tensorflow
i
Easy pipeline to bring Tensorflow models to mobileGreat documentationOptimizations to bring model to mobileUpcoming : XLA (Accelerated Linear Algebra) compiler to optimize for hardware
![Page 15: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/15.jpg)
CNNdroid
i
GPU accelerated CNNs for AndroidSupports Caffe, Torch and Theano models~30-40x Speedup using mobile GPU vs CPU (AlexNet)
Internally, CNNdroid expresses data parallelism for different layers, instead of leaving to the GPU’s hardware scheduler
![Page 16: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/16.jpg)
DeepLearningKit
i
Platform : iOS, OS X and tvOS (Apple TV)DNN Type : CNNs models trained in CaffeRuns on mobile GPU, uses Metal
Pro : Fast, directly ingests Caffe modelsCon : Unmaintained
![Page 17: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/17.jpg)
Caffe
i
Caffe for Android https://github.com/sh1r0/caffe-android-libSample app https://github.com/sh1r0/caffe-android-demo
Caffe for iOS : https://github.com/aleph7/caffeSample app https://github.com/noradaiko/caffe-ios-sample
Pro : Usually couple of lines to port a pretrained model to mobile CPUCon : Unmaintained
![Page 18: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/18.jpg)
Running pre-trained models on mobile
i
Mobile Library
Platform GPU
DNN Architecture Supported
Trained Models Supported
Tensorflow iOS/Android
Yes CNN,RNN,LSTM, etc
Tensorflow
CNNDroid Android Yes CNN Caffe, Torch, Theano
DeepLearningKit
iOS Yes CNN Caffe
MXNet iOS/Android
No CNN,RNN,LSTM, etc
MXNet
Caffe iOS/Android
No CNN Caffe
Torch iOS/Android
No CNN,RNN,LSTM, etc
Torch
![Page 19: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/19.jpg)
Building a DL App in 1 week
![Page 20: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/20.jpg)
i
Learn Playing an Accordion3 months
![Page 21: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/21.jpg)
i
Learn Playing an Accordion3 months
Knows Piano
Fine Tune Skills
1 week
![Page 22: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/22.jpg)
I got a dataset, Now What?
i
Step 1 : Find a pre-trained modelStep 2 : Fine tune a pre-trained modelStep 3 : Run using existing frameworks
“Don’t Be A Hero” - Andrej Karpathy
![Page 23: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/23.jpg)
How to find pretrained models for my task?
i
Search “Model Zoo”
Microsoft Cognitive Toolkit (previously called CNTK) – 50 ModelsCaffe Model ZooKerasTensorflowMXNet
![Page 24: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/24.jpg)
AlexNet, 2012 (simplified)
i[Krizhevsky, Sutskever,Hinton’12]
Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Ng, “Unsupervised Learning of Hierarchical Representations with Convolutional Deep Belief Networks”, 11
n-dimensionFeature
representation
![Page 25: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/25.jpg)
Deciding how to fine tune
i
Size of New Dataset
Similarity to Original Dataset
What to do?
Large High Fine tune.Small High Don’t Fine Tune, it will overfit.
Train linear classifier on CNN Features
Small Low Train a classifier from activations in lower layers.Higher layers are dataset specific to older dataset.
Large Low Train CNN from scratchhttp://blog.revolutionanalytics.com/2016/08/deep-learning-part-2.html
![Page 26: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/26.jpg)
Deciding when to fine tune
i
Size of New Dataset
Similarity to Original Dataset
What to do?
Large High Fine tune.Small High Don’t Fine Tune, it will overfit.
Train linear classifier on CNN Features
Small Low Train a classifier from activations in lower layers.Higher layers are dataset specific to older dataset.
Large Low Train CNN from scratchhttp://blog.revolutionanalytics.com/2016/08/deep-learning-part-2.html
![Page 27: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/27.jpg)
Deciding when to fine tune
i
Size of New Dataset
Similarity to Original Dataset
What to do?
Large High Fine tune.Small High Don’t Fine Tune, it will overfit.
Train linear classifier on CNN Features
Small Low Train a classifier from activations in lower layers.Higher layers are dataset specific to older dataset.
Large Low Train CNN from scratchhttp://blog.revolutionanalytics.com/2016/08/deep-learning-part-2.html
![Page 28: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/28.jpg)
Deciding when to fine tune
i
Size of New Dataset
Similarity to Original Dataset
What to do?
Large High Fine tune.Small High Don’t Fine Tune, it will overfit.
Train linear classifier on CNN Features
Small Low Train a classifier from activations in lower layers.Higher layers are dataset specific to older dataset.
Large Low Train CNN from scratchhttp://blog.revolutionanalytics.com/2016/08/deep-learning-part-2.html
![Page 29: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/29.jpg)
Building a DL Website in 1 week
![Page 30: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/30.jpg)
Less Data + Smaller Networks = Faster browser training
i
![Page 31: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/31.jpg)
Several JavaScript Libraries
i
Run large CNNs• Keras-JS• MXNetJS• CaffeJS
Train and Run CNNs• ConvNetJS
Train and Run LSTMs• Brain.js• Synaptic.js
Train and Run NNs• Mind.js• DN2A
![Page 32: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/32.jpg)
ConvNetJS
i
Both Train and Test NNs in browserTrain CNNs in browser
![Page 33: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/33.jpg)
Keras.js
i
Run Keras models in browser, with GPU support.
![Page 34: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/34.jpg)
Brain.JS
i
Train and run NNs in browserSupports Feedforward, RNN, LSTM, GRUNo CNNsDemo : http://brainjs.com/
Trained NN to recognize color contrast
![Page 35: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/35.jpg)
MXNetJS
i
On Firefox and Microsoft Edge, performance is 8x faster than Chrome. Optimization difference because of ASM.js.
![Page 36: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/36.jpg)
Building a DL App in 1 month
(and get featured in Apple App store)
![Page 37: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/37.jpg)
Response Time Limits – Powers of 10
i
0.1 second : Reacting instantly1.0 seconds : User’s flow of thought10 seconds : Keeping the user’s attention
[Miller 1968; Card et al. 1991; Jakob Nielsen 1993]:
![Page 38: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/38.jpg)
Apple frameworks for Deep Learning Inference
i
BNNS – Basic Neural Network SubroutineMPS – Metal Performance Shaders
![Page 39: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/39.jpg)
Metal Performance Shaders (MPS)
i
Fast, Provides GPU acceleration for inference phaseFaster app load times than Tensorflow (Jan 2017)About 1/3rd the run time memory of Tensorflow on Inception-V3 (Jan 2017)~130 ms on iPhone 7S Plus to run Inception-V3
Cons: • Limited documentation. • No easy way to programmatically port models. • No batch normalization. Solution : Join Conv and BatchNorm weights
![Page 40: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/40.jpg)
i
Putting out more frames than an art gallery
![Page 41: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/41.jpg)
Basic Neural Network Subroutines (BNNS)
i
Runs on CPU
BNNS is faster for smaller networks than MPS but slower for bigger networks
![Page 42: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/42.jpg)
BrainCore
i
NN Framework for iOSProvides LSTMs functionalityFast, uses Metal, runs on iPhone GPUhttps://github.com/aleph7/braincore
![Page 43: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/43.jpg)
Building a DL App in 6 months
![Page 44: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/44.jpg)
i
What you want
https://www.flickr.com/photos/kenjonbro/9075514760/ and http://www.newcars.com/land-rover/range-rover-sport/2016
$2000$200,000What you can afford
![Page 45: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/45.jpg)
11x11 conv, 96, /4, pool/2
5x5 conv, 256, pool/2
3x3 conv, 384
3x3 conv, 384
3x3 conv, 256, pool/2
fc, 4096
fc, 4096
fc, 1000
AlexNet, 8 layers
(ILSVRC 2012)
Revolution of Depth
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”, 2015i
![Page 46: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/46.jpg)
11x11 conv, 96, /4, pool/2
5x5 conv, 256, pool/2
3x3 conv, 384
3x3 conv, 384
3x3 conv, 256, pool/2
fc, 4096
fc, 4096
fc, 1000
AlexNet, 8 layers
(ILSVRC 2012)
3x3 conv, 64
3x3 conv, 64, pool/2
3x3 conv, 128
3x3 conv, 128, pool/2
3x3 conv, 256
3x3 conv, 256
3x3 conv, 256
3x3 conv, 256, pool/2
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512, pool/2
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512, pool/2
fc, 4096
fc, 4096
fc, 1000
VGG, 19 layers
(ILSVRC 2014)
input
Conv7x7+ 2(S)
MaxPool 3x3+ 2(S)
LocalRespNorm
Conv1x1+ 1(V)
Conv3x3+ 1(S)
LocalRespNorm
MaxPool 3x3+ 2(S)
Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
MaxPool 3x3+ 2(S)
Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Av eragePool 5x5+ 3(V)
Dept hConcat
Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
AveragePool 5x5+ 3(V)
Dept hConcat
MaxPool 3x3+ 2(S)
Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)
Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)
Dept hConcat
AveragePool 7x7+ 1(V)
FC
Conv1x1+ 1(S)
FC
FC
Soft maxAct iv at ion
soft max0
Conv1x1+ 1(S)
FC
FC
Soft maxActivat ion
soft max1
Soft maxAct ivat ion
soft max2
GoogleNet, 22 layers
(ILSVRC 2014)
Revolution of Depth
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”, 2015i
![Page 47: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/47.jpg)
AlexNet, 8 layers
(ILSVRC 2012)
ResNet, 152 layers
(ILSVRC 2015)
3x3 conv, 64
3x3 conv, 64, pool/2
3x3 conv, 128
3x3 conv, 128, pool/2
3x3 conv, 256
3x3 conv, 256
3x3 conv, 256
3x3 conv, 256, pool/2
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512, pool/2
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512, pool/2
fc, 4096
fc, 4096
fc, 1000
11x11 conv , 96, /4, pool/2
5x5 conv, 256, pool/2
3x3 conv, 384
3x3 conv, 384
3x3 conv, 256, pool/2
fc, 4096
fc, 4096
fc, 1000
1x1 conv, 64
3x3 conv, 64
1x1 conv, 256
1x1 conv, 64
3x3 conv, 64
1x1 conv, 256
1x1 conv, 64
3x3 conv, 64
1x1 conv, 256
1x2 conv, 128, /2
3x3 conv, 128
1x1 conv, 512
1x1 conv, 128
3x3 conv, 128
1x1 conv, 512
1x1 conv, 128
3x3 conv, 128
1x1 conv, 512
1x1 conv, 128
3x3 conv, 128
1x1 conv, 512
1x1 conv, 128
3x3 conv, 128
1x1 conv, 512
1x1 conv, 128
3x3 conv, 128
1x1 conv, 512
1x1 conv, 128
3x3 conv, 128
1x1 conv, 512
1x1 conv, 128
3x3 conv, 128
1x1 conv, 512
1x1 conv, 256, /2
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 512, /2
3x3 conv, 512
1x1 conv, 2048
1x1 conv, 512
3x3 conv, 512
1x1 conv, 2048
1x1 conv, 512
3x3 conv, 512
1x1 conv, 2048
ave pool, fc 1000
7x7 conv, 64, /2, pool/2
VGG, 19 layers
(ILSVRC 2014)
Revolution of Depth
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”, 2015i
Ultra deep
![Page 48: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/48.jpg)
ResNet, 152 layers
1x1 conv, 64
3x3 conv, 64
1x1 conv, 256
1x1 conv, 64
3x3 conv, 64
1x1 conv, 256
1x1 conv, 64
3x3 conv, 64
1x1 conv, 256
1x2 conv, 128, /2
3x3 conv, 128
1x1 conv, 512
1x1 conv, 128
3x3 conv, 128
1x1 conv, 512
1x1 conv, 128
3x3 conv, 128
1x1 conv, 512
1x1 conv, 128
3x3 conv, 128
1x1 conv, 512
1x1 conv, 128
3x3 conv, 128
1x1 conv, 512
1x1 conv, 128
3x3 conv, 128
1x1 conv, 512
1x1 conv, 128
3x3 conv, 128
1x1 conv, 512
1x1 conv, 128
3x3 conv, 128
1x1 conv, 512
1x1 conv, 256, /2
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 256
3x3 conv, 256
1x1 conv, 1024
1x1 conv, 512, /2
3x3 conv, 512
1x1 conv, 2048
1x1 conv, 512
3x3 conv, 512
1x1 conv, 2048
1x1 conv, 512
3x3 conv, 512
1x1 conv, 2048
ave pool, fc 1000
7x7 conv, 64, /2, pool/2
Revolution of Depth
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”, 2015i
![Page 49: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/49.jpg)
ILSVRC'15 ResNet
ILSVRC'14 GoogleNet
ILSVRC'14VGG
ILSVRC'13 ILSVRC'12 AlexNet
ILSVRC'11 ILSVRC'10
3.57
6.7 7.3
11.7
16.4
25.828.2
ImageNet Classification top-5 error (%)
shallow8 layers
19 layers22 layers
152 layers
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”, 2015
8 layers
Revolution of Depth
i
![Page 50: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/50.jpg)
Your Budget - Smartphone Floating Point Operations Per Second (2015)
i http://pages.experts-exchange.com/processing-power-compared/
![Page 51: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/51.jpg)
Accuracy vs Operations Per Image Inference
i
Size is proportional to num parameters
Alfredo Canziani, Adam Paszke, Eugenio Culurciello, “An Analysis of Deep Neural Network Models for Practical Applications” 2016
552 MB
240 MB
What we want
![Page 52: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/52.jpg)
Accuracy Per Parameter
iAlfredo Canziani, Adam Paszke, Eugenio Culurciello, “An Analysis of Deep Neural Network Models for Practical Applications” 2016
![Page 53: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/53.jpg)
Pick your DNN Architecture for your mobile architecture
i
Resnet Family
Under 150 ms on iPhone 7 using Metal GPUKaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, "Deep Residual Learning for Image Recognition”, 2015
![Page 54: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/54.jpg)
Strategies to make DNNs even more efficient
i
Shallow networksCompressing pre-trained networksDesigning compact layersQuantizing parametersNetwork binarization
![Page 55: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/55.jpg)
Pruning
i
Aim : Remove all connections with absolute weights below a threshold
Song Han, Jeff Pool, John Tran, William J. Dally, "Learning both Weights and Connections for Efficient Neural Networks", 2015
![Page 56: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/56.jpg)
Observation : Most parameters in Fully Connected Layers
iAlexNet 240 MB
VGG-16 552 MB
96% of all parameters
90% of all parameters
![Page 57: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/57.jpg)
Pruning gets quickest model compression without accuracy loss
iAlexNet 240 MB
VGG-16 552 MB
First layer which directly interacts with image is sensitive and cannot be pruned too much without hurting accuracy
![Page 58: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/58.jpg)
Weight Sharing
i
Idea : Cluster weights with similar values together, and store in a dictionary.
CodebookHuffman codingHashedNets
Simplest implementation:• Round all weights into 256 levels• Tensorflow export script reduces inception zip file from 87 MB to 26 MB
with 1% drop in precision
![Page 59: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/59.jpg)
Selective training to keep networks shallow
i
Idea : Augment data limited to how your network will be used
Example : If making a selfie app, no benefit in rotating training images beyond +-45 degrees. Your phone will anyway rotate.Followed by WordLens / Google Translate
Example : Add blur if analyzing mobile phone frames
![Page 60: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/60.jpg)
Design consideration for custom architectures – Small Filters
i
Three layers of 3x3 convolutions >> One layer of 7x7 convolution
Replace large 5x5, 7x7 convolutions with stacks of 3x3 convolutionsReplace NxN convolutions with stack of 1xN and Nx1ÞFewer parameters ÞLess compute ÞMore non-linearity
BetterFasterStronger
Andrej Karpathy, CS-231n Notes, Lecture 11
![Page 61: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/61.jpg)
SqueezeNet - AlexNet-level accuracy in 0.5 MB
i
SqueezeNet base 4.8 MBSqueezeNet compressed 0.5 MB
80.3% top-5 Accuracy on ImageNet0.72 GFLOPS/image
Fire Block
Forrest N. Iandola, Song Han et al, "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size"
![Page 62: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/62.jpg)
Reduced precision
i
Reduce precision from 32 bits to <=16 bits or lesserUse stochastic rounding for best results
In Practice:• Ristretto + Caffe
• Automatic Network quantization• Finds balance between compression rate and accuracy
• Apple Metal Performance Shaders automatically quantize to 16 bits
• Tensorflow has 8 bit quantization support• Gemmlowp – Low precision matrix multiplication library
![Page 63: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/63.jpg)
Binary weighted Networks
i
Idea :Reduce the weights to -1,+1Speedup : Convolution operation can be approximated by only summation and subtraction
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi, “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks”
![Page 64: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/64.jpg)
Binary weighted Networks
i
Idea :Reduce the weights to -1,+1Speedup : Convolution operation can be approximated by only summation and subtraction
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi, “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks”
![Page 65: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/65.jpg)
Binary weighted Networks
i
Idea :Reduce the weights to -1,+1Speedup : Convolution operation can be approximated by only summation and subtraction
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi, “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks”
![Page 66: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/66.jpg)
XNOR-Net
i
Idea :Reduce both weights + inputs to -1,+1Speedup : Convolution operation can be approximated by XNOR and Bitcount operations
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi, “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks”
![Page 67: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/67.jpg)
XNOR-Net
i
Idea :Reduce both weights + inputs to -1,+1Speedup : Convolution operation can be approximated by XNOR and Bitcount operations
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi, “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks”
![Page 68: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/68.jpg)
XNOR-Net
i
Idea :Reduce both weights + inputs to -1,+1Speedup : Convolution operation can be approximated by XNOR and Bitcount operations
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi, “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks”
![Page 69: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/69.jpg)
XNOR-Net on Mobile
i
![Page 70: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/70.jpg)
Building a DL App and get $10 million in
funding(or a PhD)
![Page 71: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/71.jpg)
i
![Page 72: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/72.jpg)
Minerva
i
![Page 73: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/73.jpg)
Minerva
i
![Page 74: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/74.jpg)
DeepX Toolkit
iNicholas D. Lane et al, “DXTK : Enabling Resource-efficient Deep Learning on Mobile and Embedded Devices with the DeepX Toolkit",2016
![Page 75: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/75.jpg)
EIE : Efficient Inference Engine on Compressed DNNs
iSong Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark Horowitz, William Dally, "EIE: Efficient Inference Engine on Compressed Deep Neural Network", 2016
189x faster on CPU13x faster on GPU
![Page 76: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/76.jpg)
One Last Question
![Page 77: Squeezing Deep Learning Into Mobile Phones](https://reader035.fdocuments.net/reader035/viewer/2022062218/58d0d1dd1a28ab47238b4adf/html5/thumbnails/77.jpg)
How to access the slides in 1 second
Link posted here -> @anirudhkoul