MCDNN An Approximation-Based Execution Framework for Deep...
Transcript of MCDNN An Approximation-Based Execution Framework for Deep...
![Page 1: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework](https://reader030.fdocuments.net/reader030/viewer/2022041203/5d5100d588c993cb1f8bb826/html5/thumbnails/1.jpg)
MCDNN:AnApproximation-BasedExecutionFrameworkfor DeepStreamProcessingUnderResourceConstraints
HaichenShen([email protected])withSeungyeop Han,Matthai Philipose,
Sharad Agarwal,AlecWolman,ArvindKrishnamurthy
UniversityofWashington MicrosoftResearch
1
![Page 2: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework](https://reader030.fdocuments.net/reader030/viewer/2022041203/5d5100d588c993cb1f8bb826/html5/thumbnails/2.jpg)
Wearable computing
2
???
à more data
![Page 3: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework](https://reader030.fdocuments.net/reader030/viewer/2022041203/5d5100d588c993cb1f8bb826/html5/thumbnails/3.jpg)
When computer vision meets wearable
Consumer Manufacturing Public Safety
“That drink will get you to 2800 calories for today”
“I last saw your keys in the store room”
“Remind Tom of the party”
“You’re on page 263 of this book”
3
![Page 4: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework](https://reader030.fdocuments.net/reader030/viewer/2022041203/5d5100d588c993cb1f8bb826/html5/thumbnails/4.jpg)
Deep learning makes vision work
Recognition Task face scene* object*
Accuracy 97% 88% 92%
Compute/frame (FLOPs) 1.00G 30.9G 39.3G
Do we have enough resources to run deep learning?
4
Compute@1-30fps (FLOPS) 1-30G 30-900G 40G-1.2T
* top-5 accuracy is shown in the table
But...
![Page 5: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework](https://reader030.fdocuments.net/reader030/viewer/2022041203/5d5100d588c993cb1f8bb826/html5/thumbnails/5.jpg)
Resource usage for continuous vision
Imager Processor Radio
OmnivisionOV274090mW
Tegra K1 GPU290GOPS@10W= 34pJ/OP
Qualcomm SD810 LTE>800mWAtheros 802.11 a/g15Mbps@700mW= 47nJ/b
Cloud
Amazon EC2CPU c4.large 2x400GFLOPS $0.1/hGPU g2.2xlarge 2.3TFLOPS $0.65/h
5Huge gap between workload and budget
BudgetDevice power Cloud cost
30% of 10Wh for 10h = 300mW $10 person/year
Workload Deep learning 300GFLOPS @ 30GFLOPs/frame, 10fps
Compute power 9GFLOPS 3.5GFLOPS (GPU) / 8GFLOPS (CPU)
![Page 6: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework](https://reader030.fdocuments.net/reader030/viewer/2022041203/5d5100d588c993cb1f8bb826/html5/thumbnails/6.jpg)
Neural network
(c)convolution(f)fullyconnect(m)maxpool(r)relu(s)softmax
6
c
r
m
c
r
m
m
r
c
r
c
r
c
f
r
f
r
f
s
imgRimgGimgB
classes+scores
![Page 7: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework](https://reader030.fdocuments.net/reader030/viewer/2022041203/5d5100d588c993cb1f8bb826/html5/thumbnails/7.jpg)
Neural network ≈ matrix multiplications
7
x x x
Low rank approximation(Y. Kim, et al. 2016)
x
Matrix sparsification(S. Han, et al. 2015)
(c)convolution(f)fullyconnect(m)maxpool(r)relu(s)softmax
c
r
m
c
r
m
m
r
c
r
c
r
c
f
r
f
r
f
s
imgRimgGimgB
classes+scores
c
c
r
m m
r
f
r
c
r
f
s
imgRimgGimgB
classes+scores
Architectural changes(J. Ba, et al. 2014)
![Page 8: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework](https://reader030.fdocuments.net/reader030/viewer/2022041203/5d5100d588c993cb1f8bb826/html5/thumbnails/8.jpg)
Managing the approx. / resource trade-off
‣Detailed characterization of the approximation / resource trade-off for many optimizations
‣ Two new optimizations for streaming, multi-application settings
‣New scheduling problem, Approximate Model Scheduling, with a heuristic solution
8
![Page 9: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework](https://reader030.fdocuments.net/reader030/viewer/2022041203/5d5100d588c993cb1f8bb826/html5/thumbnails/9.jpg)
Outline
‣Detailed characterization of the approximation / resource trade-off for many optimizations
‣ Two new optimizations for streaming, multi-application settings
‣New scheduling problem, Approximate Model Scheduling, with a heuristic solution
9
![Page 10: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework](https://reader030.fdocuments.net/reader030/viewer/2022041203/5d5100d588c993cb1f8bb826/html5/thumbnails/10.jpg)
30 40 50 60 70 80 90 100aFFuraFy (%)
101
102
103
mem
ory
(0
%)
2bM
6Fene
)aFe
Memory / accuracy trade-off
10
![Page 11: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework](https://reader030.fdocuments.net/reader030/viewer/2022041203/5d5100d588c993cb1f8bb826/html5/thumbnails/11.jpg)
30 40 50 60 70 80 90 100aFFuraFy (%)
101
102
103
mem
ory
(0
%)
2bM
6Fene
)aFe
Memory / accuracy trade-off
11
Substantially reduce memory use with gradual accuracy loss
![Page 12: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework](https://reader030.fdocuments.net/reader030/viewer/2022041203/5d5100d588c993cb1f8bb826/html5/thumbnails/12.jpg)
45 50 55 60 65 70 75 80 85aFFuraFy (%)
0
2
4
6
8
10
en
erJ
y (
J)
2Ej((xeF)
6Fene((xeF)
)aFe((xeF)
Energy / accuracy trade-off
Wifi xmit cost (0.5J)
LTE xmit cost (0.9J)
Compute energy budget (2.3J)
12
Always execute locally
Can execute locally under energy budget
Exceed energy budget when execute locally
energy budget = total energy / total time(10h) / requests per second(1 req/sec)
Nvidia Jetson TK1
![Page 13: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework](https://reader030.fdocuments.net/reader030/viewer/2022041203/5d5100d588c993cb1f8bb826/html5/thumbnails/13.jpg)
Outline
‣Detailed characterization of the approximation / resource trade-off for many optimizations
‣ Two new optimizations for streaming, multi-application settings‣ Specialization‣ Model sharing
‣New scheduling problem, Approximate Model Scheduling, with a heuristic solution
13
![Page 14: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework](https://reader030.fdocuments.net/reader030/viewer/2022041203/5d5100d588c993cb1f8bb826/html5/thumbnails/14.jpg)
Exploiting stream locality by specialization
• Standard deep neural network recognizes 4000 people• Most of videos are dominated by less than 10 faces over minutes
14
Timeline
Produce more compact models for skewed classes
![Page 15: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework](https://reader030.fdocuments.net/reader030/viewer/2022041203/5d5100d588c993cb1f8bb826/html5/thumbnails/15.jpg)
Specialization runtime
15
input(withskeweddistribution)
class
compactmodel
class
fullmodel
checkfor“other”
class
input
fullmodel
![Page 16: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework](https://reader030.fdocuments.net/reader030/viewer/2022041203/5d5100d588c993cb1f8bb826/html5/thumbnails/16.jpg)
Better resource/accuracy trade-off
16
![Page 17: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework](https://reader030.fdocuments.net/reader030/viewer/2022041203/5d5100d588c993cb1f8bb826/html5/thumbnails/17.jpg)
![Page 18: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework](https://reader030.fdocuments.net/reader030/viewer/2022041203/5d5100d588c993cb1f8bb826/html5/thumbnails/18.jpg)
Outline
‣Detailed characterization of the approximation / resource trade-off for many optimizations
‣ Two new optimizations for streaming, multi-application settings
‣New scheduling problem, Approximate Model Scheduling, with a heuristic solution
18
![Page 19: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework](https://reader030.fdocuments.net/reader030/viewer/2022041203/5d5100d588c993cb1f8bb826/html5/thumbnails/19.jpg)
Energy 14
Approximate model scheduling
19
Mobile device
Cloud
Model Pool
memory
energy
cost
Accuracy
Task 1
8 2 2model 1 (90%) model 2 (80%)
5 1 1
Task 2
9 3 3model 3 (80%) model 4 (70%)
6 2 2
Memory 10
Cost 14
Goal: maximize the overall accuracy
![Page 20: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework](https://reader030.fdocuments.net/reader030/viewer/2022041203/5d5100d588c993cb1f8bb826/html5/thumbnails/20.jpg)
Approximate model scheduling
20
Mobile device
Cloud
Model Pool
memory
energy
cost
Accuracy
Requests:1. task 1 2. task 2 3. task 1 4. task 1 5. task 2
Task 1
8 2 2model 1 (90%) model 2 (80%)
5 1 1
Task 2
9 3 3model 3 (80%) model 4 (70%)
6 2 2
Energy 14 Memory 10
Cost 14 à cloud, model 1à device, model 4à cloud, model 1à cloud, model 2à device, model 4
model 4
model 1 model 1 m2
model 4
1 2 3 4 5
Packing problem
![Page 21: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework](https://reader030.fdocuments.net/reader030/viewer/2022041203/5d5100d588c993cb1f8bb826/html5/thumbnails/21.jpg)
Approximate model scheduling
21
Mobile device
Cloud
Model Pool
Accuracy
Requests:1. task 1 à cloud, model 12. task 2 à device, model 43. task 1 à cloud, model 14. task 1 à cloud, model 25. task 2 à device, model 46. task 1
Energy 14 Memory 10
Cost 14
model 4
model 2
memory
energy
cost
Task 1
8 2 2model 1 (90%) model 2 (80%)
5 1 1
Task 2
9 3 3model 3 (80%) model 4 (70%)
6 2 2
model 4
model 1 model 1
1 2 3
Paging problem
m2
model 4
4 5
![Page 22: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework](https://reader030.fdocuments.net/reader030/viewer/2022041203/5d5100d588c993cb1f8bb826/html5/thumbnails/22.jpg)
Approximate model scheduling
• Packing problem: pick versions that satisfy energy/cost budgets" 𝑒$𝑥$&
�
&≤ 𝐸," 𝑐$𝑥$&,
�
&≤ 𝐶(𝑥$&, 𝑥$&, ∈ 0,1 , 𝑥$& 3 𝑥$&, = 0)
• Paging problem: pick versions that fit in memory∀1 ≤ 𝑡 ≤ 𝑇," 𝑠$𝑥$&
:
$;<≤ 𝑆
• Goal: maximize the accuracymaxA" " 𝑎$(𝑥$& + 𝑥$&, )
�
$
�
&
No known optimal online algorithms22
![Page 23: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework](https://reader030.fdocuments.net/reader030/viewer/2022041203/5d5100d588c993cb1f8bb826/html5/thumbnails/23.jpg)
Heuristic scheduler
• Estimate future resource use and compute the budget for each request
• Account for paging cost to reduce oscillations
• Use increasingly more accurate versions of more heavily used models
23
![Page 24: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework](https://reader030.fdocuments.net/reader030/viewer/2022041203/5d5100d588c993cb1f8bb826/html5/thumbnails/24.jpg)
Trace-driven evaluation
24
Run out of battery
Lose connectivity
![Page 25: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework](https://reader030.fdocuments.net/reader030/viewer/2022041203/5d5100d588c993cb1f8bb826/html5/thumbnails/25.jpg)
development time
MCDNN frameworkcompiler
input typemodel schema
training/validationdata
cloudruntimeschedulerdatarouterprofiler
apps
input input
classesclasses
trainedmodelcatalog
deviceruntimeschedulerdatarouter
clouddevice
runtime
![Page 26: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework](https://reader030.fdocuments.net/reader030/viewer/2022041203/5d5100d588c993cb1f8bb826/html5/thumbnails/26.jpg)
development time
MCDNN frameworkcompiler
input typemodel schema
training/validationdata
cloudruntimeschedulerdatarouterprofiler
apps
input input
classesclasses
trainedmodelcatalog
deviceruntimeschedulerdatarouter
specializer
statsspecializedmodels
clouddevice
specializationtime
runtime
![Page 27: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework](https://reader030.fdocuments.net/reader030/viewer/2022041203/5d5100d588c993cb1f8bb826/html5/thumbnails/27.jpg)
Conclusion
• MCDNN makes efficient trade-offs between resource use and accuracy• Formulate the approximate model scheduling problem and
devise a heuristic algorithm• Design a generic approximation-based execution framework
for continuous mobile vision
27
Thank you! Questions?
![Page 28: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework](https://reader030.fdocuments.net/reader030/viewer/2022041203/5d5100d588c993cb1f8bb826/html5/thumbnails/28.jpg)
BackupSlides
28
![Page 29: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework](https://reader030.fdocuments.net/reader030/viewer/2022041203/5d5100d588c993cb1f8bb826/html5/thumbnails/29.jpg)
Cloud cost / accuracy trade-off
45 50 55 60 65 70 75 80 85
aFFuUaFy (%)
100
101
102
103
104
Oate
nFy
(P
s)2bj(COouG G38)
2bj(COouG C38)
6Fene(COouG G38)
6Fene(COouG C38)
)aFe(COouG G38)
)aFe(COouG C38)
)aFe(COouG G38)
)aFe(COouG C38)
cloud GPU latency budget (9.7ms)@ $10/yr, 1 request/s@ AWS g2.2xlarge
cloud GPU latency budget (582ms)@ $10/yr, 1 request/min@ AWS g2.2xlarge
cloud CPU latency budget (7645ms)@ $10/yr, 1 request/min@ AWS c4.large
29latency budget = cost budget / cost per hour / #requests
![Page 30: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework](https://reader030.fdocuments.net/reader030/viewer/2022041203/5d5100d588c993cb1f8bb826/html5/thumbnails/30.jpg)
Model sharing
30
inputintermediate
values
face ID
race age gender …
routerinput
(a) (b)
model-fragment cache
![Page 31: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework](https://reader030.fdocuments.net/reader030/viewer/2022041203/5d5100d588c993cb1f8bb826/html5/thumbnails/31.jpg)
Dynamically-sized caching scheme
31