Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.
-
Upload
aubrie-parrish -
Category
Documents
-
view
229 -
download
0
Transcript of Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.
![Page 1: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.](https://reader036.fdocuments.net/reader036/viewer/2022062308/56649d345503460f94a0b348/html5/thumbnails/1.jpg)
Optimizing FPGA Accelerator Design for Deep Convolution neural NetworksBy: Mohamad Kanafanai
![Page 2: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.](https://reader036.fdocuments.net/reader036/viewer/2022062308/56649d345503460f94a0b348/html5/thumbnails/2.jpg)
OutlineIntroductionBackgroundMethodologyResultsEvaluation of the systemCriticismQ&A
![Page 3: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.](https://reader036.fdocuments.net/reader036/viewer/2022062308/56649d345503460f94a0b348/html5/thumbnails/3.jpg)
IntroductionCNN is extend from artificial
neural networkApplication include image
processing Requires high performance
computation hardwareDesign exploration is a must !
![Page 4: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.](https://reader036.fdocuments.net/reader036/viewer/2022062308/56649d345503460f94a0b348/html5/thumbnails/4.jpg)
What is Deep Convolution neural Networks ? Type of Machine learning8 stepsLimitationsFeed forward computation
![Page 5: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.](https://reader036.fdocuments.net/reader036/viewer/2022062308/56649d345503460f94a0b348/html5/thumbnails/5.jpg)
Roof Line model Provide a graphical representation
of performance and productivity◦Rates and efficiencies(Gflops, % of peak)◦limitation◦Benefits
Focus ◦Computation◦Communication◦locality
Not for fine tuning
![Page 6: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.](https://reader036.fdocuments.net/reader036/viewer/2022062308/56649d345503460f94a0b348/html5/thumbnails/6.jpg)
Types of dataIrrelevant Independent Dependent
![Page 7: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.](https://reader036.fdocuments.net/reader036/viewer/2022062308/56649d345503460f94a0b348/html5/thumbnails/7.jpg)
Double buffering Allows for two way
communicationIncrease throughput
![Page 8: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.](https://reader036.fdocuments.net/reader036/viewer/2022062308/56649d345503460f94a0b348/html5/thumbnails/8.jpg)
Main concerns Communication overheadBuffer managementBandwidth optimizationBetter Utilization of FPGA
![Page 9: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.](https://reader036.fdocuments.net/reader036/viewer/2022062308/56649d345503460f94a0b348/html5/thumbnails/9.jpg)
Design ExplorationComputation
◦Loop scheduling◦Loop tile sizes
Communication ratio
![Page 10: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.](https://reader036.fdocuments.net/reader036/viewer/2022062308/56649d345503460f94a0b348/html5/thumbnails/10.jpg)
Directives loop PipelineSoftware pipeliningIncrease throughput
![Page 11: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.](https://reader036.fdocuments.net/reader036/viewer/2022062308/56649d345503460f94a0b348/html5/thumbnails/11.jpg)
Directives Loop UnrollingMaximizes computationData flow design
![Page 12: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.](https://reader036.fdocuments.net/reader036/viewer/2022062308/56649d345503460f94a0b348/html5/thumbnails/12.jpg)
Directives Loop TillingDivides loops into smaller loops
◦ensure data stays in cache◦Great for Data reuse
![Page 13: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.](https://reader036.fdocuments.net/reader036/viewer/2022062308/56649d345503460f94a0b348/html5/thumbnails/13.jpg)
Memory Optimization Polyhedral based optimizationLocal memory promotion for
irrelevant type communicationsData reuse
![Page 14: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.](https://reader036.fdocuments.net/reader036/viewer/2022062308/56649d345503460f94a0b348/html5/thumbnails/14.jpg)
Designed Model
![Page 15: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.](https://reader036.fdocuments.net/reader036/viewer/2022062308/56649d345503460f94a0b348/html5/thumbnails/15.jpg)
Detail of the final design
![Page 16: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.](https://reader036.fdocuments.net/reader036/viewer/2022062308/56649d345503460f94a0b348/html5/thumbnails/16.jpg)
ResultsVirtex 7 100 MHz as IP using VHLSIntel Xeon E5 2.2 GHz 15 MB cachePre synthesis report used for performance
and exploration
![Page 17: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.](https://reader036.fdocuments.net/reader036/viewer/2022062308/56649d345503460f94a0b348/html5/thumbnails/17.jpg)
Evaluation of the system 17.42 X speedup on 1 thread GP implementation 4.8 X speedup on 16 thread GP implementation 18.6 watts vs 95 watts GP 3.62X speedup on ICCD2013 Design
![Page 18: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.](https://reader036.fdocuments.net/reader036/viewer/2022062308/56649d345503460f94a0b348/html5/thumbnails/18.jpg)
My opinionThe techniques used to optimize
loops are well thought out It’s a unique way of looking at an
acceleratorThe memory enhancement offer
great insight
![Page 19: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.](https://reader036.fdocuments.net/reader036/viewer/2022062308/56649d345503460f94a0b348/html5/thumbnails/19.jpg)
Pitfall of the claimPre cached data testsEvaluation metrics when
comparing other designs Only tested using one imageTechnology difference Claiming Design has best
utilization
![Page 20: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.](https://reader036.fdocuments.net/reader036/viewer/2022062308/56649d345503460f94a0b348/html5/thumbnails/20.jpg)
Q&A
![Page 21: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.](https://reader036.fdocuments.net/reader036/viewer/2022062308/56649d345503460f94a0b348/html5/thumbnails/21.jpg)
Referencehttp://crd.lbl.gov/assets/pubs_presos/pa
rlab08-roofline-talk.pdfhttps://www.youtube.com/watch?v=n6h
pQwq7Inwhttp://en.wikipedia.org/wiki/Loop_tilinghttp://en.wikipedia.org/wiki/Polytope_m
odelChen Zhang, Peng Li, Guangyu Sun,
Yijin Guan, Bingjun Xiao, Jason Cong ,Center for Energy-Efficient Computing and Applications, Peking University, China, Computer Science Department, University of California, Los Angeles, USA