P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A...
Transcript of P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A...
![Page 1: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/1.jpg)
A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms
Yu (Emma) Wang, Gu-Yeon Wei, David BrooksHarvard University
3/3/2020Contact: [email protected]
ParaDnngithub.com/Emma926/paradnn
![Page 2: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/2.jpg)
Acknowledgement
Frank Chen, Glenn Holloway, Dan Janni, Peter Mattson, Lifeng Nai, David Patterson, Francesco Pontiggia, Parthasarathy Ranganathan, Vijay Reddi, Brennan Saeta, Zak Stone, Anitha Vijayakumar, Shibo Wang,Qiumin Xu, Doe Hyun Yoon, Cliff Young
![Page 3: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/3.jpg)
Challenges with ML Benchmarking
● Diversity in deep learning models used○ Problem Domains, Models, Datasets
● Pace of field○ State-of-the-art models evolve every few months
● Varying evaluation metrics○ Accuracy, Time to train, Latency of inference
● Multi-disciplinary field○ Algorithms, Systems, Hardware, ML Software Stacks
![Page 4: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/4.jpg)
State of the art: MLPerf 0.6
Area Benchmark Dataset Model Reference Implementation
Vision Image classification ImageNet ResNet-50 TensorFlow
Object detection COCO 2017 Mask R-CNN Pytorch
Object detection COCO 2017 SSD-ResNet34 Pytorch
Language/Audio
Translation WMT Eng-Germ Transformer TensorFlow
Speech recognition WMT Eng-Germ GNMT PyTorch
Commerce Recommendation MovieLens-20M NCF PyTorch
Action Reinforcement Learning Go Mini-go TensorFlow
![Page 5: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/5.jpg)
State of the art: MLPerf 0.6
Area Benchmark Dataset Model Reference Implementation
Vision Image classification ImageNet ResNet-50 TensorFlow
Object detection COCO 2017 Mask R-CNN Pytorch
Object detection COCO 2017 SSD-ResNet34 Pytorch
Language/Audio
Translation WMT Eng-Germ Transformer TensorFlow
Speech recognition WMT Eng-Germ GNMT PyTorch
Commerce Recommendation MovieLens-20M NCF PyTorch
Action Reinforcement Learning Go Mini-go TensorFlow
![Page 6: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/6.jpg)
Our Methodology
ParaDnn
![Page 7: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/7.jpg)
Our Methodology
ParaDnn
![Page 8: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/8.jpg)
ParaDnn vs MLPerf
- Avoid drawing conclusions based on several arbitrary models
- Generate thousands of parameterized, end-to-end models
- Prepare hardware designs for future models
- Complement the use of existing real-world models, i.e. MLPerf
- Good for studying accuracy or convergence with real datasets
- Represent the specific models some people care about
ParaDnn
![Page 9: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/9.jpg)
ParaDnn Canonical Models
Fully Connected (FC)
CNNs: Residual, Bottleneck
RNNs: RNN, LSTM, GRU
# of Nodes # of NodesInput Output# of Layers
# of Res/Bottleneck Blocks (filter size)Input OutputFC Layerx 4
RNN or LSTM or GRU cell (size)Input Output# of Layers
RNN or LSTM or GRU cell
![Page 10: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/10.jpg)
Models
![Page 11: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/11.jpg)
Models
- ParaDnn covers a larger range than the real models- from 10k to ~1 billion parameters
![Page 12: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/12.jpg)
Analysis Enabled by ParaDnn
- Roofline analysis of TPU v2- Homogenous Platform Comparison: TPU v2 vs v3- Heterogeneous Platform Comparison: TPU vs GPU
![Page 13: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/13.jpg)
The Roofline Model
13David Brooks, Gu-Yeon Wei
![Page 14: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/14.jpg)
The Roofline Model
14David Brooks, Gu-Yeon Wei
Peak FLOPS
![Page 15: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/15.jpg)
The Roofline Model
15David Brooks, Gu-Yeon Wei
Peak FLOPS
Memory Bandwidth
![Page 16: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/16.jpg)
The Roofline Model
16David Brooks, Gu-Yeon Weicompute-intensive
![Page 17: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/17.jpg)
The Roofline Model
17David Brooks, Gu-Yeon Weicompute-intensivememory-intensive
![Page 18: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/18.jpg)
Transformer
18David Brooks, Gu-Yeon Wei
![Page 19: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/19.jpg)
FC Models
19David Brooks, Gu-Yeon Wei
ParaDnn sweeps a large range of models, from memory-bound to compute-bound.
![Page 20: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/20.jpg)
FC Models
20David Brooks, Gu-Yeon Wei
Compute-bound
![Page 21: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/21.jpg)
FC Models
21David Brooks, Gu-Yeon Wei
Memory-bound
![Page 22: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/22.jpg)
TPU v2 vs v3?
22
![Page 23: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/23.jpg)
How to upgrade to TPU v3?
23
TPU v2
![Page 24: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/24.jpg)
How to upgrade to TPU v3?
24
TPU v2TPU v3 (FLOPS )
![Page 25: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/25.jpg)
How to upgrade to TPU v3?
25
TPU v2TPU v3 (FLOPS )
TPU v3 (Mem BW )
![Page 26: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/26.jpg)
How to upgrade to TPU v3?
26
TPU v2TPU v3 (Mem BW )
TPU v3 (FLOPS )
TPU v3 (FLOPS Mem BW )
![Page 27: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/27.jpg)
How to upgrade to TPU v3?
27
TPU v2? x
? x
TPU v3 (FLOPS Mem BW )
![Page 28: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/28.jpg)
Architecture of TPU v2 vs v3
28Figure is from https://cloud.google.com/tpu/docs/system-architecture
180 TFLOPS / Board
420 TFLOPS / Board
![Page 29: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/29.jpg)
Google’s Choice of TPU v3
29
TPU v2
TPU v32.3 x
? x
![Page 30: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/30.jpg)
TPU v3 vs v2: FC Operation Breakdown
30
![Page 31: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/31.jpg)
TPU v3 vs v2: FC Operation Breakdown
31
Compute-bound: 2.3x speedup
![Page 32: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/32.jpg)
TPU v3 vs v2: FC Operation Breakdown
32
Memory-bound: 1.5x speedup
![Page 33: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/33.jpg)
TPU v3 vs v2: FC Operation Breakdown
33
Memory-bound, but benefit from 2x memory capacity:
3x speedup
![Page 34: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/34.jpg)
Google’s Choice of TPU v3
34
TPU v2
TPU v32.3 x
1.5 x
![Page 35: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/35.jpg)
TPU v3 vs v2: FC Operation Breakdown
35
ParaDnn provides diverse set of operations, and shows different operations are sensitive to different system component upgrades.
![Page 36: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/36.jpg)
TPU vs GPU?
![Page 37: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/37.jpg)
Hardware Platforms
37
![Page 38: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/38.jpg)
Hardware Platforms
38
300 GB/s per core
![Page 39: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/39.jpg)
FC and CNNFC
FC
W
A
FCGradient
Weighted Sum
G
![Page 40: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/40.jpg)
FC and CNNFC CNN
FC
W
A
FCGradient
Weighted Sum
G
ConvA
ConvGradient
Weighted Sum
G
W Fewer Weights
Larger Conv ops
![Page 41: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/41.jpg)
Hardware Platforms
41
300 GB/s per core
![Page 42: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/42.jpg)
FC TPU/GPU Speedup colored with Batch Size
9
0.35
42
![Page 43: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/43.jpg)
FC TPU/GPU Speedup colored with Batch Size
9
0.35
TPU is better
GPU is better
43
![Page 44: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/44.jpg)
FC TPU/GPU Speedup colored with Batch Size
9
0.35
TPU is better
GPU is better
44
![Page 45: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/45.jpg)
FC TPU/GPU Speedup colored with Node Size
9
45
More nodes More weights More memory-bound
![Page 46: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/46.jpg)
Hardware Platforms
46
300 GB/s per core
1.44x
![Page 47: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/47.jpg)
CNN TPU/GPU Speedup colored with Batch Size
47
![Page 48: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/48.jpg)
CNN TPU/GPU Speedup colored with Batch Size
- Up to 6x speedup- TPU architecture and software
is highly optimized for CNNs
48
![Page 49: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/49.jpg)
CNN TPU/GPU Speedup colored with Batch Size
- All models runs faster on TPU.- Larger batch sizes lead to
higher speedups.
49
![Page 50: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/50.jpg)
CNN TPU/GPU Speedup colored with Filters
- More filters have higher speedup lower bounds
50
![Page 51: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/51.jpg)
Conclusion
- Parameterized methodology: ParaDnn + a set of analysis methods- Single platform analysis: TPU v2- Homogenous platform comparison: TPU v2 vs v3- Heterogeneous platform comparison: TPU vs GPU
![Page 52: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/52.jpg)
Limitations of this Work- Does not include:
- Inference- Multi-node system: multi-GPU, or TPU pods- Accuracy, convergence- Cloud overhead
- Tractability- Limit the range of hyperparameters and datasets
- Small batch sizes (<16) and large batch sizes (> 2k) are not studied- Synthetic datasets do not include data infeed overhead
- Iterations of TPU loop is 100. Larger numbers can slightly increase the performance.
![Page 53: P A Systematic Methodology for Analysis of Deep DLearning Hardware …03-16-30)-03-16-55... · A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms](https://reader030.fdocuments.net/reader030/viewer/2022041023/5ed58be03f40d10acd5169d8/html5/thumbnails/53.jpg)
Questions?
ParaDnnAvailable: github.com/Emma926/paradnn