Team Tsinghua Student Cluster Competition @SC’19 · Team Tsinghua . Student Cluster Competition...
Transcript of Team Tsinghua Student Cluster Competition @SC’19 · Team Tsinghua . Student Cluster Competition...
![Page 1: Team Tsinghua Student Cluster Competition @SC’19 · Team Tsinghua . Student Cluster Competition @SC’19. Jiaao He, Shengqi Chen, Liyan Zheng, Kezhao Huang, Chen Zhang, Chenggang](https://reader036.fdocuments.net/reader036/viewer/2022070114/607f85d53396e269d809768f/html5/thumbnails/1.jpg)
Team Tsinghua Student Cluster Competition
@SC’19Jiaao He, Shengqi Chen, Liyan Zheng, Kezhao Huang, Chen Zhang,
Chenggang Zhao, Wentao Han, Jidong Zhai
![Page 2: Team Tsinghua Student Cluster Competition @SC’19 · Team Tsinghua . Student Cluster Competition @SC’19. Jiaao He, Shengqi Chen, Liyan Zheng, Kezhao Huang, Chen Zhang, Chenggang](https://reader036.fdocuments.net/reader036/viewer/2022070114/607f85d53396e269d809768f/html5/thumbnails/2.jpg)
Team Members
Prof. Jidong Zhai
Dr. Wentao Han
Chen Zhang Reproducibility
Chenggang Zhao Reproducibility
Jiaao He Arch
HPL/HPCG SST
Shengqi Chen Networking
IO-500 SST
Kezhao Huang VPIC
Liyan Zheng VPIC
Junior Backups Working in all aspects
![Page 3: Team Tsinghua Student Cluster Competition @SC’19 · Team Tsinghua . Student Cluster Competition @SC’19. Jiaao He, Shengqi Chen, Liyan Zheng, Kezhao Huang, Chen Zhang, Chenggang](https://reader036.fdocuments.net/reader036/viewer/2022070114/607f85d53396e269d809768f/html5/thumbnails/3.jpg)
Cluster Architecture
Intel 8280 x2 NVIDIA V100 x8
Intel 8280 x2 Intel PCIe SSD RAID0
Inifiniband
EthernetRedundant
Backup Nodes
![Page 4: Team Tsinghua Student Cluster Competition @SC’19 · Team Tsinghua . Student Cluster Competition @SC’19. Jiaao He, Shengqi Chen, Liyan Zheng, Kezhao Huang, Chen Zhang, Chenggang](https://reader036.fdocuments.net/reader036/viewer/2022070114/607f85d53396e269d809768f/html5/thumbnails/4.jpg)
Software Stack
Debian 9
NFS on ZFS
Spack
Compilers: GCC, ICC, LLVM, etc.
Libraries: CUDA, CUDNN, BLAS, MPI, etc.
Power control & montior: fan, cpu, gpu, ipmi
![Page 5: Team Tsinghua Student Cluster Competition @SC’19 · Team Tsinghua . Student Cluster Competition @SC’19. Jiaao He, Shengqi Chen, Liyan Zheng, Kezhao Huang, Chen Zhang, Chenggang](https://reader036.fdocuments.net/reader036/viewer/2022070114/607f85d53396e269d809768f/html5/thumbnails/5.jpg)
SST - Computer Simulator• Compilers: GCC, ICC (with bugs fixed), LLVM
• Core: Manual partitioner + Human Intelligence
• Components:
• Miranda: Callback lookup table -> 7x speedup
• MemH: Optimized data structure initialization -> 1.1x speedup
• Ember: Remove debug log generation -> 1.05x speedup
![Page 6: Team Tsinghua Student Cluster Competition @SC’19 · Team Tsinghua . Student Cluster Competition @SC’19. Jiaao He, Shengqi Chen, Liyan Zheng, Kezhao Huang, Chen Zhang, Chenggang](https://reader036.fdocuments.net/reader036/viewer/2022070114/607f85d53396e269d809768f/html5/thumbnails/6.jpg)
Reproduction - Planet Normal Modes• Local data generation
• Accurate runtime estimation
• One-key task scheduler with fancy functions
• Switch datasets
• Run monitor
• Auto plotting
![Page 7: Team Tsinghua Student Cluster Competition @SC’19 · Team Tsinghua . Student Cluster Competition @SC’19. Jiaao He, Shengqi Chen, Liyan Zheng, Kezhao Huang, Chen Zhang, Chenggang](https://reader036.fdocuments.net/reader036/viewer/2022070114/607f85d53396e269d809768f/html5/thumbnails/7.jpg)
VPIC
• Well-vectorized, largely scalable, computation intensive
• Performance insensitive across nodes (with IB)
• Optimize AVX load instruction imbalance by specifying core affinity -> 1.2x speedup