Automatic Performance Tuning of SpMV on GPGPU Xianyi Zhang Lab of Parallel Computing Institute of...

Automatic Performance Tuning of SpMV on GPGPU

Xianyi Zhang

Lab of Parallel Computing

Institute of Software Chinese Academy of Sciences

zxy@mail.rdcps.ac.cn

Outline

Motivation SpMV Introduction AMD Stream Computing GOSpMV Overview GOSpMV Performance Evaluation Conclusion & Future Work

Motivation

Sparse Matrix-Vector Multiplication (SpMV) y=y+Ax The important kernel in scientific

applicationsPDE solver, simulation, etc.

Low performance Irregular memory access pattern

Motivation

GPU Huge computation power

Jason Yang, James Goodman. Symmetric Key Cryptography on Modern Graphics Hardware. http://ati.amd.com/technology/streamcomputing/asiacrypt2007.pdf

SpMV Introduction

CSR (Compressed Sparse Row)

b A_val=[1,2,4,1] A_col=[0,2,1,2] A_ptr=[0,2,3,4]

for(i = 0; i < n ; i++)

{ value = 0;

for(j = A_ptr[i]; j < A_ptr[i+1] ; j++)

value = value + A_val[j]*x[A_col[j]];

y[i] += value;

} x is accessed irregularly

x is accessed indirectly

SpMV Introduction

BCSR (Block Compressed Sparse Row) BCSR 2 × 3

AMD Stream Computing

Programming Model

AMD Stream Computing User Guide

AMD Stream Computing

AMD Brook+

AMD Stream Computing User Guide

GOSpMV Overview

GOSpMV Software Architecture

GOSpMV Overview

BCSR SpMV implementation on GPGPU

GOSpMV Overview

Automatic Performance Tuning

GOSpMV Overview

Off-line GPGPU Benchmark Dense matrix (different size) Every BCSR block size

100015002000250030003500400045005000

nzCount

1x12x23x34x4

GOSpMV Overview

Run-Time Evaluation(search optimal BCSR block size)

Input: Sparse Matrix A, GPGPU Benchmark data Pdense(block-format, nzd)

Output: the maximum P (A, block-format, σ), optimal BCSR block size

For each BCSR r × c block,

calculate fill ratio fErc(A, σ) with sample rate σ

Psp(block-format, nzEBCSR)= Pdense(block-format, nzd), nzd

is nearest to nzEBCSR

P (A, block-format, σ) = P (block-format, nzEBCSR)/ fErc(A, σ)

GOSpMV Performance Evaluation

Test box Intel Pentium Dual Core E2160/1.8GHz, 2.0GB memory GPU

AMD Radeon HD 3690 (RV670), theoretical peak:428.8 GigaFlOPS (single precision)

AMD Stream SDK v1.1-beta Ubuntu 8.04, Linux 2.6.24, gcc 4.2.3

Test matrices 8 sparse matrices, different size (small, medium, large)

Small (nonzeros < 100,000) Medium (100,000 < nonzeros < 1,000,000) Large (nonzeros >= 1,000,000)

Matrix Market and UF Sparse Matrix Collection .

Test matrices

AMD Radeon HD 3690 Result SpMV BCSR on GPGPU (1500 iterations)

1x12x23x34x4CPU

Different iterations (100,300,500,1000,1500)

The automatic performance tuning (1500 iterations)

The average speedup: 3.11

Conclusion

GOSpMV Performance Speedup AMD Radeon HD 3690

average: 3.11, max: 5.96, 1500 iterations

GOSpMV is suited for Medium matrices, Large matrices Iteration number>= 300 Regular matrices (low fill ratio)

In general, GOSpMV selects the better BCSR block size by automatic performance tuning technology.

Future Work

Double precision Support other BCSR block size (e.g. 8x8) New HW (AMD RV770) Automatic performance tuning strategy

Re-ordering matrix

Thank you ！Ｑ＆Ａ

Automatic Performance Tuning of SpMV on GPGPU Xianyi Zhang Lab of Parallel Computing Institute of...

Documents

Transcript of Automatic Performance Tuning of SpMV on GPGPU Xianyi Zhang Lab of Parallel Computing Institute of...

Xiaohui Hu and Xianyi Rui - Atlantis Press

clSpMV: A Cross-Platform OpenCL SpMV Framework on - Par Lab

A High-Order Ghost Method for Solving Moving …people.duke.edu/~xz101/Xianyi_Zengs_Home_Page/...7 FRG Seminar, A High-Order Ghost Method for Moving Boundary Condition Problem Xianyi

OpenSPARSE: An Open Platform for Sparse Basic …2018/10/04 · New Efficient General Sparse Matrix Formats for Parallel SpMV Operations. Euro-Par ’17. • [SpMV] G. Flegar, E.

Written Practice Lesson 20. 1. (+3) + (-14) 2. 4xyz – 3yz - zxy.

D.LIGHTII:MarketResearchand … · FOUNDER$ SAM$GOLDMAN,$CHIEFC ... Goldman, teamed up with two engineering students, Erica Estrada and Xianyi Wu, in a ... they developed proto-types

PANDUAN PENDAFTARAN - pace.uum.edu.mypace.uum.edu.my/images/Buku_Panduan_Pendaftaran_JJ171.pdf · Sekiranya sijil SPM/SPMV anda hilang, maka anda perlu mendapatkan salinan yang dikeluarkan

RECOMENDAÇÕES VACINAS - SPMVspmv.simposium.pt/media/13961/Recomendacoes_Vacinas_SPMV_E… · A Sociedade Portuguesa de Medicina do Viajante (SPMV), criada em 2015, considera fazer

타로 3축 자이로 간단셋팅하기(Tarot ZXY 3Axis Gyro)buyrc.co.kr/upload/product_manual/1601_m1.pdf · 2012-11-26 · 타로 3축 자이로 간단셋팅하기(Tarot ZXY 3Axis

OSCILLATION FOR ADVANCED DIFFERENTIAL EQUATIONS WITH OSCILLATING COEFFICIENTS · 2019. 8. 1. · OSCILLATION FOR ADVANCED DIFFERENTIAL EQUATIONS WITH OSCILLATING COEFFICIENTS XIANYI

ARMINES, BME, UNINOVAapi.ning.com/files/zxy*xRp5iQ4ZSO*nuXiqeTaUahACYmL*3cIJI4X5f5... · ARMINES Autonomous University of Madrid ... 1.5.1 Synthesized benchmarks ... gesture recognition

SpMV: A Memory-Bound Application on the GPU Stuck Between … · 2017-09-16 · SpMV multicore performance has been gaining on the GPU in recent years due to improved memory bandwidth

Automatic Performance Tuning of Sparse-Matrix-Vector-Multiplication (SpMV) and Iterative Sparse Solvers James Demmel demmel/cs267_Spr09.

Modeling of Thermal Conductivity of Stretch Knitted ...docsdrive.com/pdfs/ansinet/jas/2012/2283-2294.pdf · Hamza Alibi, Faten Fayala, Abdelmajid Jemni and Xianyi Zeng Subject: Journal

ARMINES, BME, UNINOVA, YORK - Ningapi.ning.com/files/zxy*xRp5iQ42n1g40skoWswxaHXgPzmwV9JPNq5... · are facing issues on both memory occupancy and execution time while dealing with

Development of a fuzzy logic model for designing personalized ease allowance of a women trouser Abher RASHEED Xianyi ZENG Sébastien THOMASSEY.

જીલ્લાન સામાજિક આર્થિક સમક્ષા ભાવનગર વર્ષ- zxy -y...1 જીલ્લાન સામાજિક આર્થિક

DivofCapitalCityToolInc. · divofcapitalcitytoolinc. submittaldata transfer/supplypump duplexunits simplexunits spm-spmb-spmv project date architect engineer contractor webster representative

Automatic Performance Tuning and Sparse-Matrix-Vector-Multiplication (SpMV) James Demmel demmel/cs267_Spr10.

ZXY]djl` cbkpyqv^riu - Wing On Travel · 26

ARMINES, BME, UNINOVAapi.ning.com/files/zxyxRp5iQ4ZSOnuXiqeTaUahACYmL*3cIJI4X5f5... · ARMINES Autonomous University of Madrid ... 1.5.1 Synthesized benchmarks ... gesture recognition