PL-4089, Accelerating and Evaluating OpenCL Graph Applications, by Shuai Che, Bradford Bechmann,...

ACCELERATING AND EVALUATING OPENCL GRAPH APPLICATIONS

SHUAI CHE , BRAD BECKMANN, STEVE REINHARDT AND KEVIN SKADRON

| Accelera8ng and Evalua8ng OpenCL Graph Applica8ons| November 20, 2013 | CONFIDENTIAL 2

AGENDA

Background and Graph Applica8ons

Panno8a OpenCL™ Graph Applica8ons

Performance Evalua8on and Discussion

GRAPH APPLICATIONS

!  Intelligence ‒ Business analy8cs, security and scien8fic discovery

! Social networks ‒ Facebook, TwiVer, LinkedIn, Weibo, etc.

! Life science and healthcare ‒ Disease and drug research, life system research

!  Infrastructure ‒ Transporta8on, power grid, energy and water supply

! Scien8fic and engineering simula8ons

GRAPH APPLICATIONS

! Low arithme8c intensity and data reuse ! Not floa8ng-‐point intensive ! Branch divergence

‒ Part of threads in a wavefront are ac8ve

! Memory divergence ‒ Data distributed in different regions of memory ‒ A challenge to op8mize data layouts and memory accesses

! Load imbalance ‒ Uneven work distribu8on across different threads ‒ Short-‐running threads wait for long-‐running threads

! Parallelism ‒ Changing degree of parallelism across itera8ons ‒ Underu8liza8on of compute units for certain phases

PANNOTIA

! A graph applica8on suite for GPGPU ! Eight diverse graph algorithms, e.g., shortest path, graph par88oning, web analysis and

social network !  Implemented in C + OpenCL™ ! Some are OpenCL implementa8ons based on algorithms of prior work !  Ini8al implementa8on is for a single GPU node ! Further algorithmic and hardware-‐specific op8miza8ons are in progress ! Details of Panno8a can be found in our paper published in 2013 IEEE Interna8onal

Symposium on Workload Characteriza8on

PANNOTIA

Applica7ons Domains Single-‐Source Shortest Path Shortest Path

Connected Component Labeling Graph Clustering

Graph Coloring Graph Par88oning

Floyd-‐Warshall Shortest Path

Maximal Independent Set Graph Par88oning

Betweeness Centrality Social Network

Friend Recommenda8on Social Network

Page Rank Web Analysis

GRAPH INPUT AND DATA STRUCTURE

! Real-‐world graphs ‒ The University of Florida Sparse Matrix Collec8on ‒ The 9th DIMACS Implementa8on Challenges ‒ The10th DIMACS Implementa8on Challenges

!  Synthe8c graphs ‒  Random-‐graph generator from Georgia Tech

!  Graph input formats ‒  Coordinate Format ‒  METIS ‒  Matrix Market

!  Data structure representa8on ‒  CSR, COO, ELL … ‒  2D adjacency matrix

SINGLE SOURCE SHORTEST PATH

! Finds the path with the shortest path between the source node and all the other nodes in the graph

Vid Dist

CONNECTED COMPONENT LABELING

! Detect connected regions in graphs and images ! Connected components are the nodes in a graph that point to the same root

GRAPH COLORING

! Assign colors (integers) to ver8ces with no two adjacent ver8ces with the same color

FLOYD-‐WARSHALL

! Solves the all-‐pairs shortest path (APSP) problem – finding the shortest path from every possible source to every possible des8na8on

!  A dynamic programming approach shortestPath(i, j, k) returns the shortest path from i to j with ver8ces from {1,2,...,k}

MAXIMAL INDEPENDENT SET

!  Independent set: no two ver8ces are neighbors ! Maximal Independent set: impossible to add another vertex to s8ll keep independent

4 2 3 7

S = {0, 4, 6} is an Maximal Independent Set

BETWEENNESS CENTRALITY

! Centrality determines the rela8ve importance of a vertex within the graph (e.g. degree, betweenness, closeness …)

! Betweenness Centrality quan8fies the number of 8mes a node acts as a bridge along the shortest path between two other nodes.

∑≠≠

=tvs st

st vvBCσσ )()(

no. of shortest paths between nodes s and t )(vstσ

stσno. of shortest paths between nodes s and t passing through v

FRIEND RECOMMENDATION

!  Recommend friend connec8ons – a common feature in social websites !  A simple Map-‐Reduce like algorithm

“Andy” = [ “Brad”, “Derek”, “Shuai”, …] Andy ! <“Brad”, “Derek”, “Andy”>

<“Brad”, “Shuai”, “Andy”> <“Derek”, “Brad”, “Andy”>

<“Derek”, “Shuai”, “Andy”> <“Shuai”, “Derek”, “Andy”>

<“Shuai”, “Brad”, “Andy”> Andy recommends Brad to Shuai

PAGERANK

PERFORMANCE BENEFITS

! Speedups are up to 11x (an AMD “Tahi8” discrete GPU v.s. 4 CPU cores on A8) ! PCI-‐E overhead is included ! Performance benefits depend on graph input datasets

EXECUTION TIME BREAKDOWN (D-‐GPU)

! The por8on of GPU execu8on: 8% -‐ 99% ! Some further GPU offload can be done (e.g. FRD and MIS)

PARALLELISM (ACTIVE VERTICES OVER TIME)

Single-‐Source Shortest Path (Road Network -‐ NY)

Graph Coloring (G3 Circuit)

120000

400000

LOAD IMBALANCE (DEGREE DISTRIBUTION) Single-‐Source Shortest Path (Road Network)

Graph Coloring (G3 Circuit)

1 2 3 4 5 6 7 8

1 2 3 4 5

HIERARCHICAL CLUSTERING

!  Different program-‐input pairs may have vastly different characteris8cs!

CLR-‐G3-‐circuit CLR-‐ecology

DJK-‐US-‐NW DJK-‐US-‐CA

BC-‐2k BC-‐1k

CCL-‐lena CCL-‐deposit

FW-‐512-‐64k FW-‐256-‐16k

MIS-‐US-‐NW

MIS-‐shell CLR-‐shell

MIS-‐ecology

PRK-‐flicker FRD-‐coAuthor

PRK-‐2k

0.0 4.6

L2 HIT RATE OVER TIME (SSSP)

! The cache hit rate first improves, then degrades, improves again and finally degrades with some fluctua8ons in the middle

60 Hit R

ARCHITECTURAL IMPLICATIONS (SCALAR UNIT)

Scalar SIMD

Graph Traversal

Scalar

! Possibly include narrower SIMD units or heterogeneous SIMD units

! Resource management and scheduling

‒ Switch the task between the CPU and the GPU based on parallelism ‒ Use only “enough” SIMD engines and save power

ARCHITECTURAL IMPLICATIONS

Scalar Narrow SIMD Wide SIMD

CPU GPU

120000

CONCLUSION AND FUTURE WORK

! Graph applica8ons are an emerging workload domain ! Panno8a is a first-‐step aVempt to evaluate diverse graph building blocks on GPUs

Next-‐Step Goals: ! Add more applica8ons (e.g. matching, spanning tree, flow) ! Op8mize Panno8a applica8ons ! Extend to mul8ple GPU nodes and across CPU and GPU

DISCLAIMER & ATTRIBUTION

The informa8on presented in this document is for informa8onal purposes only and may contain technical inaccuracies, omissions and typographical errors.

The informa8on contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, so{ware changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obliga8on to update or otherwise correct or revise this informa8on. However, AMD reserves the right to revise this informa8on and to make changes from 8me to 8me to the content hereof without obliga8on of AMD to no8fy any person of such revisions or changes.

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

ATTRIBUTION

© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combina8ons thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdic8ons. OpenCL is a registered trademark of Apple Inc. Other names are for informa8onal purposes only and may be trademarks of their respec8ve owners.

PL-4089, Accelerating and Evaluating OpenCL Graph Applications, by Shuai Che, Bradford Bechmann,...

Technology

Transcript of PL-4089, Accelerating and Evaluating OpenCL Graph Applications, by Shuai Che, Bradford Bechmann,...

Quan Tri San Xuat 4089

© 2005, Kevin Skadron Designing Cool Chips in an Era of Gigascale Integration: History, Challenges, and Opportunities Kevin Skadron LAVA/HotSpot Lab Dept.

© 2006, Kevin Skadron Power-Aware and Temperature-Aware Architecture Kevin Skadron LAVA/HotSpot Lab Dept. of Computer Science University of Virginia Charlottesville,

8pp Myclinic 4089/final - Tunstall Healthcare

BECHMANN AVA und SIRADOS Baudaten · BECHMANN Nach Installation von Support Fon +49 821 – 25759-0 Fax +49 821 – 25759-99 support@bechmann.de BECHMANN AVA und SIRADOS® Baudaten

Ideenmanagement und betriebliches Vorschlagswesen · Reinhard Bechmann Ideenmanagement und betriebliches Vorschlagswesen unter Mitarbeit von Manuel Ortner

© 2004, 2005, Kevin Skadron A Quick Thermal Tutorial Kevin Skadron Mircea Stan Univ. of Virginia HotSpot group.

Kevin Skadron University of Virginia Dept. of Computer Science LAVA Lab Wrapup and Open Issues.

Markus Bechmann Vice President Digital Strategy SAP SE · BIG DATA in der Energiewelt Markus Bechmann – Vice President Digital Strategy – SAP SE 21. November 2017

Jeremy W. Sheaffer 1 David P. Luebke 2 Kevin Skadron 1

Brussel II 4089 - 12 - Bleditsch - Galanterie Per Il Liuto Solo - A Maj

Studying Thermal Management for Graphics …skadron/Papers/ispass05_gpu.pdfStudying Thermal Management for Graphics-Processor Architectures Jeremy W. Sheaffer, Kevin Skadron, David

Experiences Accelerating MATLAB Systems Biology Applications Heart Wall Tracking Lukasz Szafaryn, Kevin Skadron University of Virginia.

Head *Ula*gt,lr,,g - Mangalam Organics · 2019. 4. 13. · *Ula*gt,lr,,g Head Office : 812, Tulsiani Chambers, 212, Nariman Point, Mumbai - 400 021. rea. : 91-224920 4089 I 2282 4089

4089 p2-p psp-desain dan produksi kria keramik

Massenspektrometrische Analyse von enzymatisch …hss.ulb.uni-bonn.de/2015/4089/4089.pdf · Metabolite, der Energiegewinnung und der Signaltransduktion in der Zelle innehaben, ersichtlich

Qualitätsmanagement und kontinuierlicher Verbesserungsprozess · Reinhard Bechmann unter Mitarbeit von Silke Landerer Qualitätsmanagement und kontinuierlicher Verbesserungsprozess

TNI AU Bentuk Skadron Pengintai

Brussel II 4089 - 11 - Blohm - Partie Per Il Liuto Solo - D Min

Consulting - serwiss.bib.hs-hannover.de · Consulting Erhebungsmethoden und Erhebungsmix Günter Buchholz Hrsg. Jascha Bechmann Erscheinungsjahr 2011 Hochschule Hannover Ricklinger

Head Ulagt,lr,,g - Mangalam Organics · 2019. 4. 13. · Ulagt,lr,,g Head Office : 812, Tulsiani Chambers, 212, Nariman Point, Mumbai - 400 021. rea. : 91-224920 4089 I 2282 4089