AMD 2012: HSA in Gaming

33
GPGPU ALGORITHMS IN GAMES How Heterogeneous Systems Architecture can be leveraged to optimize algorithms in video games Matthijs De Smedt Nixxes Software B.V. Lead Graphics Programmer

description

In this presentation from 2012, AMD details the potential benefits that developers could take advantage of to leverage additional performance efficiency boosts and parallellism in gaming via utilizing the HSA capabilities of selected silicon.

Transcript of AMD 2012: HSA in Gaming

Page 1: AMD 2012: HSA in Gaming

GPGPU ALGORITHMS IN GAMES How Heterogeneous Systems Architecture can be leveraged to optimize algorithms in video games Matthijs De Smedt Nixxes Software B.V. Lead Graphics Programmer

Page 2: AMD 2012: HSA in Gaming

| HSA Algorithms in Games | June 13th, 2012

CONTENTS

A short introduction Current usage of GPGPU in games Heterogeneous Systems Architecture Examples made possible by HSA

Page 3: AMD 2012: HSA in Gaming

INTRODUCTION

Page 4: AMD 2012: HSA in Gaming

| HSA Algorithms in Games | June 13th, 2012

VIDEOGAMES

Games are near real-time simulations Response time is key Most systems run in sync with the output frequency

– Rendering 60 frames per second – Allows for 16ms of processing time

Framerate is limited either by: – GPU – CPU – Display (VSync)

CPU

GPU

Input

Simulate

Render

Render

Page 5: AMD 2012: HSA in Gaming

| HSA Algorithms in Games | June 13th, 2012

HARDWARE

Typical hardware target for PC games: – One multicore CPU – One GPU

Multiple GPUs: CrossFire – Transparent to the application – Driver alternates frames between GPUs

GPUs are becoming more general purpose: – General Purpose GPU algorithms (GPGPU)

CrossFire

Page 6: AMD 2012: HSA in Gaming

GPGPU IN GAMES

Page 7: AMD 2012: HSA in Gaming

| HSA Algorithms in Games | June 13th, 2012

INTRODUCTION TO GPGPU

Rendering is a sequence of parallel algorithms GPUs are great at parallel computation Evolution of hardware and software to general purpose First GPGPU was accomplished with programmable rendering

– DirectX – OpenGL

Second generation using dedicated GPGPU APIs: – CUDA – OpenCL – DirectCompute

Third generation of GPGPU on the way: – Heterogeneous Systems Architecture

Page 8: AMD 2012: HSA in Gaming

| HSA Algorithms in Games | June 13th, 2012

GPGPU IN GAMES

Some GPGPU algorithms are being used in games right now. For example:

– Physics Particles

Fluid simulation

Destruction

– Specialized graphics algorithms Post-processing

All these algorithms drive visual effects

GPU particle system by Fairlight

Page 9: AMD 2012: HSA in Gaming

| HSA Algorithms in Games | June 13th, 2012

CURRENT PHYSICS EXAMPLE

GPGPU particle simulation using DirectCompute Great for simulating thousands of visible particles Results of simulation are never copied back to CPU

– Can not interfere with gameplay – Not synced in networked games

Example: Smoke particles that affect game AI

CPU

GPU

Call GPU

Simulate particles

Render particles

Page 10: AMD 2012: HSA in Gaming

| HSA Algorithms in Games | June 13th, 2012

GPGPU LIMITATIONS

Why isn’t GPGPU used more for non-graphics? Latency

– DirectX has many layers and buffers – DirectX commands are buffered up to multiple frames – Actual execution on the GPU is delayed

Copy overhead – GPU cannot directly access application memory – Must copy all data from and to the application

Functionality – Constrained programming models

Page 11: AMD 2012: HSA in Gaming

HETEROGENEOUS SYSTEMS ARCHITECTURE

Page 12: AMD 2012: HSA in Gaming

| HSA Algorithms in Games | June 13th, 2012

HETEROGENEOUS SYSTEMS ARCHITECTURE

Hardware Software

"Drivers" – HSA provides a new, thin Compute API – Very low latency – Unified Address Space – Exposes more hardware capabilities

HSA Intermediate Language – Virtual ISA – Introduces CPU programming features to the GPU

New features on discrete GPUs Accelerated Processing Unit

– Next generation processor – Multiple CPU and GPU cores on

the same die – Shared memory access – Soon to be as widespread as

multicore CPUs

New hardware and software

Page 13: AMD 2012: HSA in Gaming

| HSA Algorithms in Games | June 13th, 2012

USING THE APU

Distinction between two hardware configurations APU without discrete GPU

– Found in many laptops, soon in many desktops – Use the on-die GPU for rendering

APU with discrete GPU: – Hard-core gamers will still use discrete GPUs – Asymmetrical CrossFire – Or: Dedicate the on-die GPU to Compute algorithms Could result in massive speedup of algorithms

Using SIMD co-processors to offload the CPU is familiar to PS3 developers

Page 14: AMD 2012: HSA in Gaming

| HSA Algorithms in Games | June 13th, 2012

COPY OVERHEAD

Current Compute APIs require the application to explicitly copy all input and output memory – Copying can easily takes longer than processing on CPU! – Only small datasets or very expensive computations benefit from GPGPU

HSA introduces a Unified Address Space for CPU and GPU memory – CPU pointers on the GPU – Virtual memory on the GPU Paging over PCI-Express (discrete) or shared memory controller (APU)

– Fully coherent – Will make GPGPU an option for many more algorithms

Page 15: AMD 2012: HSA in Gaming

| HSA Algorithms in Games | June 13th, 2012

LATENCY

DirectX commands are buffered When the GPU is fully loaded this buffer is saturated Delay between scheduling and executing a GPGPU program on a busy GPU can take multiple frames

– Results will be several frames behind – Game simulation needs all objects to be in sync

GPGPU is currently impractical to use for anything but visual effects

Page 16: AMD 2012: HSA in Gaming

| HSA Algorithms in Games | June 13th, 2012

Page 17: AMD 2012: HSA in Gaming

| HSA Algorithms in Games | June 13th, 2012

Page 18: AMD 2012: HSA in Gaming

| HSA Algorithms in Games | June 13th, 2012

Page 19: AMD 2012: HSA in Gaming

| HSA Algorithms in Games | June 13th, 2012

Page 20: AMD 2012: HSA in Gaming

| HSA Algorithms in Games | June 13th, 2012

LATENCY

HSA’s new Compute API will reduce latency How to deal with a saturated GPU? A second GPU

– Dedicate the APU to Compute – Virtually no latency

HSA feature: Graphics pre-emption – Context switching on the GPU Interrupt a graphics task (typically a large command list)

Execute Compute algorithm

Switch back to graphics

– Can be used both on discrete GPUs or on the APU Choose the solution best suited to your needs

Page 21: AMD 2012: HSA in Gaming

| HSA Algorithms in Games | June 13th, 2012

APU USAGE EXAMPLE

GPU CPU

HSA

Frame

Schedule

DirectCompute

Execute

Execute

Page 22: AMD 2012: HSA in Gaming

| HSA Algorithms in Games | June 13th, 2012

PROGRAMMING MODEL

HSA Intermediate Language: HSAIL Designed for parallel algorithms JIT compiles your algorithm to CPU or GPU hardware

– Also makes multi-core SIMD programming easy! High level language features

– Object-oriented programming – Virtual functions – Exceptions

Debugging SysCall support

– I/O

Page 23: AMD 2012: HSA in Gaming

EXAMPLE ALGORITHMS

Page 24: AMD 2012: HSA in Gaming

| HSA Algorithms in Games | June 13th, 2012

PHYSICS

Current GPGPU physics solutions only output to the renderer With HSA you can simulate physics on the GPU

and get the results back in the same frame Use hardware acceleration to compute physics for

gameplay objects Reduced CPU load More objects, higher fidelity

Page 25: AMD 2012: HSA in Gaming

| HSA Algorithms in Games | June 13th, 2012

FRUSTUM CULLING

Videogames tend to be GPU-bound Avoid rendering what cannot be seen Cull objects outside the camera viewport

– Test the bounding box of every object against the camera frustum

– Currently done on the CPU – Lots of vector math – Can be computed completely in parallel!

CPU needs the results immediately – HSA will allow low-latency execution

Page 26: AMD 2012: HSA in Gaming

| HSA Algorithms in Games | June 13th, 2012

OCCLUSION CULLING

Objects may be hidden behind others: Occlusion Final per-pixel occlusion is only known after

rendering the scene Approximate occlusion by rendering low-detail

geometry – This kind of occlusion culling is currently being

done on CPU or on SPUs – Rendering is better suited to GPUs

HSA solution: – Software rasterization in Compute on the GPU – HSA does not yet expose graphics pipeline – Still much faster than a multicore CPU

Software occlusion culling in Battlefield 3

Page 27: AMD 2012: HSA in Gaming

| HSA Algorithms in Games | June 13th, 2012

SORTING

Typically several long lists per frame need sorting Sorting on the GPU using a parallel sort algorithm

– Ken Batcher: Bitonic or Odd-even mergesort Copy overhead currently negates the performance

advantage of using a GPU sorting algorithm HSA solution:

– Unified Address Space – GPU can sort in-place in system memory

Page 28: AMD 2012: HSA in Gaming

| HSA Algorithms in Games | June 13th, 2012

ASSET DECOMPRESSION

Game assets are stored compressed on disk Decompression is expensive The usage of some compression algorithms is

prevented by CPU speed Games are moving away from loading screens An APU with Unified Address Space

– Can be used to decompress new assets without taxing the CPU or discrete GPU

– Perhaps even use HSAIL I/O to read from disk – A better streaming experience for gamers

Page 29: AMD 2012: HSA in Gaming

| HSA Algorithms in Games | June 13th, 2012

PATHFINDING

Some strategy games simulate thousands of units Pathfinding over complex terrain with thousands of

moving units is very expensive Clever approximate solutions are often used

– Supreme Commander 2 “Flow field” GPGPU pathfinding with HSA

– Use one GPU thread per unit to do a deep search for an optimal path

– With HSA such an algorithm can page all requisite data from system memory and write back found paths

– APU could be fully saturated with pathfinding without impacting framerate

Page 30: AMD 2012: HSA in Gaming

| HSA Algorithms in Games | June 13th, 2012

CONCLUSION

Many algorithms in games are suitable for offloading to the GPU Heterogeneous Systems Architecture solves two major obstacles

– Latency – Memory access

HSAIL allows for entirely new kinds of GPGPU programs APUs can be used to offload the CPU HSA will finally make GPUs available to developers as full-featured co-processors

Page 31: AMD 2012: HSA in Gaming

| HSA Algorithms in Games | June 13th, 2012

THANK YOU

Any questions?

Page 32: AMD 2012: HSA in Gaming
Page 33: AMD 2012: HSA in Gaming

| HSA Algorithms in Games | June 13th, 2012

Disclaimer & Attribution The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is no obligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information and to make changes from time to time to the content hereof without obligation to notify any person of such revisions or changes. NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. AMD, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names used in this presentation are for informational purposes only and may be trademarks of their respective owners. The contents of this presentation were provided by individual(s) and/or company listed on the title page. The information and opinions presented in this presentation may not represent AMD’s positions, strategies or opinions. Unless explicitly stated, AMD is not responsible for the content herein and no endorsements are implied.