GRAMPS: A Programming Model For Graphics Pipelines
description
Transcript of GRAMPS: A Programming Model For Graphics Pipelines
![Page 1: GRAMPS: A Programming Model For Graphics Pipelines](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681448f550346895db128df/html5/thumbnails/1.jpg)
1
GRAMPS: A Programming Model
For Graphics PipelinesJeremy Sugerman,
Kayvon Fatahalian, Solomon Boulos,Kurt Akeley, Pat Hanrahan
![Page 2: GRAMPS: A Programming Model For Graphics Pipelines](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681448f550346895db128df/html5/thumbnails/2.jpg)
2
“The Graphics Pipeline”
Central to the rise of 3D hardware and software.
A stable and universal abstraction
Shaped the evolution of the field…
… while leaving enormous room for innovation.
InputAssembler
VertexShader
RastOutputMerger
PixelShader
![Page 3: GRAMPS: A Programming Model For Graphics Pipelines](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681448f550346895db128df/html5/thumbnails/3.jpg)
3
RastOutputMerger
PixelShader
Fixed Function
The Graphics Pipeline is evolving
Programmable Shading
InputAssembler
VertexShader
Direct3D 10Direct3D 11
PixelShader
VertexShader
HullShader
DomainShader
Tessellator
GeometryShader
StreamOutput
![Page 4: GRAMPS: A Programming Model For Graphics Pipelines](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681448f550346895db128df/html5/thumbnails/4.jpg)
4
“GPU” is evolving, too
Continued drive for algorithmic innovation and advanced rendering techniques
First class programming models for compute:
– OpenCL, compute shaders, vendor specific, …
New / different hardware implementations:
– E.g., Larrabee, CPU-GPU combinations / hybrids
– Even NVIDIA and AMD GPUs are very different
![Page 5: GRAMPS: A Programming Model For Graphics Pipelines](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681448f550346895db128df/html5/thumbnails/5.jpg)
5
From fixed to programmable (again)
Idea: Evolve the pipeline itself from preset configurations to a programmable entity
![Page 6: GRAMPS: A Programming Model For Graphics Pipelines](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681448f550346895db128df/html5/thumbnails/6.jpg)
6
Programming model and run-time for parallel hardware
Graphs of stages and queues
GRAMPS handles scheduling, parallelism, data-flow
GRAMPS
Example: Simple GRAMPS Graph
ShaderThreadFixed
Function
![Page 7: GRAMPS: A Programming Model For Graphics Pipelines](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681448f550346895db128df/html5/thumbnails/7.jpg)
7
The Graphics Pipeline becomes an app!
Structure/setup is (application) software
– Customized or completely novel renderers
Reuses current hardware: FIFOs, shader cores, rast, …
Analogous to the transition to programmable shading
– Proliferation of new use cases and parameters
– Not (unthinkably) radical
Fra
me
Buf
fer
VertexInput PixelRast Merge
![Page 8: GRAMPS: A Programming Model For Graphics Pipelines](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681448f550346895db128df/html5/thumbnails/8.jpg)
8
Writing a GRAMPS application
Design the execution graph:
Design the stages:
– Shaders
– Threads (and Fixed Function stages)
Instantiate and launch.
VertexInput Pixel MergeRast
Fra
me
Buf
fer
Merge
![Page 9: GRAMPS: A Programming Model For Graphics Pipelines](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681448f550346895db128df/html5/thumbnails/9.jpg)
9
Fra
me
Buf
fer
VertexInput Pixel MergeRast
More Detail – Queues
Queues operate at a “packet” granularity
– “Large bundles of coherent work”
GRAMPS can optionally enforce ordering
– Required for some workloads, adds overhead
![Page 10: GRAMPS: A Programming Model For Graphics Pipelines](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681448f550346895db128df/html5/thumbnails/10.jpg)
10
Fra
me
Buf
fer
RastVertex PixelInput Merge
More Detail – Shaders
Shaders: Like pixel (or compute) shaders, stateless
– Automatic instancing, pre-reserve/post-commit
“Collection” packets: shared header and N elements
New: “Push” operation to coalesce variable outputs
Vertex
![Page 11: GRAMPS: A Programming Model For Graphics Pipelines](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681448f550346895db128df/html5/thumbnails/11.jpg)
11
Fra
me
Buf
fer
RastVertex PixelInput Merge
More Detail – Thread/Fixed Function
Threads: Like POSIX threads, stateful
– Explicit reserve/commit on queues
Fixed Function: Effectively non-programmable Threads
Rast
![Page 12: GRAMPS: A Programming Model For Graphics Pipelines](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681448f550346895db128df/html5/thumbnails/12.jpg)
12
More Detail – Queue Sets
Queue sets enable binning-style algorithms
One logical queue with multiple lanes (or bins)
– One consumer at a time per lane
– Many lanes with data allows many parallel consumers
VertexInput Rast
Fra
me
Buf
fer
Pixel MergeMerge
Fra
me
Buf
fer
Pixel
![Page 13: GRAMPS: A Programming Model For Graphics Pipelines](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681448f550346895db128df/html5/thumbnails/13.jpg)
13
Quick Comparison to “Streaming”
Streaming: “squeeze out every FLOP”
– Goals: throughput, bulk transfer, arithmetic intensity
– Intensive static analysis, program transformation
– Bound space, data access, execution time
GRAMPS: “interesting applications are irregular”
– Goals: throughput, dynamic, data-dependent code
– Aggregate work at run-time, heterogeneous hardware
– Streaming apps are GRAMPS apps
![Page 14: GRAMPS: A Programming Model For Graphics Pipelines](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681448f550346895db128df/html5/thumbnails/14.jpg)
14
Evaluation: Design Goals
Broad application scope: preferable to roll-your-own
Multi-platform: suits many hardware configurations
High performance: competitive with roll-your-own
Tunable: expert users can optimize their apps
Optimized Implementations: inform, and are informed by, hardware
![Page 15: GRAMPS: A Programming Model For Graphics Pipelines](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681448f550346895db128df/html5/thumbnails/15.jpg)
15
Broad Application Scope
Ray Tracing Graph
= Thread = Queue
= Shader = Stage Output
= Fixed-Func = Push Output
Rasterization Pipeline (with ray tracing extension)
Ray Tracing Extension
Fra
me
Buf
fer
PSRastRO
Ver
tex
Buf
fers
VSNIAN
VS1IA1 OM
Trace PS2
Tiler Sampler Camera Intersect
ShadeShadowIntersect
Blend
Fra
me
Buf
fer
![Page 16: GRAMPS: A Programming Model For Graphics Pipelines](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681448f550346895db128df/html5/thumbnails/16.jpg)
16
Multi-Platform: Two (Simulated) Machines
CPU-Like: 8 Fat Cores, Rast
GPU-Like: 1 Fat Core, 4 Micro Cores, Rast, Sched
![Page 17: GRAMPS: A Programming Model For Graphics Pipelines](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681448f550346895db128df/html5/thumbnails/17.jpg)
17
High Performance – Metrics
Priority #1: Show scale out parallelism
– Can GRAMPS exploit the application parallelism and fill the machine?
Priority #2: Show ‘reasonable’ bandwidth / storage requirements for queueing
– What is the worst case total footprint of all queues?
– A scheduling problem: trade-off with possible parallelism
![Page 18: GRAMPS: A Programming Model For Graphics Pipelines](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681448f550346895db128df/html5/thumbnails/18.jpg)
18
High Performance – Scheduling
Very simple static prototype scheduler (both platforms):
Static stage priorities:
Limited pre-emption points
No dynamic weighting of current queue depths
(Lowest)
(Highest)
1 2 3 4
5 6 7
Fra
me
Buf
fer
![Page 19: GRAMPS: A Programming Model For Graphics Pipelines](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681448f550346895db128df/html5/thumbnails/19.jpg)
19
High Performance – Results
Three scenes x { Rasterization, Ray Tracer, Hybrid }
Parallelism is 95+% for all but rasterized fairy (~80%).
Queues are small: < 600KB CPU-like, < 1.5MB GPU-like
Order costs footprint
![Page 20: GRAMPS: A Programming Model For Graphics Pipelines](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681448f550346895db128df/html5/thumbnails/20.jpg)
20
Also: raw counters, statistics, text log of run-time activity
Tunability – Understanding Performance
GRAMPSviz:
![Page 21: GRAMPS: A Programming Model For Graphics Pipelines](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681448f550346895db128df/html5/thumbnails/21.jpg)
21
Tiler Sampler Camera Intersect
ShadeShadowIntersect
Blend
Fra
me
Buf
fer
Execution Graph topology / design:
Sizing critical queues:
Tunability – Lessons Learned
PSRast
Fra
me
Buf
fer
PS + OMRast
Fra
me
Buf
fer
Sort-Middle Sort-Last
OM
![Page 22: GRAMPS: A Programming Model For Graphics Pipelines](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681448f550346895db128df/html5/thumbnails/22.jpg)
22
Summary
After a long era of stability, the Graphics Pipeline is undergoing rapid change.
GRAMPS enables software-defined custom pipelines.
– The Graphics Pipeline becomes an app
– Prototypes show plausible performance, resource needs
– Handles heterogeneous parallelism well
– Applicable beyond rendering and beyond GPUs
![Page 23: GRAMPS: A Programming Model For Graphics Pipelines](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681448f550346895db128df/html5/thumbnails/23.jpg)
23
Thank You
Our funding agencies:
Stanford Pervasive Parallelism Lab
Department of the Army Research
Intel Corporation
Rambus Stanford Graduate Fellowship
Intel PhD Fellowship
NSF Graduate Research Fellowship
http://graphics.stanford.edu/papers/gramps-tog/