A User-Programmable Vertex Engine

34
A User-Programmable Vertex Engine Erik Lindholm Erik Lindholm Mark Kilgard Mark Kilgard Henry Moreton Henry Moreton NVIDIA Corporation NVIDIA Corporation Presented by Han-Wei Shen Presented by Han-Wei Shen

description

A User-Programmable Vertex Engine. Erik Lindholm Mark Kilgard Henry Moreton NVIDIA Corporation Presented by Han-Wei Shen. Where does the Vertex Engine fit?. Transform & Lighting. Traditional Graphics Pipeline. setup rasterizer. texture blending. frame-buffer anti-aliasing. - PowerPoint PPT Presentation

Transcript of A User-Programmable Vertex Engine

Page 1: A User-Programmable Vertex Engine

A User-Programmable Vertex Engine

A User-Programmable Vertex Engine

Erik LindholmErik Lindholm

Mark KilgardMark Kilgard

Henry MoretonHenry Moreton

NVIDIA CorporationNVIDIA Corporation

Presented by Han-Wei ShenPresented by Han-Wei Shen

Page 2: A User-Programmable Vertex Engine

Where does the Vertex Engine fit? Where does the Vertex Engine fit?

frame-bufferanti-aliasingframe-bufferanti-aliasing

textureblendingtexture

blending

setuprasterizer

setuprasterizer

Transform & LightingTransform & Lighting

Traditional Graphics Pipeline

Page 3: A User-Programmable Vertex Engine

frame-bufferanti-aliasingframe-bufferanti-aliasing

textureblendingtexture

blending

setuprasterizer

setuprasterizer

Transform & LightingTransform & Lighting

GeForce 3 Vertex EngineGeForce 3 Vertex Engine

VertexProgramVertex

Program

Page 4: A User-Programmable Vertex Engine

API SupportAPI Support

• Designed to fit into OpenGL and Designed to fit into OpenGL and D3D API’sD3D API’s

• Program mode vs. Fixed function Program mode vs. Fixed function modemode

• Load and bind programLoad and bind program

• Simple to add to old D3D and Simple to add to old D3D and OpenGL programsOpenGL programs

Page 5: A User-Programmable Vertex Engine

Programming Model Programming Model

• Enable vertex program Enable vertex program •glEnable(GL_VERTEX_PROGRAM_NV);

• Create vertex program objectCreate vertex program object

• Bind vertex program object Bind vertex program object

• Execute vertex program object Execute vertex program object

Page 6: A User-Programmable Vertex Engine

Create Vertex Program Create Vertex Program

• Programs (assembly) are defined Programs (assembly) are defined inline as inline as

character strings character strings static const GLubyte vpgm[] = “\!!VP1. 0\ DP4 o[HPOS].x, c[0], v[0]; \ DP4 o[HPOS].y, c[1], v[0]; \ DP4 o[HPOS].z, c[2], v[0]; \ DP4 o[HPOS].w, c[3], v[0]; \ MOV o[COL0],v[3]; \END";

Page 7: A User-Programmable Vertex Engine

Create Vertex Program (2)Create Vertex Program (2)

• Load and bind vertex programs Load and bind vertex programs similar to texture objects similar to texture objects glLoadProgramNV(GL_VERTEX_PROGRAM_NV, 7,

strelen(programString), programString);

….

glBindProgramNV(GL_VERTEX_PROGRAM_NV, 7);

Page 8: A User-Programmable Vertex Engine

Invoke Vertex Program Invoke Vertex Program

• The vertex program is initiated The vertex program is initiated when a vertex is given, i.e., whenwhen a vertex is given, i.e., when

glBegin(…)glBegin(…)

glVertex3f(x,y,z)glVertex3f(x,y,z)

… …

glEnd()glEnd()

Page 9: A User-Programmable Vertex Engine

Let’s look at the sample program

Let’s look at the sample program

static const GLubyte vpgm[] = “\!!VP1. 0\ DP4 o[HPOS].x, c[0], v[0]; \ DP4 o[HPOS].y, c[1], v[0]; \ DP4 o[HPOS].z, c[2], v[0]; \ DP4 o[HPOS].w, c[3], v[0]; \ MOV o[COL0],v[3]; \END";

O[HPOS] = M(c0,c1,c2,c3) * v - HPOS? O[COL0] = v[3] - COL0?

Calculate the clip space point position and Assign the vertex with v[3] as its diffuse color

Page 10: A User-Programmable Vertex Engine

Vertex Source

Vertex Program

Vertex Output

Program Constants

Temporary Registers

16x4 registers

128 instructions

96x4 registers

12x4 registers

15x4 registers

Programming ModelProgramming Model

V[0] …V[15] c[0]

…c[96]

R0 …R11

O[HPOS]O[COL0]O[COL1]O[FOGP]O[PSIZ]O[TEX0] …O[TEX7]

All quad floats

Page 11: A User-Programmable Vertex Engine

Input Vertex AttributesInput Vertex Attributes

• V[0] – V[15]V[0] – V[15]

• Aliased (tracked) with conventional per-Aliased (tracked) with conventional per-vertex attributes (Table 3)vertex attributes (Table 3)

• Use glVertexAttribNV() to explicitly assig Use glVertexAttribNV() to explicitly assig values values

• Can also specify a scalar value to the vertex Can also specify a scalar value to the vertex attribute array - glVertexAttributesNV()attribute array - glVertexAttributesNV()

• Can change values inside or outside Can change values inside or outside glBegin()/glEnd() pairglBegin()/glEnd() pair

Page 12: A User-Programmable Vertex Engine

Program ConstantsProgram Constants

• Can only change values outside glBegin()/glEnd() Can only change values outside glBegin()/glEnd() pair pair

• No automatic aliasing No automatic aliasing

• Can be used to track OpenGl matrices Can be used to track OpenGl matrices (modelview, projection, texture, etc.)(modelview, projection, texture, etc.)

• Example: Example:

glTrackMatrix(GL_VERTEX_PROGRAM_NV, 0, glTrackMatrix(GL_VERTEX_PROGRAM_NV, 0, GL_MODELVIEW_PROJECTION_NV, GL_MODELVIEW_PROJECTION_NV, GL_IDENTIGY_NV)GL_IDENTIGY_NV)

- track 4 contiguous program constants starting - track 4 contiguous program constants starting with c[0]with c[0]

Page 13: A User-Programmable Vertex Engine

Program Constants (cont’d)

Program Constants (cont’d)

DP4 o[HPOS].x, c[0], v[OPOS]DP4 o[HPOS].x, c[0], v[OPOS]

DP4 o[HPOS].y, c[1], v[OPOS]DP4 o[HPOS].y, c[1], v[OPOS]

DP4 o[HPOS].z, c[2], v[OPOS]DP4 o[HPOS].z, c[2], v[OPOS]

DP4 o[HPOS].w, c[3], v[OPOS]DP4 o[HPOS].w, c[3], v[OPOS]

What does it do? What does it do?

Page 14: A User-Programmable Vertex Engine

Program Constants (cont’d)

Program Constants (cont’d)

glTrackMatrixNV(GL_VERTEX_PROGRAM_NV, 4, glTrackMatrixNV(GL_VERTEX_PROGRAM_NV, 4, GL_MODEL_VIEW, GL_INVERSE_TRANPOSE_NV)GL_MODEL_VIEW, GL_INVERSE_TRANPOSE_NV)

DP3 R0.x, C[4], V[NRML]DP3 R0.x, C[4], V[NRML]

DP3 R0.y, C[5[, V[NRML]DP3 R0.y, C[5[, V[NRML]

DP3 R0.z, C[6], V[NRML] DP3 R0.z, C[6], V[NRML]

What doe it do? What doe it do?

Page 15: A User-Programmable Vertex Engine

Hardware Block DiagramHardware Block Diagram

Vertex Attribute Buffer (VAB)

Vector FP Core

Vertex In

Vertex Out

Page 16: A User-Programmable Vertex Engine

Vertex Attribute Buffer (VAB)

Vertex Attribute Buffer (VAB)

128 ( 32 x 4 )

128

dirty bitsVAB

….0 1 14 15IB

Page 17: A User-Programmable Vertex Engine

 

0 1 n-2 n-1........IB

0 1 n-2 n-1........OB

SIMDVector Unit

SpecialFunction

Unit

ConstantMemory

InstructionMemory

Registers

writemask

sw/neg

writemask

sw/negsw/neg

HW Block DiagramHW Block Diagram

Page 18: A User-Programmable Vertex Engine

Data PathData Path

FPU Core

NegateSwizzle

NegateSwizzle

NegateSwizzle

X Y Z WX Y Z W X Y Z W

Write Mask

X Y Z W

Page 19: A User-Programmable Vertex Engine

Instruction Set: The opsInstruction Set: The ops

• 17 instructions total17 instructions total

• MOV, MUL, ADD, MAD, DSTMOV, MUL, ADD, MAD, DST

• DP3, DP4DP3, DP4

• MIN, MAX, SLT, SGEMIN, MAX, SLT, SGE

• RCP, RSQ, LOG, EXP, LITRCP, RSQ, LOG, EXP, LIT

• ARL ARL

Page 20: A User-Programmable Vertex Engine

Instruction Set: The Core FeaturesInstruction Set: The Core Features

• Immediate access to sourcesImmediate access to sources

• Swizzle/negate on all sourcesSwizzle/negate on all sources

• Write mask on all destinationsWrite mask on all destinations

• DP3,DP4 most common graphics opsDP3,DP4 most common graphics ops

• Cross product is MUL+MAD with Cross product is MUL+MAD with swizzlingswizzling

• LIT instruction implements LIT instruction implements phongphonglightinglighting

Page 21: A User-Programmable Vertex Engine

Dot Product Instruction Dot Product Instruction

DP3 R0.x, R1, R2DP3 R0.x, R1, R2

R0.x = R1.x * R2.x + R1.y * R1.y + R0.x = R1.x * R2.x + R1.y * R1.y + R1.z * R2.zR1.z * R2.z

DP4 R0.x, R1, R2DP4 R0.x, R1, R2

4-component dot product 4-component dot product

Page 22: A User-Programmable Vertex Engine

MUL instruction MUL instruction

MUL R1, R0, R2 MUL R1, R0, R2 (component-wise (component-wise mult.)mult.)

R1.x = R0.x * R2.x R1.x = R0.x * R2.x

R1.y = R0.y * R2.y R1.y = R0.y * R2.y

R1.z = R0.z * R2.z R1.z = R0.z * R2.z

R1.w = R0.w * R2.w R1.w = R0.w * R2.w

Page 23: A User-Programmable Vertex Engine

MAD instruction MAD instruction

MAD R1, R2, R3, R4MAD R1, R2, R3, R4

R1 = R2 * R3 + R4 R1 = R2 * R3 + R4

*: component wise multiplication*: component wise multiplication

Example: Example:

MAD R1, R0.yzxw, R2.zxyw, -R1MAD R1, R0.yzxw, R2.zxyw, -R1

What does it do? What does it do?

Page 24: A User-Programmable Vertex Engine

Cross Product Coding ExampleCross Product Coding Example

# Cross product R2 = R0 x R1# Cross product R2 = R0 x R1

MUL R2, R0.zxyw, R1.yzxw;MUL R2, R0.zxyw, R1.yzxw;MAD R2, R0.yzxw, R1.zxyw, -R2;MAD R2, R0.yzxw, R1.zxyw, -R2;

Page 25: A User-Programmable Vertex Engine

Lighting instructionLighting instruction

LIT R1, R0 LIT R1, R0 (phong light model)(phong light model)Input: R0 = (diffuse, specular, ??, shiness)Input: R0 = (diffuse, specular, ??, shiness)

Output R1 = (1, diffuse, specular^shininess, Output R1 = (1, diffuse, specular^shininess, 1)1)

Usually followed by Usually followed by

DP3DP3 o[COL0], C[21], R1 o[COL0], C[21], R1 (assuming using (assuming using c[21]) c[21])

where C[xx] = (ka, kd, ks, ??) where C[xx] = (ka, kd, ks, ??)

Page 26: A User-Programmable Vertex Engine

Ready to trace some program? Ready to trace some program?

Page 27: A User-Programmable Vertex Engine

Previous Work: Geometry EnginePrevious Work: Geometry Engine

• High bandwidth + lots of FlopsHigh bandwidth + lots of Flops

• Low clock rateLow clock rate

• No architectural continuityNo architectural continuity

• VERY hard to programVERY hard to program

• Some high-level language support Some high-level language support (maybe)(maybe)

• A compromise solution (vtx,prim,pix,A compromise solution (vtx,prim,pix,…)…)

Page 28: A User-Programmable Vertex Engine

Alternative: The CPUAlternative: The CPU

• Low bandwidth + reasonable FlopsLow bandwidth + reasonable Flops

• High clock rateHigh clock rate

• Excellent architectural continuityExcellent architectural continuity

• VERY hard to use efficientlyVERY hard to use efficiently

• Excellent high-level language Excellent high-level language supportsupport

• Flexible, but often too slowFlexible, but often too slow

Page 29: A User-Programmable Vertex Engine

New Design: The Vertex EngineNew Design: The Vertex Engine

• Simple hardware for a commodity Simple hardware for a commodity GPUGPU

• Allows user to manipulate vertex Allows user to manipulate vertex transformtransform

• Simple to use programming modelSimple to use programming model

• Superset of fixed function modeSuperset of fixed function mode

Page 30: A User-Programmable Vertex Engine

Why Vertex Processing?Why Vertex Processing?

• Very parallelVery parallel

• Use single vertex programming Use single vertex programming modelmodel

• Hardware can batch or interleaveHardware can batch or interleave

• KISSKISS

Page 31: A User-Programmable Vertex Engine

Why Not Primitive Processing?Why Not Primitive Processing?

• Face culling and clipping break Face culling and clipping break parallelismparallelism

• Complicates memory accessesComplicates memory accesses

• Inefficient (control takes time)Inefficient (control takes time)

• Let hardware designers optimizeLet hardware designers optimize

Page 32: A User-Programmable Vertex Engine

Programming Model: Vertex I/OProgramming Model: Vertex I/O

• Streaming vertex architectureStreaming vertex architecture

• Source data converted to floatsSource data converted to floats

• Source data loadedSource data loaded

• Run programRun program

• Destination data drainedDestination data drained

• Destination data re-formatted for Destination data re-formatted for hwhw

Page 33: A User-Programmable Vertex Engine

Hardware ImplementationHardware Implementation

• Vector SIMD Unit + Special Vector SIMD Unit + Special Function UnitFunction Unit

• Multithreaded and pipelined to hide Multithreaded and pipelined to hide latencylatency

• Any one instruction/cycleAny one instruction/cycle

• All instructions equal latencyAll instructions equal latency

• Free swizzling/negate/write mask Free swizzling/negate/write mask supportsupport

Page 34: A User-Programmable Vertex Engine

ConclusionConclusion

• Very simple, efficient Very simple, efficient implementationimplementation

• Allows vertex programming Allows vertex programming continuitycontinuity

• Stanford Imagine ArchitectureStanford Imagine Architecture

• A work in progress, lots more to A work in progress, lots more to come…come…

• We welcome your feedbackWe welcome your feedback