Graphics Processing Unit Architecture (GPU Arch)data0003/Talks/gpuarch.pdf · 04/14/05 Ajit Datar,...

04/14/05 Ajit Datar, Apurva Padhye Computer Architecture

Graphics Processing Unit Architecture (GPU Arch)With a focus on NVIDIA GeForce

6800 GPU

What is a GPU• From Wikipedia : A specialized processor

efficient at manipulating and displaying computer graphics

• 2D primitive support – bit block transfers• Some might have video support • And of course 3D support (a topic at the heart of

this presentation)• GPUs are optimized for raster graphics

The Graphics pipeline

Modern graphics pipeline (left) (ref: http://graphics.stanford.edu/courses/cs448a-01-fall/lectures/lecture2/walk010.html)

OpenGL 3D pipeline (right) (ref: http://www.vorlesungen.uos.de/informatik/ifc99-00/opengl/images/pipeline.gif)

3D graphics software interfaces

• Low level• Specification not an API• Crossplatform implementations• Popular with some games• A simple seq of opengl instr (in C)

glClearColor(0.0,0.0,0.0,0.0);glClear(GL_COLOR_BUFFER_BIT);glColor3f(1.0,1.0,1.0);glOrtho(0.0,1.0,0.0,1.0,-1.0,1.0);glBegin(GL_POLYGON);

glVertex(0.25,0.25,0.0);glVertex(0.75,0.25,0.0);glVertex(0.75,0.75,0.0);glVertex(0.25,0.75,0.0);

glEnd();

OpenGL (v2.0 as of now)

3D graphics software interfaces

• High level• 3D API – part of DirectX• Very popular in the gaming industry• Microsoft platforms only

Direct 3D (v9.0c as of now)

NVIDIA GeForce 6800

• Impressive performance stats 600 Million vertices/s 6.4 billion texels/s 12.8 billion pixels/s rendering z/stencil only 64 pixels per clock cycle early z-cull (reject rate)

• Riva series (1st DirectX compatible)– Riva 128, Riva TNT, Riva TNT2

• GeForce Series– GeForce 256, GeForce 3 (DirectX 8), GeForce FX, GeForce 6 series

General info

NVIDIA GeForce 6800Block Diagram

• Allow shader to be applied to each vertex

• Transformation and other per vertex ops

• Allow vertex shader to fetch texture data (6 series only)

NVIDIA GeForce 6800Vertex Processor (or vertex shader)

• Cull/clip – per primitive operation and data preparation for rasterization

• Rasterization: primitive to pixel mapping

• Z culling : quick pixel elimination based on depth

NVIDIA GeForce 6800Clipping, Z Culling and Rasterization

• Fragment : a candidate pixel

• Varying number of pixel pipelines

• Operates on quads – for texture LOD

• SIMD processing hides texture fetch latency

• Texture caches

NVIDIA GeForce 6800Fragment processor and Texel pipeline

• Texture unit can apply filters.• Shader units can perform 8

math ops (w/o texture load) or 4 math ops (with texture load) in a clock

• Fog calculation done in the end

• Pixels almost ready for framebuffer

NVIDIA GeForce 6800Fragment processor and Texel pipeline

• Depth testing• Stencil tests• Alpha operations• Render final color to

target buffer

NVIDIA GeForce 6800Z compare and blend

NVIDIA GeForce 6800

• Vertex stream frequency– hardware support for looping over a subset of

vertices• Example: rendering the same object

multiple times at diff locations (grass, soldiers, people in stadium)

Features – Geometry Instancing

NVIDIA GeForce 6800

• Early culling and clipping;– cull nonvisible primitives at high rate

• Rasterization– supports Point Sprite, Aliased and anti-aliasing and triangles, etc

• Z-Cull– Allows high-speed removal of hidden surfaces

• Occlusion Query– Keeps a record of the number of fragments passing or failing the

depth test and reports it to the CPU

Features - continued

NVIDIA GeForce 6800

• Texturing– Extended support for non power of two textures to match support

for power of two textures - Mipmapping, Wrapping and clamping, Cube map and 3D textures.

• Shadow Buffer Support– Fetches shadow buffer as a projective texture and performs z-

compares of the shadow buffer data to distance from light.

Features Continued

NVIDIA GeForce 6800

• Increased instruction count (upto 65535 instructions.)

• Fragment processor; multiple render targets.• Dynamic flow control branching• Vertex texturing• More temporary registers.

Features – Shader Support

NVIDIA GeForce 6800

• Co-issue:– Each four-component-wide vector unit is

capable of executing two independent instructions in parallel

– More scalar computations done in less time.

• Dual issue: – two independent instructions can be

executed on different parts of the shader pipeline

– Makes scheduling easy and more efficient.

Features – Co-issue and Dual Issue

GPGPU• Look at GPU as a fast SIMD processor• It is a specialized processor, so not all

programs can be run• Example computational programs – FFT,

Cryptography, Ray Tracing, Segmentation and even sound processing!

GPU from comp arch perspective

• Focus on Floating point math• fp32 and fp16 precision support for intermediate

calculations• 6 four-wide fp32 vector MADs/clock in shaders and 1

scalar multifunction op• 16 four-wide fp32 vector MADs/clock in frag-proc plus 16

four-wide fp32 MULs• Dedicated fp16 normalization hardware

Processing units

• Use dedicated but standard memory architectures (eg DRAM)

• Multiple small independent memory partitions for improved latency

• Memory used to store buffers and optionally textures• In low-end system (Intel 855GM) system memory is

shared as the Graphics memory

Memory

• GPU interfaces with the CPU using fast buses like AGP and PCI Express

• Port speeds– PCI express upto 8GB/sec ( 4 + 4 )

Practically upto ( 3.2 + 3.2 )– AGP upto 2 GB/sec (for 8x AGP)

• Such bus speeds are important because textures and vertex data needs to come from CPU to GPU (after that it's the internal GPU bandwidth that matters)

System Interface

• Texture caches (2 level)– Shared between vertex procs and fragment procs– Cache processed/filtered textures

• Vertex caches– cache processed and unprocessed vertexes– improve computation and fetch performance

• Z and buffer cache and write queues

Caches

• http://download.nvidia.com/downloads/nZone/videos/nvidia/nalu.wmv

References• Nvidia 6800 chapter from GPU Gems 2

http://download.nvidia.com/developer/GPU_Gems_2/GPU_Gems2_ch30.pdf

• OpenGL designhttp://graphics.stanford.edu/courses/cs448a-01-fall/design_opengl.pdf

• OpenGL programming guide (ISBN: 0201604582)• Real time graphics architectures lecture notes

http://graphics.stanford.edu/courses/cs448a-01-fall/

• GeForce 256 overviewhttp://www.nvnews.net/reviews/geforce_256/gpu_overview.shtml

• NVIDIA websitehttp://nvidia.com

So long and thanks for all the fish

(Oh yeah ... any questions?)

Graphics Processing Unit Architecture (GPU Arch)data0003/Talks/gpuarch.pdf · 04/14/05 Ajit Datar,...

Documents

Transcript of Graphics Processing Unit Architecture (GPU Arch)data0003/Talks/gpuarch.pdf · 04/14/05 Ajit Datar,...

Efficient Rendering with DirectX 12 on Intel Graphics · –Code-named “Haswell”, Gen 7.5 GPU architecture –Intel HD Graphics 4400/4600/5000 –Intel Iris Graphics 5100, Iris

Modern GPU Architecture

AMD GPU Architecture - GPGPU€¦ · 2 | AMD GPU Architecture | September 13th, 2009 Overview •AMD GPU architecture •How OpenCL maps on GPU and CPU •How to …

CS 380 - GPU and GPGPU Programming Lecture 1: Introduction · 2 Lecture Overview Goals • Learn GPU architecture and programming; both for graphics and for computing (GPGPU) •

Graphics Processing Unit GPU

Graphics Processing Unit (GPU) Acceleration of Machine ...flightsoftware.jhuapl.edu/files/FSW09_Tweddle.pdf · Graphics Processing Unit (GPU) Acceleration of Machine Vision Software

GPU Architecture: Implications & Trendss08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf · GPU Architecture: Implications & Trends ... GPU architecture increasingly centers

ARM Mali Architecture - Microsoft · ARM Mali GPU Architecture Sam Martin ARM Game Developer Day - London Graphics Architect, ARM 03/12/2015

GPU-REMuSiC on Graphics Processing Units

Graphics Processing Unit (GPU) Architecture and Programming

The Compute Architecture of Intel® Processor Graphics Gen7parallel.vub.ac.be/education/gpu/doc/Compute_Architecture_of_Intel... · The Compute Architecture of Intel® Processor Graphics

Lecture 5: GPU Architecture & CUDA Programming15418.courses.cs.cmu.edu/.../lectures/05_gpuarch/05_gpuarch_slide… · NVIDIA Tesla architecture (2007) Provided alternative, non-graphics-speci!c

Real-Time Graphics Architecturegraphics.stanford.edu/courses/cs448-07-spring/lectures/gpu/gpu.pdf · Real-Time Graphics Architecture Lecture 9: Programming GPUs Kurt Akeley CS448

The Graphics Processing Unit (GPU) revolutioncourses.cs.vt.edu/cs4414/S13/LECTURES/gpu-lecture1.cs... · 2013-02-19 · The Graphics Processing Unit (GPU) revolution Ramu Anandakrishnan.

Graphics Hardware - GAMMAgamma.cs.unc.edu/graphicscourse/11_GPU.pdf · Graphics Hardware ! What is a GPU ! Evolution of GPU ! GPU Design ! Modern Features ! ... than drawing directly

GPU Architectures - University of Washingtoncourses.cs.washington.edu/courses/cse471/13sp/lectures/GPUs... · Architecture Multicore CPUs x86 ... AMD Graphics Core Next GPU ARCHITECTURES:

Global Graphics Processing Unit (GPU) Market

GPU Architecture Overview - GitHub Pagescis565-fall-2012.github.io/.../09-10-GPU-Architecture-Overview.pdf · GPU Architecture Overview ... AMD. 2 CPU and GPU Trends FLOPS – FL

Gpu Architecture

Graphics and Game Engines GPU overviewusers.iit.uni-miskolc.hu/~mileff/graphics/GraphicsChapter3.pdf · GPU Overview ⦿ Graphics Processing Unit (GPU) is the central unit of your