3 d to _hpc
-
Upload
obsidian-software -
Category
Documents
-
view
1.108 -
download
1
Transcript of 3 d to _hpc
![Page 1: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/1.jpg)
An Introduction to GPU3D Games to HPC
Krishnaraj RaoPresented at Bangalore DV Club, 03/12/2010
![Page 2: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/2.jpg)
Agenda
3D GraphicsThe Big Picture
Quick Overview
Programming Model
Importance of 3D
High Performance Parallel ComputingWhy GPUs for HPPC?
Available APIs
GPU Computing architecture
Q & A
![Page 3: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/3.jpg)
The Big Picture ! Movies
Creation
Capture Models Scene API
Rendering Post Processing
Creation
![Page 4: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/4.jpg)
The Big Picture - Games
Creation
Capture Models Scene API
Rendering Post Processing
Creation
!"#$%
DriversHLSL,Cg
![Page 5: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/5.jpg)
Models end up in World Space
Y
X
Z
Light Source
Screen
View Pointor Camera
World Coordinate Space
Worldspace includes everything!Position and orientation for allitems is needed to accurately calculatetransformations into screen space.
![Page 6: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/6.jpg)
View Transformation world ends up on Screen
Screen Coordinate Space
![Page 7: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/7.jpg)
Simple Interactive 3D Graphics App
A simple exampleStatic scene geometry, moving viewer
Repeat this loop:CPU takes user input from joystick or mouse
CPU re-calculates viewer position, view direction, and light positions in 3-D world space
GPU clears memory and draws the complete scene geometry with the new viewer and light positions
Repeat forever
VertexEngine
Setup Raster
Z Cull
FragmentEngine
Texture
Raster Ops
ReadJoystickPosition
Update Viewer Position and Light
Direction
Draw all Scene
Objects
![Page 8: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/8.jpg)
Adding Programmability to the Graphics Pipeline
3D Applicationor Game
3D API:OpenGL or Direct3D
ProgrammableVertex
Processor
PrimitiveAssembly
Rasterization & Interpolation
3D API Commands
Transformed Vertices
Assembled Polygons, Lines, and
Points
GPU Command &
Data Stream
ProgrammableFragmentProcessor
RasterizedPre-transformed
Fragments
TransformedFragments
RasterOperations
Framebuffer
Pixel Updates
GPUFront End
Pre-transformed Vertices
Vertex Index Stream
Pixel Location Stream
CPU ! GPU Boundary
![Page 9: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/9.jpg)
NVIDIA Confidential
A History of Innovation
1999GeForce 256
22 Million Transistors
2002GeForce463 MillionTransistors
2003GeForce FX130 Million Transistors
2004GeForce 6 222 Million Transistors
1995NV1
1 Million Transistors
2005GeForce 7 302 Million Transistors
2008GeForce GTX 200
1.4 BillionTransistors
2006-2007GeForce 8 754 Million Transistors
"#$but what do all these extra transistors do?
![Page 10: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/10.jpg)
GPU continues to offload CPU work
GeomGather
GeomProc
TriangleProc
PixelProc
Z / Blend
GPUCPU
GeomGather
GeomProc
TriangleProc
PixelProc
Z / Blend
GPUCPU
GeomGather
GeomProc
TriangleProc
PixelProc
Z / Blend
GPUCPU
Physics and AI
Scene Mgmt
GeomGather
GeomProc
TriangleProc
PixelProc
Z / Blend
GPUCPU
Physics and AI
Scene Mgmt
1996
2000
2004
2008
![Page 11: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/11.jpg)
Programming Model
API: Set of functions, procedures or classes that an OS, library or service provides to support requests made by computer programs
DirectX: Collection of APIs to handle multimedia, esp. game programming and video tasks, on MS platforms.
OpenGL (Open Graphics Library) is a standard specification defining a cross-language, cross-platform API for writing applications that produce 2D and 3D computer graphics.
![Page 12: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/12.jpg)
Why is 3D Graphics important?More than just Fun and Games....
Tokyo, Japan California Coastline
![Page 13: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/13.jpg)
3D Consumer Applications
Music
Vista
Photos Maps
PDFsOffice
![Page 14: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/14.jpg)
GPUS IN HPC
![Page 15: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/15.jpg)
MassiveData
Parallelism
Data Fits in Cache Huge Data Sets
!"#$%&'#()#*)+,#-.//#,/
InstructionLevel
Parallelism
![Page 16: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/16.jpg)
GPU Processing Power
!"#
!"#$#%&'()&*+,
*-.&/0123
45.-&(6789:
$"#
#;<2=&>012&?@&AB,
-&/0123
4.*&'6789:
>9C
'9C
CPU, meet your new partner!
![Page 17: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/17.jpg)
With floating-point math and textures, graphics processors can be used for more than just graphics
%&%&'$($)%*+*,-.$&/,012*$3140/56+7$1+$%&'28
Lots of ongoing research mapping algorithms and problems onto programmable GPUs
Solving Linear Equations
Black-Scholes Options Pricing
Rigid- and Soft-Body Dynamics
Middleware layers being developed to accelerate )*9*$:-+;98$7-4*$0<926:2$1+$%&'2$=HavokFX)
Beyond Graphics
![Page 18: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/18.jpg)
What is GPGPU ?
General Purpose computation using GPUin applications other than 3D graphics
GPU accelerates critical path of application
Data parallel algorithms leverage GPU attributesLarge data arrays, streaming throughput
Fine-grain SIMD parallelism
Floating point (FP) computation
%,*-5$>1,$)*4?-,,-226+7.9$0-,-..*.8$-.71,65<42
Applications ! see //GPGPU.orgGame effects (FX) physics, image processing
Physical modeling, computational engineering, matrix algebra, convolution, correlation, sorting
![Page 19: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/19.jpg)
A quiet buildup of potential
Calculation Throughput and Memory Bandwidth: 10X
Equivalent performance at fraction of power & cost
GPU in every PC ! pervasive presence and massive impact
%&'2$<-@*$-.A-92$?**+$0-,-..*.$)4/.56-:1,*8
Natively designed to handle massive threading
Every pixel is a thread
Increased precision (fp32), programmability, flexibility
GPUs are a mass-market parallel processor
Economies of scale
Peak floating point performance is much higher than comparable CPUs
Why Computation on the GPU?
ATI x1900XT!$400 (video card)!250 GFLOPs (SP Float)!46 GB main memory BW
Intel Core 2 Duo E6600!$400 (processor only)!40 GFLOPS (SP Float)!8.5 GB main memory BW
![Page 20: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/20.jpg)
Why Computation on the GPU?
Supercomputing PerformanceInherently Parallel Architecture
1000+ cores, massively parallel processing
250x the compute performance of a PC
Personal)B+*$Researcher, One C/0*,:140/5*,8
Supercomputer in a desktop system
Plugs into standard power strip
AccessibleProgram in C, C++, Fortran for Windows or Linux
Available from OEMs and resellers worldwide and priced like a workstation
![Page 21: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/21.jpg)
Compute Applications
Computational Fluid Dynamics
Computer Aided Engineering
Digital Content Creation
Electronic Design Automation
Finance
Game Physics
Graphics
Imaging and Computer Vision
Medical Imaging
Numerics
Bio-Informatics and Life Sciences
Computational Chemistry
Computational Electromagnetics & Electrodynamics
Data Mining, Analytics & Databases
MATLAB Acceleration
Molecular Dynamics
Weather, Atmospheric, Ocean Modeling, and Space Sciences
Libraries
Oil & Gas
Programming Tools
Ray Tracing
Signal Processing
Video & Audio
![Page 22: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/22.jpg)
Heterogeneous Computing
Multi-Core
CPU
Parallel-Core
GPU
![Page 23: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/23.jpg)
APIS FOR HETEROGENEOUS COMPUTING
![Page 24: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/24.jpg)
APIs for Heterogeneous Computing
CUDA (Compute Unified Device Architecture) is a parallel computing architecture developed by NVIDIA. Programmers use 'C for CUDA' (C with NVIDIA extensions), compiled through a PathScale Open64 C compiler, to code algorithms for execution on the GPU. Both low/high level APIs are provided
OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors.
Microsoft DirectCompute is an API that supports General-purpose computing on GPUs on Microsoft Win Vista or Win 7. DirectCompute is part of the Microsoft DirectX collection of APIs.
![Page 26: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/26.jpg)
One Host+ one or more Compute DevicesEach Compute Device is composed of one or more Compute Units
Each Compute Unit is further divided into one or more Processing Elements
OpenCL: Platform Model & Program Structure
![Page 27: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/27.jpg)
CUDA Parallel Computing Architecture
ISA and hardware compute engine
Includes a C-compiler plus support for OpenCL and
DX11 Compute
Architected to natively support all computational interfaces
(standard languages and APIs)
![Page 28: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/28.jpg)
Shared back-end compiler and optimization technology
OpenCL and C for CUDA
OpenCL
C for CUDA
PTX
GPU
Entry point for developers who prefer high-level C
Entry point for developers who want
low-level API
Option 1
![Page 29: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/29.jpg)
146X
Medical Imaging
U of Utah
36X
Molecular Dynamics
U of Illinois, Urbana
18X
Video Transcoding
Elemental Tech
50X
MatlabComputing
AccelerEyes
100X
Astrophysics
RIKEN
149X
Financial simulation
Oxford
47X
Linear AlgebraUniversidad
Jaime
20X
3D UltrasoundTechniscan
130X
Quantum Chemistry
U of Illinois, Urbana
30X
Gene Sequencing
U of Maryland
CUDA SuccessDScience & ComputationNot 2x or 3x, but speedups are 20x to 150x
![Page 30: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/30.jpg)
$100K - $1MAccessibility
Pe
rfo
rma
nc
e
250x
< $10 K
TeslaPersonal
Supercomputer
E1;-9F2
Workstations1x
250xFaster
100x more affordable20x less power consumption
SupercomputingCluster
![Page 31: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/31.jpg)
C1.@6+7$5<*$G1,.;F2$H125$3140.*I$
Challenges
Oil & Gas
Science
Medicine
Broadcast Space Exploration
Film
Auto Design
![Page 32: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/32.jpg)
Grand Computing Challenges
Renewable Energy
Personalized Medicine
Mathematics for Scientific Discovery
InformationData Mining
Machines That Think
Natural Human Machine
Interaction
Predict Environmental
Changes
Economic Analysis
![Page 33: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/33.jpg)
Final Thoughts
GPU and heterogeneous parallel architecture will revolutionize computing
Parallel computing needed to solve some of the most interesting and important human challenges ahead
Learning parallel programming is imperative for students in computing and sciences
![Page 34: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/34.jpg)
From Virtua Fighter to Tsubame
1995 ! NV1 2008 ! GT200
0.8M transistors 1,200M transistors
50MHz 1.3GHz
1M Bytes 4G Bytes
0 GFLOPS 1 TFLOPS
Another 1000x in 15 years?
![Page 35: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/35.jpg)
BACKUP
![Page 36: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/36.jpg)
Graphics API History
![Page 37: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/37.jpg)
Open GL
1992: OpenGL 1.0
1996: OpenGL 1.1 (Vertex Arrays, Improved Texturing)
1998: OpenGL 1.2 (3D Textures, BGRA pixel format)
1998: OpenGL 1.2.1 (Multi-Texture)
2001: OpenGL 1.3 (Multi-sample AA, Cube/Compressed Textures)
2002: OpenGL 1.4 (Depth/Shadow mapping, Auto mipmap generation)
2003: OpenGL 1.5 (Vertex Attr from Vid Mem)
2005: OpenGL 2.0 (GLSL, Vertex/Pixel Shaders, MRT, Non P-of-2 Tex)
2006: OpenGL 2.1 (GLSL1.2, sRGB Textures)
2008: OpenGL 3.0 (GLSL1.3, 32b FP Textures)
2009: OpenGL 3.1 (March 2009, GLSL1.4, Perf, CopyBufferAPI)
2009: OpenGL 3.2 (Aug 2009, GLSL1.5, Geom Shaders)
![Page 38: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/38.jpg)
OpenGL ES
Designed for hand-held and embedded devicesGoal is smaller footprint to support OpenGL
PlayStation 3 and cell phone industry adopting ES
OpenGL ES 1.1Strips out anything deemed extra in OpenGL
Keeps conventional fixed-function vertex and fragment processing
OpenGL ES 2.0Adds programmable vertex and fragment shaders
Shaders specified in binary format
Drops support for fixed-function vertex and fragment processing
![Page 39: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/39.jpg)
OpenGL ES ! Cont
OpenGL ES 1.0 : Symbian OS, Android Platform
OpenGL ES 1.0+ : Playstation 3
OpenGL ES 1.1 : iPhone SDK, Bberry (Some Models)
Open GL ES 2.0 : iPhone 3GS, iPOD touch
![Page 40: 3 d to _hpc](https://reader031.fdocuments.net/reader031/viewer/2022020306/554a0d76b4c9055c598b47c2/html5/thumbnails/40.jpg)
DirectX
GDI: legacy Windows graphics API ~1985
DirectX 1.0 ! 1995/6 (No 3D support, DirectDraw, DirectSound, DirectInput)
DirectX 3.0 ! 1996 (Rasterization only 3D Support, Akward prog. Model, Not
successful)
DirectX 5.0 ! 1997 (Draw Primitives, DirectX vs OpenGL War)
DirectX 6.0 ! 1998 (Multitexture, OGL/Glide features, Texture Compression)
DirectX 7.0 ! 1999 (Geometry HW accleration and Blending, Cube mapping)
DirectX 8.0 ! 2000/1 (Programable VS/PS Shaders, XBOX)
DirectX 9.0 ! 2002-2003 (More programmability, Branching, FP pixel prog.)
DirectX 9.0c ! 2004 (ShaderModel 3.0)
DirectX 10.0 ! 2006 (SM4.0, WinVista, Geometry Shaders, Streaming Output)
DirectX 10.1 ! 2008 (SM4.1, Better Image Quality)
DirectX 11.0 - 2009 (SM5.0, DirectCompute Tesselation, WinVista SP2, Win7)