Programming with CUDA and Parallel Algorithms

30
Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller

description

Programming with CUDA and Parallel Algorithms. Waqar Saleem Jens Müller. Organization. People Waqar Saleem, [email protected] Jens Mueller, [email protected] Room 3335, Ernst-Abbe-Platz 2 The course will be conducted in English 6 points Wahl/Wahlpflicht - PowerPoint PPT Presentation

Transcript of Programming with CUDA and Parallel Algorithms

Page 1: Programming with CUDA and Parallel Algorithms

Programming with CUDA, WS09

Waqar Saleem, Jens Müller

Programming with CUDA and Parallel

AlgorithmsWaqar Saleem

Jens Müller

Page 2: Programming with CUDA and Parallel Algorithms

Programming with CUDA, WS09

Waqar Saleem, Jens Müller

Organization• People

• Waqar Saleem, [email protected]

• Jens Mueller, [email protected]

• Room 3335, Ernst-Abbe-Platz 2

• The course will be conducted in English

• 6 points

• Wahl/Wahlpflicht

• Theoretical/Practical

Page 3: Programming with CUDA and Parallel Algorithms

Programming with CUDA, WS09

Waqar Saleem, Jens Müller

Organization•Meetings, before winter break

• Tue 12-14, CZ 129

• Thu 16-18, CZ 129

• Every second week

• Starting next week

• Exercises: Wed 8-10, CZ 125

• Starting tomorrow in the pool

Page 4: Programming with CUDA and Parallel Algorithms

Programming with CUDA, WS09

Waqar Saleem, Jens Müller

The course•2 parts

• Before winter break: Lectures and assignments

• Need at least 50% in assignments to qualify for ...

• After the break: Group projects

• Project chosen by or assigned to each group

• Regular meetings

• Presentation of each project on semester end

Page 5: Programming with CUDA and Parallel Algorithms

Programming with CUDA, WS09

Waqar Saleem, Jens Müller

Assignments• Build up a minimal ray tracer on GPU

• Implement basic ray tracer on CPU

• Port to GPU

• Make ray tracer more interesting/efficient

• Utilize CUDA concepts

• Basic framework will be provided

• Scene format and scenes

• Introduction to ray tracing concepts

Page 6: Programming with CUDA and Parallel Algorithms

Programming with CUDA, WS09

Waqar Saleem, Jens Müller

Requirements

•Strong background in C programming

•Familiarity with your OS

•Modifying default settings

•Writing/understanding Makefiles

•Compiler flags and options

Page 7: Programming with CUDA and Parallel Algorithms

Programming with CUDA, WS09

Waqar Saleem, Jens Müller

Course content•Parallel programming models and

platforms

•GPGPU

•GPGPU on NVIDIA cards: CUDA

•Architecture and programming model

•OpenCL

Page 8: Programming with CUDA and Parallel Algorithms

Programming with CUDA, WS09

Waqar Saleem, Jens Müller

Today

•Organization

•Brief introduction to parallel programming and CUDA

•Short introduction to Ray tracing

Page 9: Programming with CUDA and Parallel Algorithms

Programming with CUDA, WS09

Waqar Saleem, Jens Müller

Growth of Compute Capability

•Moore’s law: the number of transistors that can be placed ... on an integrated circuit [doubles] approximately every two yearssource: wikipedia

Page 10: Programming with CUDA and Parallel Algorithms

Programming with CUDA, WS09

Waqar Saleem, Jens Müller

Growth of Compute Capability•Moore’s law

source: wikipedia

Page 11: Programming with CUDA and Parallel Algorithms

Programming with CUDA, WS09

Waqar Saleem, Jens Müller

Need for increasing compute capability

•Problems are getting more complex

•e.g. Text editing to Image editing to Video editing

•Current hardware complexity is never enough

•Impractical to stop development at current state of the art

Page 12: Programming with CUDA and Parallel Algorithms

Programming with CUDA, WS09

Waqar Saleem, Jens Müller

Barriers to growth•Natural limit on transistor size: the size

of an atom

•More transistors per unit area lead to higher power consumption and heat dissipation

Page 13: Programming with CUDA and Parallel Algorithms

Programming with CUDA, WS09

Waqar Saleem, Jens Müller

Solution: Parallel architectures

Page 14: Programming with CUDA and Parallel Algorithms

Programming with CUDA, WS09

Waqar Saleem, Jens Müller

Parallel architectures•Multiple Instructions Multiple Data

(MIMD)

•multi-threaded, multi-core architectures, clusters, grids

•Single Instruction Multiple Data (SIMD)

•Cell processor, GPUs, clusters, grids

•GPU: Graphics Processing Unit

•Parallel programming allows to program for parallel architectures

Page 15: Programming with CUDA and Parallel Algorithms

Programming with CUDA, WS09

Waqar Saleem, Jens Müller

GPU architecture

•Simpler architecture than MIMD

•Little overhead for instruction scheduling, branch prediction etc.Subsequent figures from NVIDIA CUDA Programming Guide 2.3.1 unless mentioned otherwise

Page 16: Programming with CUDA and Parallel Algorithms

Programming with CUDA, WS09

Waqar Saleem, Jens Müller

GPU architecture•Simpler architecture leads to higher

performance (compared to CPUs)

Page 17: Programming with CUDA and Parallel Algorithms

Programming with CUDA, WS09

Waqar Saleem, Jens Müller

General Purpose computing on GPU, GPGPU

•Attractive because of raw GPU power

•Traditionally hard because GPU programming was closely associated to graphics

•Simplicity of GPU architecture limits the kind of problems suitable for GPGPU

•or at least requires some problems to be reformulated

Page 18: Programming with CUDA and Parallel Algorithms

Programming with CUDA, WS09

Waqar Saleem, Jens Müller

GPGPU for the masses*•Freeing the GPU from graphics:

Nvidia CUDA, ATI Stream

•C-like programming interface to the GPU

•* - knowledge of underlying architecture required to achieve peak performance

Page 19: Programming with CUDA and Parallel Algorithms

Programming with CUDA, WS09

Waqar Saleem, Jens Müller

Freeing Parallel Programming

•OpenCL: code once, run anywhere

•single core, multi core, GPU, ...

•platform details transparent to the user

•supported by major vendors: Apple, Intel, AMD, Nvidia, ...

•OpenCL drivers made available by ATI and Nvidia for their cards

Page 20: Programming with CUDA and Parallel Algorithms

Programming with CUDA, WS09

Waqar Saleem, Jens Müller

This course•chiefly CUDA: Nvidia specific,

mature, well documented, easily available literature

•some OpenCL: open standard, very new, limited documentation available, very similar concepts to CUDA

•no ATI Stream

Page 21: Programming with CUDA and Parallel Algorithms

Programming with CUDA, WS09

Waqar Saleem, Jens Müller

CUDA, Compute Unified Device Architecture

•Software: C like programming interface to the GPU

•Hardware: the hardware that supports the above programming model

Page 22: Programming with CUDA and Parallel Algorithms

Programming with CUDA, WS09

Waqar Saleem, Jens Müller

CUDA hardware model

Page 23: Programming with CUDA and Parallel Algorithms

Programming with CUDA, WS09

Waqar Saleem, Jens Müller

CUDA programming model•CPU=host, GPU=device, work

unit=thread

Page 24: Programming with CUDA and Parallel Algorithms

Programming with CUDA, WS09

Waqar Saleem, Jens Müller

Page 25: Programming with CUDA and Parallel Algorithms

Programming with CUDA, WS09

Waqar Saleem, Jens Müller

Ray tracing•A method to render a given scene

•Cast rays from a camera into the scene

•Compute ray intersections with scene geometry

•Render pixelimage source: wikipedia

Page 26: Programming with CUDA and Parallel Algorithms

Programming with CUDA, WS09

Waqar Saleem, Jens Müller

Ray tracer complexity•A ray tracer can be arbitrarily

complex

•Recursively compute intersections for reflected, refracted and shadow rays

•Account for diffuse lighting

•Consider multiple light sources

•Consider light sources other than point lights

•Account for textures: object materials

Page 27: Programming with CUDA and Parallel Algorithms

Programming with CUDA, WS09

Waqar Saleem, Jens Müller

Coding a ray tracer

•Relatively easy to code on the CPU

•Call the same intersection function recursively on secondary rays

•CPU code is not so complex

•Tricky to code on the GPU as recursion is not yet supported in GPGPU models

Page 28: Programming with CUDA and Parallel Algorithms

Programming with CUDA, WS09

Waqar Saleem, Jens Müller

This course

•Build a trivial ray tracer on the CPU

•compute view rays only

•part of tomorrow’s exercise

•Port to GPU

•Add complexity to your GPU ray tracer

Page 29: Programming with CUDA and Parallel Algorithms

Programming with CUDA, WS09

Waqar Saleem, Jens Müller

Reminders

•Exercise session tomorrow

•Register on CAJ

Page 30: Programming with CUDA and Parallel Algorithms

Programming with CUDA, WS09

Waqar Saleem, Jens Müller

See you next time!