GPU Acceleration in Registration

Post on 02-Jan-2016

52 views 3 download

Tags:

description

GPU Acceleration in Registration. Danny Ruijters 26 April 2007. Outline. The GPU Rigid 3D-3D Registration Elastic Registration Conclusions. The GPU. The graphics card. Raserization of primitives Texture mapping Colour interpolation. The GPU. Graphics Processing Unit - PowerPoint PPT Presentation

Transcript of GPU Acceleration in Registration

GPU Acceleration in Registration

Danny Ruijters26 April 2007

GPU Acceleration in Registration, Danny Ruijters 2

Outline

• The GPU• Rigid 3D-3D Registration• Elastic Registration• Conclusions

GPU Acceleration in Registration, Danny Ruijters 3

The GPU

GPU Acceleration in Registration, Danny Ruijters 4

The graphics card

• Raserization of primitives• Texture mapping• Colour interpolation

GPU Acceleration in Registration, Danny Ruijters 5

The GPU

• Graphics Processing Unit• Programmable processor in the

graphics rendering pipeline• Parallel execution (SIMD like)

GPU Acceleration in Registration, Danny Ruijters 6

on-chip cache memoryvideo memory

system memory

rasterization

CPU

vertex shading

(T&L)

triangle setup

fragment shading

andraster

operations

textures

frame buffer

geometry

commands

pre-TnL cache

post-TnL cache

texture cache

Graphics rendering pipeline

GPU Acceleration in Registration, Danny Ruijters 7

Bottleneckson-chip cache memoryvideo memory

system memory

rasterization

CPU

vertex shading

(T&L)

triangle setup

fragment shading

andraster

operations

textures

frame buffer

geometry

commands

pre-TnL cache

post-TnL cache

texture cache

transform limited

fragment shader limited

CPU limited

texture limited

frame buffer limited

setup limited

raster limited

transfer limited

GPU Acceleration in Registration, Danny Ruijters 8

128 processing units

Local cache

Shared memory

GPU Acceleration in Registration, Danny Ruijters 9

Performance

GPU Acceleration in Registration, Danny Ruijters 10

Performance• Parallelism & pipelining (up to 16 parallel pipelines)• Vector processor• Moore’s Law: CPU: 2* performance per 18 months• GPU: 2* performance per 6 months

GeForce 7900 GTX GeForce 8800 GTX

Code name G71 G80

Release date 3 / 2006 11 / 2006

Transistors 278 M (90 nm) 681 M (90 nm)

Clock speed 650 MHz 1350 MHz

Processing units 24+8 (pixel + vertex) 128 (unified)

Peak pixel fill rate 10.4 Gigapixels/s 36.8 Gigapixels/s

Peak memory bandwidth

51.2 GB/s (256 bit) 86.4 GB/s (384 bit)

Memory 512 MB 768 MB

Peak performance 250 Gigaflops 520 Gigaflops

GPU Acceleration in Registration, Danny Ruijters 11

Textures & buffers

• 1D, 2D, 3D textures

• 2D output buffers (frame buffer, accumulation buffer, stencil buffer, p-buffer)

• 8, 10, 12, 16 bit integers, 16, 32 bit floating point

• 1 (intensity), 2 (luminance-alpha), 3 (RGB), 4 (RGBA) components per pixel

GPU Acceleration in Registration, Danny Ruijters 12

Historic overview GPU• RenderMan (1988, pre-history)• Intel MMX (SIMD, 1997, pre-history)• Register combiners (nVidia, 1999, bronze age)• Vender specific APIs (2001, iron age)• Generic assembly-like language (2002, middle-

ages) • Different high-level languages (2003, industrial

age)• CUDA: general purpose C-like language (2007,

modern age)

GPU Acceleration in Registration, Danny Ruijters 13

Register combiners (1999, bronse age)// Stage 0// spare0.rgb = gradient dot ViewDir, spare1.rgb = -(gradient dot ViewDir)glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_A_NV,GL_TEXT

URE0_ARB,GL_EXPAND_NORMAL_NV,GL_RGB);glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_B_NV,GL_CONS

TANT_COLOR1_NV,GL_EXPAND_NORMAL_NV,GL_RGB);glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_C_NV,GL_TEXT

URE0_ARB,GL_EXPAND_NEGATE_NV,GL_RGB);glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_D_NV,GL_CONS

TANT_COLOR1_NV,GL_EXPAND_NORMAL_NV,GL_RGB);glCombinerOutputNV(GL_COMBINER0_NV,GL_RGB,GL_SPARE0_NV,GL_SPARE1_

NV,GL_DISCARD_NV,GL_NONE,GL_NONE,GL_TRUE,GL_TRUE,GL_FALSE);

GPU Acceleration in Registration, Danny Ruijters 14

GL_ARB_fragment_program (2002)

!!ARBfp1.0

ATTRIB coord = fragment.texcoord[0];ATTRIB color = fragment.color;OUTPUT out = result.color;TEMP texel;TEMP lookup;

TEX texel, coord, texture[0], 3D;TEX lookup, texel, texture[1], 1D;

MUL out, lookup, color;END

GPU Acceleration in Registration, Danny Ruijters 15

GLSlang (2003)

uniform vec3 ViewDir;

void main (void){

float value;vec3 gradient;gradient = texture3(0, gl_TexCoord0) * 2.0 - 1.0;value = 1.0 - abs(dot(gradient, ViewDir));value *= 1.3 * dot(gradient, gradient);value = clamp(value, 0.0, 1.0);gl_FragColor = vec4(value);

}

GPU Acceleration in Registration, Danny Ruijters 16

CUDA (2007)

• Compute Unified Device Architecture• General purpose C-like language• nVidia only• Very recently released

GPU Acceleration in Registration, Danny Ruijters 17

Rigid 3D-3D Registration

GPU Acceleration in Registration, Danny Ruijters 18

3DRA – MR registration

GPU Acceleration in Registration, Danny Ruijters 19

3DRA – XperCT Registration 1

Pre-operative

GPU Acceleration in Registration, Danny Ruijters 20

3DRA – XperCT Registration 2

Post-operative:verification of the embolization

GPU Acceleration in Registration, Danny Ruijters 21

3DRA Slice

GPU Acceleration in Registration, Danny Ruijters 22

Mutual information

F. Maes et al., "Multimodality Image Registration by Maximization of Mutual Information,“IEEE Transactions on Medical Imaging 16(2), pp. 187-198, April 1997

GPU Acceleration in Registration, Danny Ruijters 23

Joint histogram

GPU Acceleration in Registration, Danny Ruijters 24

Resampling

Joint histogram: increment(g,g)

GPU Acceleration in Registration, Danny Ruijters 25

3DRA – MR, before, after

GPU Acceleration in Registration, Danny Ruijters 26

3DRA – MR: CPU interpolation

GPU Acceleration in Registration, Danny Ruijters 27

3DRA – MR: GPU interpolation

GPU Acceleration in Registration, Danny Ruijters 28

Elastic Registration

GPU Acceleration in Registration, Danny Ruijters 29

Elastic deformation

• Parameterized deformation:

• B-spline deformation:

GPU Acceleration in Registration, Danny Ruijters 30

Cubic B-spline

GPU Acceleration in Registration, Danny Ruijters 31

GPU linear interpolation

• Hardwired: linear interpolation is much faster than separate lookups

GPU Acceleration in Registration, Danny Ruijters 32

GPU Cubic Interpolation

• Compose cubic interpolation from weighted sum of linear interpolations:

=

C. Sigg, M. Hadwiger, “Fast Third-Order Texture Filtering”, GPU Gems 2

GPU Acceleration in Registration, Danny Ruijters 33

Outline of proof

=

GPU Acceleration in Registration, Danny Ruijters 34

GPU Cubic Interpolation

• 2D: 4 linear-interpolated lookups, instead of 16 direct lookups

• 3D: 8 linear-interpolated lookups, instead of 64 direct lookups

GPU Acceleration in Registration, Danny Ruijters 35

GPU Linear Interpolation AccuracynVidia QuadroFX 3500

-1

0

1

2

3

4

5

6

7

8

9

10

1 13 25 37 49 61 73 85 97 109 121 133 145 157 169 181 193 205 217 229 241 253

Err

or

* -1

0^-8

GPU Acceleration in Registration, Danny Ruijters 36

Linear deformation, linear interpolation

GPU Acceleration in Registration, Danny Ruijters 37

Linear deformation, cubic interpolation

GPU Acceleration in Registration, Danny Ruijters 38

Cubic deformation, linear interpolation

GPU Acceleration in Registration, Danny Ruijters 39

Cubic deformation, cubic interpolation

GPU Acceleration in Registration, Danny Ruijters 40

Optimization

• Many parameters: huge parameter space

• Solution: use derivatives like Jacobian, Hessian

• Examples: Gradient Descent, Quasi-Newton, Levenberg-Marquardt

GPU Acceleration in Registration, Danny Ruijters 41

GPU Elastic Registration Iteration

1. Generate deformed image on GPU & store to texture

2. Calculate Similarity Measure & First-Order Derivative on GPU

– Texture with reference image– Texture with deformed image

GPU Acceleration in Registration, Danny Ruijters 42

First-Order Derivative of Sim. Measure

J. Kybic, M. Unser, “Fast Parametric Elastic Image Registration”

GPU Acceleration in Registration, Danny Ruijters 43

Derivative of the Similarity Measure

SSD:

GPU Acceleration in Registration, Danny Ruijters 44

Derivative of the Deformed Image

• Sobel operator to calculate gradients:

-1 0 1

-4 0 4

-1 0 1

1 4 1

0 0 0

-1 -4 -1

GPU Acceleration in Registration, Danny Ruijters 45

Derivative of the Control Points

• Constant• B-spline: separatable kernel of fixed size

GPU Acceleration in Registration, Danny Ruijters 46

Original Fluoroscopy Sequence

GPU Acceleration in Registration, Danny Ruijters 47

2 * 2 Control Points

GPU Acceleration in Registration, Danny Ruijters 48

8 * 8 Control Points

GPU Acceleration in Registration, Danny Ruijters 49

Deformation Field

GPU Acceleration in Registration, Danny Ruijters 50

GPU Elastic Registration

• 40 images: Quasi Newton: 16 seconds

• Gradient Descent: 63 seconds• 8 * 8 Control Points: rest motion• Multi-resolution deformation field,

with reduced parameters (discussed with Dirk Loeckx)

GPU Acceleration in Registration, Danny Ruijters 51

CUDA Libraries

GPU Acceleration in Registration, Danny Ruijters 52

CUDA Software Stack

GPU Acceleration in Registration, Danny Ruijters 53

CUDA Libraries

• CUBLAS• CUFFT

GPU Acceleration in Registration, Danny Ruijters 54

CUBLAS

• Basic Linear Algebra Subprograms• Vector, Matrix, Numerical Math• Almost no initialization• Function calls

GPU Acceleration in Registration, Danny Ruijters 55

CUBLAS performanceexecution times scalar vector add dual-core Woodcrest and G80 core

0

50

100

150

200

250

300

350

400

450

500

0 500 1000 1500 2000 2500 3000 3500 4000 4500

data s ize vector (kB)

exec

utio

n tim

e (m

s)

G80 (ms)

Woodcrest (ms)

GPU Acceleration in Registration, Danny Ruijters 56

CUBLAS performanceexecution times vector inproduct dual-core Woodcrest and G80 core

0.0000

50.0000

100.0000

150.0000

200.0000

250.0000

300.0000

350.0000

400.0000

450.0000

500.0000

0 500 1000 1500 2000 2500 3000 3500 4000 4500

data s ize vector (kB)

exec

utio

n tim

e (m

s)

G80 (ms)

Woodcrest (ms)

GPU Acceleration in Registration, Danny Ruijters 57

CUFFT performanceexecution times 2D FFT single-core Woodcrest and G80 core

(size 2^n)

0.001

0.01

0.1

1

10

100

1000

10000

1 10 100 1000 10000

N point 2D FFT

Ex

ec

uti

on

tim

e (

ms

)

G80 (CudaFFT)

Woodcrest (FFTW)

GPU Acceleration in Registration, Danny Ruijters 58

Conclusion & Future work

GPU Acceleration in Registration, Danny Ruijters 59

Conclusions

• GPU: powerful parallel processor, but has its limitations

• Rigid Registration: interpolation on the GPU

• Elastic Registration: calculation of the Similarity Measure & first order derivative on the GPU

GPU Acceleration in Registration, Danny Ruijters 60

Future work

• Multi-resolution deformation fields• 2D-3D registration of the Coronary

Arteries (not presented)

GPU Acceleration in Registration, Danny Ruijters 61

Questions?