GPU Acceleration in Registration

62
GPU Acceleration in Registration Danny Ruijters 26 April 2007

description

GPU Acceleration in Registration. Danny Ruijters 26 April 2007. Outline. The GPU Rigid 3D-3D Registration Elastic Registration Conclusions. The GPU. The graphics card. Raserization of primitives Texture mapping Colour interpolation. The GPU. Graphics Processing Unit - PowerPoint PPT Presentation

Transcript of GPU Acceleration in Registration

Page 1: GPU Acceleration in Registration

GPU Acceleration in Registration

Danny Ruijters26 April 2007

Page 2: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 2

Outline

• The GPU• Rigid 3D-3D Registration• Elastic Registration• Conclusions

Page 3: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 3

The GPU

Page 4: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 4

The graphics card

• Raserization of primitives• Texture mapping• Colour interpolation

Page 5: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 5

The GPU

• Graphics Processing Unit• Programmable processor in the

graphics rendering pipeline• Parallel execution (SIMD like)

Page 6: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 6

on-chip cache memoryvideo memory

system memory

rasterization

CPU

vertex shading

(T&L)

triangle setup

fragment shading

andraster

operations

textures

frame buffer

geometry

commands

pre-TnL cache

post-TnL cache

texture cache

Graphics rendering pipeline

Page 7: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 7

Bottleneckson-chip cache memoryvideo memory

system memory

rasterization

CPU

vertex shading

(T&L)

triangle setup

fragment shading

andraster

operations

textures

frame buffer

geometry

commands

pre-TnL cache

post-TnL cache

texture cache

transform limited

fragment shader limited

CPU limited

texture limited

frame buffer limited

setup limited

raster limited

transfer limited

Page 8: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 8

128 processing units

Local cache

Shared memory

Page 9: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 9

Performance

Page 10: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 10

Performance• Parallelism & pipelining (up to 16 parallel pipelines)• Vector processor• Moore’s Law: CPU: 2* performance per 18 months• GPU: 2* performance per 6 months

GeForce 7900 GTX GeForce 8800 GTX

Code name G71 G80

Release date 3 / 2006 11 / 2006

Transistors 278 M (90 nm) 681 M (90 nm)

Clock speed 650 MHz 1350 MHz

Processing units 24+8 (pixel + vertex) 128 (unified)

Peak pixel fill rate 10.4 Gigapixels/s 36.8 Gigapixels/s

Peak memory bandwidth

51.2 GB/s (256 bit) 86.4 GB/s (384 bit)

Memory 512 MB 768 MB

Peak performance 250 Gigaflops 520 Gigaflops

Page 11: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 11

Textures & buffers

• 1D, 2D, 3D textures

• 2D output buffers (frame buffer, accumulation buffer, stencil buffer, p-buffer)

• 8, 10, 12, 16 bit integers, 16, 32 bit floating point

• 1 (intensity), 2 (luminance-alpha), 3 (RGB), 4 (RGBA) components per pixel

Page 12: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 12

Historic overview GPU• RenderMan (1988, pre-history)• Intel MMX (SIMD, 1997, pre-history)• Register combiners (nVidia, 1999, bronze age)• Vender specific APIs (2001, iron age)• Generic assembly-like language (2002, middle-

ages) • Different high-level languages (2003, industrial

age)• CUDA: general purpose C-like language (2007,

modern age)

Page 13: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 13

Register combiners (1999, bronse age)// Stage 0// spare0.rgb = gradient dot ViewDir, spare1.rgb = -(gradient dot ViewDir)glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_A_NV,GL_TEXT

URE0_ARB,GL_EXPAND_NORMAL_NV,GL_RGB);glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_B_NV,GL_CONS

TANT_COLOR1_NV,GL_EXPAND_NORMAL_NV,GL_RGB);glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_C_NV,GL_TEXT

URE0_ARB,GL_EXPAND_NEGATE_NV,GL_RGB);glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_D_NV,GL_CONS

TANT_COLOR1_NV,GL_EXPAND_NORMAL_NV,GL_RGB);glCombinerOutputNV(GL_COMBINER0_NV,GL_RGB,GL_SPARE0_NV,GL_SPARE1_

NV,GL_DISCARD_NV,GL_NONE,GL_NONE,GL_TRUE,GL_TRUE,GL_FALSE);

Page 14: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 14

GL_ARB_fragment_program (2002)

!!ARBfp1.0

ATTRIB coord = fragment.texcoord[0];ATTRIB color = fragment.color;OUTPUT out = result.color;TEMP texel;TEMP lookup;

TEX texel, coord, texture[0], 3D;TEX lookup, texel, texture[1], 1D;

MUL out, lookup, color;END

Page 15: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 15

GLSlang (2003)

uniform vec3 ViewDir;

void main (void){

float value;vec3 gradient;gradient = texture3(0, gl_TexCoord0) * 2.0 - 1.0;value = 1.0 - abs(dot(gradient, ViewDir));value *= 1.3 * dot(gradient, gradient);value = clamp(value, 0.0, 1.0);gl_FragColor = vec4(value);

}

Page 16: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 16

CUDA (2007)

• Compute Unified Device Architecture• General purpose C-like language• nVidia only• Very recently released

Page 17: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 17

Rigid 3D-3D Registration

Page 18: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 18

3DRA – MR registration

Page 19: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 19

3DRA – XperCT Registration 1

Pre-operative

Page 20: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 20

3DRA – XperCT Registration 2

Post-operative:verification of the embolization

Page 21: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 21

3DRA Slice

Page 22: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 22

Mutual information

F. Maes et al., "Multimodality Image Registration by Maximization of Mutual Information,“IEEE Transactions on Medical Imaging 16(2), pp. 187-198, April 1997

Page 23: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 23

Joint histogram

Page 24: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 24

Resampling

Joint histogram: increment(g,g)

Page 25: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 25

3DRA – MR, before, after

Page 26: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 26

3DRA – MR: CPU interpolation

Page 27: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 27

3DRA – MR: GPU interpolation

Page 28: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 28

Elastic Registration

Page 29: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 29

Elastic deformation

• Parameterized deformation:

• B-spline deformation:

Page 30: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 30

Cubic B-spline

Page 31: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 31

GPU linear interpolation

• Hardwired: linear interpolation is much faster than separate lookups

Page 32: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 32

GPU Cubic Interpolation

• Compose cubic interpolation from weighted sum of linear interpolations:

=

C. Sigg, M. Hadwiger, “Fast Third-Order Texture Filtering”, GPU Gems 2

Page 33: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 33

Outline of proof

=

Page 34: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 34

GPU Cubic Interpolation

• 2D: 4 linear-interpolated lookups, instead of 16 direct lookups

• 3D: 8 linear-interpolated lookups, instead of 64 direct lookups

Page 35: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 35

GPU Linear Interpolation AccuracynVidia QuadroFX 3500

-1

0

1

2

3

4

5

6

7

8

9

10

1 13 25 37 49 61 73 85 97 109 121 133 145 157 169 181 193 205 217 229 241 253

Err

or

* -1

0^-8

Page 36: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 36

Linear deformation, linear interpolation

Page 37: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 37

Linear deformation, cubic interpolation

Page 38: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 38

Cubic deformation, linear interpolation

Page 39: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 39

Cubic deformation, cubic interpolation

Page 40: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 40

Optimization

• Many parameters: huge parameter space

• Solution: use derivatives like Jacobian, Hessian

• Examples: Gradient Descent, Quasi-Newton, Levenberg-Marquardt

Page 41: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 41

GPU Elastic Registration Iteration

1. Generate deformed image on GPU & store to texture

2. Calculate Similarity Measure & First-Order Derivative on GPU

– Texture with reference image– Texture with deformed image

Page 42: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 42

First-Order Derivative of Sim. Measure

J. Kybic, M. Unser, “Fast Parametric Elastic Image Registration”

Page 43: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 43

Derivative of the Similarity Measure

SSD:

Page 44: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 44

Derivative of the Deformed Image

• Sobel operator to calculate gradients:

-1 0 1

-4 0 4

-1 0 1

1 4 1

0 0 0

-1 -4 -1

Page 45: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 45

Derivative of the Control Points

• Constant• B-spline: separatable kernel of fixed size

Page 46: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 46

Original Fluoroscopy Sequence

Page 47: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 47

2 * 2 Control Points

Page 48: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 48

8 * 8 Control Points

Page 49: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 49

Deformation Field

Page 50: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 50

GPU Elastic Registration

• 40 images: Quasi Newton: 16 seconds

• Gradient Descent: 63 seconds• 8 * 8 Control Points: rest motion• Multi-resolution deformation field,

with reduced parameters (discussed with Dirk Loeckx)

Page 51: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 51

CUDA Libraries

Page 52: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 52

CUDA Software Stack

Page 53: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 53

CUDA Libraries

• CUBLAS• CUFFT

Page 54: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 54

CUBLAS

• Basic Linear Algebra Subprograms• Vector, Matrix, Numerical Math• Almost no initialization• Function calls

Page 55: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 55

CUBLAS performanceexecution times scalar vector add dual-core Woodcrest and G80 core

0

50

100

150

200

250

300

350

400

450

500

0 500 1000 1500 2000 2500 3000 3500 4000 4500

data s ize vector (kB)

exec

utio

n tim

e (m

s)

G80 (ms)

Woodcrest (ms)

Page 56: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 56

CUBLAS performanceexecution times vector inproduct dual-core Woodcrest and G80 core

0.0000

50.0000

100.0000

150.0000

200.0000

250.0000

300.0000

350.0000

400.0000

450.0000

500.0000

0 500 1000 1500 2000 2500 3000 3500 4000 4500

data s ize vector (kB)

exec

utio

n tim

e (m

s)

G80 (ms)

Woodcrest (ms)

Page 57: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 57

CUFFT performanceexecution times 2D FFT single-core Woodcrest and G80 core

(size 2^n)

0.001

0.01

0.1

1

10

100

1000

10000

1 10 100 1000 10000

N point 2D FFT

Ex

ec

uti

on

tim

e (

ms

)

G80 (CudaFFT)

Woodcrest (FFTW)

Page 58: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 58

Conclusion & Future work

Page 59: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 59

Conclusions

• GPU: powerful parallel processor, but has its limitations

• Rigid Registration: interpolation on the GPU

• Elastic Registration: calculation of the Similarity Measure & first order derivative on the GPU

Page 60: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 60

Future work

• Multi-resolution deformation fields• 2D-3D registration of the Coronary

Arteries (not presented)

Page 61: GPU Acceleration in Registration

GPU Acceleration in Registration, Danny Ruijters 61

Questions?

Page 62: GPU Acceleration in Registration