Three-Dimensional Template Correlation: Object Recognition in 3D Voxel Data

14
Three-Dimensional Template Three-Dimensional Template Correlation: Correlation: Object Recognition in 3D Object Recognition in 3D Voxel Data Voxel Data Tom VanCourt Tom VanCourt Boston Boston University University Yongfeng Gu Yongfeng Gu ECE ECE Department Department Martin Herbordt Martin Herbordt CAAD CAAD lab lab www.bu.edu/ www.bu.edu/ caadlab caadlab BOSTON UNIVERSITY

description

BOSTON. UNIVERSITY. Three-Dimensional Template Correlation: Object Recognition in 3D Voxel Data. Tom VanCourtBoston University Yongfeng GuECE Department Martin Herbordt CAAD lab www.bu.edu/caadlab. 3D Template Matching. Increasing use of volumetric data sets - PowerPoint PPT Presentation

Transcript of Three-Dimensional Template Correlation: Object Recognition in 3D Voxel Data

Page 1: Three-Dimensional Template Correlation: Object Recognition in 3D Voxel Data

Three-Dimensional Template Correlation:Three-Dimensional Template Correlation:Object Recognition in 3D Voxel DataObject Recognition in 3D Voxel Data

Tom VanCourtTom VanCourt Boston UniversityBoston UniversityYongfeng GuYongfeng Gu ECE DepartmentECE DepartmentMartin Herbordt Martin Herbordt CAAD lab CAAD lab

www.bu.edu/caadlabwww.bu.edu/caadlab

BOSTONUNIVERSITY

Page 2: Three-Dimensional Template Correlation: Object Recognition in 3D Voxel Data

CAMP `053D Template Matching

2BOSTONUNIVERSITY

3D Template Matching3D Template Matching

Increasing use of volumetric data setsIncreasing use of volumetric data sets MRI / CAT, confocal microscopy, molecule structureMRI / CAT, confocal microscopy, molecule structure

Increased complexity of correlationIncreased complexity of correlation 2D: 2D: O(nO(n22) (x,y) ) (x,y) O(nO(n11) rotations = O(n) rotations = O(n33)) 3D : 3D : O(nO(n33) (x,y,z) ) (x,y,z) O(nO(n33) rotations = O(n) rotations = O(n66))

Transform techniques help a little: Transform techniques help a little: O(nO(n33)) O(n O(n22) log n) log n O(nO(n66)) O(n O(n44) log n) log n

Solution: Application-specific accelerators Solution: Application-specific accelerators Programmable off-the-shelf hardwareProgrammable off-the-shelf hardware Custom logic design, unique to each applicationCustom logic design, unique to each application

Page 3: Three-Dimensional Template Correlation: Object Recognition in 3D Voxel Data

CAMP `053D Template Matching

3BOSTONUNIVERSITY

Volumetric Data SetsVolumetric Data Sets

Complex data typesComplex data types Multiple fluorescence channelsMultiple fluorescence channels Oriented data: flow vectorsOriented data: flow vectors Nonlinear scoring modelsNonlinear scoring models

True 3D data acquisitionTrue 3D data acquisition Medical imaging (MRI, PET, CAT, …)Medical imaging (MRI, PET, CAT, …) Confocal microscopyConfocal microscopy Emerging techniques: Emerging techniques:

Diffusion tensor tomographyDiffusion tensor tomography

Page 4: Three-Dimensional Template Correlation: Object Recognition in 3D Voxel Data

CAMP `053D Template Matching

4BOSTONUNIVERSITY

COTS COTS ANDAND Custom? How? Custom? How?

Field Programmable Gate ArraysField Programmable Gate Arrays 1000s of uncommitted elements1000s of uncommitted elements Custom processor built on demandCustom processor built on demand On-chip RAM bandwidth: >1TBit/secOn-chip RAM bandwidth: >1TBit/sec Massive parallelism: 100s-1000s of PEsMassive parallelism: 100s-1000s of PEs

Accelerator is tailored to each applicationAccelerator is tailored to each application

~100% payload computation cycles~100% payload computation cyclesNoNo load/store cycles load/store cyclesNoNo loop overhead cycles loop overhead cyclesNoNo address arithmetic cycles address arithmetic cycles

~0% logic dedicated to unused features~0% logic dedicated to unused features

Page 5: Three-Dimensional Template Correlation: Object Recognition in 3D Voxel Data

CAMP `053D Template Matching

5BOSTONUNIVERSITY

Acceleration StrategyAcceleration Strategy

Standard approach:Standard approach:

Accelerated approach:Accelerated approach:

TransformPer Channel

Rotated Image

Molecule Grid

Products ofTransforms

CorrelationResult

Molecule Grid

CorrelationResult

FFT x FFT-1

Direct Correlation bySystolic Array

RotatedAddressing

Page 6: Three-Dimensional Template Correlation: Object Recognition in 3D Voxel Data

CAMP `053D Template Matching

6BOSTONUNIVERSITY

Correlation PipelineCorrelation Pipeline

Systolic3D

Correlation

VoxelValue

Rotation

RotatedImage

Access

DataReductionFiltering

Customizable functionsCustomizable functions High data reuseHigh data reuse

Direct correlationDirect correlation Beats FFT for modest problemsBeats FFT for modest problems Generalizes correlation sumGeneralizes correlation sum:: ΣΣijkijk FF(A(Axyzxyz, T, Tijkijk))

Natural for FPGA implementationNatural for FPGA implementation Regular structure Regular structure Simple data elements Simple data elements

Page 7: Three-Dimensional Template Correlation: Object Recognition in 3D Voxel Data

CAMP `053D Template Matching

7BOSTONUNIVERSITY

Rotated Memory AccessRotated Memory Access

Load image once & reuseLoad image once & reuse Access image in rotated orderAccess image in rotated order

via index transformationvia index transformation

xxi i xxjj x xkk i x i xyyii y yjj y ykk j = j = y yzzii z zjj z zkk k k zz

Allows axis scaling, mirror reversalAllows axis scaling, mirror reversalAnisotropic: e.g. X,Y resolution Anisotropic: e.g. X,Y resolution ≠ Z≠ ZNo need for resamplingNo need for resampling

~0 delay & buffer overhead~0 delay & buffer overhead Strength reduction eliminates multiplicationStrength reduction eliminates multiplication Arithmetic cost hidden by pipeliningArithmetic cost hidden by pipelining

x

y

i

j

Page 8: Three-Dimensional Template Correlation: Object Recognition in 3D Voxel Data

CAMP `053D Template Matching

8BOSTONUNIVERSITY

Voxel Value RotationVoxel Value Rotation

Not needed for scalar data Not needed for scalar data (RGB, gray scale, etc)(RGB, gray scale, etc)

Step exists architecturally, as identity transformStep exists architecturally, as identity transform For spatially oriented data For spatially oriented data (e.g. fluid flow in brain tissue)(e.g. fluid flow in brain tissue)

Perform rigid rotation of image … Perform rigid rotation of image … Then rotate oriented voxel valuesThen rotate oriented voxel values

Page 9: Three-Dimensional Template Correlation: Object Recognition in 3D Voxel Data

CAMP `053D Template Matching

9BOSTONUNIVERSITY

Correlation ArrayCorrelation Array

3D extension of conventional array3D extension of conventional array

Custom unit cellCustom unit cellHolds constant value for templateHolds constant value for templateCustom Custom FF(a, b)(a, b)

… … 1D array + line buffer1D array + line bufferExtend line to result widthExtend line to result width

… … 2D array + plane buffer2D array + plane bufferExtend plane to result sizeExtend plane to result size

… … 3D array3D arrayOne input voxel per cycle, paddedOne input voxel per cycle, paddedOne output correlation point per cycleOne output correlation point per cycle

A

Sin Sout

+

FT

SoutA

SinRAM FIFO

RAM FIFO

Page 10: Three-Dimensional Template Correlation: Object Recognition in 3D Voxel Data

CAMP `053D Template Matching

10BOSTONUNIVERSITY

3D Correlation Result3D Correlation Result

Template is stored in computation arrayTemplate is stored in computation array FIFOs hold partial correlation sumsFIFOs hold partial correlation sums

Template data andComputation array

3D Correlation resultWhole volume shown

FIFO line buffersPad to result width

FIFO plane buffersPad to result depth

Correlation completeResult passed to data reduction filter

Page 11: Three-Dimensional Template Correlation: Object Recognition in 3D Voxel Data

CAMP `053D Template Matching

11BOSTONUNIVERSITY

Peak Capture / Data ReductionPeak Capture / Data Reduction

3D result 3D result ≥≥ image size image size Full result would slow hostFull result would slow host

Template may occur > 1xTemplate may occur > 1x Find multiple maximaFind multiple maxima

Reporting Reporting NN highest highest points is not points is not

effectiveeffective

Instead: Local max by regionInstead: Local max by region 8x8x8 region– 5128x8x8 region– 512::1 reduction1 reduction More maxima, less redundancyMore maxima, less redundancy Record exact (x,y,z) in regionRecord exact (x,y,z) in region BBUTUT may miss close maxima may miss close maxima

Region Region template size may be OK template size may be OK

Broad maximumreported redundantly

Local maxima missed

Page 12: Three-Dimensional Template Correlation: Object Recognition in 3D Voxel Data

CAMP `053D Template Matching

12BOSTONUNIVERSITY

Why Reconfigurable?Why Reconfigurable?

Massive parallelism, modest costMassive parallelism, modest cost COTS hardware, tracks technologyCOTS hardware, tracks technology

Application-optimized processingApplication-optimized processing Tracks application changesTracks application changes

Ex: 1, 2, 3-channel fluorescenceEx: 1, 2, 3-channel fluorescence

Flexible performance tradeoffsFlexible performance tradeoffs Allows non-linear scoringAllows non-linear scoring

Available nowAvailable now PC add-insPC add-ins SGI AltixSGI Altix Cray XD1Cray XD1

24bit

RGB

8bit

Mono

4bit

Page 13: Three-Dimensional Template Correlation: Object Recognition in 3D Voxel Data

CAMP `053D Template Matching

13BOSTONUNIVERSITY

Performance ResultsPerformance Results

Voxel valueVoxel value Voxel bitsVoxel bitsLogic per PE Logic per PE

(slices)(slices)Number of Number of

PEsPEsClock Clock MHzMHz

Speed: Speed: 101099 SAC/sec SAC/sec

2-tuple2-tuple 22 1111 2744 = 142744 = 1433 51.551.5 141.9141.9

3-tuple3-tuple 77 2121 1331=111331=1133 46.146.1 61.361.3

2-tuple2-tuple(nonlinear)(nonlinear) 55 4444 729=9729=933 30.630.6 22.222.2

2-tuple2-tuple 66 3535 729=9729=933 38.338.3 27.927.9

4-tuple4-tuple(oriented)(oriented) 77 1616 1331 = 111331 = 1133 46.346.3 61.761.7

Xilinx Virtex-II Pro VP70Xilinx Virtex-II Pro VP70 Measured: Score-accumulate per sec (SAC/sec)Measured: Score-accumulate per sec (SAC/sec)

Complex models not limited in number of bitsComplex models not limited in number of bits Simple models not limited by worst-case speedSimple models not limited by worst-case speed

Page 14: Three-Dimensional Template Correlation: Object Recognition in 3D Voxel Data

CAMP `053D Template Matching

14BOSTONUNIVERSITY

ConclusionsConclusions

Accelerators enable 3D template matchingAccelerators enable 3D template matching >100x speedup over 3D FFT (n~100)>100x speedup over 3D FFT (n~100) Complex data types, including vector valuesComplex data types, including vector values Nonlinear comparisons supportedNonlinear comparisons supported

Programmability avoids common limitationsProgrammability avoids common limitations No penalty due to over-generalizationNo penalty due to over-generalization No limit due to data/function restrictionsNo limit due to data/function restrictions

3D data and FPGA coprocessors match well3D data and FPGA coprocessors match well Both are emerging and expanding Both are emerging and expanding FPGAs three years ago couldn’t do it!FPGAs three years ago couldn’t do it!