Performancevergleich DSP vs. FPGA Werner FRIESENBICHLER 0526423.
Video on DSP and FPGA
description
Transcript of Video on DSP and FPGA
Video on DSP and Video on DSP and FPGAFPGA
John JohanssonJohn Johansson
April 12, 2004April 12, 2004
AgendaAgenda
►Overview of video processingOverview of video processing► A typical video encoder and the DCTA typical video encoder and the DCT► Requirements of DCTRequirements of DCT► Comparison of DSP and FPGA chipsComparison of DSP and FPGA chips► Analysis and conclusionsAnalysis and conclusions►QuestionsQuestions
Overview of Video ProcessingOverview of Video Processing
Video processing generally Video processing generally involvesinvolves
► Compression / Compression / DecompressionDecompression
► Special EffectsSpecial Effects► TV BroadcastingTV Broadcasting
► Focus on Compression
Video EncodingVideo Encoding
Typical Video EncoderTypical Video Encoder
►Focus on DCT Focus on DCT algorithmalgorithm
The Discrete Cosine The Discrete Cosine TransformationTransformation
►DCT is a spatial transform, like the FFTDCT is a spatial transform, like the FFT► Rearranges data into a more compressible Rearranges data into a more compressible
formatformat► Typically done on 64 (8x8) pixels at a timeTypically done on 64 (8x8) pixels at a time
►Big nasty equation …Big nasty equation …
►… … But no sharp teeth (optimizes extremely But no sharp teeth (optimizes extremely well)well)
Requirements for DCTRequirements for DCT
Basic IdeaBasic Idea
►Read in data (64 values, 8-24 bits signed / Read in data (64 values, 8-24 bits signed / unsigned)unsigned)►Do transformationDo transformation►Write out dataWrite out data►Profit !!!Profit !!!
►Easy, right ??Easy, right ??
Requirements for DCTRequirements for DCT
Memory LimitationsMemory Limitations► Load an entire frame?Load an entire frame?►One frame can vary from 50K to 50 MB in One frame can vary from 50K to 50 MB in
size when uncompressedsize when uncompressed► External memory is much slower, more External memory is much slower, more
plentifulplentiful►Do the DCT in chunks (8x8 block)Do the DCT in chunks (8x8 block)
Requirements for DCTRequirements for DCT
Degree of ParallelismDegree of Parallelism►DCT can be done DCT can be done
serially, or broken serially, or broken up and done in up and done in parallelparallel
► Parallelism depends Parallelism depends largely on available largely on available memorymemory
► Price / Performance Price / Performance tradeoffstradeoffs
The ChallengersThe Challengers
Xilinx Spartan-3 FPGAXilinx Spartan-3 FPGA► 50K – 5M gates50K – 5M gates► 326 MHz326 MHz► 100 KB – 2.3 MB internal memory100 KB – 2.3 MB internal memory► 4 - 104 dedicated multipliers4 - 104 dedicated multipliers►Oodles of I/O pins (up to 784)Oodles of I/O pins (up to 784)
Look at XC3S1000Look at XC3S1000►1M gates, 560 KB memory, 24 multipliers, 1M gates, 560 KB memory, 24 multipliers, 376 I/O pins376 I/O pins
The ChallengersThe Challengers
ADSP-BF5xx Blackfin ProcessorADSP-BF5xx Blackfin Processor► 200 – 750 MHz200 – 750 MHz► Single or dual coreSingle or dual core►DMA memory controllerDMA memory controller► 52 KB – 326 KB internal memory52 KB – 326 KB internal memory►Other processor goodiesOther processor goodies
Look at ADSP-BF533Look at ADSP-BF533►500 MHz, single core, 148 KB memory500 MHz, single core, 148 KB memory
PerformancePerformance
How do we correctly benchmark an algorithm How do we correctly benchmark an algorithm between two completely different processors?between two completely different processors?
►I don’t really knowI don’t really know►Look at some rough performance Look at some rough performance indicators and try and draw a conclusionindicators and try and draw a conclusion
PerformancePerformance
FPGAFPGA► Varies from 1-25 cycle(s) / pixel for DCTVaries from 1-25 cycle(s) / pixel for DCT► Reading and writing of data takes additional Reading and writing of data takes additional
timetime► Clock speed limited by degree of parallelismClock speed limited by degree of parallelism
DSPDSP► Roughly 5 cycles / pixel for DCTRoughly 5 cycles / pixel for DCT►DMA controller allows parallel reading and DMA controller allows parallel reading and
writing with some setup overheadwriting with some setup overhead
(Ideal) Performance(Ideal) Performance
Spartan-3Spartan-3► 64 read + 64 compute + 64 write = 196 64 read + 64 compute + 64 write = 196
cycles / blockcycles / block► 326 MHz = 1.66 Mblocks / second326 MHz = 1.66 Mblocks / second
BlackfinBlackfin► 319 compute + 10 DMA transfer = 329 319 compute + 10 DMA transfer = 329
cycles / blockcycles / block► 500 MHz = 1.52 Mblocks / second500 MHz = 1.52 Mblocks / second
AdvantagesAdvantages
FPGAFPGA► Potential for very high parallelismPotential for very high parallelism► Existing video designs available for purchaseExisting video designs available for purchase► Good middleman functionalityGood middleman functionality
DSPDSP► Higher potential clock speedHigher potential clock speed► Much more flexible designMuch more flexible design► DMA memory controllerDMA memory controller
DisadvantagesDisadvantages
FPGAFPGA► Low flexibilityLow flexibility►Hard to optimizeHard to optimize► Limited logic blocksLimited logic blocks
DSPDSP►Difficult to achieve full utilizationDifficult to achieve full utilization►Higher power consumptionHigher power consumption
ConclusionsConclusions
FPGAFPGA► Best for well defined roles, like DCTBest for well defined roles, like DCT► Faster in situations where throughput mattersFaster in situations where throughput matters► Can be very expensiveCan be very expensive
DSPDSP► Better off for more flexible roles, like full Better off for more flexible roles, like full
encoderencoder► Situations where large amounts of (additional) Situations where large amounts of (additional)
memory are neededmemory are needed
Questions?Questions?
ReferencesReferences
Xilinx Spartan IIIXilinx Spartan IIIhttp://www.xilinx.com/xlnx/http://www.xilinx.com/xlnx/
xil_prodcat_landingpage.jsp?title=Spartan-3xil_prodcat_landingpage.jsp?title=Spartan-3
Analog Devices BlackfinAnalog Devices Blackfinhttp://www.analog.com/processors/http://www.analog.com/processors/
processors/blackfin/index.htmlprocessors/blackfin/index.html
ReferencesReferences
Other articlesOther articleshttp://www.xilinx.com/publications/products/http://www.xilinx.com/publications/products/
services/xc_pdf/xc_videoapps44.pdfservices/xc_pdf/xc_videoapps44.pdf
http://www.xilinx.com/publications/products/http://www.xilinx.com/publications/products/sp2e/xc_dspvid43.htmsp2e/xc_dspvid43.htm
http://www.reed-ectronics.com/ednmag/http://www.reed-ectronics.com/ednmag/article/CA336860?article/CA336860?stt=000&pubdate=11%2F27%25stt=000&pubdate=11%2F27%25