Video on DSP and FPGA

19
Video on DSP and Video on DSP and FPGA FPGA John Johansson John Johansson April 12, 2004 April 12, 2004

description

Video on DSP and FPGA. John Johansson April 12, 2004. Agenda. Overview of video processing A typical video encoder and the DCT Requirements of DCT Comparison of DSP and FPGA chips Analysis and conclusions Questions. Overview of Video Processing. Video processing generally involves - PowerPoint PPT Presentation

Transcript of Video on DSP and FPGA

Page 1: Video on DSP and FPGA

Video on DSP and Video on DSP and FPGAFPGA

John JohanssonJohn Johansson

April 12, 2004April 12, 2004

Page 2: Video on DSP and FPGA

AgendaAgenda

►Overview of video processingOverview of video processing► A typical video encoder and the DCTA typical video encoder and the DCT► Requirements of DCTRequirements of DCT► Comparison of DSP and FPGA chipsComparison of DSP and FPGA chips► Analysis and conclusionsAnalysis and conclusions►QuestionsQuestions

Page 3: Video on DSP and FPGA

Overview of Video ProcessingOverview of Video Processing

Video processing generally Video processing generally involvesinvolves

► Compression / Compression / DecompressionDecompression

► Special EffectsSpecial Effects► TV BroadcastingTV Broadcasting

► Focus on Compression

Page 4: Video on DSP and FPGA

Video EncodingVideo Encoding

Typical Video EncoderTypical Video Encoder

►Focus on DCT Focus on DCT algorithmalgorithm

Page 5: Video on DSP and FPGA

The Discrete Cosine The Discrete Cosine TransformationTransformation

►DCT is a spatial transform, like the FFTDCT is a spatial transform, like the FFT► Rearranges data into a more compressible Rearranges data into a more compressible

formatformat► Typically done on 64 (8x8) pixels at a timeTypically done on 64 (8x8) pixels at a time

►Big nasty equation …Big nasty equation …

►… … But no sharp teeth (optimizes extremely But no sharp teeth (optimizes extremely well)well)

Page 6: Video on DSP and FPGA

Requirements for DCTRequirements for DCT

Basic IdeaBasic Idea

►Read in data (64 values, 8-24 bits signed / Read in data (64 values, 8-24 bits signed / unsigned)unsigned)►Do transformationDo transformation►Write out dataWrite out data►Profit !!!Profit !!!

►Easy, right ??Easy, right ??

Page 7: Video on DSP and FPGA

Requirements for DCTRequirements for DCT

Memory LimitationsMemory Limitations► Load an entire frame?Load an entire frame?►One frame can vary from 50K to 50 MB in One frame can vary from 50K to 50 MB in

size when uncompressedsize when uncompressed► External memory is much slower, more External memory is much slower, more

plentifulplentiful►Do the DCT in chunks (8x8 block)Do the DCT in chunks (8x8 block)

Page 8: Video on DSP and FPGA

Requirements for DCTRequirements for DCT

Degree of ParallelismDegree of Parallelism►DCT can be done DCT can be done

serially, or broken serially, or broken up and done in up and done in parallelparallel

► Parallelism depends Parallelism depends largely on available largely on available memorymemory

► Price / Performance Price / Performance tradeoffstradeoffs

Page 9: Video on DSP and FPGA

The ChallengersThe Challengers

Xilinx Spartan-3 FPGAXilinx Spartan-3 FPGA► 50K – 5M gates50K – 5M gates► 326 MHz326 MHz► 100 KB – 2.3 MB internal memory100 KB – 2.3 MB internal memory► 4 - 104 dedicated multipliers4 - 104 dedicated multipliers►Oodles of I/O pins (up to 784)Oodles of I/O pins (up to 784)

Look at XC3S1000Look at XC3S1000►1M gates, 560 KB memory, 24 multipliers, 1M gates, 560 KB memory, 24 multipliers, 376 I/O pins376 I/O pins

Page 10: Video on DSP and FPGA

The ChallengersThe Challengers

ADSP-BF5xx Blackfin ProcessorADSP-BF5xx Blackfin Processor► 200 – 750 MHz200 – 750 MHz► Single or dual coreSingle or dual core►DMA memory controllerDMA memory controller► 52 KB – 326 KB internal memory52 KB – 326 KB internal memory►Other processor goodiesOther processor goodies

Look at ADSP-BF533Look at ADSP-BF533►500 MHz, single core, 148 KB memory500 MHz, single core, 148 KB memory

Page 11: Video on DSP and FPGA

PerformancePerformance

How do we correctly benchmark an algorithm How do we correctly benchmark an algorithm between two completely different processors?between two completely different processors?

►I don’t really knowI don’t really know►Look at some rough performance Look at some rough performance indicators and try and draw a conclusionindicators and try and draw a conclusion

Page 12: Video on DSP and FPGA

PerformancePerformance

FPGAFPGA► Varies from 1-25 cycle(s) / pixel for DCTVaries from 1-25 cycle(s) / pixel for DCT► Reading and writing of data takes additional Reading and writing of data takes additional

timetime► Clock speed limited by degree of parallelismClock speed limited by degree of parallelism

DSPDSP► Roughly 5 cycles / pixel for DCTRoughly 5 cycles / pixel for DCT►DMA controller allows parallel reading and DMA controller allows parallel reading and

writing with some setup overheadwriting with some setup overhead

Page 13: Video on DSP and FPGA

(Ideal) Performance(Ideal) Performance

Spartan-3Spartan-3► 64 read + 64 compute + 64 write = 196 64 read + 64 compute + 64 write = 196

cycles / blockcycles / block► 326 MHz = 1.66 Mblocks / second326 MHz = 1.66 Mblocks / second

BlackfinBlackfin► 319 compute + 10 DMA transfer = 329 319 compute + 10 DMA transfer = 329

cycles / blockcycles / block► 500 MHz = 1.52 Mblocks / second500 MHz = 1.52 Mblocks / second

Page 14: Video on DSP and FPGA

AdvantagesAdvantages

FPGAFPGA► Potential for very high parallelismPotential for very high parallelism► Existing video designs available for purchaseExisting video designs available for purchase► Good middleman functionalityGood middleman functionality

DSPDSP► Higher potential clock speedHigher potential clock speed► Much more flexible designMuch more flexible design► DMA memory controllerDMA memory controller

Page 15: Video on DSP and FPGA

DisadvantagesDisadvantages

FPGAFPGA► Low flexibilityLow flexibility►Hard to optimizeHard to optimize► Limited logic blocksLimited logic blocks

DSPDSP►Difficult to achieve full utilizationDifficult to achieve full utilization►Higher power consumptionHigher power consumption

Page 16: Video on DSP and FPGA

ConclusionsConclusions

FPGAFPGA► Best for well defined roles, like DCTBest for well defined roles, like DCT► Faster in situations where throughput mattersFaster in situations where throughput matters► Can be very expensiveCan be very expensive

DSPDSP► Better off for more flexible roles, like full Better off for more flexible roles, like full

encoderencoder► Situations where large amounts of (additional) Situations where large amounts of (additional)

memory are neededmemory are needed

Page 17: Video on DSP and FPGA

Questions?Questions?

Page 18: Video on DSP and FPGA

ReferencesReferences

Xilinx Spartan IIIXilinx Spartan IIIhttp://www.xilinx.com/xlnx/http://www.xilinx.com/xlnx/

xil_prodcat_landingpage.jsp?title=Spartan-3xil_prodcat_landingpage.jsp?title=Spartan-3

Analog Devices BlackfinAnalog Devices Blackfinhttp://www.analog.com/processors/http://www.analog.com/processors/

processors/blackfin/index.htmlprocessors/blackfin/index.html

Page 19: Video on DSP and FPGA

ReferencesReferences

Other articlesOther articleshttp://www.xilinx.com/publications/products/http://www.xilinx.com/publications/products/

services/xc_pdf/xc_videoapps44.pdfservices/xc_pdf/xc_videoapps44.pdf

http://www.xilinx.com/publications/products/http://www.xilinx.com/publications/products/sp2e/xc_dspvid43.htmsp2e/xc_dspvid43.htm

http://www.reed-ectronics.com/ednmag/http://www.reed-ectronics.com/ednmag/article/CA336860?article/CA336860?stt=000&pubdate=11%2F27%25stt=000&pubdate=11%2F27%25