Mapping the FFT Algorithm to the IBM Cell Processor
-
Upload
darrel-buckley -
Category
Documents
-
view
48 -
download
1
description
Transcript of Mapping the FFT Algorithm to the IBM Cell Processor
![Page 1: Mapping the FFT Algorithm to the IBM Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062421/56812a43550346895d8d6f3d/html5/thumbnails/1.jpg)
Mapping the FFT Mapping the FFT Algorithm to the IBM Cell Algorithm to the IBM Cell ProcessorProcessor
Andy PolidoreAdvisors: Brendan Burns, Joseph Czechowski
![Page 2: Mapping the FFT Algorithm to the IBM Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062421/56812a43550346895d8d6f3d/html5/thumbnails/2.jpg)
MotivationMotivationMRI ImagingFast Fourier Transformations
◦Efficient algorithm for computing a Discrete Fourier Transform
◦DFT converts time-domain to frequency-domain
2D FFT: Perform a 1D FFT on each row of an image and then perform a 1D FFT on each resulting column
The Cell◦ Nine cores◦ 1 Power Processing Unit (PPU)◦ 8 Synergistic Processing Units (SPU)
![Page 3: Mapping the FFT Algorithm to the IBM Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062421/56812a43550346895d8d6f3d/html5/thumbnails/3.jpg)
StrategyStrategyCell comes with 2d routine
◦Needs to be called twice◦First call organizes the data in
contiguous column form Striping
Limited SPU memory◦Quad Buffering
![Page 4: Mapping the FFT Algorithm to the IBM Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062421/56812a43550346895d8d6f3d/html5/thumbnails/4.jpg)
PPU SPU 0Input Buffer
Output Buffer
FFT out
Input
DMA In
FFT
DMA Out
PPU SPU 0Input Buffer
Output Buffer
Input
FFT out
FFT
DMA In
DMA Out
![Page 5: Mapping the FFT Algorithm to the IBM Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062421/56812a43550346895d8d6f3d/html5/thumbnails/5.jpg)
PPU
Input Buffer
Output Buffer
DMA InInput
FFT out
FFT
SPU 7SPU 1
SPU 0
![Page 6: Mapping the FFT Algorithm to the IBM Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062421/56812a43550346895d8d6f3d/html5/thumbnails/6.jpg)
PPUSPU 2Input Buffer
Output Buffer
FFT out
Input
DMA In
FFT
DMA Out
PPUInput Buffer
Output Buffer
Input
FFT out
FFT
DMA In
DMA Out
Sync Point
SPU 1
SPU 0
SPU 2
SPU 1
SPU 0
![Page 7: Mapping the FFT Algorithm to the IBM Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062421/56812a43550346895d8d6f3d/html5/thumbnails/7.jpg)
Quad bufferingQuad bufferingWhy it is required?
◦Space problems◦Maximizing processing power
Buffers◦IN to handle incoming data◦FFTin and FFTout to process the data◦OUT stores the data ready to be
DMA’ed back to main memory
![Page 8: Mapping the FFT Algorithm to the IBM Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062421/56812a43550346895d8d6f3d/html5/thumbnails/8.jpg)
BufferingBuffering
665544332211
------------------------------------------FILLFILL00DDCCBBAA
![Page 9: Mapping the FFT Algorithm to the IBM Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062421/56812a43550346895d8d6f3d/html5/thumbnails/9.jpg)
BufferingBuffering
6655443322
--------------FILLFILLFFTFFTOUTOUTFFTFFTININ11
------------------------------------------FILLFILL00DDCCBBAA
![Page 10: Mapping the FFT Algorithm to the IBM Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062421/56812a43550346895d8d6f3d/html5/thumbnails/10.jpg)
BufferingBuffering
66554433
FILLFILLFFTFFTININOUTOUT22--------------FILLFILLFFTFFTOUTOUT
FFTFFTININ
FFTFFTOUTOUT
11------------------------------------------FILLFILL00DDCCBBAA
![Page 11: Mapping the FFT Algorithm to the IBM Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062421/56812a43550346895d8d6f3d/html5/thumbnails/11.jpg)
BufferingBuffering
665544
FFTFFTININFFTFFTOUTOUTFILLFILLOUTOUT33FILLFILLFFTFFTININOUTOUT22--------------FILLFILLFFTFFTOUTOUT
FFTFFTININ
FFTFFTOUTOUT
11------------------------------------------FILLFILL00DDCCBBAA
![Page 12: Mapping the FFT Algorithm to the IBM Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062421/56812a43550346895d8d6f3d/html5/thumbnails/12.jpg)
BufferingBuffering
FILLFILLFFTFFTININOUTOUTFFTFFTOUTOUT66OUTOUTFILLFILLFFTFFTOUTOUTFFTFFTININ55
FFTFFTOUTOUTOUTOUTFFTFFTININFILLFILL44
FFTFFTININFFTFFTOUTOUTFILLFILLOUTOUT33FILLFILLFFTFFTININOUTOUTFFTFFTOUTOUT22--------------FILLFILLFFTFFTOUTOUTFFTFFTININ11------------------------------------------FILLFILL00DDCCBBAA
![Page 13: Mapping the FFT Algorithm to the IBM Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062421/56812a43550346895d8d6f3d/html5/thumbnails/13.jpg)
StripingStripingMain Memory
SPU 0
SPU 1
SPU 2
SPU 3
SPU 4
SPU 5
SPU 6
SPU 7
![Page 14: Mapping the FFT Algorithm to the IBM Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062421/56812a43550346895d8d6f3d/html5/thumbnails/14.jpg)
ChallengesChallengesSimulator
◦Testing is slow◦Alignment◦Compiler
C coding◦Working with bytes
Parallel processing◦Data movement◦Debugging
![Page 15: Mapping the FFT Algorithm to the IBM Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062421/56812a43550346895d8d6f3d/html5/thumbnails/15.jpg)
Knowledge GainedKnowledge GainedMastering LinuxC make files, linking, etcData movement strategiesMulti-core processingDebugging!
![Page 16: Mapping the FFT Algorithm to the IBM Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062421/56812a43550346895d8d6f3d/html5/thumbnails/16.jpg)
Results and ConclusionsResults and ConclusionsSuccess?Future Work
◦Arbitrary size input
![Page 17: Mapping the FFT Algorithm to the IBM Cell Processor](https://reader036.fdocuments.net/reader036/viewer/2022062421/56812a43550346895d8d6f3d/html5/thumbnails/17.jpg)
Questions?Questions?