HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony...
-
Upload
rosamund-cain -
Category
Documents
-
view
226 -
download
0
Transcript of HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony...
HW/SW Co-Design of an MPEG-2 Decoder
Pradeep DhananjayKiran Divakar
Leela Kishore KothamasuAnthony Weerasinghe
Outline
• Objectives• Background• HW-SW Partitioning• SW/HW Design• Testing and Debug• VGA Display Driver• Results• Lessons Learned• Future Work
Objectives
• Accelerate MPEG-2 Decoder – Identify bottlenecks– Isolate bottleneck functions and partition design– Convert SW functions to HW blocks– Design HW/SW interfaces for communication– Measure accelerated performance
• Design VGA display driver on FPGA– Attempt to display decoded stream in real-time
Background
• Development Platform– TLL-5000 prototyping board• ARMv9, Spartan3 FPGA, VGA DAC (ADV7125)
• Source code for MPEG2 Decoder– Obtained from sourceforge.net
Background – MPEG2
• Consists of Group of pictures (GOP) sequence• Types of pictures– I-picture (Intra coded)– P-picture (Forward predicted)– B-picture (Bidirectional predicted)
Background – MPEG2
HW-SW Partitioning
• Linux profiling done to determine critical functions– Results based on a particular input (mpeg file)– Assumed to be representative of a typical use case– Profiling done on x86 Linux and as well as on the
board• gmon.out generated on board
Profiling on x86-Linux
mpeg2_idct_add_c
rgb_c_32_420
get_mpeg1_non_intra_block
MC_put_xy_16_c
MC_avg_xy_16_c
mpeg2_idct_copy_c
MC_put_o_16_c
mpeg2_slice
0 5 10 15 20 25 30
% Time Spent in Function
Profiling on ARM-Linux
mpeg2_idct_add_c
mpeg2_idct_copy_c
MC_put_o_16_c
get_mpeg1_non_intra_block
MC_put_o_8_c
mpeg2_slice
0 5 10 15 20 25 30 35 40 45 50
% Time Spent in Function
HW-SW Partitioning
SW Design
• IDCT function uses pointers to access an input array– Not suitable for synthesis by Catapult-C– Converted all pointer accesses to array accesses
• IDCT performs non sequential accesses with varying stride– Modified caller of the IDCT function to re-organize
access pattern into sequential form– Created temporary array, which is passed to function– Return array from function is re-distributed to correct
locations• Changes to software verified using golden code
SW Flow Chart
MPEG2 SW code
.……
........
…….
IDCT function call .……
........
…….
…….
Create temporary buffer
• Pass input values in temporary buffer to FPGA memory
• Issue Start command to FPGA
• IDCT does computation and stores data back in FPGA memory
• Generates interrupt signal after computation is done
• Reads values from FPGA memory to temporary buffer
• Stores values from temp buffer back to original array in order
start
Wait for Interruptinterrupt
.
.
.
.
.
HW Design
• Mentor Catapult-C Synthesis Tool – High level synthesis from C/C++ to Verilog RTL
HW Design
• High Level Synthesis – Tool schedules operations on a cycle-by cycle basis– Constrained to available resources• Uses target device and library information
– Built RTL as a interface + controller + datapath
Example: Y = A*C + B*D
Example: Y = A*C + B*D
Example: Y = A*C + B*D
Example: Y = A*C + B*D
HW Design
• Code conversion for synthesis– Isolate IDCT function from MPEG2 code– Merge initialization functions• One initialization construct was needed
– Remove all global variables• Few dependencies for the IDCT function
– Convert pointer arithmetic to array offsets• Most work needed for this conversion• No standard guidelines available
HW Design
• Pointer conversions
HW Design
• Hardware Interface
HW Design
• Verifying Isolated IDCT function in C and RTL– C testbench written to test isolated IDCT function– Catapult-C allows testing of C function vs. RTL
• Ensure RTL generation matches expected behavior• Un-converted pointer code generated wrong RTL
HW Design
• Integration with communication interface– Communication FSM given– Integrate IDCT block
Problems Faced
• IDCT RTL would not synthesize to 66 MHz– 27 MHz clock used instead
• IDCT code takes ~30 minutes to synthesize– Inefficiency of using Catapult-C to generate code
• Catapult code difficult to debug• Some reads not returning correct values– Read/Write alignment– Synthesis could be a problem
Debug Techniques
• Removed IDCT block for fast synthesis– Used to check interface memory writes – Showed 16 bit writes were not successful
• Routed state bits to board LEDS– Helpful when program hangs due to lack of DTACK– OR’d DTACK with DIP switch to prevent hang
• printf and printk statements to check addresses and data being sent
Delay Values
• Hardware Delay– Approximately 10 us to compute IDCT• Based on cycle count provided by Catapult-C and 27
MHz clock frequency of FPGA
• Pure software implementation– Approximately 30 us
• Overhead for communication– ~15000 us
VGA Display Block Diagram
VGA Application
Driver
VGA Controller
Main FSM
RAM 1
RAM 2
ADV 7125
Monitor
VGA
On Board
FPGA
ARM
Generated ppm files
VGA Hardware: ADV7125 Video DAC
• ADV7125 has triple 8-bit video DAC’s • VGA DAC requires R, G, B 8-bit values• Needs H-Sync and V-Synch signals
VGA Controller
• Used double buffer to store frame data– FIFO implementation didn’t work
• ARM cannot keep up with the display data rate requirement
– Frame resolution: 64X48 – Each frame transfer requires 3072 words– Used 12KB RAM memory to implement double buffer
• One full frame transferred with single driver call– Reduces system call overhead– Each call overhead ~26 μs
• Interrupt used to communicate to User application– Fills the next buffer
VGA Display Demonstration
Lessons Learned
• Debugging on an FPGA is difficult!• Hand-conversion of C code could have been more
efficient• Create test bench to simulate ARM-FGPA communication
– Allows quick debug of FPGA hardware– Visibility into internal signals
• Hardware partition should have high computation to communication ratio– IDCT called many times with small computation time– ~10 us of computation; ~15000 us of communication
Future Work
• Fix erroneous reads from IDCT• Integrate VGA display driver and MPEG2
Decoder
Thank you!