HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony...

33
HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe

Transcript of HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony...

Page 1: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

HW/SW Co-Design of an MPEG-2 Decoder

Pradeep DhananjayKiran Divakar

Leela Kishore KothamasuAnthony Weerasinghe

Page 2: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

Outline

• Objectives• Background• HW-SW Partitioning• SW/HW Design• Testing and Debug• VGA Display Driver• Results• Lessons Learned• Future Work

Page 3: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

Objectives

• Accelerate MPEG-2 Decoder – Identify bottlenecks– Isolate bottleneck functions and partition design– Convert SW functions to HW blocks– Design HW/SW interfaces for communication– Measure accelerated performance

• Design VGA display driver on FPGA– Attempt to display decoded stream in real-time

Page 4: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

Background

• Development Platform– TLL-5000 prototyping board• ARMv9, Spartan3 FPGA, VGA DAC (ADV7125)

• Source code for MPEG2 Decoder– Obtained from sourceforge.net

Page 5: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

Background – MPEG2

• Consists of Group of pictures (GOP) sequence• Types of pictures– I-picture (Intra coded)– P-picture (Forward predicted)– B-picture (Bidirectional predicted)

Page 6: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

Background – MPEG2

Page 7: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

HW-SW Partitioning

• Linux profiling done to determine critical functions– Results based on a particular input (mpeg file)– Assumed to be representative of a typical use case– Profiling done on x86 Linux and as well as on the

board• gmon.out generated on board

Page 8: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

Profiling on x86-Linux

mpeg2_idct_add_c

rgb_c_32_420

get_mpeg1_non_intra_block

MC_put_xy_16_c

MC_avg_xy_16_c

mpeg2_idct_copy_c

MC_put_o_16_c

mpeg2_slice

0 5 10 15 20 25 30

% Time Spent in Function

Page 9: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

Profiling on ARM-Linux

mpeg2_idct_add_c

mpeg2_idct_copy_c

MC_put_o_16_c

get_mpeg1_non_intra_block

MC_put_o_8_c

mpeg2_slice

0 5 10 15 20 25 30 35 40 45 50

% Time Spent in Function

Page 10: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

HW-SW Partitioning

Page 11: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

SW Design

• IDCT function uses pointers to access an input array– Not suitable for synthesis by Catapult-C– Converted all pointer accesses to array accesses

• IDCT performs non sequential accesses with varying stride– Modified caller of the IDCT function to re-organize

access pattern into sequential form– Created temporary array, which is passed to function– Return array from function is re-distributed to correct

locations• Changes to software verified using golden code

Page 12: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

SW Flow Chart

MPEG2 SW code

.……

........

…….

IDCT function call .……

........

…….

…….

Create temporary buffer

• Pass input values in temporary buffer to FPGA memory

• Issue Start command to FPGA

• IDCT does computation and stores data back in FPGA memory

• Generates interrupt signal after computation is done

• Reads values from FPGA memory to temporary buffer

• Stores values from temp buffer back to original array in order

start

Wait for Interruptinterrupt

.

.

.

.

.

Page 13: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

HW Design

• Mentor Catapult-C Synthesis Tool – High level synthesis from C/C++ to Verilog RTL

Page 14: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

HW Design

• High Level Synthesis – Tool schedules operations on a cycle-by cycle basis– Constrained to available resources• Uses target device and library information

– Built RTL as a interface + controller + datapath

Page 15: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

Example: Y = A*C + B*D

Page 16: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

Example: Y = A*C + B*D

Page 17: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

Example: Y = A*C + B*D

Page 18: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

Example: Y = A*C + B*D

Page 19: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

HW Design

• Code conversion for synthesis– Isolate IDCT function from MPEG2 code– Merge initialization functions• One initialization construct was needed

– Remove all global variables• Few dependencies for the IDCT function

– Convert pointer arithmetic to array offsets• Most work needed for this conversion• No standard guidelines available

Page 20: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

HW Design

• Pointer conversions

Page 21: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

HW Design

• Hardware Interface

Page 22: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

HW Design

• Verifying Isolated IDCT function in C and RTL– C testbench written to test isolated IDCT function– Catapult-C allows testing of C function vs. RTL

• Ensure RTL generation matches expected behavior• Un-converted pointer code generated wrong RTL

Page 23: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

HW Design

• Integration with communication interface– Communication FSM given– Integrate IDCT block

Page 24: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

Problems Faced

• IDCT RTL would not synthesize to 66 MHz– 27 MHz clock used instead

• IDCT code takes ~30 minutes to synthesize– Inefficiency of using Catapult-C to generate code

• Catapult code difficult to debug• Some reads not returning correct values– Read/Write alignment– Synthesis could be a problem

Page 25: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

Debug Techniques

• Removed IDCT block for fast synthesis– Used to check interface memory writes – Showed 16 bit writes were not successful

• Routed state bits to board LEDS– Helpful when program hangs due to lack of DTACK– OR’d DTACK with DIP switch to prevent hang

• printf and printk statements to check addresses and data being sent

Page 26: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

Delay Values

• Hardware Delay– Approximately 10 us to compute IDCT• Based on cycle count provided by Catapult-C and 27

MHz clock frequency of FPGA

• Pure software implementation– Approximately 30 us

• Overhead for communication– ~15000 us

Page 27: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

VGA Display Block Diagram

VGA Application

Driver

VGA Controller

Main FSM

RAM 1

RAM 2

ADV 7125

Monitor

VGA

On Board

FPGA

ARM

Generated ppm files

Page 28: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

VGA Hardware: ADV7125 Video DAC

• ADV7125 has triple 8-bit video DAC’s • VGA DAC requires R, G, B 8-bit values• Needs H-Sync and V-Synch signals

Page 29: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

VGA Controller

• Used double buffer to store frame data– FIFO implementation didn’t work

• ARM cannot keep up with the display data rate requirement

– Frame resolution: 64X48 – Each frame transfer requires 3072 words– Used 12KB RAM memory to implement double buffer

• One full frame transferred with single driver call– Reduces system call overhead– Each call overhead ~26 μs

• Interrupt used to communicate to User application– Fills the next buffer

Page 30: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

VGA Display Demonstration

Page 31: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

Lessons Learned

• Debugging on an FPGA is difficult!• Hand-conversion of C code could have been more

efficient• Create test bench to simulate ARM-FGPA communication

– Allows quick debug of FPGA hardware– Visibility into internal signals

• Hardware partition should have high computation to communication ratio– IDCT called many times with small computation time– ~10 us of computation; ~15000 us of communication

Page 32: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

Future Work

• Fix erroneous reads from IDCT• Integrate VGA display driver and MPEG2

Decoder

Page 33: HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

Thank you!