1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil...
-
Upload
damian-benson -
Category
Documents
-
view
217 -
download
1
Transcript of 1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil...
1
Implementation in Hardware of Video Processing Algorithm
Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk
SPRING 2008
High Speed Digital System Lab
2
Project Goals
Real time video signal filtering based on
nonlinear diffusion algorithm.
• Studying the algorithm of nonlinear diffusion.
• Studying the work environment of Synplify DSP.
• Implementing on FPGA, a real time video processing algorithm.
Non linear Diffusion Filtering
3
The nonlinear diffusion is an iterative algorithm that provides local smoothing of the picture and at the same time edges preservation.
Here you can see 3 steps along the iterative process.
Original image Step one Step two Step three
Project stages
• Simulink design of an existing Matlab code• Adaptation of the Simulink design to SynplifyDSP
components and constraints.• Synthesis of the VHDL code produced by SynplifyDSP
using SynplifyPro• Integration of the above RTL component within the Gidel
card architecture using Quartus II and ProcWizard• Place and route by using Quartus II• Loading RBF file to Gidel’s Procstar II card using
ProcWizard
4
Comparison between SynplifyDSP and direct VHDL implementationPros:
• The SynplifyDSP tool plugs into the familiar Simulink
environment.• The development is fast.
Cons:• Hard to obtain an optimal implementation (non optimal
critical path)• VHDL code that is hard to understand and therefore it is
difficult to make changes
5
Simulink design
6
R
G
B
Simulink design
7
R
G
B
From Simulink to SynplifyDSP
We had to change our design because:
1) We choose not to use any buffer between the DVI connection and the processing of the input.
2) In the Simulink design we use matrices to represent images, but SynplifyDSP can only use vectors.
8
Image representations
• Image as matrix
• Image as vector
9
333231
232221
131211
aaa
aaa
aaa
333231232221131211 aaaaaaaaa
Computing derivation
10
333231232221131211 aaaaaaaaa 0 0 0
0 0 0
333231233322322131132312221121013012011 aaaaaaaaaaaaaaaaaa
false result false resulttrue result
333231232221131211 aaaaaaaaa
233322322131
132312221121
131312121111
aaaaaa
aaaaaa
aaaaaa
232221
131211
131211
aaa
aaa
aaa
333231
232221
131211
aaa
aaa
aaa
Matrix derivation
Vector derivation
SynplifyDSP design
11
R
G
B
R
G
B
SynplifyDSP design
12
ROM component in SynplifyDSP design
• In SynplifyDSP we can’t implement the mathematical expression:
To overcome this problem we use ROM components that function as LUT.
Loading the ROM is done by creating an array.• SynplifyDSP automatically uses a LUT to
calculate the LOG function.
13
5.0
Fixed point precision• In Matlab and Simulink we work at full precision. • But when we implements the above design on
FPGA, we have to work with fixed point precision.
Hence we need to estimate how many bits we should use per signal, in order to get a satisfactory error.
• It appears that using 12 bits for the fraction of each signal provides satisfactory precision.
14
Matlab and Synplify comparison
• We measure the error between the Matlab code output and the SynplifyDSP output.
• For 1 iteration: relative root MSE = 1%
15
Matlab result Synplify result
SynplifyDSP – VHDL code
16
Synplify Pro
17
Synplify Pro
18
Synplify Pro
Performance Summary
***************************
Worst slack in design: 13.447
Requested Estimated Requested Estimated
Starting Clock Frequency Frequency Period [ns] Period [ns] Slack
-----------------------------------------------------------------------------------------------------------------
clk 44.0 MHz 107.8 MHz 22.727 9.280 13.447
================================================================
19
• Requested Frequency – the minimal frequency we want to achieve.• Estimated Frequency – the frequency of the current design.• Requested Period – the maximal period we want to achieve for a single
cycle.• Estimated Period - single cycle time of the current design.• Slack – this is the extra time we have in single cycle. A negative value indicates that timing constraints could not be met.
Procwizard + Quartus
• In the ProcWizard we create the interface between the FPGA and daughter board DVI port.
• The Quartus performs the place and route according to the Procwizard interface and the SynplifyPRO node-level netlist.
20
Procwizard
21
Block Diagram
22
CLK
I2C 2
3Data
DVIReceiver
VideoInDVD
CLK
I2C 2
3Data
VideoOut
ComputerScreen
DVITransmitter
Procstar II Board
DVIDaughter
Board
Pix
elD
ata
Clo
ck
VS
YN
C
HS
YN
C
Top Level DesignC
lock
VS
YN
CHS
YN
C Pixel
Data
DVIConnector
DVIConnector
2424
Rates & Frequencies
• The DVI connection provides one pixel (24 bits) per clock.
• DVI frame rate is 60 frames per second.• Minimum clock frequency of DVI standard
is : 25.175 MHz• Our goal was : 43MHz (for 800 600)• Achieved frequency: 107.8 MHz • We achieved our goal by using pipeline • The bit rate is 43M 24bit 1Gbit/sec 23
Memory
• For 10 iteration we use 10 55KB ROMs and 3 log 0.4KB ROMs and 3
8KB ROMs.
• ROM size = 3*0.4K+3*8K+10*55K=574KB
24
5.0
2
Time table
25
JAN
4
DEC
28
DEC
21
DEC
14
DEC
7
NOV
30
NOV
23
Date (week starting at…)
Assignment
Working on minimizing the fixed point
precision of the synplifyDSP components in the simulink implementation
Working on minimizing the ROM size
Studying the DVI protocol and fitting the
implementation for working with DVI
Planning and creating the Parallel
implementation
Final Presentation