TKT-2431 SoC design · TKT-2431 SoC design Introduction to exercises SoC design / Fall 2011....
Transcript of TKT-2431 SoC design · TKT-2431 SoC design Introduction to exercises SoC design / Fall 2011....
TKT-2431 SoC design
Introduction to exercises
SoC design / Fall 2011
Exercises
Assistants:
Antti Alhonen [email protected]
Jussi Raasakka [email protected]
(Otto Esko [email protected])
In the project work, a simplified H.263 video encoder is
implemented on Altera DE2 FPGA Development and Education
board
The projects work consists of a set of exercises
After successfully finishing each exercise, one should have a
working H.263 video encoder
Exercises: Mon 14-16, Tue 14-16, Wed 16-18 (TC417)
Assistance not available in any other time
All needed software is installed on the PCs of the class and can
be used whenever the class is not reserved for other courses
SoC design / Fall 2011
Exercises cont.
Attending the exercise hours is voluntary
The following assignment is introduced
Tools and algorithms are introduced
Hints are given
Questions are answered
Completing each of the exercises is mandatory
The returns have to be in time
The returns have to be accepted
Exercise work is carried out in groups of 1-2 students
Groups of 2 persons are preferred
SoC design / Fall 2011
Exercises cont.
The exercise work consists of several phases and sub-tasks
Receiving and understanding the system requirements
Writing a system specification
Software implementation of the encoder
Functional verification on PC workstation
Migrating the SW implementation onto FPGA
Verification and performance profiling for pure SW implementation
HW/SW partitioning and hardware acceleration
Verification and performance profiling for accelerated implementation
Documentation
SoC design / Fall 2011
Exercises cont.
Completed exercise work is valid for three successive exams
Points from the exercise work
You can gain points from some of the exercises
See exercise pages for more detail
Bonus point criteria will be explained during the first exercises
http://www.tkt.cs.tut.fi/kurssit/2431
SoC design / Fall 2011
Exercise 1 / Part 1
Introduction to topic
SoC design / Fall 2011
Topic of the work
A simplified H.263 video encoder on DE2 FPGA Education and
Development board
The system design flow
Introducing the requirements for video encoder
Functional specification is written
Software implementation written in ANSI C language of the video
encoder algorithm is made and verified on PC workstation
Initial hardware architecture containing a single Nios II softcore CPU and
necessary peripherals is synthesized for FPGA
Software version is migrated to Nios II processor on FPGA
Design is partitioned into software and hardware according to the
profiling result of software implementation
DCT algorithm is accelerated with dedicated logic
Accelerated system is implemented and verified on FPGA
Performance analysis is carried out for the accelerated system as well
and compared with the pure software implementation
SoC design / Fall 2011
H.263
The basics of H.263 video encoding are explained during following
exercises
Students are encouraged to get familiar with video encoding algorithms
in general before they start the project
H.263 has a lot in common with algorithms like JPEG and MPEG-2
A very simplified version of H.263 video encoder (resembling motion
JPEG) is used.
Only INTRA coding (i.e. prediction of subsequent frames is not applied)
Algorithms used are DCT (Discrete Cosine Transform), Quantization,
RLE (Run-Length Encoding), and VLC coding
SoC design / Fall 2011
Software
Altera Quartus II v7.2
System development front-end
Schematic editing
FPGA synthesis
SOPC builder for building Avalon/Nios based systems
Integrated Iogic analyzer
Nios II IDE
Software development environment for Nios II processor
Part of Nios II development kit
Mentor Graphics ModelSim
Simulating own VHDL blocks/designs
ffplay
video player
tmndec
H.263 decoder
nios2-terminal
Terminal software for reading from jtag uart
SoC design / Fall 2011
Hardware
Altera DE2 Development and Education Board Cyclone II 2C35 FPGA
33,216 logic elements
483,840 bits of embedded RAM
35 Embedded multipliers
4 PLLs
475 User I/O pins (at maximum)
External memory devices
4 MB Flash
512 KB SRAM
8 MB SDRAM
RS-232 serial port
Used for communication between PC and Nios II processor
USB blaster port
Used for programming the FPGA (memory contents and HW configuration)
In addition, the board contains following peripherals (not so relevant for the project)
Ethernet MAC/PHY device
4x user push-buttons, 18x toggle switches
18x red user leds, 9x green user leds
8x dual 7-segment display
2x expansion headers (40 user I/O pins / header)
SD flash connector header
50 MHz and 27 MHz Oscillators
SoC design / Fall 2011
Exercise returns
Exercises are returned as follows:
Return for an exercise has to be made before the next week’s sunday at
23:59 by E-mail
Return your exercises to [email protected]
All the required documents have to be in either pdf or pure text-file
format
The subject for the email has the following form:
SOCD_Ex<exercise_number>_G<group_number> where
<exercise_number> is the number of the exercise in question and
<group_number> is the number of your group.
SoC design / Fall 2011
Bonus points
Three main exercise returns are rated
Excellent: 1 bonus point for the exam
The returned document is very good and/or the returned source codes
work correctly and are well done
Accepted: no bonus
The returned document or code is acceptable
Rejected: no bonus, the return has to be corrected
Use common sense: Do not return rubbish!
All the exercises have to be accepted
Exercise points for the exam can be obtained:
1 point can be obtained from each of the exercises 2, 5, 12
Encoder achieves the given frame rate criteria (2p if > 75fps, 1p if > 50
fps)
2 most optimized encoder are awarded extra points
Bonus exercise: Dual Nios II encoder implementation
Up to 3 bonus points can be achieved
SoC design / Fall 2011
Exercise 1, Part 2
Introduction to algorithms
SoC design / Fall 2011
Requirements for Video Transmission
Communication delay
More important in video conferencing applications than in file-based
streaming applications
Should be as low as possible (< 250 ms, even 150 ms)
Should be kept as constant as possible
Avoiding burst of frames followed by a still image
Buffering
Frame rate
Affects to perceived smoothness of motion
Under 10 fps video stream is perceived as “fast slide show”
Image resolution
Directly proportional to data size of a raw image
Depends on the application
SoC design / Fall 2011
Introduction to H.263 Standard
May 1996, ITU-T recommendation v1
Block-based ( Macroblock size is 16 pixels by 16 lines )
Motion estimation for temporal redundancy reduction
Same objects are likely to be present in adjacent frames
Half pixel accurate motion vectors
DCT for spatial redundancy reduction
8 x 8 blocks
Adjacent pixel values have only a little difference
Quantization (lossy)
Control of compression ratio
RLE and Huffman as entropy coding algorithms
SoC design / Fall 2011
Block Diagram of H.263 Encoder
pre-processing DCT Q Entropy coding
Q-1
IDCT
Mot. Est.
Mot. Comp
++
-
Previous reconstructed pictures
(same image as the decoder
observes)
motion vector v(u,v)
v(u,v)
bits
ou
t (Hu
ffman
, VL
C)
7 0 4 0 0 0 0 1
1 9 3 0 0 0 0 0
2 0 0 0 0 0 0 0
0 0 0 0 0 0 0 00 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
No need to send
zeros in 8x8 block to
the decoder
1/2 pixel accurate
(interpolation)
Prediction error computation
1
0
1
10
0
1
0
In Intra mode, MBs are coded directly
SoC design / Fall 2011
Discrete Cosine Transform (DCT)
Assumption: Adjacent pixels differ only a little from each other
Thus, data in the frequency domain is easier to compress
Spatial domain compression
Pixels are grouped into blocks and the blocks are then transformed
into frequency domain
Essential information is then in more compact form
Important DCT-coefficients in upper-left corner, that is, in low frequencies
Compression is achieved by discarding the less important information
of the transformed block
Quantization of coefficients
DCT itself is a lossless transform
Limited accuracy with coefficients, however, leads to some loss of
information
SoC design / Fall 2011
Entropy Encoding
After quantization, the quantized coefficients are
compressed in a lossless manner using entropy
encoding
Run-length coding
o Lower amplitude coefficient likely to be zero
o Arrange successive quantized non-zero coefficients
into combinations of (LAST, RUN, LEVEL)
• Last = Whether this is the final non-zero coefficient in the
block
• RUN = Number of preceding zeros
• LEVEL = sign and magnitude of the non-zero coefficient
o Coefficients are processed in zig-zag order
• Due to the fact that running zeros are most likely located
at higher frequencies
Huffman coding (variable length coding)
o After RLE coefficients are encoded based on the
statistical characteristics
• Shorter codewords for symbols which occur with high
probability
SoC design / Fall 2011
H.263 – Project work
A simplified version of H.263 video encoder (resembling motion
JPEG) is used.
only INTRA coding (i.e. prediction of subsequent frames is not applied)
used algorithms are DCT (Discrete Cosine Transform), quantization, RLE
(Run-Length Encoding), and VLC coding.
Image resolution used is QCIF (176 x 144)
Encoder:
Decoder:
pre-processing DCT Q Entropy coding
Q-1 IDCT
Reconstructed pictures
Entropy decoding
011001011
011001011
SoC design / Fall 2011
Design flow
Specification
HW/SW partitioning
Final Implementation
Requirements
Verif
icati
on
Performance analysis
Performance analysisD
ocu
men
tati
on
SW Implementation
Performance analysis
SoC design / Fall 2011
Specification
In this week the specification of the encoder is started
Required C source codes for the encoder are pre-given
Can be downloaded from course web-pages
You have to write a simple specification for the video encoder system
you are going to implement
Specification does not have to be long
It is the quality of the contents that matters
4-7 pages in total (including the chapters introduced on next week)
The specification should be written before the implementation
An implementation document will be written later
A diagram of the video encoding flow is required
Control and data flow diagram describing how the pre-given H.263
functions are used
SoC design / Fall 2011
Specification (2)
1. Introduction
What is being specified
2. Flow of encoding
Present different phases of the encoding
Explain the encoding flow briefly
A flow diagram of encoding is required!
3. Encoder interface
Inputs and outputs of encoder
What kind of data is read in?
What is the output data like?
4.Description of algorithms
Function prototypes
Description of function parameters and return values
Description of function behavior and purpose in this design
At least DCT, quantization, RLE, and VLC have to be covered here
The subsequent sections will be written in exercise 2.
SoC design / Fall 2011
Links on H.263 related material
http://www.itu.int/rec/T-REC-H.263/
ITU-T specification of H.263
http://www.jaxstream.com/products/jaxspeed/wp_m4venc.pdf
Basics of MPEG-4 video encoding
http://www.ece.purdue.edu/~ace/jpeg-tut/jpegtut1.html
JPEG tutorial
SoC design / Fall 2011