Post on 03-Jan-2016
description
Video Coding Standards
Heejune AHNEmbedded Communications Laboratory
Seoul National Univ. of TechnologyFall 2011
Last updated 2011. 5. 13
Heejune AHN: Image and Video Compression p. 2
Agenda
History and Concepts JPEG and JPEG-2000 MPEG-1 and MPEG-2 MPEG-4 H.261 and H.263 H.264 Beyond H.264
Heejune AHN: Image and Video Compression p. 3
1. Standards and Standards Bodies
VCEG (video coding expert group) in ITU (formerly CCITT) Focus on real-time, two-way video communication
MPEG/JPEG (moving picture expert group) in ISO Focus on multimedia storage and distribution for entertainment
Some are overlapped
JPEG
JPEG-2000
MPEG-1MPEG-2 => H.262
MPEG-4
MPEG-7
MPEG-21
H.261
H.263
H.264MPEG-4/AVC <=
ITU VCEGISO MPEG/JPEG
H.264 High ProfileH.264 SVCH.264 MVCHEVC(H.265)
Heejune AHN: Image and Video Compression p. 4
History of Video Coding Standards
HP SVCMVC
HEVC
2011
Heejune AHN: Image and Video Compression p. 5
ISO-MPEG/JPEG JPEG (1992) : compression of still image (DCT) MPEG-1 (1993) : real time play back of VHS quality on Video CD (1.4Mbps) MPEG-2 (1995) : broadcasting quality video service (3~5Mbps) MPEG-4 (1998) : wide bandwidth (20bps to high) and object oriented coding JPEG-2000 (2000) : better quality still image
ITU-VCEG H.261 (1990) : video telephony over ISDN (px64kbps) H.263 (1995) : video telephony over circuit and packet network, at 20 kbps
to high bandwidth H.264 (2003) : multipurpose better quality video coding
Others MPEG-7 (Multimedia content description interface) for search and retrieval
in multimedia DB MPEG-21(Multimedia Framework) for multimedia delivery for interoperability
Heejune AHN: Image and Video Compression p. 6
Standards process and usage
Standards process
Understanding standards Only Syntax and Decoder system are defined in Standards. Encoder, application, and Implementation are open to users Standards provides “profile and level” and recommended usage for
helping users to choose from many technical options.
Scope & Aim of St’ds
Proposals From
Companies, Universities
Test Model
(Docs & ref. SW)
Draft St’ds
Int’lSt’ds
Improvement Proposals
Performance &
complexity evaluation
Heejune AHN: Image and Video Compression p. 7
2. JPEG
ISO IS-10918 By ISO/IEC JTC1/SC29/WG10, (1984~1992) Widely used in WWW and digital photography Motion-JPEG is just a successive stream of JPEG images
Heejune AHN: Image and Video Compression p. 8
Baseline JPEG Codec
RGB or YCbCr coded in either separately or in interleaved order
Leveloffset
8x8DCT
Uniformscalar
quantization
Zig-zag scan
Run-levelcoding VLC
DifferentialCoding VLC
Quantizationtables
ac quantization indices
dc quantization indices
AC Huffmantables
DC Huffmantables
bits
bits
inputimage
16 11 10 16 24 40 51 61
12 12 14 19 26 58 60 55
14 13 16 24 40 57 69 56
14 17 22 29 51 87 80 62
18 22 37 56 68 109 103 77
24 36 55 64 81 104 113 92
49 64 78 87 103 121 120 101
72 92 95 98 112 100 103 99
8x8 blocks
[0,255] => [-128,127]
RRRRSSSS-value
SSSS-value
Heejune AHN: Image and Video Compression p. 9
Lossless JPEG DPCM used, prediction from 3 neighbors pixels
Optional mode Progressive encoding
• Store image data in order of DC only, low-frequency AC, high frequency AC
Hierarchical encoding• Store image data in low resolution to high resolution
Motion-JPEG Just a sequence of JPEG still images Low complexity, Error tolerance, Market awareness Used for video conferencing and surveillance before widely
available cheap MPEG-1/2/4 solution in a market
Heejune AHN: Image and Video Compression p. 10
JPEG-2000
Features Good compression performance than JPEG
• at high compression ratio, no blocking effects Good compression for continuous tone, bi level (text) Both lossless and lossy compression in one framework ROI (region of interest) support Error resilient support (data partitioning) Rather slow in current embedded system due to complexity
Encoding process
WaveletTransform
QuantizerArithmetic Encoder(Tiling)
bits image
Heejune AHN: Image and Video Compression p. 11
Comparison between JPEG vs. JPEG-2000
Lenna, 256x256 RGBBaseline JPEG: 4572 bytes
Lenna, 256x256 RGBJPEG-2000: 4572 bytes
Heejune AHN: Image and Video Compression p. 12
MPEG-1/2
MC-DCT Hybrid Coding
Intra-frame Decoder
Motion-Compensated
Predictor
ControlData
DCTCoefficients
MotionData
0
Intra/Inter
CoderControl
Decoder
MotionEstimator
Intra-frameDCT Coder- E
ntro
py co
der
Quant
DeQ
Heejune AHN: Image and Video Compression p. 13
MPEG-1
MPEG-1 Targeted VHS quality(352x288, 30fps, YCbCr420) on VCD (600MB) 1.4 Mbps (1.2 Mbps video + 0.2 Mbps audio) VCD, 70 minutes Three parts: Part 1 System, Part 2 Video, Part 3 Audio
Technology MC-DCT Hybrid
• Macro-block (16x16 pixels): Motion estimation unit
• Block (8x8 pixels): DCT and Quant unit GOP structure
• I, P, B picture
• Trade-off between random access and coding efficiency Asymmetric complexity
• Larger memory and high computation required at Encoder
Heejune AHN: Image and Video Compression p. 14
MPEG-1 Structure
Syntax Hierarchy Sequence layer
GOP layer
Picture Layer
Slice Layer
MB Layer
Block Layer
SH
MB MB MB MB ... ...
3 4
1 2 5 6
6
5
1 2
3 4
8
8
16
16
Cr
Cb
Y
8
8
Cb Cr
I B B P B B P ... P
Slice
GOPSH
GOPSH
GOP ...SH
SH : Sequence Header GOP : Group of Picture
(4:2:0)
Heejune AHN: Image and Video Compression p. 15
Picture Coding • I Picture: no interframe prediction
• P Picture: interframe prediction from one casual reference picture
• B Picture: interframe prediction from one previous and one future picture
GOP and picture order • display order (input at encoder)
• Transmission order (Encoding/decoding order)
I1 P1 B1 B2 P2 B4 B5 I2 B6 B7
I1 P1B1 B2 P2B4 B5 I2B6 B7
Heejune AHN: Image and Video Compression p. 16
MPEG-2
Major target application Digital television quality (720x576/480, 25/30 fps) at 3 ~ 4Mbps
Interlaced video support Frame picture vs field picture : motion compensation unit Frame DCT vs field DCT in frame picture
field picture field picture
frame picture
Field DCTFrame DCT
Heejune AHN: Image and Video Compression p. 17
Scalability Support Spatial scalability
• Low resolution at Base layer and high resolution at Enhancement layer
• BL is used for prediction of EL
• E.g. SD resolution at BL, HD resolution at EL Temporal scalability
• 30 fps at BL, 60 fps at EL SNR scalability
• Same resolution but different quality Data partitioning
• Coding Data is packed into different stream
BL Enc
Input video EL Enc
BL bit stream
EL bit stream
down
BL Dec
EL Enc Higher Quality
Lower Quality
Heejune AHN: Image and Video Compression p. 18
Profile & Level MPEG-2 has many options; all implementation do not needs all of them Profiles
• Simple : 4:2:0 input, I and P picture only, low complexity & low perf.• Main : 4:2:0 input, I,P,B Picture, interlaced• 4:2:2 : 4:2:2 input (same vertical resolution of color)• SNR : SNR scalable• Spatial : Spatial scalable• High : Spatial and 4:2:2
Level• Low (352x288), Main(720x576), High 1440 (1440x1152), High (1920x1152)
E.g.• MPEG-1 : Main profile & Low Level• SD DTV, DVD : Main profile & Main Level • HDTV : Main profile & High Level (Historically MPEG-3’s target application)
Heejune AHN: Image and Video Compression p. 19
MPEG-4
Features Support for low bit rate (from 20 Kbps) Support for object based coding
• Reuse of components, composition, and interactivity support. In practice, object based is not well used
Object-based Coding Video Object Shape Coding : transparent/opaque region, binary or grey scale Texture coding with arbitrary shape
• DCT after zero filling in interblock and exrapolation in Intrablock
VO1 VO2
VO3
Heejune AHN: Image and Video Compression p. 20
Visual data structure
VS : 비쥬얼 화면열 (VS : visual seguence/video session)
VO1: 비디오 객체
(video object)
VOL1 : 비디오 객체 계층 (VOL : video object layer)
GOV1 : 비디오 객체 화면모음 (GOV : group of VOP)
VOP1 : 비디오 객체 화면 (VOP : video object plane)
MB : 대블록 (MB : macro block)
2차원/3차원 합성객체(synthetic object)
VO2
GOV2
VOL2
VOP2
Heejune AHN: Image and Video Compression p. 21
H.261
ITU Mostly focus on real-time communication H.261
First video coding std(1990) N-ISDN (1990’s)
• px64Kbps (p=1,..30), typically 64 ~ 384kbps
• Circuit network based: low delay, reliable
H.261 key features YCbCr420 CIF, QCIF input MC-DCT Integer-pel motion Optional loop filter (for deblocking)
• Filtering at 8x8 block boundary FEC used
Heejune AHN: Image and Video Compression p. 22
H.261 syntax structure
H.261 Bit structure
MBA 채워넣기
PSC PTYPETR PEI PSPARE GOB 층
MBA MTYPE MQUANT
MVD
CBP
블럭 층CBPMVD
TCOEFF EOB
가변길이 부호
고정길이 부호
GBSC GQUANTGN GEI GSPARE 대블럭 층
화면 층
GOB (Group of block) 층
대블럭(Macro block) 층
블럭 층
12
10
8
6
4
11
9
7
5
3
1 2
5
3
1
23 24 25 26 27 28 29 30 31 32 33
12 13 14 15 16 17 18 19 20 21 22
1 2 3 4 5 6 7 8 9 10 11
QCIFCIF
352 176
288
144
8
8
16
16
Y Cr Cb
화면
GOB
대블럭
블럭
Heejune AHN: Image and Video Compression p. 23
H.263
H.263 Versions Version 1 (1995)
• Improvement to H.261
• 4 optional modes Version 2 (2000, H.263+)
• 12 optional modes Version 3 (2002, H.263++)
• 19 optional modes Key Features
Targets to 20 kbps and for packet based network also Half-pel prediction Redesigned 3-D VLC code
Heejune AHN: Image and Video Compression p. 24
H.263 Optional Modes Annex D: Unrestricted motion vectors Annex E: Syntax-based arithmetic coding Annex F: Advanced Prediction Annex G: PB Frames
Annex I : Advanced Intra Coding Annex J: Deblocking Filter Annex K: Slice Structured Mode Annex L: Supplemental enhancement information Annex M: Improved PB frames Annex N: Reference Picture Selection Annex O: Scalability Annex P: reference picture resampling
Heejune AHN: Image and Video Compression p. 25
(continued) Annex Q: Reduced resolution update Annex R: Indepenedent Segment Decoding Annex S: Alternative inter VLC Annex T: Modified Quantization Annex U: Enhanced reference picture selection Annex V: Data partition slice Annex W: Additional supplemental enhancement information
Heejune AHN: Image and Video Compression p. 26
Performance
Heejune AHN: Image and Video Compression p. 27
H.264
Name ITU H.264 = ISO MPEG-4 Part 10/AVC H.26L : Long term enhancement, not compatible H.263 Now accepted in DMB-T/S, IPTV, replacing many MPEG-2 solutions For 50% gain to H.263+
Heejune AHN: Image and Video Compression p. 28
Key features Smaller processing units (upto 4x4 pixel block) Intra prediction Inter prediction
• Macroblock based Interframe prediction selection
• ¼ pixel motion vector support
• Motion vector options for subblocks 4x4 Integer DCT Deblocking filter Universal VLC CAVAC (content-based adaptive binary arithmetic coding)
Heejune AHN: Image and Video Compression p. 29
Intra-frame Prediction
luma- 4x4: 9 modes
- 16x16: 4 modes
chroma- 8x8: 4modes
- The same prediction mode is always applied to both chroma blocks
M A B C D
I
J
K
L
M A B C DI
J
K
L
M
I
J
A B C D
K
L
Mean (A-D, I-M)
M A B C D
I
J
K
L
E F G H
……..
H
V
……
..
H
VMean(H, V)
H
V
H
V
……
..
H
V ……..
H
V
H
VMean(H, V)
H
V
…
Heejune AHN: Image and Video Compression p. 30
Inter-frame Prediction
H.264 MPEG-1/2/4, H.261/3
References
Permits up to 15 (2 mostly used) reference pictures Bi-predictive B-slices A P-slice may reference a picture that has B-slices Supports explicit weighting coefficients and (a+b)/2 type
A P-slice references only one I-picture Bi-directional B-slices
Only permit (a+b)/2 type prediction weighting
Block Sizes
Tree-structured (16x16 16x8, 8x16, 8x8 8x4, 4x8, 4x4)
Either 16x16 or 8x8
Motion Estimation
half or ¼-pixel accuracy 6-point interpolation for half-pixel and 2-point linear interpolation for ¼-pixel
MPEG2 permits half-pixel accuracy and MPEG4 permits ¼-pixel accuracy2-point linear interpolation
I B P
Heejune AHN: Image and Video Compression p. 31
Heejune AHN: Image and Video Compression p. 32
Transform and Quantization
Integer DCT No encoder decoder mismatch
Three types of transform followed by quantization- Type 1: for the 4x4 array of luma DC coefficients in intra MBs predicted in 16x16 mode # -1
- Type 2: for the 2x2 array of chroma DC coefficients #16-17
- Type 3: for all other 4x4 blocks # 0-15, 18-25
-1
( 16x16 Intra Mode only)
0 1 4 5
2 3 6 7
8 9 12 13
10 11 14 15
16 17
18 19
20 21
22 23
24 25
*Data is transmitted in the numbered order
4 pixels 4 pixels 4 pixels
4
pixe
ls
4
pixe
ls
4
pixe
ls
Heejune AHN: Image and Video Compression p. 33
Transform and Quantization
4×4 DCT ( X – Input, Y – output)
4×4 integer transform- forward
- backward
5
2,2
1with ba
W Post-scaling factor (PF)
Heejune AHN: Image and Video Compression p. 34
Entropy Coding
Parameters to be codedentropy_coding_mode=0
entropy_coding_mode=1
Macroblock type (Intra/Inter)
Exponential Golomb codes (Exp_Golomb)
Variable Length Coding (VLC)
Context-based Adaptive Binary Arithmetic Coding (CABAC)
Coded block pattern
Quantizer parameter
Reference frame index
Motion vector
Residual dataContext-adaptive variable length coding (CAVLC)
Heejune AHN: Image and Video Compression p. 35
Deblocking Filters
A boundary-strength (BS) parameter is assigned to every 4×4 block
BS = 0 No filtering
BS = 1-3 Slight filtering
BS = 4 Strong filtering Filters only when
|P0-Q0|< α
|P1-P0|< β
|Q1-Q0|< β
Thresholds α and β depend on the average quantization parameter (QP)
The deblocking filtering accounts for 1/3 of the computational complexity of a decoder.
Block modes and conditions
Boundary-Strength
parameter (BS)
One of the blocks is intra-coded and the edge is a MB edge
4
One of the blocks is intra-coded 3
One of the blocks has coded residuals 2
Difference of block motion ≥ one luma sample distance
1
Motion compensation from different reference frames
1
Else 0
P3 P2 P1 P0 Q0 Q1 Q2 Q3
Heejune AHN: Image and Video Compression p. 36
Network Adaptation
VCL & NAL VCL (video coding layer) NAL (network adaptation layer)
Error Resilient Tools Flexible macroblock ordering (FMO)
• Allows to assign MBs to slices In an order other than scan order
Arbitrary slice ordering (ASO)• Improved end-to-end delay in real-time applications
Redundant slices (RS)• Redundant representations are coded using different
coding parametersSlice Group #0
Slice Group #1
Heejune AHN: Image and Video Compression p. 37
Profile & Level
Main application Baseline : Video telephony Main : DTV and Storage Extended :Streaming
Profile & tools
Heejune AHN: Image and Video Compression p. 38
Performance comparison
Heejune AHN: Image and Video Compression p. 39
Contributions of the VCL Tools
Spatial Prediction for Intra-coded Macroblocks
Saves 6-9% bits
Temporal Prediction Saves around 50% bits
Transforms PSNR less than 0.02dB
Logarithmic QuantizationA change in step size by 12% also saves 12% bits
CAVLC Saves 5-8% bits
CABACSaves 5-15% bits over CAVLC
Picture-adaptive frame/field (PAFF) coding Saves 16%-20% bits
MB-adaptive frame/field (MBAFF) codingSaves 14-16% bits over PAFF
Deblocking Filter Saves 5-10% bits
Heejune AHN: Image and Video Compression p. 40
Conclusion
Many video coding standards St’ds reflect Coding Technology and Implementation Technology Coding performance has improved over 4 times since H.261 (1990)
What’s next SVC (Scalable Video Coding) in H.264 (done) H.264ext (further improvement of H.264) 3-D and MVC (Multi-View Coding) is on going. UDTV (ultra Definition TV: 3840x2160) And what’s next?