MPEG4 Natural Video Coding

MPEG4 Natural Video Coding

• Functionalities:– Coding of arbitrary shaped objects– Efficient compression of video and

images over wide range of bit rates– Spatial, temporal and quality scalability– Robust transmission over error-prone

lines

Update: May 2003

Basic Principles

• Profiles: for interoperability reasons, a decoder should at least support a clearly defined set of tools.

• For each profile, the tools to be supported are specified.

• For each profile different levels are specified, setting certain complexity bounds (memory, number of objects, bit rate etc.)

MPEG-4 Natural Video Decoding Tools

De-Mux

FBA Decoding

Mesh Decoding

Visual Texture Decoding

Motion Decoding(And compensation)

Texture Decoding

ShapeDecoding

SceneCompo-

sition

+

Bit-Stream DecodedVideo

Object Based Video Coding

• Scene: Composed of one or several Visual Objects (VOs)

• VO of type video is defined as a sequence of video object planes (VOP)

• Each VOP an instance of a VO at a certain time and can be either rectangular or of arbitrary shape.

• In the non-rectangular case, a VOP is also defined by a “Shape Matrix” (Alpha plane).

Hierarchical Structure of MPEG-4 Video

VS1 VS2

VO1 VO2

VOL1 VOL2

GOV1 GOV2

VOP1

Visual Object Sequence

Visual Object

Video Object Layer

Group of VOPs

Video Object Plane

VOPk VOPk+1

Layer 1 Layer 2

VOP Processing• Each VOP is processed blockwise• A VOP bounding box is built: the smallest

surrounding rectangle.• The box is used to represent the VOP by

means of its texture and shape.

VOP Coding

• For Coding, the VOP is divided to MBs of 16x16 pixels, and the box is extended to match this size (in both X and Y directions)

MB Types within the VOP boundary box

MB Types

• Transparent: MBs that are completely outside the VOP – NO YUV data !

• Opaque: MBs that are completely inside the VOP – coded as regular intra or inter MB.

• Boundary: MBs that are at the boundary of the VOP – arbitrary shape coding tools used

Rectangular VO Coding

• A rectangular VO coding is identical to a frame coding in MPEG-1 & 2: – Motion compensated block based coding– No shape information is needed in that

case

• New tools are also introduced

Decoder block

diagram

No Shape decoding!

Motion Modes

• 1MV:– One Motion Vector (MV) is transmitted

for the 16x16 luminance pixels

• 4MV: – The MB is subdivided into 4 blocks of

8x8 pixels (luminance !) , and 4 MV are transmitted.

New Motion Compensation Tools

• Quarter-Pel Motion compensation– Calculating MVs with an increased resolution of

¼ pel (instead of ½ and full pel)

• Global Motion Compensation– A single set of parameters representing the

VOP global motion (no need for local MB-MV)

• Direct Mode in Bidirectional Prediction– An improved bidirectional prediction, which

uses the MVs of neighboring P-VOP. • More general than the PB frames of H.263

New texture Coding Tools• N-bit Tool

– Supports 4-12 bits/pixel for Y/UV blocks

DCT

Scan

Quantization AC/DCPrediction

VLC(Huffman)

8x8 PredictionError )Y/UV(

For Intra ONLY !

Bit-Stream

Quantization

• Two optional quantization methods:– MPEG-2 (H.262) Based– H.263 Based

• The DC coefficient is quantized using a fixed quantizer step size

• The quantization step can take values from 1 to 31 (coded once per VOP)

• In A special MB mode it is possible to modify the scaling

AC/DC Prediction

• Due to statistical dependencies, the values of one block can be predicted from the values of the neighboring blocks, for Intra blocks ONLY !

No side information needed:The gradient from B to A and the gradient from B to C (which Are know to the decoder !) are compared, and the direction with the lower gradient is selected for the AC/DC prediction.

Alternative Scan Modes• The known Zigzag Mode is optional as well as

alternate vertical/horizontal scans (Intra ONLY !)

If AC/DC prediction is not used – Zigzag scan is selected.If the DC prediction refers to the horizontally adjacent block, alternate-vertical scan Is selected and vice versa.

Arbitrary Shaped VO Coding

• Shape information is represented by means of Alpha Masks which defines the level of transparency of o VOP

• An 8-bit word representing completely transparent pixels (0) to completely opaque pixels (255)

• A VO must have Alpha mask larger than 0 !

• A binary Alpha mask is also option for VO.

Binary Shape Coding• The shape coding is performed blockwise: for

each MB the Alpha values are coded separately (BAB: Binary Alpha Blocks)

• The binary values are 0 and 255 but they are treated as ‘0’ and ‘1’

• Context Based Arithmetic Encoding (CAE):– The codewords are assigned according to

probability that depends on the context of the shape element.

– Both Intra and Inter mode exists, and they must be the same for all MBs in a VOP.

– Shape MC is also performed with some restrictions (full pel resolution, no 4MV mode, no bidirectional MC)

Gray Level Shape Coding

• A Binary Support Mask is needed – A ‘0’ is assigned in this mask to all pixels that

their value in the Gray-Level Mask is 0, and 255 for all other values (non-transparent pixels)

– The support mask is coded like the binary mask

• The gray-scale Alpha plane is coded as texture data with arbitrary shape.

Boundary MBs Coding

Some Special tools are used:• Motion Compensation

– A special padding process is performed to avoid the case of MV that refer to transparent pels in the reference VOP

• Predictive MV Coding– Predictive coding of MV from neighboring

block is performed to reduce bit-rate for the motion information

Boundary MBs Codingcont’d

Texture Coding• Two algorithms can be applied:

– Padding: The transparent pels in the boundary MBs are padded: filled with values according to some rules

– Shape-Adaptive DCT: based on regular DCT with respect to the binary Alpha values (so it codes only the non-transparent pixels !)• Higher complexity and higher efficiency

Texture Coding by Padding

• Since transparent pixels will be set to transparent after the decoding of the binary shape mask, the can be treated as ‘don’t care’ and any value can be filled in there according to encoders strategy.– Finding the OPTIMAL values for these pels is

very complex, so many times a zero padding is used , in particular for coding of prediction error which is close to zero anyway…

Texture Coding by SA-DCT

• Coding only the opaque pels in a boundary MB by a four steps algorithm:– Vertical shifting to the top– Vertical 1-D DCT (for each column)– Horizontal shifting to the left– Horizontal 1-D DCT (to the shifted coefficients)

• Since number of pels/coefficients in every column/raw can vary, a specific SCT kernel is created for each case.

SA-DCT Example

Shape Adaptive DCT cont’d

• The number of coefficients is similar to the number of non-transparent pixels in the boundary MB

• The resulting coefficients a scanned in a modified zigzag, that takes into account the transparent pixels– If the scan covers such pixel it is omitted in the

coefficient vector

• Optional: The mean (DC) of a the opaque pixels in the MB is subtracted from all pixel values and added after the decoding, overwriting the DC value from the SA-DCT procedure

MPEG4 Natural Video Coding

Documents

Transcript of MPEG4 Natural Video Coding