Techniques de compression Audio/Vidéo (Journée Télécoms & Multimédia par : zouhair guennoun)

A/V COMPRESSION TECHNIQUES& COMPRESSION STANDARDS

Journée TELECOMS & MULTIMEDIA. 14 Mai 2010 – ENSA TANGER

Pr. Zouhair GUENNOUNEcole Mohammadia d’ingénieurs – EMI

Laboratoire d’Electronique et Communications – LEC

[email protected]

ReceiverChannelInformation

Source Transmitted information

Noise sourceDatareduction

Received information

Communication Model

2

33

Audio/Video Coding Applications

Detailed Communication Model

Transmitter

Information

Source

Source

Coding

Channel

Coding

Information

Channel

Data

ReductionEncrypt

Receiver

Source

Decoding

Channel

Decoding

Data

Reconstruction DecryptDestination

Noise

4

5

Agenda

• Introduction

• Audio & Video compression principles

• A/V Compression standards

• Conclusion

6

Agenda

• Introduction– Why compressing?

– Audio & Video basics

– MPEGx, & H.26x Compression Standards Overview



• Conclusion

Why compressing?

7

8

• Audio: Compression needed in spectral domain

• Bit rate of a stereo audio source (CD-DA encoding)– Sampling frequency : 44.1kHz

– Stereo - 16-bit per sample

– Bit rate = 44100 * 2 * 16 = 1.41Mbits/sec

Audio waveform (time)

time

The need for compression

9

Digital Audio

Type Sampling Frequency

(kHz)

Bits per

Sample

# Channels

Bit Rate (Mbps)

Telephone signal

(G.711)

8 8 1 0.064

(ISDN)

CD-DA(Compact Disc – Digital Audio)

44,1 16 2 1,411(CD-ROM 1x)

DAT(Digital Audio Tape)

48 16 2 1,536

10

• Video: Compression needed in spatial domain

• Bit rate of a video source (CCIR 601 - 50Hz countries)

– 25 images per second

– YUV colour coding (Y: luminance –U,V : Chrominance)

• Y: 8 bit per pixel –

• U,V: 1 pixel on 2 coded, 8 bit per pixel

Bit rate = (576*720)*25*16 = 166Mbits/sec


720 samples

576lines

Video image

Bit Rate versus Spatial ResolutionBit Rate (in Mbps)

3,69 7,52 30,41

162,20

648,81

1061,68

SQCIF

(128

* 9

6)

QCIF

(174

* 14

4)

CIF

(352

* 28

8)

4CIF

(704

* 576)

16CIF

4:3

(1408

* 1

152)

16CIF

16:9

(192

0*115

2)

12

13


• Channels available for A/V transmission– Analog television channel (compatibility)

• Cable (bandwidth = 8MHz) • Satellite (Bandwidth = 30-40MHz)

Capacity around 40Mbits/sec

– Compact Disc (CD – 650MB)For 74 min. play time : 1.41Mbits/sec

– Digital Versatile Disc (DVD – 4.7GB)For 135 min. play time : 4.6Mbits/sec

Illustrative example

• PSTN modem - maximum bit rate: 56kbps

• Video frame sequence -– Resolution: 288x352 (CIF format)– RGB colors: 8x3 bits per pixel– Frame rate transmission: 30 frames per second

• Required bit rate: 288x352x8x3x30 = 72.99Mbps

• Ratio between the required bit rate and largest possible bit rate: 72.99Mbps/56kbps = 1289– To accomplish the transmission over PSTN, a need to compress data by

at least 1289 times.

15

16

• MPEG-1 target (Video-CD : 74 min. constraints)

But quality was judged too poor (about VHS quality)


Compression

Video : 166 Mbit/sec

Audio : 1.4 Mbit/sec

1.4 Mbit/sec

17

• MPEG-2 target

– Program stream (DVD)

– Transport stream (DVB)


Compression

1 program(video, multichannelaudio, ....)

= motivation for the capacityincrease of the CD (--> DVD)

3-9 Mbit/sec (variable bitrate)(but higher quality than MPEG-1)

Compression

n programs(video, multichannelaudio, ....)

about 40 Mbit/sec (constant bitrate)(DVB-Satellite & DVB-Cable)


• Compression extends the playing time of a given storage device.

• Compression allows a reduction in bandwidth

• For the same bandwidth, compression allows faster transmission, and better quality.

• Compression removes redundancy from signals. – Redundancy is however essential to making data more resistant to

errors. – Compressed data are more sensitive to errors than uncompressed

data.

18

19

Principles of Compression• Compression (or Source Coding) is achieved by

suppressing information: – redundant information

– irrelevant information

• Suppression of redundant information lossless compression

•

The original signal and the one obtained after encoding and decoding are identical

Compression DecompressionRc (bps) Ri < Rc Rp = Rc

Fc(x,y,t) Fp(x,y,t) = Fc(x,y,t)

20

Principles of Compression

• Suppression of irrelevant information lossy compression (Perceptive Coding)

Example: bandwidth limitation, masking in audio

The original signal and the one obtained after encoding and decoding are different but are perceived as identical

Compression DecompressionRc (bps) Ri < Rc Rp = Rc

Fc(x,y,t) Fp(x,y,t) Fc(x,y,t)

2222


• Lossless vs. lossy data compression– Source entropy H(X)

– Rate-Distortion function R(D) or D(R)

• Probabilistic modeling is at the heart of data compression– What is P(X) for video source X?

– Is video coding more difficult than image coding?

Lossless methods

Lossy methods

L0

H(S)

Distortion

0 Dmax

23


• Reversible (lossless): data files (i.e.: V.42bis standard in modems, zip files)

• Non-reversible (lossy): audio & video signals

• Usually more compression to lower quality and higher CPU consumption.

– Different compression algorithms also differ in their computational complexity, generally for the same bit rate more complex techniques get better quality at the expense of using more CPU.

– Compression algorithms designed for telephony should introduce very little delay because otherwise lost interactivity and echoes are problems and poor quality of sound.

24


• Scene more complex Higher bit rate for same quality

• CBR variable quality (example : Video CD artefact)

• Constant quality VBR necessary (e.g.: DVD-Video)

For Gaussian source N(0, 2)

RRD 22 2

Bit Rate

Distortion

Constant Bit RateConstant Quality

Complex

Simple

26


• Constant Bit Rate systems –CBR (G.711, G.722, G.729) are better suited for connection-oriented services.

• Variable Bit Rate systems –VBR (MPEG, G.723.1) are best suited to networks without constant bit rate reserve.

– MPEG compression is the most efficient and gives better quality but consumes much CPU and introduce so much delay can not be used in interactive applications (video conferencing or telephone).


• Video codec key issues:

– Compression efficiency and image quality

– Computational complexity

– Frame rate

Encoder DecoderChannel

28


• General-purpose compression: Entropy encoding

– Remove statistical redundancy from data

– E.g. encode common values with short codes, uncommon values with longer codes

– Good for text files, poor for images/video

Entropy Encoder

Entropy Decoder

ChannelSource Data

Decoded Data

29


• Add a model that attempts to represent the image/video signal in a form that can be easily compressed by the entropy encoder

• Model exploits the subjective redundancy of images and video (Spatial, Temporal, Chromatic redundancies)

• Decoded image may not be identical to original image

• Image properties that are useful for compression:– Many of the pixels of a typical photographic image contain little or no

« useful » detail (e.g. flat area)

– The eye is insensitive to « high frequency » image information

Entropy Encoder

Entropy Decoder

ChannelImage Model

Image Model

30

32


• Trade-off Complexity/Quality/Bit Rate

• New technique may result in new trade-off

Quality

Bitrate

Complexity

MPEG Layer 1

MPEG Layer 2

MPEG Layer 3

MPEG AAC

Other TechniqueSpeech coding


Redundancies

StatisticalRedundancy

PsychologicalRedundancy (HVS)

InterpixelRedundancy

CodingRedundancy

Spatial (intraframe)Redundancy

Temporal (interframe)Redundancy

Variable-Length Coding Huffman, Arithmetic Run Length Coding, …

Luminance (Contrast) MaskingTexture Masking

Color MaskingFrequency MaskingTemporal Masking

33

3434

Quality Measurements

• Objective– Mean Square Error (MSE)

– Peak Signal-to-Noise-Ratio (PSNR)

– Measure the fidelity to original video

• Subjective– Human Vision System (HVS) based

– Emphasize audiovisual quality rather than fidelity

36

Quality Measurements

• Signal distortion is not a good measure of the performance of a lossy compression method

an other method is necessary: MOS scale (Mean Opinion Score)

• The five-grade CCIR impairment scale (Rec.562)– 1 – unsatisfactory (Very annoying), – 2 – poor (Annoying), – 3 – satisfactory (Slightly annoying), – 4 – good (Perceptible but not annoying), – 5 – Excellent (Imperceptible)

• Example: Double blind test

38

Quality Measurements Speech Coding - Compression vs quality

Bit

Rat

e (

Kb

/s)

MOS (Mean Opinion Score)

0

PCM (G.711)

ADPCM 32 (G.726)

ADPCM 24 (G.725)

ADPCM 16 (G.726) LDCELP 16 (G.728)

LPC 4.8CS-ACELP 8 (G.729)

MP-MLQ 6,4 (G.723.1)Require special hardware (DSP)

8

16

24

32

40

48

56

64

CS-ACELP (G.729a)

0 1 2 3 4 5

Standard MOSG.711 (64 Kb/s): 4,10G.729 ( 8 Kb/s): 3,92G.726 (32 Kb/s): 3,85G.729a ( 8 Kb/s): 3,70G.723.1 (5,3 Kb/s): 3,65G.728 (16 Kb/s): 3,61

Audio & Video Basics

39

40

Audio Basics

• Analog signal sampled at constant rate– telephone: 8,000 samples/sec

– CD music: 44,100 samples/sec

• Each sample quantized, i.e., rounded– e.g., 28=256 possible quantized

values

• Each quantized value represented by bits– 8 bits for 256 values

– 16 bits for 65536 values

• Mono, stereo, or surround?– 1, 2 or more channels

• Example: 8,000 mono samples/sec, 256 quantized values --> 64kbps

• Receiver converts it back to analog signal:– some quality reduction

Example rates

• CD: 1.411Mbps

• MP3: 96, 128, 160kbps

• Internet telephony: 5.3 - 13kbps (G.723.3, G.729, and GSM – Global System for Mobile communication)

43

Audio Basics:Speech Coding and compression

• 5 quality ranges (human ear sensitivity: 20Hz to 20kHz):

Range Frequency Bandwidth Quality and Applications

Telephone channel 300Hz – 3.4kHz intelligible speech, noisy natural,

Expanded bandwidth 50Hz – 7kHz speech with respected natural

Hi.Fi. bandwidth 20Hz – 15kHz excellent speech and music

Stereo bandwidth 20Hz – 20kHz CD quality

Stereo bandwidth 20Hz – 48kHz perfect quality, studio, cinema, DVD

44

Video Basics• Operation of analogue television: The image captured by the camera lens

is converted into three monochrome images obtained by applying filters of the three fundamental (primary) colors –R (Red), G (Green), B (Blue).

– All kind colors are produced by using different proportions of these primary colors

• Additive Color Mixing on a black surface

• Subtractive Color Mixing on a white surface

– The correct combination of the three monochrome images can reconstruct the original image.

– RGB signals thus obtained are available in some cameras, though it is unusual to work with them

46

Video Basics: Digital Video & Pixels

• Digital video is a sequence of frames, each consisting of a rectangular grid of picture elements or pixels.

– For purely black-and-white video, each pixel is represented as a single bit, 0 for black or 1 for white.

– For grey-scale video, 8 bits per pixel can be used to represent 256 levels of grey … good enough for most cases.

– For good colour video, 8 bits are used per pixel for each of the RGB colours, resulting in 24 bits per pixel.

47

Video Basics : Digital Video & Pixels

Digital Camera

The Eye

Film

Source: Digital Image Processing – Gonzalez, Woods. Prentice Hall

48

Video Basics: Sampling & Quantization

48

Sampling & Quantization

Source: Digital Image Processing – Gonzalez, Woods. Prentice Hall

49

Video Basics: Scanning• When an image (frame) appears on the retina of the human

eye, the image is retained for several milliseconds before decaying.

• Consequently, if a sequence of images is displayed at the appropriate rate, the eye does not notice that it is looking at discrete images.– This is how you get smooth motion in videos!

• What that rate is depends on the eye in question and how the images are displayed.

51

Video Basics: Scanning

Spatial and Temporal Sampling of a Video Sequence

Source: H.264 and MPEG-4 Video Compression. Video Coding for next generation multimedia. I.E.G. Richardson. John Wiley & Sons, Ltd. 2003. Chapter 2.

53

55

Video Basics: Color Format

• RGB is not efficient since it uses equal bandwidth for each color component.

• R,G,B components are correlated– Transmitting R,G,B components separately is redundant

– More efficient use of bandwidth is desired

• To store or transmit video signals (sequence of images –frames at constant rate), RGB signals are transformed into three linear combination of such signals.

56


• The combination is performed such that: – One of the new signals collects all the information light or brightness of the

image, Y, this signal is called luminance. – The other two signals, called U and V, correspond to different combinations of

the three original signals, chosen so that capture all the color information which is why these two signals are generically referred to as chrominance.

• Various formulae have been devised to convert RGB values to chrominance and luminance values, depending on the format: YUV, YIQ, YCbCr, …

• Consider switching from RGB to YUV as a change of a coordinate system to one that maintains the same number of degrees of freedom but can solve the problem more easily.

• For backward compatibility, colour signals had to be receivable and watchable on a black-and-white set.

Color Formats Conversion

• kr, kg, kb are weighting factors

BkGkRkY bgr

YBC

YGC

YRC

b

g

r

csteCCC bgr

YBk

C

YRk

C

BkGkkRkY

b

b

r

r

bbrr

1

5.0

1

5.0

1

bb

b

br

bbr

br

rr

rr

Ck

YB

Ckk

kkC

kk

kkYG

Ck

YR

5.0

1

1

12

1

12

5.0

1

58

Color Formats Conversion

• ITU-R recommendation BT.601 defines kr = 0.299 and kb = 0.114.

YBC

YRC

BGRY

b

r

564.0

713.0

114.0587.0299.0

b

br

r

CYB

CCYG

CYR

772.1

344.0714.0

402.1

59

61


61

http://www.yorku.ca/eye/photopik.htm

62


• Human eye is more sensitive to the luminance (brightness) component than the color component: the latter need not be transmitted as accurately.

– The luminance is broadcast at the same frequency as a black-and-white signal, and the chrominance is ignored on black-and-white sets.

– The two chrominance signals are broadcast in narrow bands at higher frequencies.• Called hue and saturation or tint and colour

63

Video Basics:

Chrominance Downsampling

• The reduced resolution in the chroma components is called downsampling (subsampling).

• The subsampling is based on the human eye less sensitive to chrominance.

• (Y, Cr, Cb) may use different resolutions 4:n:m: The numbers indicate the relative sampling rate of each component in the horizontal direction.

Video Basics:


• 4:4:4 sampling: the three components have the same resolution (3n bits per pixel)

– a sample of each component exists at every pixel position.

– Preservation of the full fidelity of the chrominance components.

• 4:2:2 sampling: Cb and Cr have the same vertical resolution as Y, but half the horizontal resolution (2n bits per pixel).

– 4:2:2 video is used for high-quality color reproduction.

64

Video Basics:


• 4:1:1 sampling: Cb and Cr have the same vertical resolution as Y, but quarter the horizontal resolution (1.5n bits per pixel).

• 4:2:0 sampling: Cb and Cr each have half the horizontal and vertical resolution of Y (1.5n bits per pixel).

– 4:2:0 video requires exactly half as many samples as 4:4:4 video

– 4:2:0 is widely used for consumer applications such as video conferencing.

65

70

Video Basics: Spatial Resolution Formats

• CIF: Common Interchange (Intermediate) Format - Intermediate format used in videoconferencing (communication between US & Europe)

– Luma resolution: 352x288 (360x288) pixels– Sampling frequency: 30Hz (30 frames/second - fps),

non-interlaced, sampling rate 4:2:0

• QCIF:176x144 pixels, 30fps (Quarter CIF) –used in Video Telephony applications

• SQCIF: 128x96 pixels, 30fps (Sub QCIF), mobile multimedia applications

• 4CIF: 704x576 pixels, 30fps, appropriate for standard-definition television and DVD-video

• 16CIF: 1408x1152 pixels, 50fps

71

Spatial Resolution Formats

16CIF 16:9

16CIF 4:3

SCIF

CIF

QCIF SQCIF

72

Video Basics: Spatial Resolution Formats

• SIF: Simple Input Format (Source Intermediate Format) - Half the vertical & horizontal resolution of 4:2:0. Used in Video Cassette Recorders (VCRs)

– 360x242 (352x240) pixels, 30 frames/second for NTSC, sampling rate 4:2:0

– 360x288 (352x288) pixels, 25 frames/second for PAL, SECAM, sampling rate 4:2:0

• CCIR-601 (ITU-R 601 or BT 601)– 720x525 pixels, 30 frames/second, sampling rate 4:4:4 & 4:2:2

– 720x625 pixels, 25 frames/second, sampling rate 4:4:4 & 4:2:2

MPEG, what is it?

76

77

•ISO (1947): International Organization for Standardization;

•IEC (1906): International Electrotechnical Commission,

•ISO/IEC JTC 1 (1987): Joint Technical Committee 1 of the ISO and the IEC. It deals with all matters of information technology.

•ITU-T : Telecommunication Standardization Sector coordinates standards for telecommunications on behalf of the International Telecommunication Union (ITU 1993 – 1956 CCITT).

International Organizations

78

• JPEG - ITU-T T.81, ISO/IEC IS 10918-1 : Joint Photographic Experts Group one of two sub-groups of ISO/IEC Joint Technical Committee 1, Subcommittee 29, Working Group 1 (ISO/IEC JTC 1/SC 29/WG 1) - titled as Coding of still pictures.

• MPEG: Moving Picture Experts Group (ISO/IEC JTC 1/SC 29/WG 11) - a working group of ISO/IEC in charge of the development of standards for coded representation of digital audio and video and related data.

• ITU-T SG15 : H26x – Videophone & Videoconference standards

• JVT: Joint Video Team - a group of video coding experts from ITU-T Study Group 16 (VCEG) and ISO/IEC JTC 1 SC 29 / WG 11 (MPEG), created to develop an advanced video coding specification.

•Formed in 2001, the JVT’s main result has been ITU-T Rec. H.264 | ISO/IEC 14496-10, commonly referred to as H.264/MPEG-4-AVC, H.264/AVC, or MPEG-4 Part 10 AVC.

International Organizations (Cont’d)

http://en.wikipedia.org/wiki/Joint_Photographic_Experts_Group

http://en.wikipedia.org/wiki/International_Organization_for_Standardization

http://en.wikipedia.org/wiki/International_Electrotechnical_Commission

http://en.wikipedia.org/wiki/ISO/IEC_JTC_1

http://en.wikipedia.org/wiki/ISO/IEC_JTC_1

79

MPEG: Moving Picture Experts Group• Moving Picture Expert Group established in 1988 for the

development of digital video

– Still active (MPEG-21 is currently in development)

• International standard (ISO/IEC) Interoperability & economy of scale

• Compression of audio and video and multiplexing in a single stream

• Definition of the interface not of the codecs room for improvement

80

MPEG: Moving Picture Experts Group

• Official home page of the Moving Picture Experts Group (MPEG):www.chiariglione.org/mpeg/

• In charge of the development of standards for coded representation of digital audio and video and related data.

• The group produces standards that help the industry offer end users an ever more enjoyable digital media experience.

http://www.chiariglione.org/mpeg/

81

List of MPEG standards

• MPEG-1 (ISO 11172) The standard on which such products as Video CD and MP3 are based (approved in Nov. 1992)

– Video-oriented CD-ROM, SIF format (video progressive)

– Objective: VHS quality. Typical bit rate 1.5Mb/s

– Useful for tele-education, enterprise applications, business, etc.

http://mpeg.chiariglione.org/standards/mpeg-1/mpeg-1.htm



82

List of MPEG standards (Cont’d)

• MPEG-2 (ISO 13818) The standard on which such products as Digital Television set top boxes and DVD are based (approved in 1994, 1996);

– Compatible extension of MPEG-1 'up‘

– Oriented broadcast (interlaced video)

– Multiple resolutions standardized, from SIF (compatible with MPEG 1 up to high definition formats for DVDs and so on.

– Intended for studio-quality audio and video. Broadcast quality HDTV also.

– Various bit rates 4-100Mb/s.(CBR & VBR)

– Useful for all types of applications (business, entertainment, etc.).

• MPEG-3: Originally designed for HDTV, finally resolved by reparameterization of MPEG-2.




83

List of MPEG standards (cont’d)

• MPEG-4 (ISO 14496) The standard for multimedia for the fixed and mobile web (Version 1 -approved in Oct. 1998, Version 2 - approved in Dec. 1999, Versions 3, 4, 5) – Computer Graphics Applications;

– Originally intended to similar applications as H.263, but expanded to cover a wider range of multimedia applications.

– Extension 'down' MPEG-1. Internet video Oriented

– Useful in the range 28,8-500Kb/s. New compression algorithms. Typically less than 1 Mbps but could be as high as tens of Mbps.

83




84


• MPEG-4 (ISO 14496) …

– Coding of Audiovisual Objects - Standard for audio, video and graphics in interactive 2D and 3D multimedia communication - MPEG-4 v.2 & 3

– Supports scene composition and content-based functionalities, in which scenes are expressed in terms of multiple audio-visual objects (AVOs) that can be manipulated together or individually.

– Supports layering/scaling: multiple versions of AVOs can be provided and matched against needs and available resources.

• For example, a base level AVO can be provided to give the bare essentials, with multiple optional AVOs that provide levels of enhancement details.

• If we don’t have enough network resources, drop the enhancements and stick with the basics!

84




85


• MPEG-7 (ISO 15938) The standard for description and search of audio and visual content (approved in Jul. 2001);

– Audiovisual content description (indexing, searching, databases, etc.).. Interprets semantics of audiovisual information

– More to do with structuring, and describing and searching through multimedia content

• MPEG-21 (21000) The Multimedia Framework.

– Focus on multimedia distribution and on DRM aspects;

85







86


• MPEG-A (23000) – Application-specific formats, integrating multiple MPEG technologies

• MPEG-B (23001) – Systems specific standards

• MPEG-C (23002) – Video specific standards

• MPEG-D (23003) – Audio specific standards

• MPEG-E (23004) – MPEG multimedia Middleware - support to download andexecution of multimedia applications

• MPEG-V (23005) – Context and media control - interchange with virtual worlds

• MPEG-M (23006) – MPEG extensible Middleware - packaging and reusabilityof MPEG technologies

• MPEG-U (23007) – MPEG Rich Media User Interface

87

List of ITU-T Standards

• H.261 (1983-1990)

– A standard for video telephony and video conferencing over PSTN (Public Switching Telephone Networks) and wireless networks.

– Uses either the CIF or QCIF format.

– Uses p x 64kbps where p can be between 1 and 30.

– Originally designed for ISDN usage (Integrated Services Digital

Network).

– Still in use• Low complexity, low latency

• Mostly as a backward-compatibility feature

• Overtaken by H.263

88

List of ITU-T Standards (cont’d)

• H.263, H.263+, H.263++ (1993-1999)– Based on H.261 but offers significant improvement on

coding efficiency, employs advanced coding options and lower resolutions to preserve quality over lower bit rates channels.

– Uses either the QCIF or S-QCIF formats.– Uses less than 64kbps.– PSTN and mobile network: 10 to 24kbps– Adopted by several videophone terminal standards:

H.324 (PSTN), H.320 (ISDN), H.310 (B-ISDN)

• H.264/AVC (1999-2003)– Double the coding efficiency in comparison to any other

existing video coding standards

92

Chronological Table of Video Coding Standards

H.261

(1990)

MPEG-1

(1991)

H.263

(1995/96) H.263+

(1997/98)

H.263++

(2000)

H.264

( MPEG-4

Part 10 )

(2002)MPEG-4 v1

(1998/99)

MPEG-4 v2

(1999/00)

MPEG-4 v3

(2001)

1990 1992 1994 1996 1998 2000 2002 2003

MPEG-2

(H.262)

(1994/95)ISO/IECMPEG

ITU-TVCEG

94

Agenda

• Introduction

• Audio & Video compression principles– Audio compression– Video compression– Audio/Video synchronisation


• Conclusion

Audio Compression principles

95

96

Speech Coding and Compression

• Waveform coding (PCM, DPCM, ADPCM)– Samples coding (G.711, G.721, G.722, G.723,

G.725, G.726, …)

• Source Coding– Speech modeling and parameters transmission of

the model (G728, G729, …)

• Hybrid Coding

98

Audio compression

• By identifying what can and, more important what cannot be heard, the schemes described obtain much of their compression by discarding information that cannot be perceived.

• Over the course of our evolutionary history we have developed limitations on what we can hear. – Some of these limitations are physiological, based on the

machinery of hearing.

– Others are psychological, based on how our brain processes auditory stimuli.

Audio Compression

• Sub-band Coding

– Techniques used in Layer I and II of MPEG audio are based on sub-band coding.

• Transform Coding

– DCT is used in Layer III of MPEG audio.

• Predictive Coding

– Frequency prediction is used in AC-3 and MPEG AAC.

100

104

Common Audio Formats and Standards

Pulse Code Modulation (PCM)– Differential Pulse Code Modulation (DPCM)

– Adaptive Differential Pulse Code Modulation (ADPCM)

• Compact Disc Digital Audio (CD-DA)

• MPEG Audio– Layer I

– Layer II

– Layer III

112

Audio compression

• Based on psycho-acoustics

• Compress the bit rate without affecting the quality perceived by the human ears (based on the

imperfection of human ears)

• Removal of irrelevancies

• 4 main principles :

– Threshold of audibility

– Frequency masking

– Critical bands

– Temporal masking

113

Audio compression• Principle 1: Threshold of audibility

Not all frequency components need to be encoded with the same resolution. Nr_bit(f) = (signal/threshold)db/6

http://www.audiodesignline.com

114

Audio compression• Principle 2: Frequency masking

Analysis of the incoming signal


115

Audio compression

• Principle 3: Critical bands

– Human ear may be modelled as a collection of narrow band filters

– Bandwidth of these filters = critical band

– critical band(<100 Hz) for lowest audible frequencies( 4 kHz) for highest audible frequencies

– The human ear cannot distinguish between two sounds having two different frequencies in a critical band.Example : when we hear 50 & 60 Hz at the same time we cannot distinguish them.

– Consequence: Noise masking threshold depends solely of the signal energy within a limited bandwidth domain.The largest sound is taken as the representative of the critical band.Necessity to analyse the signal at 100Hz resolution at low-frequency

116

Audio compression• Principle 4: Temporal masking

The masking that occurs when a sound raises the audibility threshold for a brief interval preceding and following the

sound, selection of the frame duration for frequency analysis

and encoding.


117

The MPEG encoder


122

Audio features in MPEG

• MPEG1 :– Mono/stereo/dual/joint stereo (Possibility Dolby surround)

– Sampling frequencies : 32, 44.1 & 48 kHz

– 3 layers : trade-off complexity/delay versus coding efficiency of compression

– Various bit rate : trade-off quality versus bit rate

• MPEG2 :– 5.1 channels

– Sampling frequencies extended to 16, 22.05 & 24 kHz

123

Layer I coding

• The Layer I coding scheme provides a 4:1 compression.

• In Layer I coding the time frequency mapping is accomplished using a bank of 32 subband filters.

• The output of the subband filters is critically sampled. That is, the output of each filter is down-sampled by 32.

• The samples are divided into groups of 12 samples each. – Twelve samples from each of the 32 subband filters, or a total of 384

samples, make up one frame of the Layer I coder.

130

Layer II Coding

• The Layer II coder provides a higher compression rate by making some relatively minor modifications to the Layer I coding scheme.

• The compression ratio in Layer II coding can be increased from 4:1 to 8:1 or 6:1.

• These modifications include: – how the samples are grouped together,

– the representation of the scale factors, and

– the quantization strategy.

131

Layer III Coding - MP3

• One of the problems with the Layer I and Layer II coding schemes was that with the 32-band decomposition, the bandwidth of the subbands at lower frequencies is significantly larger than the critical bands.

• This makes it difficult to make an accurate judgment of the mask-to-signal ratio. – If we get a high amplitude tone within a subband and if the subband

was narrow enough, we could assume that it masked other tones in the band.

– However, if the bandwidth of the subband is significantly higher than the critical bandwidth at that frequency, it becomes more difficult to determine whether other tones in the subband will be be masked.

132


• Layer III offers almost CD quality with less than 2 bits/sample (enables transferring music files via Internet over 28.8kbps modems)

• A simple way to increase the spectral resolution would be to decompose the signal directly into a higher number of bands.

• However, one of the requirements on the Layer III algorithm is that it be backward compatible with Layer I and Layer II coders.

• To satisfy this backward compatibility requirement, the spectral decomposition in the Layer III algorithm is performed in two stages.

133


• First the 32-band subband decomposition used in Layer I and Layer II is employed.

• The output of each subband is transformed using a modified discrete cosine transform (MDCT) with a 50% overlap.

• The Layer III algorithm specifies two sizes for the MDCT, 6 or 18. This means that the output of each subband can be decomposed into 18 frequency coefficients or 6 frequency coefficients.

Advanced Audio Coding

• AAC (Advanced Audio Coding): audio compression formats defined by MPEG-2 standard.

• AAC was known as NBC (Non-Backward-Compatible), non compatible with MPEG-1 audio formats.

134

Advanced Audio Coding

• AAC can manipulate more channels than MP2 or MP3 (48 full audio channels and 16 enhanced low-frequency channels compared to 5 full audio channels and 1 enhanced low-frequency channel for MP2 or MP3),

• AAC can manipulates higher sampling frequencies than MP3 (up to 96kHz compared to 48kHz).

135

Video Compression principles

136

137

Video Compression

• Two applied techniques for video compression:

– Spatial or intraframe compression: removal of intra-picture redundancy in the image of each frame as in JPEG images

– Temporal or interframe compression: removal of inter-picture redundancy (between consecutive frames.) Coding of difference with an interpolated picture (moving

vectors).

138

Video compression

• Result– 4:2:0 SIF resolution : 30 Mbps

(= 25images/sec * 288lines * 352pixels * 1.5(lum & chrom) * 8bits)

±1.2 Mbps (CBR) in video CD (MPEG1)

– 4:2:2 CCIR 601 resolution : 166 Mbps (= 25images/sec * 576lines * 720pixels * 2(lum & chrom) * 8bits)

± 3-4 Mbps (mean) in MPEG2

Image Codec (e.g. JPEG)

VLC

Entropy Decoder

Transmit/Store

Image Model

RLEZigzagQuantizeDCTBlock

VLDRLDIZigzagIQuantizeIDCTIBlock

Blocks

• Process the data in blocks (sub-images) of 8x8 samples

• Covert Red-Green-Blue into Luminance (grayscale) and chrominance (Blue color difference and Red color difference)

• Use half resolution for chrominance (because eye is more sensitive to grayscale than to color)

• Each block contains redundant information.

140

Block

Discrete Cosine Transform

• DCT transformation (in frequency domain) decorrelates the input signal.

• Transform each block of 8x8 samples into a block of 8x8 spatial frequency coefficients.

• Most image blocks only contain a few significant coefficients (usually the lowest “frequencies”)– Energy tends to be concentrated into a few significant

coefficients (most energy in low spatial frequencies)– Other coefficients are close to zero / insignificant

141

DCT

Discrete Cosine Transform

• Any 8x8 block of pixels can be represented as a sum of 64 basis patterns (black and white patterns)

• Output of the DCT is the set of weights for these basis patterns (The DCT coefficients)– Multiply each basis pattern by its weight

and add them together

– Result is the original image block

142

Quantize and zig-zag scanning

• Divide each DCT coefficient by an integer, discard remainder

• high frequent spatial frequencies quantized with lower resolution than low ones (remove irrelevancy) - Result: loss of precision. Typically, a few non-zero coefficients are left

• Scan quantized coefficients in a zig-zag order: Non-zero coefficients tend to be grouped together

143

ZigzagQuantize

144

Video compression• Spatial redundancy reduction (DCT example)

158 0 -1 0 0 0 0 0 -1 -1 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

139 144 149 153 155 155 155 155144 151 153 156 159 156 156 156150 155 160 163 158 156 156 156

159 161 162 160 160 159 159 159159 160 161 162 162 155 155 155161 161 161 161 160 157 157 157162 162 161 163 162 157 157 157162 162 161 161 163 158 158 158

158 0 -1 -1 -1 -1 EOB

zig-zag scan

1260 -1 -12 -5 2 -2 -3 1 -23 -17 -6 -3 -3 0 0 -1 -11 -9 -2 2 0 -1 -1 0

-7 -2 0 1 1 0 0 0 -1 -1 1 2 0 -1 1 1 2 0 2 0 -1 1 1 -1 -1 0 0 -1 0 2 1 -1 -3 2 -4 -2 2 1 -1 0

DCT

Quantisation

Run-Length Encoding

• Encode each coefficient value as a (run, level) pair:– Run = number of zeros preceding value

– Level = non-zero value

• Usually, the block data is reduced to a short sequence of (run, level) pairs– This is now easy to compress using an entropy encoder

145

RLE

Variable-Length Encoding

• Encode each (run, level) pair using a variable-length code– Frequently occurring groups – assign a short code

– Infrequently occurring groups – assign a long code

• Result: compressed version of the image

146

VLC

Image decoding

• Reverse the stages to recover the image

• Information was thrown away during quantization– Decoded image will not be identical to the original

• In general: more compression = more quality loss

• Too much compression:– Block edges start to show (“blockiness”)

– High-frequency patterns start to appear (“mosquito noise”)

147

Video coding

• Moving images contain significant temporal redundancy– Successive frames are very similar

• Add an extra “motion model” at the “front end” of the image encoder

• The amount of data to be coded can be reduced significantly if the previous frame is subtracted from the current frame.

148

Video Encoder

• Video frames

VLC

Headers

Motion Model

RLEZigzagQuantizeDCTMotion Comp.

RescaleIDCT

Buffer

Motion Estim.

Recon.

Motion Vectors

Motion Vectors

Video Decoder

VLD

Headers

RLD IZigzag Rescale IDCTBuffer Recon.

Motion Estimation

• Process 16x16 luminance samples at a time (“macroblock”)

• Compare with neighboring area in previous frame

• Find closet matching area– Prediction reference

• Calculate offset between current macroblock and prediction reference area– Motion vector

151

152

MotionEstimation

Motion Compensation

• Subtract the reference area from the current macroblock– Difference macroblock

• Encode the difference macroblock with an image encoder

• If motion estimation was effective– Little data left in difference macroblock

– More efficient compression

153

154

Motion Compensation

– In Motion Estimation (ME), each macroblock (MB) of the Target P-frame is assigned a best matching MB from the previously coded I or P frame - prediction.

– prediction error: The difference between the MB and its matching MB, sent to DCT and its subsequent encoding steps.

– The prediction is from a previous frame — forward prediction.

154

155

Motion Compensation

• MPEG introduces a third frame type — B-frames, and its accompanying bi-directional motion compensation.

– Each MB from a B-frame will have up to two motion vectors (MVs) (one from the forward and one from the backward prediction).

– If matching in both directions is successful, then two MVs will be sent and the two corresponding matching MBs are averaged before comparing to the Target MB for generating the prediction error.

– If an acceptable match can be found in only one of the reference frames, then only one MV and its corresponding MB will be used from either the forward or backward prediction.

155

156

B-frame Coding Based on Bidirectional Motion Compensation.

156

157

The Need for Bidirectional Search.

The MB containing part of a ball in the Target frame cannot find a good matching MB in the previous frame because half of the ball was occluded by another object. A match however can readily be obtained from the next frame.

157

Motion Compensation

158

Video MPEG - Frame Types

• I (Intra): self-contained, only spatial compression (like JPEG)

• P (Predictive): referred to the P/I before. Temporal compression by extrapolation using macroblocks. A macroblock can be:

• Same: no change over the reference frame

• Moved: (eg. A ball in motion) is described by a vector of movement and eventually a correction (difference from original)

• New: (eg. What appears behind a door that opens) is described by spatial compression (like an I-frame)

• B (Bidirectional): temporal compression with interpolation; referred to the P/I before and the P/I after. Maximum compression, maximum computational complexity. It softens the image, reducing noise.

159

I Frames (Intra)

Intra frames are coded as self-contained,

without reference to other frames

72 x 1024 x 8 / 0,16 = 3,7Mbps

25 framesper second

18 KBytes I

18 KBytes I

18 KBytes I

18 KBytes I

18 KBytes I

160

P frames (Predictive)

Predictive frames are encoded using motion compensation based on

previous I or P frame

60 x 1024 x 8 / 0,24 = 2,0Mbps

18 KB I

6 KB P

6 KB P

18 KB I

6 KB P

6 KB P

18 KB I

161

B frames (Bidirectional)

Transmission order: 1,4,2,3,7,5,6,10,8,9,…

Bidirectional frames are encoded using motion compensation based on the nearest I or P previous and

subsequent

54 x 1024 x 8 / 0,36 = 1,2Mbps

18 KB I

1

4 KB B

2

4 KB B

3

6 KB P

4

4 KB B

5

4 KB B

6

6 KB P

7

4 KB B

8

4 KB B

9

18 KB I

10Common Values

162

Bidirectional Motion Compensation

B11

Pred 12

B1B2

B4B5

B7B8

B10

Intra 0

Pred 3

Pred 6

Intra 9

- Intra- Forward- Reverse- Bidirectional

16 x 16 bidirectionalmacroblocks

Group of Picture Structure

I-frames: for random access

intraframe coded; lowest compression

P-frames: predictive encoded

most recent I- or P- frame, medium compression

B-frames: interpolation

most recent & subsequent I- or P-frame, highest compression

163

Video compression• Temporal redundancy reduction

B

5

Bi-directional prediction

I : Intra-coded pictureP: Predicted pictureB: Bi-directionally interpolated picture

4

B

Order ofpresentation

Order oftransmission

BI P

0 3

B P

1 2 6

B

Prediction

I B P B

Increase of compressionrate

0 1 2 3 4

7

B P B

5 9

B I P

8

P B B P I B

86 7 9

164

Synchronisation - Getting data on time

• Synchronisation in the multimedia context refers to the mechanism that ensures a temporal consistent presentation of the audio-visual information to the user

• “On time” Not too late, not too earlyNo buffer over- or underflow

• Flow control : not applicable in broadcasting

• Common time base and Definition of a standard target decoder that describes the data consumption pattern of the receiver. – Remark: Direct MPEG (Microsoft) does not use time information for

clock recovery but relies on flow control

165

Streams

• Idea of continuity (pipelining): Carry time information for clock recovery

• No flow control (allows broadcasting): The emitter must have a precise knowledge of the receiver data consumption pattern (explicit in MPEG STD)

• Just-in-time: Shorter delay and smaller buffer size than with flow control

• Two aspects in synchronisation :Clock recovery & timing control (model & buffering)

166

Requirement on for stream transport

• Data information BER (Bit Error Rate) requirement

No repetition of frame possible FEC (Forward Error Correction)

• Time information No jitter

167

Agenda

• Introduction


• A/V Compression standards– The MPEG model and its situation in a communication context– JPEG & MJPEG– H.261 & MPEG-1– H.263 & MPEG-2– Visioconference

• Conclusion

168

MPEG Versions

• MPEG-1– For video storage in CD-ROM & transmission over T-1 lines (1.5Mbps)

• MPEG-2– Many options: 352x240 pixel; 720x480 pixel; 1440x1152 pixel;

1920x1080 pixel– Many profiles (set of coding tools & parameters)

• Main Profile– I, P & B frames; 720x480 conventional TV– Very good quality @ 4-6 Mbps

• MPEG-4– <64kbps to 4Mbps– Designed to enable viewing, access & manipulation of objects, not only

pixels– For digital TV, streaming video, mobile multimedia & games

169

MPEG Coding Standard

• Motion Picture Expert Group (MPEG)– Video and audio compression & multiplexing

– Video display controls• Fast forward, reverse, random access

• Elements of encoding– Intra- and inter-frame coding using DCT

– Bidirectional motion compensation

– Group of Picture structure

– Scalability options

• MPEG only standardizes the decoder

170

Video H.26x

• ITU-T video Standards for video conferencing: low speed, low turnover. Less action in movies.– H.261: Developed in the late 80 for ISDN (constant flow).

– H.263, H.263+, H.264. More modern and efficient.

• Simplified MPEG compression algorithms:– More restricted motion vectors (least action)

– In H.261: No frames B (excessive latency and complexity)

• Less CPU intensive. Feasible real-time software codec

171

Video H.26x (Cont’d)

• Subsampling 4:1:1

• Resolutions:– CIF (Common Interchange Format): 352 x 288

– QCIF (Quarter CIF): 176 x 144

– SCIF (Super CIF): 704 x 576

• Independent Audio: G.722 (quality), G.723.1, G.728, G.729

• Audio-video synchronization using H.320 (ISDN) and H.323 (Internet)

172

The MPEG model

Audiodecoder

Audio signal

Videosignal

Presented signals

Multiplexer

Videodecoder

Captured signals

Audioencoder

Videoencoder

Audio signal

Videosignal Digital storage medium

orNetwork

Transmission channel

Demulti-plexer

174

Components of the MPEG standard

• The MPEG standard is composed of 3 main parts :– Audio : Specifies the compression of audio signals

– Video : Specifies the compression of video signals

– System : specifies how the compressed audio and video signals are combined in the multiplexed stream (program stream or transport stream).

• Each part specifies :– The bitstream syntax

– The timing requirement and the related information (bit rate, buffer needs)

178

VideoEncoder

MPEG2 compression layer

Audioencoder

Audio,videosources

ES(ElementaryStream)

Adap-tationto thechannel

PS(1 pro-gram)

MPEG2 system layer

PSMulti-plexing


DVB, DVD ...

Disc

Satellite

TSMulti-plexing

TS(n pro-grams)


Cable

TS (Transport Stream)orPS (Program Stream)

MPEG in a communication context

• A simple view of MPEG in the communication context

JPEG & MJPEG

179

180

JPEG Coding Standard

• Key Components:– Transform:

• 8×8 DCT

• boundary padding

– Quantization:• uniform quantization

• DC/AC coefficients

– Coding:• Zigzag scan

• run length/Huffman coding

181181

JPEG Baseline Coder

169130

173129

170181

170183

179181

182180

179180

179179

169132

171130

169183

164182

179180

176179

180179

178178

167131

167131

165179

170179

177179

182171

177177

168179

169130

165132

166187

163194

176116

15394

153183

160183

Tour Example

182182

Step 1: Transform

• DC level shifting

• 2D DCT

169130

173129

170181

170183

179181

182180

179180

179179

169132

171130

169183

164182

179180

176179

180179

178178

167131

167131

165179

170179

177179

182171

177177

168179

169130

165132

166187

163194

176116

15394

153183

160183

412

451

4253

4255

5153

5452

5152

5151

414

432

4155

3654

5152

4851

5251

5050

393

393

3751

4251

4951

5443

4949

4051

412

374

3859

3566

4812

2534

2555

3655

-128

412

451

4253

4255

5153

5452

5152

5151

414

432

4155

3654

5152

4851

5251

5050

393

393

3751

4251

4951

5443

4949

4051

412

374

3859

3566

4812

2534

2555

3655

13

42

12

09

40

21

13

44

30

55

47

73

30

46

32

16

113

916

109

621

179

3310

810

1720

1024

2727

132

6078

4413

1827

2738

56313

DCT

183183

Step 2: Quantization

99103

101120

100112

121103

9895

8778

9272

6449

92113

77103

10481

10968

6455

5637

3524

2218

6280

5669

8751

5740

2922

2416

1714

1314

5560

6151

5826

4024

1914

1610

1212

1116

Q-table

13

42

12

09

40

21

13

44

30

55

47

73

30

46

32

16

113

916

109

621

179

3310

810

1720

1024

2727

132

6078

4413

1827

2738

56313

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

01

10

11

01

11

00

01

01

23

21

13

23

520

Q

Why increasefrom top-left tobottom-right?

184184

Step 3: Entropy Coding

Zigzag Scan

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

01

10

11

01

11

00

01

01

23

21

13

23

520

(20,5,-3,-1,-2,-3,1,1,-1,-1,0,0,1,2,3,-2,1,1,0,0,0,0,0,0,1,1,0,1,EOB)

Zigzag Scan

End Of the Block:

All following coefficients are zero

185

Video M-JPEG (Motion JPEG)

• The simplest: Try the video as a sequence of JPEG photos, without taking advantage of redundancy between frames.

• DCT Algorithm (Discrete Cosine Transform)

• less efficient, but low delay.

• Used in:

– Some digital recording systems and nonlinear

editing (editing independent of each frame)

– Some videoconferencing systems (low delay).

• It does not include standard audio support. The audio has been encoded by some other means (eg CD-DA) and synchronized by non-standard mechanisms.

186

H.261 & MPEG1

187187

H.261 Coding Standard

• Background:– Facilitate video conferencing and videophone service over

ISDN– p×64 kbps

• p=1: videophone; • p>5: videoconference; • p=30: VHS-quality;

– Basis of MPEG-1 and MPEG-2

• Features– Maximum coding delay of 150ms– Amenable to low-cost VLSA implementation

188

Input Image Formats

CIF QCIF

# of pels/line (Y)

# of pels/line (U/V)

360(352)

180(176)

180(176)

90(88)

# of lines/pic (Y)

# of lines/pic (U/V)

288

144

144

72

Interlacing 1:1 1:1

Temporal rate 30,15,10,7.5 30,15,10,7.5

Aspect ratio 4:3 4:3

189189

Video Multiplex

• It defines a data structure so that a decoder can interpret the received bit stream without any ambiguity

• Hierarchical data structure– Picture layer– Group of blocks (GOB) layer– Macroblock (MB) layer– Block layer

– Each layer has a distinct header

190190

Picture and GOB Layers

• Picture layer consists of picture header followed by the data for GOBs

– Picture header contains data such as picture format (CIF or QCIF)

• GOB layer is always composed of 33 MBs

– GOB header contains a MB address and compression mode followed by the data for the blocks

191

Macroblock and Block Layers

Macroblock: the smallest unit to select the compression mode

Y1 Y2

Y4Y3

Cr Cb

A MB always consists of 6 blocks (Y1 – Y4, Cr, Cb)

MBA MTYPE MQUANT MVD CBP Block Data

192192

Compression Modes

• Intra Mode– Similar to JPEG coding

– Support two compression modes

• Inter Mode– ME is not specified (MC is optional)

– Usually, 16-by-16 BMA, integer-pel accuracy, search range [-15,15]

– Support various compression modes

197

CRC errorand

Fixed-lengthcontrol

p x 64

I-DCT

MotionEstimation

Q-1

Q

Intra

DCTHuffman

VLC

FrameMemory

- Inter

Filter

Motion Vector

8x8block

H.261 Encoder

• Intended for videoconferencing applications

• Bit rates = p x 64 kbps, p = 2, 6, 24 common

198

MPEG-1• MPEG-1 adopts the SIF (Source Input Format) digital TV format.

• MPEG-1 supports only non-interlaced video. Normally, its picture resolution is:

– 352×240 for NTSC video at 30 fps– 352×288 for PAL video at 25 fps– It uses 4:2:0 chroma subsampling

• The MPEG-1 standard is also referred to as ISO/IEC 11172. It has five parts: – 11172-1 Systems, – 11172-2 Video, – 11172-3 Audio, – 11172-4 Conformance, and – 11172-5 Software.

198

200

Hierarchical Data Structure

• Sequences are formed by Group Of Pictures (GOP)

• GOP are made up of pictures (frames)

• Pictures consist of slices

• Slices are made up of macro-blocks (MB)

• Macro-blocks consist of blocks

• Blocks are 8×8 pixels arrays

201

Layers of MPEG-1 Video Bitstream.

201

Hierarchical Data Structure

Example of temporal picture structure

202

203

Slices in an MPEG-1 Picture.

203

204

Video MPEG (MPEG-1)

• Subsampling 4:2:0 (25% more savings than 4:2:2)

• Two possible formats:

– SIF (Standard Interchange Format) - in PAL (396 MBs):

– Y: 352x288 pixels,

– Cr & Cb: 176x144 pixels

– QSIF (Quarter SIF) (99 MBs):

– Y: 176 x 144;

– Cr & Cb : 88 x 72

• Two compression types (simultaneously):

– Spatial: as in JPEG

– Temporal: takes advantage of each frame having similarity with those around.

206

MPEG-1 Video

• Typical Sequence (360ms): I1 B2 B3 P4 B5 B6 P7 B8 B9 I10

• Order of encoding / decoding : I1 P4 B2 B3 P7 B5 B6 I10 B8 B9

• Typical size of frames (SIF, 352x288):

– I: 18kBytes (7:1)

– P: 6kBytes (20:1)

– B: 2.5 - 4kBytes (50:1)

– Average bit rate (IBBPBBPBBI): 1.2Mbps

– With QSIF the bit rate is reduced to 300kbps

• Compression Latency (Typical values):

– M-JPEG: 45 ms

– MPEG frames I: 200 - 400 ms

– MPEG frames I & P: 200 - 500 ms

– MPEG frames I, P & B: 400 - 850 ms

210

MB Types in MPEG-I

I-pictures P-pictures B-pictures

Intra Intra Intra

Intra-A Intra-A Intra-A

Inter-D Inter-F

Inter-DA Inter-FD

Inter-F Inter-FDA

Inter-FD Inter-B

Inter-FDA Inter-BD

Skipped Inter-BDA

A- adaptive quantization

F- forward prediction with MC

D- DCT of prediction error will be coded

B – backward prediction with MC

I – interpolated prediction with MC

Inter-I

Inter-ID

Inter-IDA

Skipped

214

Audio MPEG-1

• Mono or stereo sampling to 32, 44.1 (CD) or 48 (DAT) kHz. If you are using a reduced bit rate it is desirable to sample at 32 kHz.

• Psychoacoustic compression (with losses) asymmetric.

• From 32 to 448 kbps per audio channel

• Three layers in ascending order of complexity/quality:

– Layer I: good quality with 192-256 kbps per channel is not used

– Layer II: 96-128 kbps CD quality per channel

– Layer III: quality CD with 64 kbps per channel

• Each layer introduces new algorithms, and includes those of the above.

• Layer III used in DAB (Digital Audio Broadcast) and MP3

216

MPEG-1System

• Responsible for ensuring the synchronization between audio and video through a system of time slots ( 'timeslots') based on a clock of 90kHz.

• It is only necessary if using audio and video simultaneously (not for MP3 streams for example)

• Requires a small flow (5-50kbps)

217

Audio encoder

Video encoder

System Multiplexer

Analog audio signal

MPEG-1 stream

Analog video signal

Synchronization of audio and video MPEG

Digital video stream with timeslots

Digital audio stream with timeslots

Clock90 KHz

During the decoding the reverse process is performed

Prototypical Decoder ISO/IEC 11172

219

220

Major Differences from H.261

• Source formats supported:– H.261 only supports CIF (352 × 288) and QCIF (176 × 144) source formats,

MPEG-1 supports SIF (352 × 240 for NTSC, 352 × 288 for PAL).– MPEG-1 also allows specification of other formats as long as the Constrained

Parameter Set (CPS) as shown in the following Table is satisfied:

The MPEG-1 Constrained Parameter Set

220

Parameter Value

Horizontal size of picture ≤ 768

Vertical size of picture ≤ 576

No. of MBs / picture ≤ 396

No. of MBs / second ≤ 9,900

Frame rate ≤ 30 fps

Bit-rate ≤ 1,856 kbps

225

MPEG-I vs. H.261

H.261 MPEG-1

Sequential access Random access

One basic frame rate Flexible frame rate

CIF and QCIF images only Flexible image size

I and P frames only I, P and B frames

MC over 1 frame MC over 1 or more frames

Integer-pel MV accuracy Half-pel MV accuracy

Spatial filtering in the loop No filter

Variable threshold+uniform quantization

Quantization matrix

No GOP structure GOP structure

GOB structure Slice structure

226

H.263/H.263+ & MPEG2

227

Video Codecs: H.263

• Frame-based coding

• Low Bit rate Coding: – < 64 kbps (typical)

• H.261 coding with improvements– I/P/B frames– Additional Image formats: 4CIF, 16CIF

• Suitable for desktop video conferencing over low-speed links

230

H.263 Baseline Coding Algorithm

• Video Frame Structure– support sub-QCIF, QCIF, CIF, 4CIF and 16CIF

• Video Coding Tools– Motion estimation and compensation

• range : [-16,15.5] accuracy : half-pel

– Transform: 8×8 DCT

– Quantization: Q factor

– Entropy Coding: 3D VLC (LAST,RUN,LEVEL)

• Coding Control– Intra/Inter switch

7,0,,

, nmQ

cc

nmq

nm

231

Advanced Coding Modes in H.263

Unrestricted motion vector mode• range : [-31.5,31.5]• Allow MV to point outside the picture boundaries• Syntax-based arithmetic coding mode• About 5% savings over VLC• Advanced prediction mode

Overlapped Block Motion Compensation (OBMC)• PB-frame mode

I B P B P …

235

H.263+

• Advanced intra coding mode

• Deblocking filter mode

• Slice structure mode

• Supplemental enhancement information mode

• Improved PB-frame mode

• Reference picture selection mode

• Temporal, SNR and Spatial scalability mode

• Reference picture resampling mode

• Reduced resolution update mode

• Independently segmented decoding mode

• Alternative Inter VLC mode

• Modified quantization mode

244

MPEG-2• MPEG-2: For higher quality video at a bit-rate of more than 4

Mbps.

• Defined seven profiles aimed at different applications (toolboxes) :– Simple profile (No B picture), – Main profile (=MPEG1+interlaced, Does not support scalability), – SNR scalable profile (allows graceful degradation (noise improvement

at same resolution), – Spatial scalable profile (hierarchical coding : improvement at higher

resolution), – High profile.– 4:2:2 Profile, – Multiview Profile.

244

246

Video MPEG-2

• Compatible extension of MPEG-1

• Designed for digital TV:– Optimized for transmission, not storage

– Provides interlaced video (TV) as well as progressive (MPEG-1 was only progressive)

• According to the values of the sampling parameters used are defined in MPEG-2 four levels exist:– Low: 352x288 (supports MPEG-1)

– Main: 720x576 (equivalent CCIR 601)

– High-1440: 1440x1152 (HDTV 4:3)

– High: 1920x1152 (HDTV 16:9)

247

Profiles and Levels in MPEG-2

Four Levels in the Main Profile of MPEG-2

247

Level Simple profile

Main profile

SNR Scalable profile

Spatially Scalable profile

High Profile

4:2:2 Profile

Multiview Profile

HighHigh 1440MainLow

*

****

**

**** * *

Level Max. Resolution Max fps

Max pixels/sec

Max coded Data Rate (Mbps)

Application

High High 1440

Main Low

1,920 × 1,1521,440 × 1,152

720 × 576352 × 288

60 603030

62.7 × 106

47.0 × 106

10.4 × 106

3.0 × 106

8060 154

film productionconsumer HDTV

studio TVconsumer tape equiv.

248

Profiles Simple Main SNR Scalability

Spatial

Scalability

High 4:2:2 (Studio)

Subsampling 4:2:0 4:2:0 4:2:0 4:2:0 4:2:0/2 4:2:2

High 1920x1152 (HDTV 16:9)

80Mbps 100Mbps

High -1440 1440x1152 (HDTV 4:3)

60Mbps 60Mbps 80Mbps

Main 720x576 (CCIR 601)

15Mbps 15Mbps 15Mbps 20Mbps 50Mbps

Low 352x288 (MPEG1)

4Mbps 4Mbps

Leve

lsBit rates of Levels and Profiles MPEG-2

The peak rates are shown under the standard for each combination of profile and level.

249

Five Modes of Predictions

• MPEG-2 defines Frame Prediction and Field Prediction as well as five prediction modes:

1. Frame Prediction for Frame-pictures: Identical to MPEG-1 MC-based prediction methods in both P-frames and B-frames.

2. Field Prediction for Field-pictures: A macroblock size of 16×16 from Field-pictures is used.

249

250

3. Field Prediction for Frame-pictures: The top-field and bottom-field of a Frame-picture are treated separately. Each 16×16 macroblock (MB) from the target Frame-picture is split into two 16×8 parts, each coming from one field. Field prediction is carried out for these 16×8 parts.

4. 16×8 MC for Field-pictures: Each 16×16 macroblock (MB) from the target Field-picture is split into top and bottom 16×8 halves. Field prediction is performed on each half. This generates two motion vectors for each 16×16 MB in the P-Field-picture, and up to four motion vectors for each MB in the B-Field-picture.

This mode is good for a finer MC when motion is rapid and irregular.

250


251

5. Dual-Prime for P-pictures: First, Field prediction from each previous field with the same parity (top or bottom) is made. Each motion vector mv is then used to derive a calculated motion vector cv in the field with the opposite parity taking into account the temporal scaling and vertical shift between lines in the top and bottom fields. For each MB the pair mv and cvyields two preliminary predictions. Their prediction errors are averaged and used as the final prediction error.

This mode mimics B-picture prediction for P-pictures without adopting backward prediction (and hence with less encoding delay).

This is the only mode that can be used for either Frame-pictures or Field-pictures.

251


252

Supporting Interlaced Video

• MPEG-2 must support interlaced video as well since this is one of the options for digital broadcast TV and HDTV.

• In interlaced video each frame consists of two fields, referred to as the top-field and the bottom-field.

– In a Frame-picture, all scanlines from both fields are interleaved to form a single frame, then divided into 16×16 macroblocks and coded using MC.

– If each field is treated as a separate picture, then it is called Field-picture.

252

259

Audio MPEG-2

• Algorithms:– Version compatible with MPEG-1 Layer I, II and III

– Improved Compression System Advanced Audio Coding (AAC). Comparable quality to MPEG-1 layer III with 50-70% of flow. Not compatible with MPEG-1.

• Channels:– Stereo version compatible with MPEG-1

• Independent (each channel)

• Set (exploits redundancy between channels)

– Support multi-channel (languages) and 5.1 (5 channels surround)

261

MPEG-2 Scalabilities

• The MPEG-2 scalable coding: A base layer and one or more enhancement layers can be defined — also known as layered coding.

– The base layer can be independently encoded, transmitted and decoded to obtain basic video quality.

– The encoding and decoding of the enhancement layer is dependent on the base layer or the previous enhancement layer.

• Scalable coding is especially useful for MPEG-2 video transmitted over networks with following characteristics:– Networks with very different bit-rates.– Networks with variable bit rate (VBR) channels.– Networks with noisy connections.

261

262

MPEG-2 Scalabilities (Cont’d)

• MPEG-2 supports the following scalabilities:

1. SNR Scalability—enhancement layer provides higher SNR (Different levels of quality), base/enhancement layer uses a coarse/fine quantizer for DCT coefficients.

2. Spatial Scalability — enhancement layer provides higher spatial resolution (Different resolutions), base/enhancement layer is a low/high spatial resolution of the video.

3. Temporal Scalability—enhancement layer facilitates higher frame rate (Different frame rates), allow the decodability at different frame rates.

4. Hybrid Scalability — combination of any two of the above three scalabilities.

5. Data Partitioning — quantized DCT coefficients are split into partitions (Separate headers and payloads apart).

• Limited scalability capabilities: Three layers only

262

264

Non-Scalable

Non-scalable Bit stream

Decoder 1 Decoder 2 Decoder 3

265

Spatial Scalability

Scalable bit stream

Decoder 1

Decoder 4Decoder 3

Decoder 2

268

PSNR Scalability (Quality)

Scalable Bit stream

Decoder 1 Decoder 2 Decoder 3

272272

Temporal scalability

1 0 1 1 1 … 0 1 0 1 0 0 0 … 1 1 0 1 0 0

Frame 0,1,2,3,4,5,…Frame 0,2,4,6,8,…Frame 0,4,8,12,…

30Hz15Hz7.5Hz

276

Hybrid Scalability

• Any two of the above three scalabilities can be combined to form hybrid scalability:

1. Spatial and Temporal Hybrid Scalability.2. SNR and Spatial Hybrid Scalability.3. SNR and Temporal Hybrid Scalability.

• Usually, a three-layer hybrid coder will be adopted which consists of: – Base Layer, – Enhancement Layer 1, and – Enhancement Layer 2.

276

277

Data Partitioning

• The Base partition contains lower-frequency DCT coefficients, enhancement partition contains high-frequency DCT coefficients.

• Strictly speaking, data partitioning is not layered coding, since a single stream of video data is simply divided up and there is no further dependence on the base partition in generating the enhancement partition.

• Useful for transmission over noisy channels and for progressive transmission.

277

278

Major Differences from MPEG-1• Better resilience to bit-errors: In addition to Program Stream, a

Transport Stream is added to MPEG-2 bit streams.

• Support of 4:2:2 and 4:4:4 chroma subsampling.

• More restricted slice structure: MPEG-2 slices must start and end in the same macroblock row. In other words, the left edge of a picture always starts a new slice and the longest slice in MPEG-2 can have only one row of macroblocks.

• More flexible video formats: It supports various picture resolutions as defined by DVD, ATV and HDTV.

278

279

Major Differences from MPEG-1 (Cont’d)

• Nonlinear quantization — two types of scales are allowed:

1. For the first type, scale is the same as in MPEG-1 in which it is an integer in the range of [1, 31] and scalei = i.

2. For the second type, a nonlinear relationship exists, i.e., scalei ≠ i. The ith scale value can be looked up from the following Table.

Table : Possible Nonlinear Scale in MPEG-2

279

280280

Other Improvements

MPEG-I MPEG-II

Intra MB

DC Coeff.

8bits 11bits

Intra MB

AC Coeff.

[-256,255] [-2048,2047]

Non-intra MB Coeff.

[-256,255] [-2048,2047]

Finer Quantization of the DCT Coefficients

282

Videoconference

• Interactive communication through audio, video and data sharing

• It can be:

– Point to point

– Point to multipoint

– Multipoint to multipoint

283

Requirements / Features of the videoconference

• Compression / Decompression in real time.

• 200-400 ms maximum delay.

• Mobility disabled.

• Normally acceptable quality audio phone.

• Need to synchronize audio and video.

• Need for signaling protocol (connectionless service).

284

Videoconference Standards

• Videoconferencing systems have been standardized by the ITU-T (International Telecommunications Union -Telecommunications sector) in the standards of the series H(multimedia and audiovisual systems)

• The H.32x are videoconferencing standards. The 'x' depends on the type of network used

285

H.32x Standards

Standard Physical environment

Service Type Year approval

H.320 ISDN

Streaming a/v128 to 384 Kb/s

Circuit 1990

H.321 ATM Circuit

H.322 IsoEthernet TDM

H.323 EthernetStreaming a/v 14,4 - 512 Kb/s

Packet 1996

H.324 analog Modem Circuit

The H.32x are standards umbrella. Each is based on a previous set of standards to specify all the necessary services in a videoconference. e.g., G.711 audio coding

286

H.320 Standard

287

H.323 Standard

• Packet-based multimedia communications systems

288

H.320 & H.323 Standards

ISDN IP

H.261

Video Coding

H.221

Binary train conversion

H.243

Multi Point

G.711

3.1kHz audio 64/56kbps

H.242

Control Protocol

H.230

Signalization and Control

G.722

7kHz audio 64/56/48kbps

G.728

3.1kHz audio 16kbps

H.261/263

Video Coding

H.245

Control Protocol

H.225

packetization

G.711

3.1kHz audio 64/56kbps

G.723

3.1kHz audio 5.3kbps

Q.931

Call Signalization

RAS

Gate Keeper Signalization

T.120

Data Protocols Multimedia Communication

289

H.320 & H.323 Standards

H.323 H.320

Control H.225.0 Call Control Q.931

H.245 System Control H.242

H.225.0 Multiplexing H.221

Media G.711

G.722

G.723.1

G.728

Audio G.711

G.722

G.728

H.261

H.263

Video H.261

H.263

T.120 Data T.120

290

H.32x audio Formats

Codec Original bandwidth

(kbps)

Compression Ratio

Compressed Bandwidth (kbps)

G.711

G.722

G.723.1

G.728

G.729

MPEG

64

224

64

64

64

706

1 : 1

3,5-4,6 : 1

10 : 1

4 : 1

8 : 1

3-11 : 1

64

48-64

6,4

16

8

64-256

MPEG is not an audio format H.323. It only appears for comparison

294

Agenda

• Introduction



• Conclusion

295

Some Digital Audio Formats

FormatSampling Freq.

(KHz)# Channels

Capacity per Channel (Kb/s)

Application

PCM (G.711) 8 1 64 Telephony

ADPCM (G.721) 8 1 32 Telephony

SB-ADPCM (G.722) 16 1 48/56/64 Vídeoconferenc.

MP-MLQ (G.723.1) 8 1 6,3/5,3 variable Internet Telephony

ADPCM (G.726) 8 1 16/24/32/40 Telephony

E-ADPCM (G.727) 8 1 16/24/32/40 Telephony

LD-CELP (G.728) 8 1 16 Telephony /Videoc.

CS-ACELP (G.729) 8 1 8 Internet Telephony

RPE-LTP (GSM 06.10) 8 1 13,2 GSM Telephony

CELP (FS 1016) 8 1 4,8

LPC-10E (FS 1015) 8 1 2,4

CD-DA / DAT 44,1/48 2 705,6/768 Hi-Fi Audio

MPEG-1 Layer I 32/44,1/48 2 192-256 variable

MPEG-1 Layer II 32/44,1/48 2 96-128 variable

MPEG-1 Layer III (MP3) 32/44,1/48 2 64 variable Hi-Fi Internet

MPEG-2 AAC 32/44,1/48 5.1 32-44 variable Hi-Fi Internet

High delay

Low delay

296

Digital Video FormatsVideo Format Y Size

Color Sampling

Frame Rate (Hz)

Raw Data Rate (Mbps)

HDTV Over air. cable, satellite, MPEG2 video, 20-45 Mbps SMPTE296M 1280x720 4:2:0 24P/30P/60P 265/332/664 SMPTE295M 1920x1080 4:2:0 24P/30P/60I 597/746/746 Video production, MPEG2, 15-50 Mbps BT.601 720x480/576 4:4:4 60I/50I 249 BT.601 720x480/576 4:2:2 60I/50I 166 High quality video distribution (DVD, SDTV), MPEG2, 4-10 Mbps BT.601 720x480/576 4:2:0 60I/50I 124 Intermediate quality video distribution (VCD, WWW), MPEG1, 1.5 Mbps SIF 352x240/288 4:2:0 30P/25P 30 Video conferencing over ISDN/Internet, H.261/H.263, 128-384 Kbps CIF 352x288 4:2:0 30P 37 Video telephony over wired/wireless modem, H.263, 20-64 Kbps QCIF 176x144 4:2:0 30P 9.1

297

Format SQCIF QCIF CIF 4CIF or SCIF

16CIF 4:3

16CIF 16:9

Resolution 128x96 176x144 352x288 704x576

720x576

1408x11521440x1152

1920x1152

H.261 Op.

H.263 Op. Op.

MPEG-4

MPEG-1

MPEG-2 Low Principal High 1440 High

Stan

dar

dCompressed video standard resolutions

298

System Spatial Compression

(DCT)

Temporal Compression

Complexity Compression

Efficiency delay

M-JPEG Yes No Medium Low Very small

H.261 Yes Limited

(I & P)

High Medium small

MPEG-1/2 Yes Extended

(I, P & B)

Very High Large high

H.263

MPEG-4

Yes Extended

(I, P & B)

Enormous large Half high

Video compression formats

299

Standard/Format Typical Bandwidth Compression Ratio

CCIR 601 170Mbps 1:1 (Reference)

M-JPEG 10-20Mbps 7-27:1

H.261 64 – 2000kbps 24:1

H.263 28,8-768kbps 50:1

MPEG-1 0,4-2,0Mbps 100:1

MPEG-2 1,5-60Mbps 30-100:1

MPEG-4 28,8-500kbps 100-200:1

Video compression formats Bit rates

Lowdelay

Highdelay

300

Video compression formats

Type Method Format Original Compressed

Video Conference

H.261 176x144 or 352x288

@10-30 fr/sec

2-36 Mbps 64-1544kbps

Full Motion MPEG2 720x480 @30 fr/sec

249 Mbps 2-6Mbps

HDTV MPEG2 1920x1080 @30 fr/sec

1.6 Gbps 19-38Mbps

303

Agenda

• Introduction



• Conclusion

References

• Yun Q. Shi, Huifung Sun, 2008. Image and Video Compression for Multimedia Engineering. Fundamentals, Algorithms, and Standards. CRC Press.

• Gonzalez, Woods, 2008. Digital Image Processing. Prentice Hall.

• Jae-Beom Lee, Hari Kalva, 2008. The VC-1 and H.264 Video Compression Standards for Broadband Video Services. Springer.

• H.R. Wu & .R. Rao, 2006. Digital Video Image Quality and Perceptual Coding. Taylor & Francis Group. LLC.

• Khalid Sayood, 2005. An introduction to data compression. Morgan Kaufmann Publishers.

• I.E.G. Richardson, 2003. H.264 and MPEG-4 Video Compression. Video Coding for next generation multimedia. John Wiley & Sons, Ltd.

• Richardson, 2002. Video Codec Design. John Wiley & Sons.

• John WATINSON, 2001. The MPEG Handbook MPEG1, MPEG2, MPEG4. Focal Press.

• Ghanbari, 1999. Video coding: an introduction to standard codecs. IEE Press.

• Riley and Richardson, 1997. Digital Video Communications. pub. Artech House.

• Bhaskaran V, Konstantinides, 1996. Image and video compression standards – algorithms and architectures. Kluwer academic publishers.

• Netravali, A N and Haskell, B G, 1995. Digital pictures: Representation, Compression and Standards. 2nd Edition, Plenum Press.

304

References

• www.chiariglione.org/mpeg/

• http://www.mpeg.org

• http://jura1.eng.rgu.ac.uk/ (Digital Video pages)

• http://www.vcodex.com

305

http://www.chiariglione.org/mpeg/

http://www.mpeg.org/

http://jura1.eng.rgu.ac.uk/

Techniques de compression Audio/Vidéo (Journée Télécoms & Multimédia par : zouhair guennoun)

Documents

Transcript of Techniques de compression Audio/Vidéo (Journée Télécoms & Multimédia par : zouhair guennoun)