Techniques de compression Audio/Vidéo (Journée Télécoms & Multimédia par : zouhair guennoun)
-
Upload
rachid-zine -
Category
Documents
-
view
216 -
download
1
description
Transcript of Techniques de compression Audio/Vidéo (Journée Télécoms & Multimédia par : zouhair guennoun)
A/V COMPRESSION TECHNIQUES& COMPRESSION STANDARDS
Journée TELECOMS & MULTIMEDIA. 14 Mai 2010 – ENSA TANGER
Pr. Zouhair GUENNOUNEcole Mohammadia d’ingénieurs – EMI
Laboratoire d’Electronique et Communications – LEC
ReceiverChannelInformation
Source Transmitted information
Noise sourceDatareduction
Received information
Communication Model
2
33
Audio/Video Coding Applications
Detailed Communication Model
Transmitter
Information
Source
Source
Coding
Channel
Coding
Information
Channel
Data
ReductionEncrypt
Receiver
Source
Decoding
Channel
Decoding
Data
Reconstruction DecryptDestination
Noise
4
5
Agenda
• Introduction
• Audio & Video compression principles
• A/V Compression standards
• Conclusion
6
Agenda
• Introduction– Why compressing?
– Audio & Video basics
– MPEGx, & H.26x Compression Standards Overview
• Audio & Video compression principles
• A/V Compression standards
• Conclusion
Why compressing?
7
8
• Audio: Compression needed in spectral domain
• Bit rate of a stereo audio source (CD-DA encoding)– Sampling frequency : 44.1kHz
– Stereo - 16-bit per sample
– Bit rate = 44100 * 2 * 16 = 1.41Mbits/sec
Audio waveform (time)
time
The need for compression
9
Digital Audio
Type Sampling Frequency
(kHz)
Bits per
Sample
# Channels
Bit Rate (Mbps)
Telephone signal
(G.711)
8 8 1 0.064
(ISDN)
CD-DA(Compact Disc – Digital Audio)
44,1 16 2 1,411(CD-ROM 1x)
DAT(Digital Audio Tape)
48 16 2 1,536
10
• Video: Compression needed in spatial domain
• Bit rate of a video source (CCIR 601 - 50Hz countries)
– 25 images per second
– YUV colour coding (Y: luminance –U,V : Chrominance)
• Y: 8 bit per pixel –
• U,V: 1 pixel on 2 coded, 8 bit per pixel
Bit rate = (576*720)*25*16 = 166Mbits/sec
The need for compression
720 samples
576lines
Video image
Bit Rate versus Spatial ResolutionBit Rate (in Mbps)
3,69 7,52 30,41
162,20
648,81
1061,68
SQCIF
(128
* 9
6)
QCIF
(174
* 14
4)
CIF
(352
* 28
8)
4CIF
(704
* 576)
16CIF
4:3
(1408
* 1
152)
16CIF
16:9
(192
0*115
2)
12
13
The need for compression
• Channels available for A/V transmission– Analog television channel (compatibility)
• Cable (bandwidth = 8MHz) • Satellite (Bandwidth = 30-40MHz)
Capacity around 40Mbits/sec
– Compact Disc (CD – 650MB)For 74 min. play time : 1.41Mbits/sec
– Digital Versatile Disc (DVD – 4.7GB)For 135 min. play time : 4.6Mbits/sec
Illustrative example
• PSTN modem - maximum bit rate: 56kbps
• Video frame sequence -– Resolution: 288x352 (CIF format)– RGB colors: 8x3 bits per pixel– Frame rate transmission: 30 frames per second
• Required bit rate: 288x352x8x3x30 = 72.99Mbps
• Ratio between the required bit rate and largest possible bit rate: 72.99Mbps/56kbps = 1289– To accomplish the transmission over PSTN, a need to compress data by
at least 1289 times.
15
16
• MPEG-1 target (Video-CD : 74 min. constraints)
But quality was judged too poor (about VHS quality)
The need for compression
Compression
Video : 166 Mbit/sec
Audio : 1.4 Mbit/sec
1.4 Mbit/sec
17
• MPEG-2 target
– Program stream (DVD)
– Transport stream (DVB)
The need for compression
Compression
1 program(video, multichannelaudio, ....)
= motivation for the capacityincrease of the CD (--> DVD)
3-9 Mbit/sec (variable bitrate)(but higher quality than MPEG-1)
Compression
n programs(video, multichannelaudio, ....)
about 40 Mbit/sec (constant bitrate)(DVB-Satellite & DVB-Cable)
The need for compression
• Compression extends the playing time of a given storage device.
• Compression allows a reduction in bandwidth
• For the same bandwidth, compression allows faster transmission, and better quality.
• Compression removes redundancy from signals. – Redundancy is however essential to making data more resistant to
errors. – Compressed data are more sensitive to errors than uncompressed
data.
18
19
Principles of Compression• Compression (or Source Coding) is achieved by
suppressing information: – redundant information
– irrelevant information
• Suppression of redundant information lossless compression
•
The original signal and the one obtained after encoding and decoding are identical
Compression DecompressionRc (bps) Ri < Rc Rp = Rc
Fc(x,y,t) Fp(x,y,t) = Fc(x,y,t)
20
Principles of Compression
• Suppression of irrelevant information lossy compression (Perceptive Coding)
Example: bandwidth limitation, masking in audio
The original signal and the one obtained after encoding and decoding are different but are perceived as identical
Compression DecompressionRc (bps) Ri < Rc Rp = Rc
Fc(x,y,t) Fp(x,y,t) Fc(x,y,t)
2222
Principles of Compression
• Lossless vs. lossy data compression– Source entropy H(X)
– Rate-Distortion function R(D) or D(R)
• Probabilistic modeling is at the heart of data compression– What is P(X) for video source X?
– Is video coding more difficult than image coding?
Lossless methods
Lossy methods
L0
H(S)
Distortion
0 Dmax
23
Principles of Compression
• Reversible (lossless): data files (i.e.: V.42bis standard in modems, zip files)
• Non-reversible (lossy): audio & video signals
• Usually more compression to lower quality and higher CPU consumption.
– Different compression algorithms also differ in their computational complexity, generally for the same bit rate more complex techniques get better quality at the expense of using more CPU.
– Compression algorithms designed for telephony should introduce very little delay because otherwise lost interactivity and echoes are problems and poor quality of sound.
24
Principles of Compression
• Scene more complex Higher bit rate for same quality
• CBR variable quality (example : Video CD artefact)
• Constant quality VBR necessary (e.g.: DVD-Video)
For Gaussian source N(0, 2)
RRD 22 2
Bit Rate
Distortion
Constant Bit RateConstant Quality
Complex
Simple
26
Principles of Compression
• Constant Bit Rate systems –CBR (G.711, G.722, G.729) are better suited for connection-oriented services.
• Variable Bit Rate systems –VBR (MPEG, G.723.1) are best suited to networks without constant bit rate reserve.
– MPEG compression is the most efficient and gives better quality but consumes much CPU and introduce so much delay can not be used in interactive applications (video conferencing or telephone).
Principles of Compression
• Video codec key issues:
– Compression efficiency and image quality
– Computational complexity
– Frame rate
Encoder DecoderChannel
28
Principles of Compression
• General-purpose compression: Entropy encoding
– Remove statistical redundancy from data
– E.g. encode common values with short codes, uncommon values with longer codes
– Good for text files, poor for images/video
Entropy Encoder
Entropy Decoder
ChannelSource Data
Decoded Data
29
Principles of Compression
• Add a model that attempts to represent the image/video signal in a form that can be easily compressed by the entropy encoder
• Model exploits the subjective redundancy of images and video (Spatial, Temporal, Chromatic redundancies)
• Decoded image may not be identical to original image
• Image properties that are useful for compression:– Many of the pixels of a typical photographic image contain little or no
« useful » detail (e.g. flat area)
– The eye is insensitive to « high frequency » image information
Entropy Encoder
Entropy Decoder
ChannelImage Model
Image Model
30
32
Principles of Compression
• Trade-off Complexity/Quality/Bit Rate
• New technique may result in new trade-off
Quality
Bitrate
Complexity
MPEG Layer 1
MPEG Layer 2
MPEG Layer 3
MPEG AAC
Other TechniqueSpeech coding
Principles of Compression
Redundancies
StatisticalRedundancy
PsychologicalRedundancy (HVS)
InterpixelRedundancy
CodingRedundancy
Spatial (intraframe)Redundancy
Temporal (interframe)Redundancy
Variable-Length Coding Huffman, Arithmetic Run Length Coding, …
Luminance (Contrast) MaskingTexture Masking
Color MaskingFrequency MaskingTemporal Masking
33
3434
Quality Measurements
• Objective– Mean Square Error (MSE)
– Peak Signal-to-Noise-Ratio (PSNR)
– Measure the fidelity to original video
• Subjective– Human Vision System (HVS) based
– Emphasize audiovisual quality rather than fidelity
36
Quality Measurements
• Signal distortion is not a good measure of the performance of a lossy compression method
an other method is necessary: MOS scale (Mean Opinion Score)
• The five-grade CCIR impairment scale (Rec.562)– 1 – unsatisfactory (Very annoying), – 2 – poor (Annoying), – 3 – satisfactory (Slightly annoying), – 4 – good (Perceptible but not annoying), – 5 – Excellent (Imperceptible)
• Example: Double blind test
38
Quality Measurements Speech Coding - Compression vs quality
Bit
Rat
e (
Kb
/s)
MOS (Mean Opinion Score)
0
PCM (G.711)
ADPCM 32 (G.726)
ADPCM 24 (G.725)
ADPCM 16 (G.726) LDCELP 16 (G.728)
LPC 4.8CS-ACELP 8 (G.729)
MP-MLQ 6,4 (G.723.1)Require special hardware (DSP)
8
16
24
32
40
48
56
64
CS-ACELP (G.729a)
0 1 2 3 4 5
Standard MOSG.711 (64 Kb/s): 4,10G.729 ( 8 Kb/s): 3,92G.726 (32 Kb/s): 3,85G.729a ( 8 Kb/s): 3,70G.723.1 (5,3 Kb/s): 3,65G.728 (16 Kb/s): 3,61
Audio & Video Basics
39
40
Audio Basics
• Analog signal sampled at constant rate– telephone: 8,000 samples/sec
– CD music: 44,100 samples/sec
• Each sample quantized, i.e., rounded– e.g., 28=256 possible quantized
values
• Each quantized value represented by bits– 8 bits for 256 values
– 16 bits for 65536 values
• Mono, stereo, or surround?– 1, 2 or more channels
• Example: 8,000 mono samples/sec, 256 quantized values --> 64kbps
• Receiver converts it back to analog signal:– some quality reduction
Example rates
• CD: 1.411Mbps
• MP3: 96, 128, 160kbps
• Internet telephony: 5.3 - 13kbps (G.723.3, G.729, and GSM – Global System for Mobile communication)
43
Audio Basics:Speech Coding and compression
• 5 quality ranges (human ear sensitivity: 20Hz to 20kHz):
Range Frequency Bandwidth Quality and Applications
Telephone channel 300Hz – 3.4kHz intelligible speech, noisy natural,
Expanded bandwidth 50Hz – 7kHz speech with respected natural
Hi.Fi. bandwidth 20Hz – 15kHz excellent speech and music
Stereo bandwidth 20Hz – 20kHz CD quality
Stereo bandwidth 20Hz – 48kHz perfect quality, studio, cinema, DVD
44
Video Basics• Operation of analogue television: The image captured by the camera lens
is converted into three monochrome images obtained by applying filters of the three fundamental (primary) colors –R (Red), G (Green), B (Blue).
– All kind colors are produced by using different proportions of these primary colors
• Additive Color Mixing on a black surface
• Subtractive Color Mixing on a white surface
– The correct combination of the three monochrome images can reconstruct the original image.
– RGB signals thus obtained are available in some cameras, though it is unusual to work with them
46
Video Basics: Digital Video & Pixels
• Digital video is a sequence of frames, each consisting of a rectangular grid of picture elements or pixels.
– For purely black-and-white video, each pixel is represented as a single bit, 0 for black or 1 for white.
– For grey-scale video, 8 bits per pixel can be used to represent 256 levels of grey … good enough for most cases.
– For good colour video, 8 bits are used per pixel for each of the RGB colours, resulting in 24 bits per pixel.
47
Video Basics : Digital Video & Pixels
Digital Camera
The Eye
Film
Source: Digital Image Processing – Gonzalez, Woods. Prentice Hall
48
Video Basics: Sampling & Quantization
48
Sampling & Quantization
Source: Digital Image Processing – Gonzalez, Woods. Prentice Hall
49
Video Basics: Scanning• When an image (frame) appears on the retina of the human
eye, the image is retained for several milliseconds before decaying.
• Consequently, if a sequence of images is displayed at the appropriate rate, the eye does not notice that it is looking at discrete images.– This is how you get smooth motion in videos!
• What that rate is depends on the eye in question and how the images are displayed.
51
Video Basics: Scanning
Spatial and Temporal Sampling of a Video Sequence
Source: H.264 and MPEG-4 Video Compression. Video Coding for next generation multimedia. I.E.G. Richardson. John Wiley & Sons, Ltd. 2003. Chapter 2.
53
55
Video Basics: Color Format
• RGB is not efficient since it uses equal bandwidth for each color component.
• R,G,B components are correlated– Transmitting R,G,B components separately is redundant
– More efficient use of bandwidth is desired
• To store or transmit video signals (sequence of images –frames at constant rate), RGB signals are transformed into three linear combination of such signals.
56
Video Basics: Color Format
• The combination is performed such that: – One of the new signals collects all the information light or brightness of the
image, Y, this signal is called luminance. – The other two signals, called U and V, correspond to different combinations of
the three original signals, chosen so that capture all the color information which is why these two signals are generically referred to as chrominance.
• Various formulae have been devised to convert RGB values to chrominance and luminance values, depending on the format: YUV, YIQ, YCbCr, …
• Consider switching from RGB to YUV as a change of a coordinate system to one that maintains the same number of degrees of freedom but can solve the problem more easily.
• For backward compatibility, colour signals had to be receivable and watchable on a black-and-white set.
Color Formats Conversion
• kr, kg, kb are weighting factors
BkGkRkY bgr
YBC
YGC
YRC
b
g
r
csteCCC bgr
YBk
C
YRk
C
BkGkkRkY
b
b
r
r
bbrr
1
5.0
1
5.0
1
bb
b
br
bbr
br
rr
rr
Ck
YB
Ckk
kkC
kk
kkYG
Ck
YR
5.0
1
1
12
1
12
5.0
1
58
Color Formats Conversion
• ITU-R recommendation BT.601 defines kr = 0.299 and kb = 0.114.
YBC
YRC
BGRY
b
r
564.0
713.0
114.0587.0299.0
b
br
r
CYB
CCYG
CYR
772.1
344.0714.0
402.1
59
61
Video Basics: Color Format
61
http://www.yorku.ca/eye/photopik.htm
62
Video Basics: Color Format
• Human eye is more sensitive to the luminance (brightness) component than the color component: the latter need not be transmitted as accurately.
– The luminance is broadcast at the same frequency as a black-and-white signal, and the chrominance is ignored on black-and-white sets.
– The two chrominance signals are broadcast in narrow bands at higher frequencies.• Called hue and saturation or tint and colour
63
Video Basics:
Chrominance Downsampling
• The reduced resolution in the chroma components is called downsampling (subsampling).
• The subsampling is based on the human eye less sensitive to chrominance.
• (Y, Cr, Cb) may use different resolutions 4:n:m: The numbers indicate the relative sampling rate of each component in the horizontal direction.
Video Basics:
Chrominance Downsampling
• 4:4:4 sampling: the three components have the same resolution (3n bits per pixel)
– a sample of each component exists at every pixel position.
– Preservation of the full fidelity of the chrominance components.
• 4:2:2 sampling: Cb and Cr have the same vertical resolution as Y, but half the horizontal resolution (2n bits per pixel).
– 4:2:2 video is used for high-quality color reproduction.
64
Video Basics:
Chrominance Downsampling
• 4:1:1 sampling: Cb and Cr have the same vertical resolution as Y, but quarter the horizontal resolution (1.5n bits per pixel).
• 4:2:0 sampling: Cb and Cr each have half the horizontal and vertical resolution of Y (1.5n bits per pixel).
– 4:2:0 video requires exactly half as many samples as 4:4:4 video
– 4:2:0 is widely used for consumer applications such as video conferencing.
65
70
Video Basics: Spatial Resolution Formats
• CIF: Common Interchange (Intermediate) Format - Intermediate format used in videoconferencing (communication between US & Europe)
– Luma resolution: 352x288 (360x288) pixels– Sampling frequency: 30Hz (30 frames/second - fps),
non-interlaced, sampling rate 4:2:0
• QCIF:176x144 pixels, 30fps (Quarter CIF) –used in Video Telephony applications
• SQCIF: 128x96 pixels, 30fps (Sub QCIF), mobile multimedia applications
• 4CIF: 704x576 pixels, 30fps, appropriate for standard-definition television and DVD-video
• 16CIF: 1408x1152 pixels, 50fps
71
Spatial Resolution Formats
16CIF 16:9
16CIF 4:3
SCIF
CIF
QCIF SQCIF
72
Video Basics: Spatial Resolution Formats
• SIF: Simple Input Format (Source Intermediate Format) - Half the vertical & horizontal resolution of 4:2:0. Used in Video Cassette Recorders (VCRs)
– 360x242 (352x240) pixels, 30 frames/second for NTSC, sampling rate 4:2:0
– 360x288 (352x288) pixels, 25 frames/second for PAL, SECAM, sampling rate 4:2:0
• CCIR-601 (ITU-R 601 or BT 601)– 720x525 pixels, 30 frames/second, sampling rate 4:4:4 & 4:2:2
– 720x625 pixels, 25 frames/second, sampling rate 4:4:4 & 4:2:2
MPEG, what is it?
76
77
•ISO (1947): International Organization for Standardization;
•IEC (1906): International Electrotechnical Commission,
•ISO/IEC JTC 1 (1987): Joint Technical Committee 1 of the ISO and the IEC. It deals with all matters of information technology.
•ITU-T : Telecommunication Standardization Sector coordinates standards for telecommunications on behalf of the International Telecommunication Union (ITU 1993 – 1956 CCITT).
International Organizations
78
• JPEG - ITU-T T.81, ISO/IEC IS 10918-1 : Joint Photographic Experts Group one of two sub-groups of ISO/IEC Joint Technical Committee 1, Subcommittee 29, Working Group 1 (ISO/IEC JTC 1/SC 29/WG 1) - titled as Coding of still pictures.
• MPEG: Moving Picture Experts Group (ISO/IEC JTC 1/SC 29/WG 11) - a working group of ISO/IEC in charge of the development of standards for coded representation of digital audio and video and related data.
• ITU-T SG15 : H26x – Videophone & Videoconference standards
• JVT: Joint Video Team - a group of video coding experts from ITU-T Study Group 16 (VCEG) and ISO/IEC JTC 1 SC 29 / WG 11 (MPEG), created to develop an advanced video coding specification.
•Formed in 2001, the JVT’s main result has been ITU-T Rec. H.264 | ISO/IEC 14496-10, commonly referred to as H.264/MPEG-4-AVC, H.264/AVC, or MPEG-4 Part 10 AVC.
International Organizations (Cont’d)
79
MPEG: Moving Picture Experts Group• Moving Picture Expert Group established in 1988 for the
development of digital video
– Still active (MPEG-21 is currently in development)
• International standard (ISO/IEC) Interoperability & economy of scale
• Compression of audio and video and multiplexing in a single stream
• Definition of the interface not of the codecs room for improvement
80
MPEG: Moving Picture Experts Group
• Official home page of the Moving Picture Experts Group (MPEG):www.chiariglione.org/mpeg/
• In charge of the development of standards for coded representation of digital audio and video and related data.
• The group produces standards that help the industry offer end users an ever more enjoyable digital media experience.
81
List of MPEG standards
• MPEG-1 (ISO 11172) The standard on which such products as Video CD and MP3 are based (approved in Nov. 1992)
– Video-oriented CD-ROM, SIF format (video progressive)
– Objective: VHS quality. Typical bit rate 1.5Mb/s
– Useful for tele-education, enterprise applications, business, etc.
82
List of MPEG standards (Cont’d)
• MPEG-2 (ISO 13818) The standard on which such products as Digital Television set top boxes and DVD are based (approved in 1994, 1996);
– Compatible extension of MPEG-1 'up‘
– Oriented broadcast (interlaced video)
– Multiple resolutions standardized, from SIF (compatible with MPEG 1 up to high definition formats for DVDs and so on.
– Intended for studio-quality audio and video. Broadcast quality HDTV also.
– Various bit rates 4-100Mb/s.(CBR & VBR)
– Useful for all types of applications (business, entertainment, etc.).
• MPEG-3: Originally designed for HDTV, finally resolved by reparameterization of MPEG-2.
83
List of MPEG standards (cont’d)
• MPEG-4 (ISO 14496) The standard for multimedia for the fixed and mobile web (Version 1 -approved in Oct. 1998, Version 2 - approved in Dec. 1999, Versions 3, 4, 5) – Computer Graphics Applications;
– Originally intended to similar applications as H.263, but expanded to cover a wider range of multimedia applications.
– Extension 'down' MPEG-1. Internet video Oriented
– Useful in the range 28,8-500Kb/s. New compression algorithms. Typically less than 1 Mbps but could be as high as tens of Mbps.
83
84
List of MPEG standards (cont’d)
• MPEG-4 (ISO 14496) …
– Coding of Audiovisual Objects - Standard for audio, video and graphics in interactive 2D and 3D multimedia communication - MPEG-4 v.2 & 3
– Supports scene composition and content-based functionalities, in which scenes are expressed in terms of multiple audio-visual objects (AVOs) that can be manipulated together or individually.
– Supports layering/scaling: multiple versions of AVOs can be provided and matched against needs and available resources.
• For example, a base level AVO can be provided to give the bare essentials, with multiple optional AVOs that provide levels of enhancement details.
• If we don’t have enough network resources, drop the enhancements and stick with the basics!
84
85
List of MPEG standards (cont’d)
• MPEG-7 (ISO 15938) The standard for description and search of audio and visual content (approved in Jul. 2001);
– Audiovisual content description (indexing, searching, databases, etc.).. Interprets semantics of audiovisual information
– More to do with structuring, and describing and searching through multimedia content
• MPEG-21 (21000) The Multimedia Framework.
– Focus on multimedia distribution and on DRM aspects;
85
86
List of MPEG standards (cont’d)
• MPEG-A (23000) – Application-specific formats, integrating multiple MPEG technologies
• MPEG-B (23001) – Systems specific standards
• MPEG-C (23002) – Video specific standards
• MPEG-D (23003) – Audio specific standards
• MPEG-E (23004) – MPEG multimedia Middleware - support to download andexecution of multimedia applications
• MPEG-V (23005) – Context and media control - interchange with virtual worlds
• MPEG-M (23006) – MPEG extensible Middleware - packaging and reusabilityof MPEG technologies
• MPEG-U (23007) – MPEG Rich Media User Interface
87
List of ITU-T Standards
• H.261 (1983-1990)
– A standard for video telephony and video conferencing over PSTN (Public Switching Telephone Networks) and wireless networks.
– Uses either the CIF or QCIF format.
– Uses p x 64kbps where p can be between 1 and 30.
– Originally designed for ISDN usage (Integrated Services Digital
Network).
– Still in use• Low complexity, low latency
• Mostly as a backward-compatibility feature
• Overtaken by H.263
88
List of ITU-T Standards (cont’d)
• H.263, H.263+, H.263++ (1993-1999)– Based on H.261 but offers significant improvement on
coding efficiency, employs advanced coding options and lower resolutions to preserve quality over lower bit rates channels.
– Uses either the QCIF or S-QCIF formats.– Uses less than 64kbps.– PSTN and mobile network: 10 to 24kbps– Adopted by several videophone terminal standards:
H.324 (PSTN), H.320 (ISDN), H.310 (B-ISDN)
• H.264/AVC (1999-2003)– Double the coding efficiency in comparison to any other
existing video coding standards
92
Chronological Table of Video Coding Standards
H.261
(1990)
MPEG-1
(1991)
H.263
(1995/96) H.263+
(1997/98)
H.263++
(2000)
H.264
( MPEG-4
Part 10 )
(2002)MPEG-4 v1
(1998/99)
MPEG-4 v2
(1999/00)
MPEG-4 v3
(2001)
1990 1992 1994 1996 1998 2000 2002 2003
MPEG-2
(H.262)
(1994/95)ISO/IECMPEG
ITU-TVCEG
94
Agenda
• Introduction
• Audio & Video compression principles– Audio compression– Video compression– Audio/Video synchronisation
• A/V Compression standards
• Conclusion
Audio Compression principles
95
96
Speech Coding and Compression
• Waveform coding (PCM, DPCM, ADPCM)– Samples coding (G.711, G.721, G.722, G.723,
G.725, G.726, …)
• Source Coding– Speech modeling and parameters transmission of
the model (G728, G729, …)
• Hybrid Coding
98
Audio compression
• By identifying what can and, more important what cannot be heard, the schemes described obtain much of their compression by discarding information that cannot be perceived.
• Over the course of our evolutionary history we have developed limitations on what we can hear. – Some of these limitations are physiological, based on the
machinery of hearing.
– Others are psychological, based on how our brain processes auditory stimuli.
Audio Compression
• Sub-band Coding
– Techniques used in Layer I and II of MPEG audio are based on sub-band coding.
• Transform Coding
– DCT is used in Layer III of MPEG audio.
• Predictive Coding
– Frequency prediction is used in AC-3 and MPEG AAC.
100
104
Common Audio Formats and Standards
Pulse Code Modulation (PCM)– Differential Pulse Code Modulation (DPCM)
– Adaptive Differential Pulse Code Modulation (ADPCM)
• Compact Disc Digital Audio (CD-DA)
• MPEG Audio– Layer I
– Layer II
– Layer III
112
Audio compression
• Based on psycho-acoustics
• Compress the bit rate without affecting the quality perceived by the human ears (based on the
imperfection of human ears)
• Removal of irrelevancies
• 4 main principles :
– Threshold of audibility
– Frequency masking
– Critical bands
– Temporal masking
113
Audio compression• Principle 1: Threshold of audibility
Not all frequency components need to be encoded with the same resolution. Nr_bit(f) = (signal/threshold)db/6
http://www.audiodesignline.com
114
Audio compression• Principle 2: Frequency masking
Analysis of the incoming signal
http://www.audiodesignline.com
115
Audio compression
• Principle 3: Critical bands
– Human ear may be modelled as a collection of narrow band filters
– Bandwidth of these filters = critical band
– critical band(<100 Hz) for lowest audible frequencies( 4 kHz) for highest audible frequencies
– The human ear cannot distinguish between two sounds having two different frequencies in a critical band.Example : when we hear 50 & 60 Hz at the same time we cannot distinguish them.
– Consequence: Noise masking threshold depends solely of the signal energy within a limited bandwidth domain.The largest sound is taken as the representative of the critical band.Necessity to analyse the signal at 100Hz resolution at low-frequency
116
Audio compression• Principle 4: Temporal masking
The masking that occurs when a sound raises the audibility threshold for a brief interval preceding and following the
sound, selection of the frame duration for frequency analysis
and encoding.
http://www.audiodesignline.com
117
The MPEG encoder
http://www.audiodesignline.com
122
Audio features in MPEG
• MPEG1 :– Mono/stereo/dual/joint stereo (Possibility Dolby surround)
– Sampling frequencies : 32, 44.1 & 48 kHz
– 3 layers : trade-off complexity/delay versus coding efficiency of compression
– Various bit rate : trade-off quality versus bit rate
• MPEG2 :– 5.1 channels
– Sampling frequencies extended to 16, 22.05 & 24 kHz
123
Layer I coding
• The Layer I coding scheme provides a 4:1 compression.
• In Layer I coding the time frequency mapping is accomplished using a bank of 32 subband filters.
• The output of the subband filters is critically sampled. That is, the output of each filter is down-sampled by 32.
• The samples are divided into groups of 12 samples each. – Twelve samples from each of the 32 subband filters, or a total of 384
samples, make up one frame of the Layer I coder.
130
Layer II Coding
• The Layer II coder provides a higher compression rate by making some relatively minor modifications to the Layer I coding scheme.
• The compression ratio in Layer II coding can be increased from 4:1 to 8:1 or 6:1.
• These modifications include: – how the samples are grouped together,
– the representation of the scale factors, and
– the quantization strategy.
131
Layer III Coding - MP3
• One of the problems with the Layer I and Layer II coding schemes was that with the 32-band decomposition, the bandwidth of the subbands at lower frequencies is significantly larger than the critical bands.
• This makes it difficult to make an accurate judgment of the mask-to-signal ratio. – If we get a high amplitude tone within a subband and if the subband
was narrow enough, we could assume that it masked other tones in the band.
– However, if the bandwidth of the subband is significantly higher than the critical bandwidth at that frequency, it becomes more difficult to determine whether other tones in the subband will be be masked.
132
Layer III Coding - MP3
• Layer III offers almost CD quality with less than 2 bits/sample (enables transferring music files via Internet over 28.8kbps modems)
• A simple way to increase the spectral resolution would be to decompose the signal directly into a higher number of bands.
• However, one of the requirements on the Layer III algorithm is that it be backward compatible with Layer I and Layer II coders.
• To satisfy this backward compatibility requirement, the spectral decomposition in the Layer III algorithm is performed in two stages.
133
Layer III Coding - MP3
• First the 32-band subband decomposition used in Layer I and Layer II is employed.
• The output of each subband is transformed using a modified discrete cosine transform (MDCT) with a 50% overlap.
• The Layer III algorithm specifies two sizes for the MDCT, 6 or 18. This means that the output of each subband can be decomposed into 18 frequency coefficients or 6 frequency coefficients.
Advanced Audio Coding
• AAC (Advanced Audio Coding): audio compression formats defined by MPEG-2 standard.
• AAC was known as NBC (Non-Backward-Compatible), non compatible with MPEG-1 audio formats.
134
Advanced Audio Coding
• AAC can manipulate more channels than MP2 or MP3 (48 full audio channels and 16 enhanced low-frequency channels compared to 5 full audio channels and 1 enhanced low-frequency channel for MP2 or MP3),
• AAC can manipulates higher sampling frequencies than MP3 (up to 96kHz compared to 48kHz).
135
Video Compression principles
136
137
Video Compression
• Two applied techniques for video compression:
– Spatial or intraframe compression: removal of intra-picture redundancy in the image of each frame as in JPEG images
– Temporal or interframe compression: removal of inter-picture redundancy (between consecutive frames.) Coding of difference with an interpolated picture (moving
vectors).
138
Video compression
• Result– 4:2:0 SIF resolution : 30 Mbps
(= 25images/sec * 288lines * 352pixels * 1.5(lum & chrom) * 8bits)
±1.2 Mbps (CBR) in video CD (MPEG1)
– 4:2:2 CCIR 601 resolution : 166 Mbps (= 25images/sec * 576lines * 720pixels * 2(lum & chrom) * 8bits)
± 3-4 Mbps (mean) in MPEG2
Image Codec (e.g. JPEG)
VLC
Entropy Decoder
Transmit/Store
Image Model
RLEZigzagQuantizeDCTBlock
VLDRLDIZigzagIQuantizeIDCTIBlock
Blocks
• Process the data in blocks (sub-images) of 8x8 samples
• Covert Red-Green-Blue into Luminance (grayscale) and chrominance (Blue color difference and Red color difference)
• Use half resolution for chrominance (because eye is more sensitive to grayscale than to color)
• Each block contains redundant information.
140
Block
Discrete Cosine Transform
• DCT transformation (in frequency domain) decorrelates the input signal.
• Transform each block of 8x8 samples into a block of 8x8 spatial frequency coefficients.
• Most image blocks only contain a few significant coefficients (usually the lowest “frequencies”)– Energy tends to be concentrated into a few significant
coefficients (most energy in low spatial frequencies)– Other coefficients are close to zero / insignificant
141
DCT
Discrete Cosine Transform
• Any 8x8 block of pixels can be represented as a sum of 64 basis patterns (black and white patterns)
• Output of the DCT is the set of weights for these basis patterns (The DCT coefficients)– Multiply each basis pattern by its weight
and add them together
– Result is the original image block
142
Quantize and zig-zag scanning
• Divide each DCT coefficient by an integer, discard remainder
• high frequent spatial frequencies quantized with lower resolution than low ones (remove irrelevancy) - Result: loss of precision. Typically, a few non-zero coefficients are left
• Scan quantized coefficients in a zig-zag order: Non-zero coefficients tend to be grouped together
143
ZigzagQuantize
144
Video compression• Spatial redundancy reduction (DCT example)
158 0 -1 0 0 0 0 0 -1 -1 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
139 144 149 153 155 155 155 155144 151 153 156 159 156 156 156150 155 160 163 158 156 156 156
159 161 162 160 160 159 159 159159 160 161 162 162 155 155 155161 161 161 161 160 157 157 157162 162 161 163 162 157 157 157162 162 161 161 163 158 158 158
158 0 -1 -1 -1 -1 EOB
zig-zag scan
1260 -1 -12 -5 2 -2 -3 1 -23 -17 -6 -3 -3 0 0 -1 -11 -9 -2 2 0 -1 -1 0
-7 -2 0 1 1 0 0 0 -1 -1 1 2 0 -1 1 1 2 0 2 0 -1 1 1 -1 -1 0 0 -1 0 2 1 -1 -3 2 -4 -2 2 1 -1 0
DCT
Quantisation
Run-Length Encoding
• Encode each coefficient value as a (run, level) pair:– Run = number of zeros preceding value
– Level = non-zero value
• Usually, the block data is reduced to a short sequence of (run, level) pairs– This is now easy to compress using an entropy encoder
145
RLE
Variable-Length Encoding
• Encode each (run, level) pair using a variable-length code– Frequently occurring groups – assign a short code
– Infrequently occurring groups – assign a long code
• Result: compressed version of the image
146
VLC
Image decoding
• Reverse the stages to recover the image
• Information was thrown away during quantization– Decoded image will not be identical to the original
• In general: more compression = more quality loss
• Too much compression:– Block edges start to show (“blockiness”)
– High-frequency patterns start to appear (“mosquito noise”)
147
Video coding
• Moving images contain significant temporal redundancy– Successive frames are very similar
• Add an extra “motion model” at the “front end” of the image encoder
• The amount of data to be coded can be reduced significantly if the previous frame is subtracted from the current frame.
148
Video Encoder
• Video frames
VLC
Headers
Motion Model
RLEZigzagQuantizeDCTMotion Comp.
RescaleIDCT
Buffer
Motion Estim.
Recon.
Motion Vectors
Motion Vectors
Video Decoder
VLD
Headers
RLD IZigzag Rescale IDCTBuffer Recon.
Motion Estimation
• Process 16x16 luminance samples at a time (“macroblock”)
• Compare with neighboring area in previous frame
• Find closet matching area– Prediction reference
• Calculate offset between current macroblock and prediction reference area– Motion vector
151
152
MotionEstimation
Motion Compensation
• Subtract the reference area from the current macroblock– Difference macroblock
• Encode the difference macroblock with an image encoder
• If motion estimation was effective– Little data left in difference macroblock
– More efficient compression
153
154
Motion Compensation
– In Motion Estimation (ME), each macroblock (MB) of the Target P-frame is assigned a best matching MB from the previously coded I or P frame - prediction.
– prediction error: The difference between the MB and its matching MB, sent to DCT and its subsequent encoding steps.
– The prediction is from a previous frame — forward prediction.
154
155
Motion Compensation
• MPEG introduces a third frame type — B-frames, and its accompanying bi-directional motion compensation.
– Each MB from a B-frame will have up to two motion vectors (MVs) (one from the forward and one from the backward prediction).
– If matching in both directions is successful, then two MVs will be sent and the two corresponding matching MBs are averaged before comparing to the Target MB for generating the prediction error.
– If an acceptable match can be found in only one of the reference frames, then only one MV and its corresponding MB will be used from either the forward or backward prediction.
155
156
B-frame Coding Based on Bidirectional Motion Compensation.
156
157
The Need for Bidirectional Search.
The MB containing part of a ball in the Target frame cannot find a good matching MB in the previous frame because half of the ball was occluded by another object. A match however can readily be obtained from the next frame.
157
Motion Compensation
158
Video MPEG - Frame Types
• I (Intra): self-contained, only spatial compression (like JPEG)
• P (Predictive): referred to the P/I before. Temporal compression by extrapolation using macroblocks. A macroblock can be:
• Same: no change over the reference frame
• Moved: (eg. A ball in motion) is described by a vector of movement and eventually a correction (difference from original)
• New: (eg. What appears behind a door that opens) is described by spatial compression (like an I-frame)
• B (Bidirectional): temporal compression with interpolation; referred to the P/I before and the P/I after. Maximum compression, maximum computational complexity. It softens the image, reducing noise.
159
I Frames (Intra)
Intra frames are coded as self-contained,
without reference to other frames
72 x 1024 x 8 / 0,16 = 3,7Mbps
25 framesper second
18 KBytes I
18 KBytes I
18 KBytes I
18 KBytes I
18 KBytes I
160
P frames (Predictive)
Predictive frames are encoded using motion compensation based on
previous I or P frame
60 x 1024 x 8 / 0,24 = 2,0Mbps
18 KB I
6 KB P
6 KB P
18 KB I
6 KB P
6 KB P
18 KB I
161
B frames (Bidirectional)
Transmission order: 1,4,2,3,7,5,6,10,8,9,…
Bidirectional frames are encoded using motion compensation based on the nearest I or P previous and
subsequent
54 x 1024 x 8 / 0,36 = 1,2Mbps
18 KB I
1
4 KB B
2
4 KB B
3
6 KB P
4
4 KB B
5
4 KB B
6
6 KB P
7
4 KB B
8
4 KB B
9
18 KB I
10Common Values
162
Bidirectional Motion Compensation
B11
Pred 12
B1B2
B4B5
B7B8
B10
Intra 0
Pred 3
Pred 6
Intra 9
- Intra- Forward- Reverse- Bidirectional
16 x 16 bidirectionalmacroblocks
Group of Picture Structure
I-frames: for random access
intraframe coded; lowest compression
P-frames: predictive encoded
most recent I- or P- frame, medium compression
B-frames: interpolation
most recent & subsequent I- or P-frame, highest compression
163
Video compression• Temporal redundancy reduction
B
5
Bi-directional prediction
I : Intra-coded pictureP: Predicted pictureB: Bi-directionally interpolated picture
4
B
Order ofpresentation
Order oftransmission
BI P
0 3
B P
1 2 6
B
Prediction
I B P B
Increase of compressionrate
0 1 2 3 4
7
B P B
5 9
B I P
8
P B B P I B
86 7 9
164
Synchronisation - Getting data on time
• Synchronisation in the multimedia context refers to the mechanism that ensures a temporal consistent presentation of the audio-visual information to the user
• “On time” Not too late, not too earlyNo buffer over- or underflow
• Flow control : not applicable in broadcasting
• Common time base and Definition of a standard target decoder that describes the data consumption pattern of the receiver. – Remark: Direct MPEG (Microsoft) does not use time information for
clock recovery but relies on flow control
165
Streams
• Idea of continuity (pipelining): Carry time information for clock recovery
• No flow control (allows broadcasting): The emitter must have a precise knowledge of the receiver data consumption pattern (explicit in MPEG STD)
• Just-in-time: Shorter delay and smaller buffer size than with flow control
• Two aspects in synchronisation :Clock recovery & timing control (model & buffering)
166
Requirement on for stream transport
• Data information BER (Bit Error Rate) requirement
No repetition of frame possible FEC (Forward Error Correction)
• Time information No jitter
167
Agenda
• Introduction
• Audio & Video compression principles
• A/V Compression standards– The MPEG model and its situation in a communication context– JPEG & MJPEG– H.261 & MPEG-1– H.263 & MPEG-2– Visioconference
• Conclusion
168
MPEG Versions
• MPEG-1– For video storage in CD-ROM & transmission over T-1 lines (1.5Mbps)
• MPEG-2– Many options: 352x240 pixel; 720x480 pixel; 1440x1152 pixel;
1920x1080 pixel– Many profiles (set of coding tools & parameters)
• Main Profile– I, P & B frames; 720x480 conventional TV– Very good quality @ 4-6 Mbps
• MPEG-4– <64kbps to 4Mbps– Designed to enable viewing, access & manipulation of objects, not only
pixels– For digital TV, streaming video, mobile multimedia & games
169
MPEG Coding Standard
• Motion Picture Expert Group (MPEG)– Video and audio compression & multiplexing
– Video display controls• Fast forward, reverse, random access
• Elements of encoding– Intra- and inter-frame coding using DCT
– Bidirectional motion compensation
– Group of Picture structure
– Scalability options
• MPEG only standardizes the decoder
170
Video H.26x
• ITU-T video Standards for video conferencing: low speed, low turnover. Less action in movies.– H.261: Developed in the late 80 for ISDN (constant flow).
– H.263, H.263+, H.264. More modern and efficient.
• Simplified MPEG compression algorithms:– More restricted motion vectors (least action)
– In H.261: No frames B (excessive latency and complexity)
• Less CPU intensive. Feasible real-time software codec
171
Video H.26x (Cont’d)
• Subsampling 4:1:1
• Resolutions:– CIF (Common Interchange Format): 352 x 288
– QCIF (Quarter CIF): 176 x 144
– SCIF (Super CIF): 704 x 576
• Independent Audio: G.722 (quality), G.723.1, G.728, G.729
• Audio-video synchronization using H.320 (ISDN) and H.323 (Internet)
172
The MPEG model
Audiodecoder
Audio signal
Videosignal
Presented signals
Multiplexer
Videodecoder
Captured signals
Audioencoder
Videoencoder
Audio signal
Videosignal Digital storage medium
orNetwork
Transmission channel
Demulti-plexer
174
Components of the MPEG standard
• The MPEG standard is composed of 3 main parts :– Audio : Specifies the compression of audio signals
– Video : Specifies the compression of video signals
– System : specifies how the compressed audio and video signals are combined in the multiplexed stream (program stream or transport stream).
• Each part specifies :– The bitstream syntax
– The timing requirement and the related information (bit rate, buffer needs)
178
VideoEncoder
MPEG2 compression layer
Audioencoder
Audio,videosources
ES(ElementaryStream)
Adap-tationto thechannel
PS(1 pro-gram)
MPEG2 system layer
PSMulti-plexing
Adap-tationto thechannel
DVB, DVD ...
Disc
Satellite
TSMulti-plexing
TS(n pro-grams)
Adap-tationto thechannel
Cable
TS (Transport Stream)orPS (Program Stream)
MPEG in a communication context
• A simple view of MPEG in the communication context
JPEG & MJPEG
179
180
JPEG Coding Standard
• Key Components:– Transform:
• 8×8 DCT
• boundary padding
– Quantization:• uniform quantization
• DC/AC coefficients
– Coding:• Zigzag scan
• run length/Huffman coding
181181
JPEG Baseline Coder
169130
173129
170181
170183
179181
182180
179180
179179
169132
171130
169183
164182
179180
176179
180179
178178
167131
167131
165179
170179
177179
182171
177177
168179
169130
165132
166187
163194
176116
15394
153183
160183
Tour Example
182182
Step 1: Transform
• DC level shifting
• 2D DCT
169130
173129
170181
170183
179181
182180
179180
179179
169132
171130
169183
164182
179180
176179
180179
178178
167131
167131
165179
170179
177179
182171
177177
168179
169130
165132
166187
163194
176116
15394
153183
160183
412
451
4253
4255
5153
5452
5152
5151
414
432
4155
3654
5152
4851
5251
5050
393
393
3751
4251
4951
5443
4949
4051
412
374
3859
3566
4812
2534
2555
3655
-128
412
451
4253
4255
5153
5452
5152
5151
414
432
4155
3654
5152
4851
5251
5050
393
393
3751
4251
4951
5443
4949
4051
412
374
3859
3566
4812
2534
2555
3655
13
42
12
09
40
21
13
44
30
55
47
73
30
46
32
16
113
916
109
621
179
3310
810
1720
1024
2727
132
6078
4413
1827
2738
56313
DCT
183183
Step 2: Quantization
99103
101120
100112
121103
9895
8778
9272
6449
92113
77103
10481
10968
6455
5637
3524
2218
6280
5669
8751
5740
2922
2416
1714
1314
5560
6151
5826
4024
1914
1610
1212
1116
Q-table
13
42
12
09
40
21
13
44
30
55
47
73
30
46
32
16
113
916
109
621
179
3310
810
1720
1024
2727
132
6078
4413
1827
2738
56313
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
01
10
11
01
11
00
01
01
23
21
13
23
520
Q
Why increasefrom top-left tobottom-right?
184184
Step 3: Entropy Coding
Zigzag Scan
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
01
10
11
01
11
00
01
01
23
21
13
23
520
(20,5,-3,-1,-2,-3,1,1,-1,-1,0,0,1,2,3,-2,1,1,0,0,0,0,0,0,1,1,0,1,EOB)
Zigzag Scan
End Of the Block:
All following coefficients are zero
185
Video M-JPEG (Motion JPEG)
• The simplest: Try the video as a sequence of JPEG photos, without taking advantage of redundancy between frames.
• DCT Algorithm (Discrete Cosine Transform)
• less efficient, but low delay.
• Used in:
– Some digital recording systems and nonlinear
editing (editing independent of each frame)
– Some videoconferencing systems (low delay).
• It does not include standard audio support. The audio has been encoded by some other means (eg CD-DA) and synchronized by non-standard mechanisms.
186
H.261 & MPEG1
187187
H.261 Coding Standard
• Background:– Facilitate video conferencing and videophone service over
ISDN– p×64 kbps
• p=1: videophone; • p>5: videoconference; • p=30: VHS-quality;
– Basis of MPEG-1 and MPEG-2
• Features– Maximum coding delay of 150ms– Amenable to low-cost VLSA implementation
188
Input Image Formats
CIF QCIF
# of pels/line (Y)
# of pels/line (U/V)
360(352)
180(176)
180(176)
90(88)
# of lines/pic (Y)
# of lines/pic (U/V)
288
144
144
72
Interlacing 1:1 1:1
Temporal rate 30,15,10,7.5 30,15,10,7.5
Aspect ratio 4:3 4:3
189189
Video Multiplex
• It defines a data structure so that a decoder can interpret the received bit stream without any ambiguity
• Hierarchical data structure– Picture layer– Group of blocks (GOB) layer– Macroblock (MB) layer– Block layer
– Each layer has a distinct header
190190
Picture and GOB Layers
• Picture layer consists of picture header followed by the data for GOBs
– Picture header contains data such as picture format (CIF or QCIF)
• GOB layer is always composed of 33 MBs
– GOB header contains a MB address and compression mode followed by the data for the blocks
191
Macroblock and Block Layers
Macroblock: the smallest unit to select the compression mode
Y1 Y2
Y4Y3
Cr Cb
A MB always consists of 6 blocks (Y1 – Y4, Cr, Cb)
MBA MTYPE MQUANT MVD CBP Block Data
192192
Compression Modes
• Intra Mode– Similar to JPEG coding
– Support two compression modes
• Inter Mode– ME is not specified (MC is optional)
– Usually, 16-by-16 BMA, integer-pel accuracy, search range [-15,15]
– Support various compression modes
197
CRC errorand
Fixed-lengthcontrol
p x 64
I-DCT
MotionEstimation
Q-1
Q
Intra
DCTHuffman
VLC
FrameMemory
- Inter
Filter
Motion Vector
8x8block
H.261 Encoder
• Intended for videoconferencing applications
• Bit rates = p x 64 kbps, p = 2, 6, 24 common
198
MPEG-1• MPEG-1 adopts the SIF (Source Input Format) digital TV format.
• MPEG-1 supports only non-interlaced video. Normally, its picture resolution is:
– 352×240 for NTSC video at 30 fps– 352×288 for PAL video at 25 fps– It uses 4:2:0 chroma subsampling
• The MPEG-1 standard is also referred to as ISO/IEC 11172. It has five parts: – 11172-1 Systems, – 11172-2 Video, – 11172-3 Audio, – 11172-4 Conformance, and – 11172-5 Software.
198
200
Hierarchical Data Structure
• Sequences are formed by Group Of Pictures (GOP)
• GOP are made up of pictures (frames)
• Pictures consist of slices
• Slices are made up of macro-blocks (MB)
• Macro-blocks consist of blocks
• Blocks are 8×8 pixels arrays
201
Layers of MPEG-1 Video Bitstream.
201
Hierarchical Data Structure
Example of temporal picture structure
202
203
Slices in an MPEG-1 Picture.
203
204
Video MPEG (MPEG-1)
• Subsampling 4:2:0 (25% more savings than 4:2:2)
• Two possible formats:
– SIF (Standard Interchange Format) - in PAL (396 MBs):
– Y: 352x288 pixels,
– Cr & Cb: 176x144 pixels
– QSIF (Quarter SIF) (99 MBs):
– Y: 176 x 144;
– Cr & Cb : 88 x 72
• Two compression types (simultaneously):
– Spatial: as in JPEG
– Temporal: takes advantage of each frame having similarity with those around.
206
MPEG-1 Video
• Typical Sequence (360ms): I1 B2 B3 P4 B5 B6 P7 B8 B9 I10
• Order of encoding / decoding : I1 P4 B2 B3 P7 B5 B6 I10 B8 B9
• Typical size of frames (SIF, 352x288):
– I: 18kBytes (7:1)
– P: 6kBytes (20:1)
– B: 2.5 - 4kBytes (50:1)
– Average bit rate (IBBPBBPBBI): 1.2Mbps
– With QSIF the bit rate is reduced to 300kbps
• Compression Latency (Typical values):
– M-JPEG: 45 ms
– MPEG frames I: 200 - 400 ms
– MPEG frames I & P: 200 - 500 ms
– MPEG frames I, P & B: 400 - 850 ms
210
MB Types in MPEG-I
I-pictures P-pictures B-pictures
Intra Intra Intra
Intra-A Intra-A Intra-A
Inter-D Inter-F
Inter-DA Inter-FD
Inter-F Inter-FDA
Inter-FD Inter-B
Inter-FDA Inter-BD
Skipped Inter-BDA
A- adaptive quantization
F- forward prediction with MC
D- DCT of prediction error will be coded
B – backward prediction with MC
I – interpolated prediction with MC
Inter-I
Inter-ID
Inter-IDA
Skipped
214
Audio MPEG-1
• Mono or stereo sampling to 32, 44.1 (CD) or 48 (DAT) kHz. If you are using a reduced bit rate it is desirable to sample at 32 kHz.
• Psychoacoustic compression (with losses) asymmetric.
• From 32 to 448 kbps per audio channel
• Three layers in ascending order of complexity/quality:
– Layer I: good quality with 192-256 kbps per channel is not used
– Layer II: 96-128 kbps CD quality per channel
– Layer III: quality CD with 64 kbps per channel
• Each layer introduces new algorithms, and includes those of the above.
• Layer III used in DAB (Digital Audio Broadcast) and MP3
216
MPEG-1System
• Responsible for ensuring the synchronization between audio and video through a system of time slots ( 'timeslots') based on a clock of 90kHz.
• It is only necessary if using audio and video simultaneously (not for MP3 streams for example)
• Requires a small flow (5-50kbps)
217
Audio encoder
Video encoder
System Multiplexer
Analog audio signal
MPEG-1 stream
Analog video signal
Synchronization of audio and video MPEG
Digital video stream with timeslots
Digital audio stream with timeslots
Clock90 KHz
During the decoding the reverse process is performed
Prototypical Decoder ISO/IEC 11172
219
220
Major Differences from H.261
• Source formats supported:– H.261 only supports CIF (352 × 288) and QCIF (176 × 144) source formats,
MPEG-1 supports SIF (352 × 240 for NTSC, 352 × 288 for PAL).– MPEG-1 also allows specification of other formats as long as the Constrained
Parameter Set (CPS) as shown in the following Table is satisfied:
The MPEG-1 Constrained Parameter Set
220
Parameter Value
Horizontal size of picture ≤ 768
Vertical size of picture ≤ 576
No. of MBs / picture ≤ 396
No. of MBs / second ≤ 9,900
Frame rate ≤ 30 fps
Bit-rate ≤ 1,856 kbps
225
MPEG-I vs. H.261
H.261 MPEG-1
Sequential access Random access
One basic frame rate Flexible frame rate
CIF and QCIF images only Flexible image size
I and P frames only I, P and B frames
MC over 1 frame MC over 1 or more frames
Integer-pel MV accuracy Half-pel MV accuracy
Spatial filtering in the loop No filter
Variable threshold+uniform quantization
Quantization matrix
No GOP structure GOP structure
GOB structure Slice structure
226
H.263/H.263+ & MPEG2
227
Video Codecs: H.263
• Frame-based coding
• Low Bit rate Coding: – < 64 kbps (typical)
• H.261 coding with improvements– I/P/B frames– Additional Image formats: 4CIF, 16CIF
• Suitable for desktop video conferencing over low-speed links
230
H.263 Baseline Coding Algorithm
• Video Frame Structure– support sub-QCIF, QCIF, CIF, 4CIF and 16CIF
• Video Coding Tools– Motion estimation and compensation
• range : [-16,15.5] accuracy : half-pel
– Transform: 8×8 DCT
– Quantization: Q factor
– Entropy Coding: 3D VLC (LAST,RUN,LEVEL)
• Coding Control– Intra/Inter switch
7,0,,
, nmQ
cc
nmq
nm
231
Advanced Coding Modes in H.263
Unrestricted motion vector mode• range : [-31.5,31.5]• Allow MV to point outside the picture boundaries• Syntax-based arithmetic coding mode• About 5% savings over VLC• Advanced prediction mode
Overlapped Block Motion Compensation (OBMC)• PB-frame mode
I B P B P …
235
H.263+
• Advanced intra coding mode
• Deblocking filter mode
• Slice structure mode
• Supplemental enhancement information mode
• Improved PB-frame mode
• Reference picture selection mode
• Temporal, SNR and Spatial scalability mode
• Reference picture resampling mode
• Reduced resolution update mode
• Independently segmented decoding mode
• Alternative Inter VLC mode
• Modified quantization mode
244
MPEG-2• MPEG-2: For higher quality video at a bit-rate of more than 4
Mbps.
• Defined seven profiles aimed at different applications (toolboxes) :– Simple profile (No B picture), – Main profile (=MPEG1+interlaced, Does not support scalability), – SNR scalable profile (allows graceful degradation (noise improvement
at same resolution), – Spatial scalable profile (hierarchical coding : improvement at higher
resolution), – High profile.– 4:2:2 Profile, – Multiview Profile.
244
246
Video MPEG-2
• Compatible extension of MPEG-1
• Designed for digital TV:– Optimized for transmission, not storage
– Provides interlaced video (TV) as well as progressive (MPEG-1 was only progressive)
• According to the values of the sampling parameters used are defined in MPEG-2 four levels exist:– Low: 352x288 (supports MPEG-1)
– Main: 720x576 (equivalent CCIR 601)
– High-1440: 1440x1152 (HDTV 4:3)
– High: 1920x1152 (HDTV 16:9)
247
Profiles and Levels in MPEG-2
Four Levels in the Main Profile of MPEG-2
247
Level Simple profile
Main profile
SNR Scalable profile
Spatially Scalable profile
High Profile
4:2:2 Profile
Multiview Profile
HighHigh 1440MainLow
*
****
**
**** * *
Level Max. Resolution Max fps
Max pixels/sec
Max coded Data Rate (Mbps)
Application
High High 1440
Main Low
1,920 × 1,1521,440 × 1,152
720 × 576352 × 288
60 603030
62.7 × 106
47.0 × 106
10.4 × 106
3.0 × 106
8060 154
film productionconsumer HDTV
studio TVconsumer tape equiv.
248
Profiles Simple Main SNR Scalability
Spatial
Scalability
High 4:2:2 (Studio)
Subsampling 4:2:0 4:2:0 4:2:0 4:2:0 4:2:0/2 4:2:2
High 1920x1152 (HDTV 16:9)
80Mbps 100Mbps
High -1440 1440x1152 (HDTV 4:3)
60Mbps 60Mbps 80Mbps
Main 720x576 (CCIR 601)
15Mbps 15Mbps 15Mbps 20Mbps 50Mbps
Low 352x288 (MPEG1)
4Mbps 4Mbps
Leve
lsBit rates of Levels and Profiles MPEG-2
The peak rates are shown under the standard for each combination of profile and level.
249
Five Modes of Predictions
• MPEG-2 defines Frame Prediction and Field Prediction as well as five prediction modes:
1. Frame Prediction for Frame-pictures: Identical to MPEG-1 MC-based prediction methods in both P-frames and B-frames.
2. Field Prediction for Field-pictures: A macroblock size of 16×16 from Field-pictures is used.
249
250
3. Field Prediction for Frame-pictures: The top-field and bottom-field of a Frame-picture are treated separately. Each 16×16 macroblock (MB) from the target Frame-picture is split into two 16×8 parts, each coming from one field. Field prediction is carried out for these 16×8 parts.
4. 16×8 MC for Field-pictures: Each 16×16 macroblock (MB) from the target Field-picture is split into top and bottom 16×8 halves. Field prediction is performed on each half. This generates two motion vectors for each 16×16 MB in the P-Field-picture, and up to four motion vectors for each MB in the B-Field-picture.
This mode is good for a finer MC when motion is rapid and irregular.
250
Five Modes of Predictions
251
5. Dual-Prime for P-pictures: First, Field prediction from each previous field with the same parity (top or bottom) is made. Each motion vector mv is then used to derive a calculated motion vector cv in the field with the opposite parity taking into account the temporal scaling and vertical shift between lines in the top and bottom fields. For each MB the pair mv and cvyields two preliminary predictions. Their prediction errors are averaged and used as the final prediction error.
This mode mimics B-picture prediction for P-pictures without adopting backward prediction (and hence with less encoding delay).
This is the only mode that can be used for either Frame-pictures or Field-pictures.
251
Five Modes of Predictions
252
Supporting Interlaced Video
• MPEG-2 must support interlaced video as well since this is one of the options for digital broadcast TV and HDTV.
• In interlaced video each frame consists of two fields, referred to as the top-field and the bottom-field.
– In a Frame-picture, all scanlines from both fields are interleaved to form a single frame, then divided into 16×16 macroblocks and coded using MC.
– If each field is treated as a separate picture, then it is called Field-picture.
252
259
Audio MPEG-2
• Algorithms:– Version compatible with MPEG-1 Layer I, II and III
– Improved Compression System Advanced Audio Coding (AAC). Comparable quality to MPEG-1 layer III with 50-70% of flow. Not compatible with MPEG-1.
• Channels:– Stereo version compatible with MPEG-1
• Independent (each channel)
• Set (exploits redundancy between channels)
– Support multi-channel (languages) and 5.1 (5 channels surround)
261
MPEG-2 Scalabilities
• The MPEG-2 scalable coding: A base layer and one or more enhancement layers can be defined — also known as layered coding.
– The base layer can be independently encoded, transmitted and decoded to obtain basic video quality.
– The encoding and decoding of the enhancement layer is dependent on the base layer or the previous enhancement layer.
• Scalable coding is especially useful for MPEG-2 video transmitted over networks with following characteristics:– Networks with very different bit-rates.– Networks with variable bit rate (VBR) channels.– Networks with noisy connections.
261
262
MPEG-2 Scalabilities (Cont’d)
• MPEG-2 supports the following scalabilities:
1. SNR Scalability—enhancement layer provides higher SNR (Different levels of quality), base/enhancement layer uses a coarse/fine quantizer for DCT coefficients.
2. Spatial Scalability — enhancement layer provides higher spatial resolution (Different resolutions), base/enhancement layer is a low/high spatial resolution of the video.
3. Temporal Scalability—enhancement layer facilitates higher frame rate (Different frame rates), allow the decodability at different frame rates.
4. Hybrid Scalability — combination of any two of the above three scalabilities.
5. Data Partitioning — quantized DCT coefficients are split into partitions (Separate headers and payloads apart).
• Limited scalability capabilities: Three layers only
262
264
Non-Scalable
Non-scalable Bit stream
Decoder 1 Decoder 2 Decoder 3
265
Spatial Scalability
Scalable bit stream
Decoder 1
Decoder 4Decoder 3
Decoder 2
268
PSNR Scalability (Quality)
Scalable Bit stream
Decoder 1 Decoder 2 Decoder 3
272272
Temporal scalability
1 0 1 1 1 … 0 1 0 1 0 0 0 … 1 1 0 1 0 0
Frame 0,1,2,3,4,5,…Frame 0,2,4,6,8,…Frame 0,4,8,12,…
30Hz15Hz7.5Hz
276
Hybrid Scalability
• Any two of the above three scalabilities can be combined to form hybrid scalability:
1. Spatial and Temporal Hybrid Scalability.2. SNR and Spatial Hybrid Scalability.3. SNR and Temporal Hybrid Scalability.
• Usually, a three-layer hybrid coder will be adopted which consists of: – Base Layer, – Enhancement Layer 1, and – Enhancement Layer 2.
276
277
Data Partitioning
• The Base partition contains lower-frequency DCT coefficients, enhancement partition contains high-frequency DCT coefficients.
• Strictly speaking, data partitioning is not layered coding, since a single stream of video data is simply divided up and there is no further dependence on the base partition in generating the enhancement partition.
• Useful for transmission over noisy channels and for progressive transmission.
277
278
Major Differences from MPEG-1• Better resilience to bit-errors: In addition to Program Stream, a
Transport Stream is added to MPEG-2 bit streams.
• Support of 4:2:2 and 4:4:4 chroma subsampling.
• More restricted slice structure: MPEG-2 slices must start and end in the same macroblock row. In other words, the left edge of a picture always starts a new slice and the longest slice in MPEG-2 can have only one row of macroblocks.
• More flexible video formats: It supports various picture resolutions as defined by DVD, ATV and HDTV.
278
279
Major Differences from MPEG-1 (Cont’d)
• Nonlinear quantization — two types of scales are allowed:
1. For the first type, scale is the same as in MPEG-1 in which it is an integer in the range of [1, 31] and scalei = i.
2. For the second type, a nonlinear relationship exists, i.e., scalei ≠ i. The ith scale value can be looked up from the following Table.
Table : Possible Nonlinear Scale in MPEG-2
279
280280
Other Improvements
MPEG-I MPEG-II
Intra MB
DC Coeff.
8bits 11bits
Intra MB
AC Coeff.
[-256,255] [-2048,2047]
Non-intra MB Coeff.
[-256,255] [-2048,2047]
Finer Quantization of the DCT Coefficients
282
Videoconference
• Interactive communication through audio, video and data sharing
• It can be:
– Point to point
– Point to multipoint
– Multipoint to multipoint
283
Requirements / Features of the videoconference
• Compression / Decompression in real time.
• 200-400 ms maximum delay.
• Mobility disabled.
• Normally acceptable quality audio phone.
• Need to synchronize audio and video.
• Need for signaling protocol (connectionless service).
284
Videoconference Standards
• Videoconferencing systems have been standardized by the ITU-T (International Telecommunications Union -Telecommunications sector) in the standards of the series H(multimedia and audiovisual systems)
• The H.32x are videoconferencing standards. The 'x' depends on the type of network used
285
H.32x Standards
Standard Physical environment
Service Type Year approval
H.320 ISDN
Streaming a/v128 to 384 Kb/s
Circuit 1990
H.321 ATM Circuit
H.322 IsoEthernet TDM
H.323 EthernetStreaming a/v 14,4 - 512 Kb/s
Packet 1996
H.324 analog Modem Circuit
The H.32x are standards umbrella. Each is based on a previous set of standards to specify all the necessary services in a videoconference. e.g., G.711 audio coding
286
H.320 Standard
287
H.323 Standard
• Packet-based multimedia communications systems
288
H.320 & H.323 Standards
ISDN IP
H.261
Video Coding
H.221
Binary train conversion
H.243
Multi Point
G.711
3.1kHz audio 64/56kbps
H.242
Control Protocol
H.230
Signalization and Control
G.722
7kHz audio 64/56/48kbps
G.728
3.1kHz audio 16kbps
H.261/263
Video Coding
H.245
Control Protocol
H.225
packetization
G.711
3.1kHz audio 64/56kbps
G.723
3.1kHz audio 5.3kbps
Q.931
Call Signalization
RAS
Gate Keeper Signalization
T.120
Data Protocols Multimedia Communication
289
H.320 & H.323 Standards
H.323 H.320
Control H.225.0 Call Control Q.931
H.245 System Control H.242
H.225.0 Multiplexing H.221
Media G.711
G.722
G.723.1
G.728
Audio G.711
G.722
G.728
H.261
H.263
Video H.261
H.263
T.120 Data T.120
290
H.32x audio Formats
Codec Original bandwidth
(kbps)
Compression Ratio
Compressed Bandwidth (kbps)
G.711
G.722
G.723.1
G.728
G.729
MPEG
64
224
64
64
64
706
1 : 1
3,5-4,6 : 1
10 : 1
4 : 1
8 : 1
3-11 : 1
64
48-64
6,4
16
8
64-256
MPEG is not an audio format H.323. It only appears for comparison
294
Agenda
• Introduction
• Audio & Video compression principles
• A/V Compression standards
• Conclusion
295
Some Digital Audio Formats
FormatSampling Freq.
(KHz)# Channels
Capacity per Channel (Kb/s)
Application
PCM (G.711) 8 1 64 Telephony
ADPCM (G.721) 8 1 32 Telephony
SB-ADPCM (G.722) 16 1 48/56/64 Vídeoconferenc.
MP-MLQ (G.723.1) 8 1 6,3/5,3 variable Internet Telephony
ADPCM (G.726) 8 1 16/24/32/40 Telephony
E-ADPCM (G.727) 8 1 16/24/32/40 Telephony
LD-CELP (G.728) 8 1 16 Telephony /Videoc.
CS-ACELP (G.729) 8 1 8 Internet Telephony
RPE-LTP (GSM 06.10) 8 1 13,2 GSM Telephony
CELP (FS 1016) 8 1 4,8
LPC-10E (FS 1015) 8 1 2,4
CD-DA / DAT 44,1/48 2 705,6/768 Hi-Fi Audio
MPEG-1 Layer I 32/44,1/48 2 192-256 variable
MPEG-1 Layer II 32/44,1/48 2 96-128 variable
MPEG-1 Layer III (MP3) 32/44,1/48 2 64 variable Hi-Fi Internet
MPEG-2 AAC 32/44,1/48 5.1 32-44 variable Hi-Fi Internet
High delay
Low delay
296
Digital Video FormatsVideo Format Y Size
Color Sampling
Frame Rate (Hz)
Raw Data Rate (Mbps)
HDTV Over air. cable, satellite, MPEG2 video, 20-45 Mbps SMPTE296M 1280x720 4:2:0 24P/30P/60P 265/332/664 SMPTE295M 1920x1080 4:2:0 24P/30P/60I 597/746/746 Video production, MPEG2, 15-50 Mbps BT.601 720x480/576 4:4:4 60I/50I 249 BT.601 720x480/576 4:2:2 60I/50I 166 High quality video distribution (DVD, SDTV), MPEG2, 4-10 Mbps BT.601 720x480/576 4:2:0 60I/50I 124 Intermediate quality video distribution (VCD, WWW), MPEG1, 1.5 Mbps SIF 352x240/288 4:2:0 30P/25P 30 Video conferencing over ISDN/Internet, H.261/H.263, 128-384 Kbps CIF 352x288 4:2:0 30P 37 Video telephony over wired/wireless modem, H.263, 20-64 Kbps QCIF 176x144 4:2:0 30P 9.1
297
Format SQCIF QCIF CIF 4CIF or SCIF
16CIF 4:3
16CIF 16:9
Resolution 128x96 176x144 352x288 704x576
720x576
1408x11521440x1152
1920x1152
H.261 Op.
H.263 Op. Op.
MPEG-4
MPEG-1
MPEG-2 Low Principal High 1440 High
Stan
dar
dCompressed video standard resolutions
298
System Spatial Compression
(DCT)
Temporal Compression
Complexity Compression
Efficiency delay
M-JPEG Yes No Medium Low Very small
H.261 Yes Limited
(I & P)
High Medium small
MPEG-1/2 Yes Extended
(I, P & B)
Very High Large high
H.263
MPEG-4
Yes Extended
(I, P & B)
Enormous large Half high
Video compression formats
299
Standard/Format Typical Bandwidth Compression Ratio
CCIR 601 170Mbps 1:1 (Reference)
M-JPEG 10-20Mbps 7-27:1
H.261 64 – 2000kbps 24:1
H.263 28,8-768kbps 50:1
MPEG-1 0,4-2,0Mbps 100:1
MPEG-2 1,5-60Mbps 30-100:1
MPEG-4 28,8-500kbps 100-200:1
Video compression formats Bit rates
Lowdelay
Highdelay
300
Video compression formats
Type Method Format Original Compressed
Video Conference
H.261 176x144 or 352x288
@10-30 fr/sec
2-36 Mbps 64-1544kbps
Full Motion MPEG2 720x480 @30 fr/sec
249 Mbps 2-6Mbps
HDTV MPEG2 1920x1080 @30 fr/sec
1.6 Gbps 19-38Mbps
303
Agenda
• Introduction
• Audio & Video compression principles
• A/V Compression standards
• Conclusion
References
• Yun Q. Shi, Huifung Sun, 2008. Image and Video Compression for Multimedia Engineering. Fundamentals, Algorithms, and Standards. CRC Press.
• Gonzalez, Woods, 2008. Digital Image Processing. Prentice Hall.
• Jae-Beom Lee, Hari Kalva, 2008. The VC-1 and H.264 Video Compression Standards for Broadband Video Services. Springer.
• H.R. Wu & .R. Rao, 2006. Digital Video Image Quality and Perceptual Coding. Taylor & Francis Group. LLC.
• Khalid Sayood, 2005. An introduction to data compression. Morgan Kaufmann Publishers.
• I.E.G. Richardson, 2003. H.264 and MPEG-4 Video Compression. Video Coding for next generation multimedia. John Wiley & Sons, Ltd.
• Richardson, 2002. Video Codec Design. John Wiley & Sons.
• John WATINSON, 2001. The MPEG Handbook MPEG1, MPEG2, MPEG4. Focal Press.
• Ghanbari, 1999. Video coding: an introduction to standard codecs. IEE Press.
• Riley and Richardson, 1997. Digital Video Communications. pub. Artech House.
• Bhaskaran V, Konstantinides, 1996. Image and video compression standards – algorithms and architectures. Kluwer academic publishers.
• Netravali, A N and Haskell, B G, 1995. Digital pictures: Representation, Compression and Standards. 2nd Edition, Plenum Press.
304
References
• www.chiariglione.org/mpeg/
• http://www.mpeg.org
• http://jura1.eng.rgu.ac.uk/ (Digital Video pages)
• http://www.vcodex.com
305