T.Sharon-A.Frank Multimedia Video/Audio Compression.

44
T.Sharon-A.Frank Multimedia Video/Audio Compression

Transcript of T.Sharon-A.Frank Multimedia Video/Audio Compression.

Page 1: T.Sharon-A.Frank Multimedia Video/Audio Compression.

T.Sharon-A.Frank

Multimedia

Video/Audio Compression

Page 2: T.Sharon-A.Frank Multimedia Video/Audio Compression.

2T.Sharon-A.Frank

Hybrid coding

• Images:– JPEG

• Video/Audio– M-JPEG

– MPEG (1, 2, 4)

– Other codings

– H.26x

Page 3: T.Sharon-A.Frank Multimedia Video/Audio Compression.

3T.Sharon-A.Frank

Video Coding Requirements

• Random access• Fast forward /reverse searches• Reverse playback• Audio-visual synchronization• Robustness to errors• Low coding/decoding delay• Editability• Format flexibility• Cost tradeoffs

Page 4: T.Sharon-A.Frank Multimedia Video/Audio Compression.

4T.Sharon-A.Frank

• Spatial (intra-frame) compression:– Compresses each frame in isolation, treating it as

a bitmapped image.– Based on quantization of DCT coefficients.

• Temporal (inter-frame) compression:– Compresses sequences of frames by only storing

differences between them.– Record displacement of object plus changed pixels

in area exposed by its movement.– Based on Motion Compensation (MC).

Video Compression

Page 5: T.Sharon-A.Frank Multimedia Video/Audio Compression.

5T.Sharon-A.Frank

• Image compression applied to each frame.

• Can therefore be lossless or lossy, but lossless rarely produces sufficiently high compression ratios for volume of data.

• Lossless compression implies a loss of quality if decompressed then recompressed.

• Ideally, work with uncompressed video during post-production.

Spatial Compression

Page 6: T.Sharon-A.Frank Multimedia Video/Audio Compression.

6T.Sharon-A.Frank

• Key frames are spatially compressed only– Key frames often regularly spaced

(e.g., every 12 frames).

• Difference frames only store the differences between the frame and the preceding frame or most recent key frame.

• Difference frames can be efficiently spatially compressed.

Temporal Compression

Page 7: T.Sharon-A.Frank Multimedia Video/Audio Compression.

7T.Sharon-A.Frank

Motion-JPEG (M-JPEG)

• Purely spatial compression. • Apply JPEG compression to each video frame.• Compression rates: 2:1 to 12:1

– lossy: up to 5:1 is considered broadcast quality.

• No standard, but MJPEG-A format widely supported.

• Excellent when there are rapid scene changes in the video.

• Easy to edit.

Page 8: T.Sharon-A.Frank Multimedia Video/Audio Compression.

8T.Sharon-A.Frank

Video Compression

• Divide Image to blocks– 16x16 luminance

– 8x8 chrominance (color)

• Use DCT based techniques for spatial redundancy removal (Intra-frame compression).

• Use MC (Motion Compensation) techniques for temporal redundancy removal (Inter-frame compression).

• Final stage is two dimensional run-length coding.

Coding of video is carried out in a series of steps:

Usually

Page 9: T.Sharon-A.Frank Multimedia Video/Audio Compression.

9T.Sharon-A.Frank

Three consecutive video frames

Page 10: T.Sharon-A.Frank Multimedia Video/Audio Compression.

10T.Sharon-A.Frank

Motion Compensation

• Motion compensation compensates for inter-frame differences.

• Real-time communication consideration – only the closest previous frame is used for prediction to reduce the encoding delay.

previous frame current frame

best match

Page 11: T.Sharon-A.Frank Multimedia Video/Audio Compression.

11T.Sharon-A.Frank

Motion Compensation Algorithm

• Sends new location of block

• If block changed more than a certain threshold, resends all the block

• Refreshes all the image once in a while

best match

previous frame current frame

Page 12: T.Sharon-A.Frank Multimedia Video/Audio Compression.

12T.Sharon-A.Frank

Frame Types in Compressed Video

• Key Frame– Compression is based on content of this frame.

• Difference/Delta Frame– Compression is based on last key frame.

Page 13: T.Sharon-A.Frank Multimedia Video/Audio Compression.

13T.Sharon-A.Frank

Bi-directional Motion Compensated Interpolation

Page 14: T.Sharon-A.Frank Multimedia Video/Audio Compression.

14T.Sharon-A.Frank

MPEG Dynamics

• Delicate balance between Intra-frame and Inter-frame coding.

• Two basic techniques:– Transform domain DCT-based compression

for the reduction of spatial redundancy (intra-frame).

– Block-based bi-directional MC for reduction of the temporal redundancy (inter-frame).

Page 15: T.Sharon-A.Frank Multimedia Video/Audio Compression.

15T.Sharon-A.Frank

The MPEG Standard

Three types of MPEG-2 frames processed by the viewing program:

1. I (Intracoded) frames: self-contained JPEG-encoded still pictures.

2. P (Predictive) frames: block-by-block difference with the last frame.

3. B (Bidirectional) frames: differences with the last and next frame.

Page 16: T.Sharon-A.Frank Multimedia Video/Audio Compression.

16T.Sharon-A.Frank

Use of MPEG Image Types

<I> Intra-picture/frame/image– Access points for random access– Moderate Compression

<P> Predicted pictures– Coded with a reference to a past picture – Used as reference for future predicted pictures

<B> Bi-directional prediction (interpolated pictures) – Require past and future reference for prediction– Highest compression

Page 17: T.Sharon-A.Frank Multimedia Video/Audio Compression.

17T.Sharon-A.Frank

• Group of Pictures (GOP):– Repeating sequence of I-, P- and B-pictures.

– Always begins with an I-picture.

– Display order – frames in order they will be displayed.

– Bitstream order – re-ordered so that every P- or B-picture comes after frames it depends on, allowing reconstruction of the complete frames.

MPEG GOPs

Page 18: T.Sharon-A.Frank Multimedia Video/Audio Compression.

18T.Sharon-A.Frank

A Typical MPEG Picture Display Order

I B B B B B B IP

Forward prediction

I I? B B? 25fps (9 I/P, 17B)

Page 19: T.Sharon-A.Frank Multimedia Video/Audio Compression.

19T.Sharon-A.Frank

A Typical MPEG Picture Bitstream Order

• Transmitting order: 1, 5, 2, 3, 4, 9, 6, 7, 8

Forward prediction

1 2 3 4 5 6 7 8 9

I B B B P B B B I

Bi-directional prediction

Page 20: T.Sharon-A.Frank Multimedia Video/Audio Compression.

20T.Sharon-A.Frank

MPEG Standards

• MPEG-1– 352x240 at 30 fps.

– Quality is slightly below standard VCR videos.

• MPEG-2– 720x480 & 1280x720 at 60 fps, with full CD-quality audio.

– Sufficient for television (including HDTV).

– Used on DVD-ROMs.

• MP3– Audio compression.

– Reduces digital sound files by 12:1 ratio with virtually no loss in quality.

Page 21: T.Sharon-A.Frank Multimedia Video/Audio Compression.

21T.Sharon-A.Frank

• Source Interchange Format (SIF)– 4:2:0 chrominance sub-sampling– 352x240 pixel frame

• MPEG-1 compressed SIF video at 30 frames per second has data rate of 1.86Mbps (CD video – 40mins of video at that rate).

• MPEG-1 can be scaled up to larger frames, but cannot handle interlacing.

MPEG-1 Compression

Page 22: T.Sharon-A.Frank Multimedia Video/Audio Compression.

22T.Sharon-A.Frank

• Profiles define subsets of the features of the data stream.

• Levels define parameters such as frame size and data rate.

• Each profile may be implemented at one or more levels.

• Notation: profile@level, e.g. MP@ML.

MPEG Profiles & Levels

Page 23: T.Sharon-A.Frank Multimedia Video/Audio Compression.

23T.Sharon-A.Frank

• MPEG-2 Main Profile at Main Level (MP@ML) used for DVD video:– CCIR 601 scanning

– 4:2:0 chrominance sub-sampling

– 15 Mbits per second

– Most elaborate representation of MPEG-2 compressed data.

MPEG-2 Main Profile & Level

Page 24: T.Sharon-A.Frank Multimedia Video/Audio Compression.

24T.Sharon-A.Frank

• Refinement of MPEG-1 compression:– I-pictures compressed by quantizing and Huffman

coding DCT coefficients.

– Improved motion compensation leads to better quality than MPEG-1 at same bit rates.

• Designed to support a range of multimedia data at bit rates from 10Kbps to >1.8Mbps.

• Applications from mobile phones to HDTV.• Video codec becoming popular for Internet use –

is incorporated in QuickTime, RealMedia and DivX.

MPEG-4 (1)

Page 25: T.Sharon-A.Frank Multimedia Video/Audio Compression.

25T.Sharon-A.Frank

• Standard defines an encoding for multimedia streams made up of different sorts of object –video, still images, animation, 3-D models…

• Higher profiles divide a scene into arbitrarily shaped video objects were each one may be compressed and transmitted separately; scene is composed at receiving end by combining them.

• SP and ASP profiles restricted to rectangular objects, usually complete frames.

MPEG-4 (2)

Page 26: T.Sharon-A.Frank Multimedia Video/Audio Compression.

26T.Sharon-A.Frank

• Simple Profile (SP), suitable for low bandwidth streaming over Internet:– P-pictures only– Efficient decompression, suitable for PDAs, etc– SP@L1, 64 kbps, 176x144 pixel frame.

• Advanced Simple Profile (ASP) suitable for broadband streaming:– B-pictures– Global Motion Compensation– Sub-pixel motion compensation– ASP@L5, 8000 Kbps, full CCIR 601 frame.

MPEG-4 Profiles & Levels

Page 27: T.Sharon-A.Frank Multimedia Video/Audio Compression.

27T.Sharon-A.Frank

• Starts with chrominance sub-sampling of CCIR 601.• Constant data rate 25Mbits per second; higher quality

than MJPEG at same rate.• Apply DCT, quantization, run-length and Huffman

coding on zig-zag sequence – like JPEG – to 8x8 blocks of pixels.

• If little or no difference between fields (almost static frame), apply DCT to block containing alternate lines from odd and even fields.

• If motion between fields, apply DCT to two 8x4 blocks (one from each field) separately, leading to more efficient compression of frames with motion.

DV Compression

Page 28: T.Sharon-A.Frank Multimedia Video/Audio Compression.

28T.Sharon-A.Frank

DVI (Digital Video Interactive)

• Developed by General Electric.• Uses specialized processors for compression.• Hardware-only codec – lossless transforms.• Compression rate: 80:1-160:1

– 10 sec video clip is compressed to ~2MB.• Intel – software version of DVI algorithms, marketed

as Indeo (a software only codec):– there is also an audio version of Indeo.– latest version uses hybrid wavelet transform for

compression algorithm.

Page 29: T.Sharon-A.Frank Multimedia Video/Audio Compression.

29T.Sharon-A.Frank

Cinepak

• Developed by Apple and SuperMac.• Outputs 320x240 (quarter screen) at 15 fps

with good quality – data rate that even slow single-speed and 2x

CD-ROM players can deliver.

• Software only codec supported by Microsoft’s Video for Windows and Apple’s QuickTime.

• Better color definition than other codecs, so good for natural video without graphics or animation.

Page 30: T.Sharon-A.Frank Multimedia Video/Audio Compression.

30T.Sharon-A.Frank

QuickTime

• Developed by Apple but is now cross-platform.

• Supports Cinepak, Indeo, M-JPEG and MPEG-1, and is extensible to support future codecs, such as DVCAM.

• Synchronizes all types of digital media.• For example, video frames are dropped if

necessary for synchronization with audio.

Page 31: T.Sharon-A.Frank Multimedia Video/Audio Compression.

31T.Sharon-A.Frank

Video For Windows

• Microsoft (therefore, not cross-platform).

• Uses generic AVI (audio video interleaved) format which is provided by MCI (media control interface).

• Supports a number of compression methods in real-time, non-real-time, with or without hardware assistance– Cinepak, Indeo, Microsoft Video-1.

Page 32: T.Sharon-A.Frank Multimedia Video/Audio Compression.

32T.Sharon-A.Frank

ActiveMovie (API from Microsoft)

• Now called DirectShow (supports DVD).• Solves problems of VfW and QuickTime.• Cross-platform.• Supports codecs supported by VfW as well as

MPEG audio, WAV audio, MPEG video, and Apple QuickTime video.

• Fully integrated with DirectX technology, allowing use of DirectX components and more graphics card features.

Page 33: T.Sharon-A.Frank Multimedia Video/Audio Compression.

33T.Sharon-A.Frank

Video Streaming Players

• RealVideo (from RealNetworks)– G2 Player also plays RealAudio.

– Uses a variety of compression techniques.

• RealProducer (also from RealNetworks)– Allows you to create streaming audio and

video.

– Free software just like G2!

Page 34: T.Sharon-A.Frank Multimedia Video/Audio Compression.

34T.Sharon-A.Frank

H.261 (Px64)

• Video compression for videoconferences– Compression in real-time– Targeted to ISDN

• Compressed data stream: p*64 Kbits/s, p=1, …, 30)

• 2 resolutions:– Common Intermediate Format (CIF)– Quarter CIF (QCIF)

Page 35: T.Sharon-A.Frank Multimedia Video/Audio Compression.

35T.Sharon-A.Frank

H.261 (Px64) Resolutions

QCIFCIFLines/frame Pixels/lineLines/frame Pixels/line

144 176288 352Luminance (Y) 72 88144 176Chrominance (Cb) 72 88144 176Chrominance (Cr)

• Common Intermediate Format (CIF)

• Quarter CIF (QCIF)

Page 36: T.Sharon-A.Frank Multimedia Video/Audio Compression.

36T.Sharon-A.Frank

Image Preparation

• Uncompressed CIF– One frame = 288*352*8 + 2*144*176*8 =

1,216,512 bits– 30 fps– Bandwidth = 1,216,512*30 = 36.4 Mbits/s

• Uncompressed QCIF = 9.1Mbits/s

• ISDN channels: 64Kbits/s-2Mbits/s

=> bit reduction required

Page 37: T.Sharon-A.Frank Multimedia Video/Audio Compression.

37T.Sharon-A.Frank

Desktop Videophone Applications

• Channel capacity (p=1) = 64Kbits/s

• QCIF at 10 fps --> 3 Mbits/s

• Required compression ratio = 3Mbs/64Kbs=47

• Channel capacity (p=10) = 640Kbits/s

• CIF at 30 fps --> 36.4 Mbits/s

• Required compression ratio =

36.4Mbs/640Kbs=57

Page 38: T.Sharon-A.Frank Multimedia Video/Audio Compression.

38

• In general, lossy methods required because of complex and unpredictable nature of audio data.

• CD quality, stereo, 3-minute song requires over 25 Mbytes– Data rate exceeds bandwidth of dial-up Internet

connection.

• Difference in the way we perceive sound and image means different approach from image compression is needed.

Audio Compression

Page 39: T.Sharon-A.Frank Multimedia Video/Audio Compression.

39T.Sharon-A.Frank

Audio Compression Techniques

Samplingfrequency

(KHz)

Quanti-zation(bits)

Format Quality CD quality

44.1 16 PCM HiFi music CD-DA

37.8 8 ADPCM HiFi music CD-I level

37.8 8 ADPCM FM broadcast(music)

CD-I level B

18.9 4 ADPCM AM broadcast(speech)

CD-I level

8 8 PCM Telephone N/A

Page 40: T.Sharon-A.Frank Multimedia Video/Audio Compression.

40T.Sharon-A.Frank

Standards of Speech Encoding

Standard Description

G.711 PCM of voice frequencies

G.722 Audio coding at 7 KHz within 64Kbit/s(ADPCM)

G.728 Coding of speech at 16 Kbit/s using low delaycode excited linear prediction (LD-CELP)

Page 41: T.Sharon-A.Frank Multimedia Video/Audio Compression.

41T.Sharon-A.Frank

Basic Steps of Audio Encoding

PsychoacousticalModel

Filter-Banks

QuantizationMultiplexer

EntropyCoder

Uncompressed audio data

Compressed audio data

[]

32 Sub-Bands

Control

Page 42: T.Sharon-A.Frank Multimedia Video/Audio Compression.

42

• MP3 = MPEG-1 Audio, Layer 3

• Three layers of audio compression in MPEG-1 (MPEG-2 essentially identical).

• Layer 1...Layer 3, encoding proces increases in complexity, data rate for same quality decreases– e.g. Same quality 192kbps at Layer 1, 128kbps at

Layer 2, 64kbps at Layer 3.

• 10:1 compression ratio at high quality.

• Variable bit rate coding (VBR).

MP3

Page 43: T.Sharon-A.Frank Multimedia Video/Audio Compression.

43T.Sharon-A.Frank

Voice Quality - QoS

The Objective:Provide

unfailing, ubiquitous, toll quality service

0

200

400

160

0 1 5 10

Service Level Agreement Violation

Area of Unacceptable Operation

On

e-W

ay D

elay

(m

s)

Marginal Acceptance

Acceptable Operation

Packet Loss)%( high threshold low threshold

The Challenge:Eliminate the impact of delay-insensitive traffic on real-time

traffic

Page 44: T.Sharon-A.Frank Multimedia Video/Audio Compression.

44T.Sharon-A.Frank

QoS Parameters

Best High Medium Best

Effort

Mouth-to-

Ear Delay:

0ms -

150ms

150ms -

250ms

250ms -

450ms

450ms and

above

Call

Setup:

0 sec - 1

sec

1 sec - 3

sec

3 sec - 5

sec

5 sec and

above

few ms

echo path

PSTN PSTNGG IP network

ECEC

hundred ms

few ms hundred ms few ms

few msDelay Budgets