Audio and video compression - EIEenyhchan/mt-cmpav.pdf · Audio and video compression 4.1...

16
CYH/MMT/CmpAV/p.1 Audio and video compression 4.1 introduction Unlike text and images, both audio and most video signals are continuously varying analog signals. Compression algorithms associated with digitized audio and video are different from those associated with text and images. CYH/MMT/CmpAV/p.2 4.2 Audio compression Speech and non-speech signals are encoded in different approaches. 4.2.1 Speech coding Differential pulse code modulation (DPCM) is a derivative of standard PCM and exploits the fact that, for most audio signals, the range of the differences in amplitude between successive samples of the audio waveform is less than the range of the actual sample amplitudes. (G.711) In Adaptive differential PCM (ADPCM), fewer bits are used to encode smaller difference values than for larger values. (G.721, G.722 & G.726) DPCM and ADPCM can also be used to encode non- speech signals. In linear predictive coding (LPC), a speech signal is analyzed to extract its perceptual features including pitch and format frequencies and these features are then encoded. (LPC-10, G.728 , G.723 & G.729)

Transcript of Audio and video compression - EIEenyhchan/mt-cmpav.pdf · Audio and video compression 4.1...

Page 1: Audio and video compression - EIEenyhchan/mt-cmpav.pdf · Audio and video compression 4.1 introduction ... The acoustic quality of both the MPEG and Dolby audio coders were found

CYH/MMT/CmpAV/p.1

Audio and video compression

4.1 introduction

• Unlike text and images, both audio and most videosignals are continuously varying analog signals.

• Compression algorithms associated with digitized audioand video are different from those associated with textand images.

CYH/MMT/CmpAV/p.2

4.2 Audio compression

• Speech and non-speech signals are encoded in differentapproaches.

4.2.1 Speech coding

• Differential pulse code modulation (DPCM) is aderivative of standard PCM and exploits the fact that, formost audio signals, the range of the differences inamplitude between successive samples of the audiowaveform is less than the range of the actual sampleamplitudes. (G.711)

• In Adaptive differential PCM (ADPCM), fewer bits areused to encode smaller difference values than for largervalues. (G.721, G.722 & G.726)

• DPCM and ADPCM can also be used to encode non-speech signals.

• In linear predictive coding (LPC), a speech signal isanalyzed to extract its perceptual features including pitchand format frequencies and these features are thenencoded. (LPC-10, G.728 , G.723 & G.729)

Page 2: Audio and video compression - EIEenyhchan/mt-cmpav.pdf · Audio and video compression 4.1 introduction ... The acoustic quality of both the MPEG and Dolby audio coders were found

CYH/MMT/CmpAV/p.3

• Summary of speech compression standards and theirapplications:

Standard Compressiontechnique

Compressedbit rate (kbps)

Quality Exampleapplications

G.711 PCM+companding

64 Good PSTN/ISDNtelephony

G.721 ADPCM 3216

GoodFair

Telephony atreduced bitrates

G.722 ADPCMwithsubbandcoding

6456/48

ExcellentGood

Audioconferencing

G.726 ADPCMwithsubbandcoding

40/3224/16

GoodFair

Generaltelephony atreduced bitrates

LPC-10 LPC 2.4/1.2 Poor Telephony inmilitarynetworks

G.728 Code-excitedLPC (CELP)

16 Good Low delay/lowbit ratetelephony

G.729 CELP 8 Good Telephony incellularnetworks

G.729(A) CELP 8 Good Simultaneoustelephony anddata (fax)

G.723.1 CELP 6.35.3

GoodFair

Video andinternettelephony

CYH/MMT/CmpAV/p.4

4.2.2 Perceptual coding

• Audio signal is coded based on a psychoacoustic modelwhich describes the limitations of the human ear.

• Ear is more sensitive to some signals than others.

• Frequency masking: A strong signal may reduce thelevel of sensitivity of the ear to other signals which arenear to it in frequency.

• Temporal masking: When the ear hears a loud sound, ittakes a short but finite time before it can hear a quietersound.

Page 3: Audio and video compression - EIEenyhchan/mt-cmpav.pdf · Audio and video compression 4.1 introduction ... The acoustic quality of both the MPEG and Dolby audio coders were found

CY

H/M

MT

/Cm

pAV

/p.5C

YH

/MM

T/C

mpA

V/p.6

Page 4: Audio and video compression - EIEenyhchan/mt-cmpav.pdf · Audio and video compression 4.1 introduction ... The acoustic quality of both the MPEG and Dolby audio coders were found

CYH/MMT/CmpAV/p.7 CYH/MMT/CmpAV/p.8

MPEG audio coders

• An international standard based on this approach isdefined in ISO Recommendation 11172-3.

• Summary of MPEG layer 1, 2 and 3 perceptual encoders

Layer Application Compressedbit rate

Quality Exampleinput-to-output delay

1 Digital audiocassette

32-448 kbps Hi-fi qualityat 192kbpsper channel

20ms

2 Digital audioand digital videobroadcasting

32-192 kbps Near CD-quality at128 kbps perchannel

40ms

3 CD-qualityaudio over lowbit rate channel

64 kbps CD-qualityat 64 kbpsper channel

60ms

• A higher layer makes a better use of the psychoacousticmodel and hence higher compression rate can beachieved.

• The 3 layers require increasing levels of complexity (andhence cost) to achieve a particular perceived quality, thechoice of layer and bit rate is often a compromisebetween the desired perceived quality and the availablebit rate.

Page 5: Audio and video compression - EIEenyhchan/mt-cmpav.pdf · Audio and video compression 4.1 introduction ... The acoustic quality of both the MPEG and Dolby audio coders were found

CYH/MMT/CmpAV/p.9

Dolby audio coders

• In AC-1, the bit allocation information of the quantizedsubband samples is directly encoded and embedded inthe bit-stream.

• In AC-2, this information is indirectly encoded and hasto be estimated at the decoder.

• In AC-3, additional information is transmitted tocompensate for the estimation error.

• The acoustic quality of both the MPEG and Dolby audiocoders were found to be comparable.

• Summary of compression standards for general audio:

Standard Compressedbit rate

Quality Exampleapplications

Layer 1 32-448kbps Hi-fi qualityat 192kbps

Digital audiocassettes

Layer 2 32-192kbps Near CD at128 kbps

Digital audio anddigital videobroadcasting

MPEGAudio

Layer 3 64kbps CD quality CD-quality overlow bit ratechannels

AC-1 512kbps Hi-fi quality Radio andtelevision satelliterelays

AC-2 256kbps Hi-fi quality PC sound cards

Dolbyaudiocoders

AC-3 192kbps Near CDquality

Digital videobroadcasting

CYH/MMT/CmpAV/p.10

Page 6: Audio and video compression - EIEenyhchan/mt-cmpav.pdf · Audio and video compression 4.1 introduction ... The acoustic quality of both the MPEG and Dolby audio coders were found

CYH/MMT/CmpAV/p.11 CYH/MMT/CmpAV/p.12

4.3 Video compression

• There is not just a single standard associated with videobut rather a range of standards, each targeted at aparticular application domain.

4.3.1 Video compression principles

• Video is simply a sequence of digitized pictures and it isalso referred to as moving pictures.

• A video sequence can be encoded with JPEG algorithmframe by frame and this approach is known as motionJPEG.

• In addition to the spatial redundancy present in eachframe, considerable redundancy is often present betweensuccessive frames.

• Frames are classified as 1 of 3 basic frame types (I-, P-and B- frames) and encoded differently.

Page 7: Audio and video compression - EIEenyhchan/mt-cmpav.pdf · Audio and video compression 4.1 introduction ... The acoustic quality of both the MPEG and Dolby audio coders were found

CYH/MMT/CmpAV/p.13 CYH/MMT/CmpAV/p.14

• I-frames:

• I-frames are encoded independently using the JPEGalgorithm.

• I-frames are inserted into the output stream relativelyfrequently.

• I-frames are used as access points for random accessand FF/FR functionality in the bit stream.

• P-frames:

• Frames are partitioned into blocks of size 16x16(macroblocks).

• To encode a P-frame, the contents of each macroblockin the target frame are compared on a pixel-by-pixelbasis with the contents of the reference frame to find abest-matched block of equal size.

• The reference frame can be a P- or I- frame.

• The (x,y) offset of the macroblock being encoded andthe best-matched block is known as motion vector.

• This motion-vector-searching process is known asmotion estimation.

Page 8: Audio and video compression - EIEenyhchan/mt-cmpav.pdf · Audio and video compression 4.1 introduction ... The acoustic quality of both the MPEG and Dolby audio coders were found

CYH/MMT/CmpAV/p.15 CYH/MMT/CmpAV/p.16

• A prediction of the target frame is made with thereference frame based on the motion vectors obtained.

• The difference between the predicted frame and theactual target frame is known as the prediction error.

• Motion compensation: Additional bits are required toencode the prediction error so as to compensate for thedifference if necessary.

• B-frames:

• To encoded a B-frame, any motion is estimated withreference to both the immediately preceding I- or P-frame and the immediately succeeding P- or I-frame.

• B-frames provide the highest level of compression.

• B-frames are not involved in the coding of otherframes and hence they do not propagate errors.

Page 9: Audio and video compression - EIEenyhchan/mt-cmpav.pdf · Audio and video compression 4.1 introduction ... The acoustic quality of both the MPEG and Dolby audio coders were found

CYH/MMT/CmpAV/p.17 CYH/MMT/CmpAV/p.18

• The number of frames between successive I-frames isknown as a group of pictures (GOP).

• The number of frames between a P-frame and theimmediately preceding I- or P-frame is called theprediction span.

• The order of encoding and transmission of the frames ischanged to minimize the time required to decode theframes.

• A 4th type of frame known as a PB-frame has also beendefined. Two neighboring P- and B-frames are encodedas if they were a single frame.

• A 5th type of frame known as a D-frame has beendefined for use in movie/video-on-demand applications.

Page 10: Audio and video compression - EIEenyhchan/mt-cmpav.pdf · Audio and video compression 4.1 introduction ... The acoustic quality of both the MPEG and Dolby audio coders were found

CYH/MMT/CmpAV/p.19 CYH/MMT/CmpAV/p.20

• Basic bitstream format:

• Type : type of frame , I, P or B

• Address : identifies the location of the macroblock inthe frame

• Quantization value: the threshold value used toquantize all DCT coefficients in the macroblock.

• Motion vector: encoded vector

• Block present: indicates which block in themacroblock are present

• Typical figures of the compression ratios

• I-frames: 10~20:1

• P-frames: 20~30:1

• B-frames: 30~50:1

Page 11: Audio and video compression - EIEenyhchan/mt-cmpav.pdf · Audio and video compression 4.1 introduction ... The acoustic quality of both the MPEG and Dolby audio coders were found

CYH/MMT/CmpAV/p.21

4.3.2 H.261

• H.261 has been defined by the ITU-T for the provisionof video telephony and videoconferencing services overan ISDN.

• Supports I- and P-frames only.

CYH/MMT/CmpAV/p.22

• Encoding format:

• Type: indicates if the macroblock is intracoded orintercoded

• Address: identifies the location of the macroblock inthe frame

• Quantization value: the threshold value used toquantize all DCT coefficients in the macroblock.

• Motion vector: encoded vector

• Coded block pattern: indicates which block in themacroblock are present

• Picture start code: indicates the start of a new frame.

• Temporal reference: a timestamp for the decoder tosynchronize the video information with the audioinformation.

• Picture type: indicates if the frame is encoded as I- orP-frame.

• GOB start code: is a resynchronization marker whichis used for resynchronization in case of error.

• Group of (macro)block (GOP) is a structure consists of3x11 macroblocks.

Page 12: Audio and video compression - EIEenyhchan/mt-cmpav.pdf · Audio and video compression 4.1 introduction ... The acoustic quality of both the MPEG and Dolby audio coders were found

CYH/MMT/CmpAV/p.23 CYH/MMT/CmpAV/p.24

4.3.3 H.263

• H.263 has been defined by the ITU-T for use in a rangeof real-time video applications over wireless and PSTNs.

• The applications include video telephony,videoconferencing, security surveillance, interactivegames playing and so on.

• H.263 standard has a number of advanced coding optionscompared with H.261:

• Progressive scanning with a refresh rate of either 15 or7.5 fps.

• Support I-, P-, B- and PB- frames

• Motion vectors, if necessary, are allowed to pointoutside of the frame area.

• Schemes such as error tracking, independent segmentdecoding and reference picture selection are includedin the standard that aim at minimizing the effects oferrors on neighboring GOBs.

• Error concealment scheme is incorporated into thedecoder to mask the error from the viewer.

Page 13: Audio and video compression - EIEenyhchan/mt-cmpav.pdf · Audio and video compression 4.1 introduction ... The acoustic quality of both the MPEG and Dolby audio coders were found

CYH/MMT/CmpAV/p.25

4.3.4 MPEG

• The Motion Pictures Expert Group (MPEG) was formedby the ISO to formulate a set of standards relating to arange of multimedia applications that involve the use ofvideo with sound.

MPEG1 : ISO Recommendation 11172

• Similar video compression technique as H.261.

• Progressive scanning with a refresh rate of 30Hz (forNTSC) and 25Hz (for PAL)

• Support I-, P- and B- frames

• I-frames must be used for the various random-accessfunctions associated with VCRs.

• Improvement with respect to H.261:

1. A new layer called slice is added in the structure ofthe stream such that the decoder can resynchronizemore quickly in case of error.

2. support B-frames

3. larger searching window of motion vectors and finerresolution of its representation

CYH/MMT/CmpAV/p.26

• Typical figures of the compression ratios

• I-frames: 10:1

• P-frames: 20:1

• B-frames: 50:1

Page 14: Audio and video compression - EIEenyhchan/mt-cmpav.pdf · Audio and video compression 4.1 introduction ... The acoustic quality of both the MPEG and Dolby audio coders were found

CYH/MMT/CmpAV/p.27 CYH/MMT/CmpAV/p.28

• Bitstream format:

• Sequence start code: indicates the start of a sequence

Page 15: Audio and video compression - EIEenyhchan/mt-cmpav.pdf · Audio and video compression 4.1 introduction ... The acoustic quality of both the MPEG and Dolby audio coders were found

CYH/MMT/CmpAV/p.29

• Video parameters: specify the screen size and aspectratio

• Bitstream parameters: indicate the bit rate and the sizeof the memory/ frame buffers that are required

• Quantization parameters: contain the contents of thequantization tables that are to be used.

-• GOP start code: indicates the start of a GOP

• Time stamp: used for synchronization purposes

• Parameters: defines the particular sequence of frametypes that are used in each GOP (e.g. IPPBPP)

-• Picture start code: indicates the start of a frame

• Type: indicates if it's a I-, P- or B-frame

• Buffer parameters: indicate how full the buffer shouldbe before the decoding operation should start

• Encode parameters: indicate the resolution of a motionvector.

-• Slice start code: indicates the start of a slice

• Vertical position: indicates the scan line in which theslice is

• Quantization parameters: indicates the scaling factorthat applies to this slice.

CYH/MMT/CmpAV/p.30

MPEG2 : ISO Recommendation 13818

• It supports four levels - low, main, high 1440 and high -each targeted at a particular application domain.

• There are 5 profiles associated with each level: simple,main, spatial resolution, quantization accuracy and high.

• The different combinations of levels and profiles form aframework for all standards activities associated withMPEG-2.

• One of the most popular setting is the MP@ML standardwhich is for digital television broadcasting.

• There are 3 standards associated with HDTV: advancedtelevision (ATV) in North America, digital videobroadcast (DVB) in Europe, and multiple sub-Nyquistsampling encoding (MUSE) in Japan.

ATV DVB MUSEAspect ratio 16/9 4/3 16/9Resolution 1280x720 1440x1152 1920x1035Compression(video)

MP@HL ofMPEG2

SSP@H1440of MPEG2

Similar toMP@HL

Compression(Audio)

Dolby AC-3 MP2

Page 16: Audio and video compression - EIEenyhchan/mt-cmpav.pdf · Audio and video compression 4.1 introduction ... The acoustic quality of both the MPEG and Dolby audio coders were found

CYH/MMT/CmpAV/p.31

• Summary of video compression standards

Standard Digitizationformat

Compressedbit rate

Example applications

H.261 CIF/QCIF x64kbps Video telephony/conferencing over ISDNand LANs

H.263 S-QCIF/QCIF

<64kbps Video telephony/conferencing and securitysurveillance over low bitrate channels

MPEG-1/ISO11172

SIF <1.5Mbps Storage of VHS-qualityvideo on CD-ROMs

MPEG-2/ISO13818Low SIF <4Mbps Recording of VHS-quality

video4:2:0 <15MbpsMain4:2:2 <20Mbps

Digital video broadcasting

4:2:0 <60MbpsHigh 14404:2:2 <80Mbps

HDTV (4/3 aspect ratio)

4:2:0 <80MbpsHigh4:2:2 <100Mbps

HDTV (16/9 aspect ratio)

MPEG-4 Various 5kbps-tens Mbps

Versatile multimediacoding standard