MPEG Audio Compression by V. Loumos. Introduction Motion Picture Experts Group (MPEG) International...

38
MPEG Audio Compression by V. Loumos
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    241
  • download

    5

Transcript of MPEG Audio Compression by V. Loumos. Introduction Motion Picture Experts Group (MPEG) International...

MPEG Audio Compression

by V. Loumos

Introduction

• Motion Picture Experts Group (MPEG)

• International Standards Organization (ISO)

• First High Fidelity Audio standard

• Part of a multiple standard for– Video compression– Audio compression– Audio, Video and Data synchronization at an

aggregate rate of1.5 Mbit/sec

MPEG Audio

• Physically Lossy compression algorithm

• Perceptually lossless, transparent algorithm

• Exploits perceptual properties of human ear

• Psychoacoustic modeling

Medium quality audio compression

• Code Excited Linear Prediction– for speech coding

• μ-law

• Adaptive Differential Pulse Code Modulation

The MPEG Audio standard

• Ensures inter-operability

• Defines coded bitstream syntax

• Defines decoding process

• Guarantees decoder’s accuracy

MPEG audio acceptance

• Wide acceptance

• Large number of MPEG audio codecs produced

• Stand-alone, Mobile phone add-ons etc

MPEG audio features

• No assumptions about the nature of the audio source

• Exploitation of human auditory system perceptual limitations

• Removal of perceptually irrelevant parts of audio signal

MPEG audio sampling rates

• 32 kHz

• 44.1 kHz

• 48 kHz

MPEG audio supports

• One or two audio channels in– a monophonic mode for a single audio channel– a dual monophonic mode for two independent

audio channels– a stereo mode with sharing of bits– a joint stereo mode based on the correlation or

the phase difference between channels

MPEG audio supports

• Several predefined fixed bit rates ranging from 32 to 224 kbits/sec per channel

• Free bit rate other than the predefined rates

MPEG audio offers

• Three independent layers of compression

• A wide range of tradeoffs between codec complexity and compressed audio quality

MPEG Audio Layer I

• Simplest coding

• Suitable for bit rates above 128 kbits/sec per channel

• Philips Digital Compact Cassette

MPEG Audio Layer II

• Intermediate complexity

• Bit rates around 128 kbits/sec per channel

• Digital Audio Broadcasting (DAB)

• Synchronized Video and Audio on CD-ROM

• Full motion CD-I

• Video-CD

MPEG Audio Layer III

• Most complex coding

• Best audio quality

• Bit rates around 64 kbits/sec per channel

• Suitable for audio over ISDN

MPEG Audio extras

• All three layers allow single chip real-time decoder implementation

• Optional Cyclic Redundancy Check (CRC) error detection

• Ancillary data may be included in the bit stream

Overview

• Quantization, the key to MPEG audio compression

• Transparent, perceptually lossless compression

• No distinction between original and 6-to-1 compressed audio clips – stereo, 16 bit/sample, sampled at 48 kHz,

compressed at 256 kbits/sec

The Polyphase Filter Bank

• Key component common to all layers

• Divides the audio signal into 32 equal-width frequency subbands

• The filters provide good time and reasonable frequency resolution

• Critical bands associated with psychoacoustic models

Psychoacoustics

• The aim is to remove acoustically irrelevant parts of the audio signal

• The human auditory system is unable to hear quantization noise under conditions of auditory masking

• Masking occurs whenever a strong signal makes a neighborhood of weaker audio signals imperceptible

Critical bands

• The human auditory system has a limited, frequency dependent resolution

• This frequency dependence is expressed in the form of critical band widths, less then 100 Hz for low and more then 4kHz for high frequencies

• The human ear blurs the various signal components inside a critical band

Noise masking threshold

• Human ear resolving power is frequency dependent

• Noise masking threshold, at any frequency, depends only on the signal energy within a limited bandwidth neighborhood that frequency

The Psychoacoustic Model

• Analyzes the audio signal and computes the amount of noise masking as a function of frequency

• The encoder decides how best to represent the input signal with a minimum number of bits

Basic Steps• Time align audio data• Convert audio to frequency domain representation• Process spectral values into tonal and non-tonal

components• Apply a spreading function• Set a lower bound for threshold values• Find the threshold values for each subband• Calculate the signal to mask ratio

MPEG Layer III coding

• Based on Layer I&II filter banks

• Compensation of filter deficiencies by processing outputs with a Modified Discrete Cosine Transform

Layer III enhancements

• Alias reduction

• Non uniform quantization

• Scalefactor bands

• Entropy coding of data values

• Use of a “bit reservoir”