Coding of Text, Voice, Image, And Video

7/23/2019 Coding of Text, Voice, Image, And Video

http://slidepdf.com/reader/full/coding-of-text-voice-image-and-video 1/15

UCCN2043 – Lecture Notes

of 15

4.0 Coding of Text, Voice, Image, and Video Signals

The information that has to be exchanged between two entities (persons or machines) in acommunication system can be in one of the following formats:

• Text

•

Voice• Image

• Video

In an electrical communication system, the information is first converted into an electricalsignal. For instance,

• A microphone is the transducer that converts the human voice into an analog signal.

• Similarly, the video camera converts the real-life scenery into an analog signal.

In a digital communication system, the first step is to convert the analog signal into digitalformat using analog-to-digital conversion techniques. This digital signal representation for

various types of information is the topic of this lesson.

4.1 Text Messages

Text messages are generally represented in ASCII (American Standard Code for InformationInterchange), in which a 7-bit code is used to represent each character. Another code formcalled EBCDIC (Extended Binary Coded Decimal Interchange Code) is also used. ASCII isthe most widely used coding scheme for representation of text in computers.

Using ASCII, the number of characters that can be represented is limited to 128 because only7-bit code is used. Out of these 128 characters, 33 are non-printing control characters (many

now obsolete) that affect how text and space are processed. The other 95 are printablecharacters, including the space (which is considered an invisible graphic). The ASCII code isused for representing many European languages as well.

To transmit text messages, first the text is converted into any one of the character-encodingschemes (such as ASCII), and then the bit stream is converted into an electrical signal.

Note: In extended ASCII, each character is represented by 8 bits. Using 8 bits, a number ofgraphic characters and control characters can be represented.

To represent all the world languages, Unicode has been developed. Unicode uses 16 bits torepresent each character and can be used to encode the characters of any recognized language

in the world. Modern programming languages such as Java and markup languages such asXML support Unicode.

It is important to note that the ASCII/Unicode coding mechanism is not the best way,according to Shannon. If we consider the frequency of occurrence of the letters of a languageand use small codewords for frequently occurring letters, the coding will be more efficient.However, more processing will be required, and more delay will result.




of 15

The best coding mechanism for text messages was developed by Morse. The Morse code wasused extensively for communication in the old days. Many ships used the Morse code untilMay 2000. In Morse code, characters are represented by dots and dashes. Morse code is nolonger used in standard communication systems.

Note: Morse code uses dots and dashes to represent various English characters. It is an

efficient code because short codes are used to represent high-frequency letters andlong codes are used to represent low-frequency letters. The letter E is represented by just one dot and the letter Q is represented by dash dash dot dash.

4.2 Voice

To transmit voice from one place to another, the speech (acoustic signal) is first convertedinto an electrical signal using a transducer, the microphone. This electrical signal is an analogsignal. The voice signal corresponding to the speech "how are you" is shown in Figure 4.1.

Figure 4.1: Speech waveform.

The important characteristics of the voice signal are given here:

• The voice signal occupies a bandwidth of 4 kHz i.e., the highest frequency component inthe voice signal is 4 kHz. Though higher frequency components are present, they are notsignificant, so a filter is used to remove all the high-frequency components above 4 kHz.

In telephone networks, the bandwidth is limited to only 3.4 kHz.

•

The pitch varies from person to person. Pitch is the fundamental frequency in the voicesignal. In a male voice, the pitch is in the range of 50–250 Hz. In a female voice, the pitchis in the range of 200–400 Hz.

• The speech sounds can be classified broadly as voiced sounds and unvoiced sounds.Signals corresponding to voiced sounds (such as the vowels a, e, i, o, u) will be periodicsignals and will have high amplitude. Signals corresponding to unvoiced sounds (such asth, s, z, etc.) will look like noise signals and will have low amplitude.




of 15

• Voice signal is considered a non-stationary signal, i.e., the characteristics of the signal(such as pitch and energy) vary. However, if we take small portions of the voice signals ofabout 20msec duration, the signal can be considered stationary. In other words, during thissmall duration, the characteristics of the signal do not change much. Therefore, the pitchvalue can be calculated using the voice signal of 20msec. However, if we take the next20msec, the pitch may be different.

The voice signal occupies a bandwidth of 4 KHz. The voice signal can be broken down into afundamental frequency and its harmonics. The fundamental frequency or pitch is low for amale voice and high for a female voice.

These characteristics are used while converting the analog voice signal into digital form.Analog-to-digital conversion of voice signals can be done using one of two techniques:waveform coding and vocoding.

Note: The characteristics of speech signals described here are used extensively for speechprocessing applications such as text-to-speech conversion and speech recognition.

Music signals have a bandwidth of 20 kHz. The techniques used for converting musicsignals into digital form are the same as for voice signals.

4.2.1 Waveform Coding

Waveform coding is done in such a way that the analog electrical signal can be reproduced atthe receiving end with minimum distortion. Hundreds of waveform coding techniques havebeen proposed by many researchers. We will study two important waveform codingtechniques: pulse code modulation (PCM) and adaptive differential pulse code modulation

(ADPCM).

Pulse Code Modulation

Pulse Code Modulation (PCM) is the first and the most widely used waveform codingtechnique. The ITU-T Recommendation G.711 specifies the algorithm for coding speech inPCM format.

PCM coding technique is based on Nyquist's theorem, which states that if a signal is sampleduniformly at least at the rate of twice the highest frequency component, it can bereconstructed without any distortion.

The highest frequency component in voice signal is 4 kHz, so we need to sample thewaveform at 8000 samples per second—every 1/8000th of a second (125 microseconds). Wehave to find out the amplitude of the waveform for every 125 microseconds and transmit thatvalue instead of transmitting the analog signal as it is.




of 15

The sample values are still analog values, and we can "quantize" these values into a fixednumber of levels. As shown in Figure 4.2, if the number of quantization levels is 256, we canrepresent each sample by 8 bits. So, 1 second of voice signal can be represented by 8000 × 8bits, 64kbits. Hence, for transmitting voice using PCM, we require 64 kbps data rate.

Figure 4.2: Pulse Code Modulation.

However, note that since we are approximating the sample values through quantization, therewill be a distortion in the reconstructed signal; this distortion is known as quantization noise.

In the PCM coding technique standardized by ITU in the G.711 recommendation, thenonlinear characteristic of human hearing is exploited— the ear is more sensitive to thequantization noise in the lower amplitude signal than to noise in the large amplitude signal.

In G.711, a logarithmic (non-linear) quantization function is applied to the speech signal, andso the small signals are quantized with higher precision. Two quantization functions, calledA-law and µ-law, have been defined in G.711.

• µ-law is used in the U.S. and Japan.

• A-law is used in Europe and the countries that follow European standards.

The speech quality produced by the PCM coding technique is called toll quality speech and istaken as the reference to compare the quality of other speech coding techniques.

For CD-quality audio, the sampling rate is 44.1 kHz (one sample every 23 microseconds),and each sample is coded with 16 bits. For two-channel stereo audio stream, the bit raterequired is

2 × 44.1 × 1000 × 16 = 1.41Mbps.

Adaptive Differential Pulse Code Modulation

One simple modification that can be made to PCM is that we can code the difference betweentwo successive samples rather than coding the samples directly. This technique is known asdifferential pulse code modulation (DPCM).




of 15

Another characteristic of the voice signal that can be used is that a sample value can bepredicted from past sample values. At the transmitting side, we predict the sample value andfind the difference between the predicted value and the actual value and then send thedifference value. This technique is known as adaptive differential pulse code modulation(ADPCM).

In ADPCM, each sample is represented by 4 bits, and hence the data rate required is 32kbps.Therefore, using ADPCM, voice signals can be coded at 32kbps without any degradation ofquality as compared to PCM.

ITU-T Recommendation G.721 specifies the coding algorithm. In ADPCM, the value ofspeech sample is not transmitted, but the difference between the predicted value and theactual sample value is. Generally, the ADPCM coder takes the PCM coded speech data andconverts it to ADPCM data.

The block diagram of an ADPCM encoder is shown in Figure 4.3(a).

Figure 4.3: (a) ADPCM Encoder

Eight-bit [.mu]-law PCM samples are input to the encoder and are converted into linearformat. Each sample value is predicted using a prediction algorithm, and then the predicted

value of the linear sample is subtracted from the actual value to generate the difference signal.Adaptive quantization is performed on this difference value to produce a 4-bit ADPCMsample value, which is transmitted. Instead of representing each sample by 8 bits, in ADPCMonly 4 bits are used.

Figure 4.3: (b) ADPCM Decoder




of 15

At the receiving end, the decoder, shown in Figure 4.3(b), obtains the de-quantized version ofthe digital signal. This value is added to the value generated by the adaptive predictor toproduce the linear PCM coded speech, which is adjusted to reconstruct m-law-based PCMcoded speech.

There are many waveform coding techniques such as delta modulation (DM) and

continuously variable slope delta modulation (CVSD). Using these, the coding rate can bereduced to 16kbps, 9.8kbps, and so on. As the coding rate reduces, the quality of the speech isalso going down. There are coding techniques using good quality speech which can beproduced at low coding rates.

The PCM coding technique is used extensively in telephone networks. ADPCM is used intelephone networks as well as in many radio systems such as digital enhanced cordlesstelecommunications (DECT).

Note: Over the past 50 years, hundreds of waveform coding techniques have been

developed with which data rates can be reduced to as low as 9.8kbps to get goodquality speech.

4.2.2 Vocoding

A radically different method of coding speech signals was proposed by H. Dudley in 1939.He named his coder vocoder, a term derived from VOice CODER. In a vocoder, the electrical

model for speech production seen in Figure 4.4 is used.

Figure 4.4: Electrical model of speech production

This model is called the source–filter model because the speech production mechanism isconsidered as two distinct entities—a filter to model the vocal tract and an excitation source.

The excitation source consists of a pulse generator and a noise generator. The filter is excited

by the pulse generator to produce voiced sounds (vowels) and by the noise generator toproduce unvoiced sounds (consonants).




of 15

The vocal tract filter is a time-varying filter—the filter coefficients vary with time. As thecharacteristics of the voice signal vary slowly with time, for time periods on the order of20msec, the filter coefficients can be assumed to be constant.

In vocoding techniques, at the transmitter, the speech signal is divided into frames of 20msecin duration. Each frame contains 160 samples. Each frame is analyzed to check whether it is a

voiced frame or unvoiced frame by using parameters such as energy, amplitude levels, etc.

For voiced frames, the pitch is determined. For each frame, the filter coefficients are alsodetermined. These parameters—voiced/unvoiced classification, filter coefficients, and pitch

for voiced frames—are transmitted to the receiver.

At the receiving end, the speech signal is reconstructed using the electrical model of speechproduction. Using this approach, the data rate can be reduced as low as 1.2kbps. However,compared to voice coding techniques, the quality of speech will not be very good.

A number of techniques are used for calculating the filter coefficients. Linear prediction isthe most widely used of these techniques.

Linear Prediction

The basic concept of linear prediction is that the sample of a voice signal can beapproximated as a linear combination of the past samples of the signal.

If Sn is the nth speech sample, then

nk nk n GU S aS +=

−∑

where

ak (k = 1,…,P) are the linear prediction coefficientsG is the gain of the vocal tract filterUn is the excitation to the filter.

Linear prediction coefficients (generally 8 to 12) represent the vocal tract filter coefficients.

Calculating the linear prediction coefficients involves solving P linear equations. One of themost widely used methods for solving these equations is through the Durbin and Levinsonalgorithm.

Coding of the voice signal using linear prediction analysis involves the following steps:

• At the transmitting end, divide the voice signal into frames, each frame of 20msec

duration. For each frame, calculate the linear prediction coefficients and pitch and findout whether the frame is voiced or unvoiced. Convert these values into code words andsend them to the receiving end.

• At the receiver, using these parameters and the speech production model, reconstruct thevoice signal.




of 15

In linear prediction technique, a voice sample is approximated as a linear combination of thepast n samples. The linear prediction coefficients are calculated every 20 milliseconds andsent to the receiver, which reconstructs the speech samples using these coefficients. Usingthis approach, voice signals can be compressed to as low as 1.2kbps.

Using linear prediction vocoder, voice signals can be compressed to as low as 1.2kbps.

Quality of speech will be very good for data rates down to 9.6kbps, but the voice soundssynthetic for further lower data rates. Slight variations of this technique are used extensivelyin many practical systems such as mobile communication systems, speech synthesizers, etc.

Note: Variations of LPC technique are used in many commercial systems, such as mobilecommunication systems and Internet telephony.

4.3 Image

To transmit an image, the image is divided into grids called pixels (or picture elements). The

higher the number of grids, the higher the resolution. Grid sizes such as 1024 × 768 and 800× 600 are generally used in computer graphics.

For black-and-white pictures, each pixel is given a certain gray-scale value. If there are 256gray-scale levels, each pixel is represented by 8 bits. So, to represent a picture with a grid sizeof 400 × 600 pixels with each pixel of 8 bits, 240kbytes of storage is required.

To represent color, the levels of the three fundamental colors—red, blue, and green—arecombined together. The shades of the colors will be higher if more levels of each color areused.

In image coding, the image is divided into small grids called pixels, and each pixel isquantized. The higher the number of pixels, the higher will be the quality of the reconstructedimage.

For example, if an image is coded with a resolution of 352 × 240 pixels, and each pixel isrepresented by 24 bits, the size of the image is 352 × 240 × 24/8 = 247.5 kilobytes.

To store the images as well as to send them through a communication medium, the imageneeds to be compressed. A compressed image occupies less storage space if stored on amedium such as hard disk or CD-ROM. If the image is sent through a communicationmedium, the compressed image can be transmitted fast.

One of the most widely used image coding formats is JPEG format. Joint Photograph ExpertsGroup (JPEG) proposed this standard for coding of images. The block diagram of JPEGimage compression is shown in Figure 4.5.

Figure 4.5: JPEG compression




of 15

For compressing the image using the JPEG compression technique, the image is divided intoblocks of 8 by 8 pixels and each block is processed using the following steps:

1.

Apply discrete cosine transform (DCT), which takes the 8 × 8 matrix and produces an8 × 8 matrix that contains the frequency coefficients. This is similar to the FastFourier Transform (FFT) used in Digital Signal Processing. The output matrix

represents the image in spatial frequency domain.

2. Quantize the frequency coefficients obtained in Step 1. This is just rounding off thevalues to the nearest quantization level. As a result, the quality of the image will

slightly degrade.

3. Convert the quantization levels into bits. Since there will be little change in theconsecutive frequency coefficients, the differences in the frequency coefficients areencoded instead of directly encoding the coefficients.

Compression ratios of 30:1 can be achieved using JPEG compression. In other words, a300kB image can be reduced to about 10kB.

Note: JPEG image compression is used extensively in Web page development. As comparedto the bit mapped files (which have a .bmp extension), the JPEG images (which havea .jpg extension) occupy less space and hence can be downloaded fast when we accessa Web site

4.4 Video

A video signal occupies a bandwidth of 5MHz. Using the Nyquist sampling theorem, we

need to sample the video signal at 10 samples/msec. If we use 8-bit PCM, video signalrequires a bandwidth of 80Mbps. This is a very high data rate, and this coding technique isnot suitable for digital transmission of video. A number of video coding techniques have beenproposed to reduce the data rate.

For video coding, the video is considered a series of frames. At least 16 frames per secondare required to get the perception of moving video. Each frame is compressed using theimage compression techniques and transmitted. Using this technique, video can becompressed to 64kbps, though the quality will not be very good.

Video encoding is an extension of image encoding. As shown in Figure 4.6, a series ofimages or frames, typically 16 to 30 frames, is transmitted per second. Due to the persistence

of the eye, these discrete images appear as though it is a moving video.

Accordingly, the data rate for transmission of video will be the number of frames multipliedby the data rate for one frame. The data rate is reduced to about 64kbps in desktop videoconferencing systems where the resolution of the image and the number of frames are

reduced considerably. The resulting video is generally acceptable for conducting businessmeetings over the Internet or corporate intranets, but not for transmission of, say, danceprograms, because the video will have many jerks.




of 15

Figure 4.6: Video coding through frames and pixels.

A variety of video compression standards have been developed. Notable among them isMPEG-2, which is used for video broadcasting. MPEG-4 is used in video conferencingapplications and HDTV for high-definition television broadcasting.

Moving Picture Experts Group (MPEG) released a number of standards for video coding. Thefollowing standards are used presently:

MPEG-2: This standard is for digital video broadcasting. The data rates are 3 and 7.5Mbps.

The picture quality will be much better than analog TV. This standard is used in broadcastingthrough direct broadcast satellites.

MPEG-4: This standard is used extensively for coding, creation, and distribution of audio-visual content for many applications because it supports a wide range of data rates. TheMPEG-4 standard addresses the following aspects:

• Representing audio-visual content, called media objects.

• Describing the composition of these objects to create compound media objects.

• Multiplexing and synchronizing the data.

The primitive objects can be still images, audio, text, graphics, video, or synthesized speech.Video coding between 5kbps and 10Mbps, speech coding from 1.2kbps to 24kbps, audio(music) coding at 128kbps, etc. are possible.

MP3 (MPEG Layer–3) is the standard for distribution of music at 128kbps data rate, which isa part of the MPEG-4 standards.




of 15

For video conferencing, 384 kbps and 2.048 Mbps data rates are very commonly used toobtain better quality as compared to 64kbps. Video conferencing equipment that supportsthese data rates is commercially available.

MPEG-4 is used in mobile communication systems for supporting video conferencing whileon the move. It also is used in video conferencing over the Internet.

In spite of the many developments in digital communication, video broadcasting continues tobe analog in most countries. Many standards have been developed for digital videoapplications. When optical fiber is used extensively as the transmission medium, perhaps then

digital video will gain popularity. The important European digital formats for video are givenhere:

Multimedia CIF format: Width in pixels 360; height in pixels 288; frames/ second 6.25 to25; bit rate without compression 7.8 to 31 Mbps; with compression 1 to 3 Mbps.

Video conferencing (QCIF format): Width in pixels 180; height in pixels 144; frames persecond 6.25 to 25; bit rate without compression 1.9 to 7.8 Mbps; with compression 0.064 to 1

Mbps.

Digital TV, ITU-R BT.601 format: Width 720; height 526; frames per second 25; bit ratewithout compression 166 Mbps; with compression 5 to 10 Mbps.

HDTV, ITU-R BT.109 format: Width 1920; height 1250; frames per second 25; bit ratewithout compression 960 Mbps; with compression 20 to 40 Mbps.

Note: Commercialization of digital video broadcasting has not happened very fast. It is

expected that utilization of HDTV will take off in the first decade of the twenty-firstcentury.

4.5 Pulse Modulation and Sampling (Glover 5.2 @ page 163)

Referring to Figure 4.7, the principal transmitter subsystems consist of an Analog-to-DigitalConverter (ADC) which performs sampling, quantization and PCM encoding process. Agood understanding in pulse modulation techniques need to be established as it is part of theADC process and, as such, constitutes an important part of a digital communicationstransmitter.

However, it is important to note that pulse modulations (generally) can be used as modulationschemes in their own rights for analogue communications.




of 15

Figure 4.7: ADC Process in digital communications

4.5.1 Pulse Modulation

Pulse modulation describes the process whereby the amplitude, width or position ofindividual pulses in a periodic pulse train are varied (i.e. modulated) in sympathy with theamplitude of a baseband information signal, g(t).

Figures 4.8(a) to (d) shows the analog input signal, Pulse Amplitude Modulation, PulseWidth Modulation, and Pulse Position Modulation respectively.

Figure 4.8: Illustration of pulse amplitude, width and position modulation




of 15

Since pulse amplitude modulation (PAM) relies on changes in pulse amplitude it requireslarger signal-to-noise ratio (SNR) than pulse position modulation (PPM) or pulse widthmodulation (PWM). This is essentially because a given amount of additive noise can changethe amplitude of a pulse (with rapid rise and fall times) by a greater fraction than the positionof its edges (Figure 4.9).

Figure 4.9: Effects of noise on pulses: (a) noise induced position and width errors completelyabsent for ideal pulse; (b) small, noise induced, position and width errors for realistic pulse.

Pulse modulation may be an end in itself allowing, for example, many separate information-carrying signals to share a single physical channel by interleaving the individual signal pulsesas illustrated in Figure 4.10. Such pulse interleaving is called time division multiplexing

(TDM) and is discussed in detail in Lesson 7.

Figure 4.10: Time division multiplexing of two pulse amplitude modulated signals




of 15

Pulse modulation, however, may also represents an intermediate stage in the generation ofdigitally modulated signals from the input analog signal. This process is called sampling.

Note: It is important to realise that pulse modulation is not, in itself, a digital but ananalogue technique.

4.5.2 Sampling (Glover & "Pulse Amplitude Modulation, Pulse Code Modulation and Sampling.pdf")

The process of selecting or recording the ordinate values of a continuous (usually analogue)function at specific (usually equally spaced) values of its abscissa is called sampling. If the

function is a signal which varies with time then the samples are sometimes called a timeseries. This is the most common type of sampling process encountered in electroniccommunications although spatial sampling of images is also important.

There are obvious similarities between sampling and PAM. In fact, in many cases, the twoprocesses are indistinguishable. In an ADC process, sampling (or PAM if you prefer) always

precede quantization and PCM.

PCM modifies the pulses created by PAM to create a completely digital signal. This is doneby quantizing (assigns integer values in a specific range to sampled instances) the PAMpulses.

Figure 4.11 shows the entire analog-to-digital conversion process in an ADC. Figure 4.12

illustrates the PAM process and Figure 4.13 illustrates the quantization and PCM process.

Figure 4.11: From analog signal to PCM digital code




of 15

Figure 4.12: The (a) analog input signal and (b) PAM signal

Figure 4.13: PCM process and transmission of PCM signal

Coding of Text, Voice, Image, And Video

Documents

Transcript of Coding of Text, Voice, Image, And Video