Interleaving AndSpeech Coding

16
Speech Coding: To transform the speach that enters the headset into the radio waves that are transmitted to the base station, GSM performs the following operations [2 ]. Please click on each operation to learn more about the operation. A small introduction to each operation can also be found below. Of course to decode the message, all of the above stages must be undone. These stages are not shown in the diagram for clarity. Speech Coding GSM is a digital communications standard, but voice is analog, and therefore it must be converted to a digital bit stream. GSM uses Pulse Coded Modulation (64kbps) to digitize voice, and then uses the Full-Rate speech codec to remove the redundancy in the signal and achieve a bit rate of 13 kbps In order to send our voice across a radio network, we have to turn our voice into a digital signal. GSM uses a method called RPE-LPC (Regular Pulse Excited - Linear Predictive Coder with a Long Term Predictor Loop) to turn our analog voice into a compressed digital equivalent. Once we have a digital signal we have to add some sort of redundancy so that we can recover from errors when we trams our digital voice over the radio channel. GSM uses a convolution codes to encode digital speech representations.

Transcript of Interleaving AndSpeech Coding

Page 1: Interleaving AndSpeech Coding

Speech Coding: To transform the speach that enters the headset into the radio waves that are transmitted to the base station, GSM performs the following operations [2]. Please click on each operation to learn more about the operation. A small introduction to each operation can also be found below. 

 Of course to decode the message, all of the above stages must be undone. These stages are not shown in the diagram for clarity.

  Speech Coding GSM is a digital communications standard, but voice is analog, and therefore it must be converted to a digital bit stream. GSM uses Pulse Coded Modulation (64kbps) to digitize voice, and then uses the Full-Rate speech codec to remove the redundancy in the signal and achieve a bit rate of 13 kbps  

In order to send our voice across a radio network, we have to turn our voice into a digital signal. GSM uses a method called RPE-LPC (Regular Pulse Excited - Linear Predictive Coder with a Long Term Predictor Loop) to turn our analog voice into a compressed digital equivalent. Once we have a digital signal we have to add some sort of redundancy so that we can recover from errors when we trams our digital voice over the radio channel. GSM uses a convolution codes to encode digital speech representations.

Speech Encoding         RPE-LPC In modern land-line telephone systems, digital coding is used. The electrical variations induced into the microphone are sampled and each sample is then converted into a digital code. The voice waveform is then sampled at a rate of 8 kHz. Each sample is then converted into an 8 bit binary number representing 256 distinct values. Since we sample 8000 times per second and each sample is 8 binary bits, we have a bitrate of 8kHz X 8 bits = 64kbps. This bitrate is

Page 2: Interleaving AndSpeech Coding

unrealistic to transmit across a radio network since interference will likely ruin the transmitted waveform. GSM speech encoding works to compress the speech waveform into a sample that results in a lower bitrate using RPE-LPC. A [1] LPC encoder fits a given speech signal against a set of vocal characteristics. The best-fit parameters are transmitted and used by the decoder to generate synthetic speech that is similar to the original. Information from previous samples is used to predict the current sample. The coefficients of the linear combination of the previous samples, plus an encoded form of the residual, the difference between the predicted and actual sample, represent the signal. Speech is divided into 20 millisecond samples, each of which is encoded as 260 bits, giving a total bit rate of 13 kbps. This way GSM can transmit 4 times (floor[64kbps/13kbps]) as many phone calls as a regular land-line telephone. See Figure 1 for a representation of RPE-LPC 

Figure 1- A block diagram detailing how an analog voice is digitized and encoded to produce a digital voice signal.

Channel Encoding Once we have a compressed digital signal, we must add a number of bits for error control to protect the signal from interference. These bits are called redundancy bits. The GSM system uses convolutional encoding to achieve this protection. The exact algorithms used differ for speech and for different data rates. The method used for speech blocks will be described below.   Bit Composition of the Speech Signal Recall that the RPE-LPC Encoder produces a block of 260 bits every 20 ms. It was found (though testing) that some of the 260 bits were more important when compared to others. Below is the composition of these 260 bits. 

  Class Ia - 50 bits (most sensitive to bit errors)  Class Ib - 132 bits (moderately sensitive to bit errors)  Class II - 78 bits (least sensitive to error) As a result of some bits being more important than others, GSM adds redundancy bits to each of the three Classes differently. The Class IA bits are encoded in a cyclic encoder. The Class Ib bits (together with the encoded Class IA bits) are encoded using convolutional encoding. Finally, the Class II bits are merely added to the result of the convolutional encoder. Below is the operation of each encoder as related to each Class of bits.

   Cyclic Encoding

 The Class IA bits are encoded using a cyclic encoder to add three bits of redundancy. The resulting Class IA bits are of the form:

where b0,b1,b2 are the three redundancy bits added by the cyclic encoder and m0,...,m49 are the original Class IA bits. The cyclic encoder produces 50+3=53 bits. Cyclic codes are linear codes (the sum of any two codes is also a codeword), as we have seen in class. In addition to

Page 3: Interleaving AndSpeech Coding

being linear, a cyclic shift, or rotate, of a codeward produces another codeword Since the code used in GSM is a (53,50) code the generator polynomial used in the encoding is of degree 53-50 = 3. The specific polynomial used in GSM is x^3 + x + 1 [3] . The following block diagram can produce the codeword. Once the data has been completely shifted through the system, the contents of Reg0 through Reg2 will contain the three additional bits. 

 GSM chose to use cyclic encoding due to the ability to quickly determine if errors are present. The three redundancy bits produced by the cyclic encoder enable the receiver to quickly determine if an error was produced. If an error was produced the current 53 bit frame is discarded and replaced by the last known "good" frame.   Convolutional Encoding

 The resulting 53 bits of the cyclic encoder are added to the 132 Class Ib bits (plus a tail of 4 extra bits so that the encoder may be flushed) and encoded using the convolutional encoder. The convolutional encoder adds one redundancy bit for every bit that it sees based on the last four bits in the sequence. Below is a block diagram detailing the convolutional encoder.

Figure 2 - A block diagram of a convolutional encoder.  The convolutional encoder retains a memory of the last four bits in the sequence (a single bit is retained in each flip-flop). These four bits are added together using a modulo-2 adder. The resulting bit is sent to the output via path 1. The encoder sends a second bit to the output via path 2. As a result, the convolutional encoder encodes one input bit into two output bits.

Page 4: Interleaving AndSpeech Coding

 In GSM there are 4 flip-flops, and the the convolution performed is of D^4 +D^3 + 1 and D^4 +D^3 + D + 1 [3].

 GSM chose to employ a convolutional encoder due to its ability to efficiently correct errors. In order to correct errors, GSM employs the use of Trellis Diagrams.

 Once the convolutional encoder has encoded the bits, a new bit sequence of 378 ( 2(53+132+4=189)=378) bits is produced. These 378 bits are directly added to the 78 Class II bits (directly added since these bits are least sensetive to error). As a result, the channel encoded bit sequence is now 378+78=456 bits long. Therefore, each 20 ms burst produces 456 bits at a bit rate of 22.8 kbps.

 To further protect against bit errors, the 456 bit sequence is then diagonally interleaved. See the interleaving section 

  Channel Coding Once the voice signal has been coded into a digital bit stream. extra bits are added to the bit stream so that the receiver can recognize and correct errors in the bit stream which could have occurred during transmission. GSM uses a technique called convolutional coding. Please go to the Coding Section to learn more about channel coding 

  Interleaving Interleaving is the processes of rearranging the bits. Interleaving allows the error correction algorithms to correct more of the errors that could have occured during transmission. By interleaving the code, there is less possibility that a whole chuck of code can be lost. Consider this example to see how interleaving works.  We need to transmit 20 bits. Furthermore, 10 bits can be transmitted in one transmission burst, and the error correcting mechanism can correct 3 errors per 10 bits. Take a look at the following two scenarios: 

 With interleaving the receiver is able to get all 20 bits correctly but without interleaving we lose 1 complete burst.  

Page 5: Interleaving AndSpeech Coding

In GSM the interleaving is much more complicated than the simple example above. The 456 bits outputed by the convolutional encoder are divided into 57 bit blocks by selecting the 0th, 8th, 16th through 448th bits in the first block, the 1st, 9th 17th through 448th bits in the 2nd block and so on to have 8 blocks [2]. Then the bits in the first 4 blocks are placed in the even bit positions for the total block of 456 bits, and the bits in the second set of 4 blocks are placed in the odd positions [2]. 

  Multiple Access GSM allows many users to use their cellphones at the same time. GSM uses a combination of Time-Division Multiple Access (TDMA) and Frequency-Division Multiple Acess (FDMA) to share the limited bandwith that is provided by regulators to the service providers. FDMA divides the spectrum into small slices, and then each frequency slice is seperated in time into many blocks by TDMA. An individual using GSM receives a block every several blocks. The transmission of the voice signal is no longer continious because of the division of the the frequency slice in time, but the data is transmitted in bursts. The burst assembly operation takes the final encoded data and groups it into bursts. Please go to the Multiple Access Section to learn the specific GSM FDMA and TDMA implementations,how a frequency is chosen for a particular user, and how the data is divided into bursts.

The Multiple Access Scheme defines how the GSM radio frequency can be shared by different simultaneous communication between different mobile stations located in different cells. GSM uses a mix of Frequency Division Multiple Access (FDMA) and Time Division Multiple Access (TDMA) combined with frequency hopping for its Multiple Access Scheme. Each user is given a pair of frequencies (one for uplink and one for downlink) and a time slot during a time frame. The time frame provides the basic unit of logical channels.

GSM Frequency Spectrum         Frequency Allocation There are two frequency bands of 25 MHz each that have been allocated for the use of GSM. The band 890 - 915 MHz is used for the uplink direction (from the mobile station to the base station). The band 935 - 960 MHz is used for the downlink direction (from the base station to the mobile station) [1].  

Figure 1: GSM Frequency Bands [2]

FDMA and TDMA FDMA divides the frequency spectrum into small slices, which are assigned to the user. Since the radio spectrum is limited and users do not free their assigned frequency until they are completely finished with it, the number of users in the system can be quickly limited [1]. As the number of users increases, the required frequency spectrum also increases. TDMA allows many users to share a common channel. The unit of time in TDMA is called a burst. Each user is assigned its own burst within a collection of bursts called a frame.

Page 6: Interleaving AndSpeech Coding

  Carrier Frequencies GSM uses TDMA within a FDMA structure. As a result, different users can transmit using the same frequency, but they can't transmit at the same time. A 25MHz frequency band is divided using an FDMA scheme into 124 one-way carrier frequencies. Each base station is assigned one or more carriers to use in its cell. A 200kHz frequency band separates the carrier frequencies from each other. Normally, a 25MHz band should be divisible into 125 carrier frequencies but in GSM the 1st carrier frequency is used as a guard band between GSM and other services that might be working on lower frequencies. 

Figure 2: Frequency Division in the Uplink Spectrum

  Bursts Each carrier frequency is then divided according to time using a TDMA scheme. Each of the carrier frequencies is divided into a 120ms multiframe. A multiframe is made up of 26 frames. Two of these frames are used for control purposes, while the remaining 24 frames are used for traffic. 

Figure 3: Structure of a Multiframe 

Each frame can in turn be divided into 8 bursts, and each of the 8 bursts is assigned to ta single user. In a TDMA system, a burst is the unit of time, and each burst lasts for approximately 0.577 ms.  

Burst0.577ms

Burst0.577ms

Burst0.577ms

Burst0.577ms

Burst0.577ms

Burst0.577ms

Burst0.577ms

Burst0.577ms

Figure 4: Structure of a Frame

  Burst Structure

Page 7: Interleaving AndSpeech Coding

 In GSM, there are 4 different types of bursts. A normal burst is used to carry speech and data information. The structure of the normal burst is shown below. Each burst consists of 3 tail bits at each end, 2 data sequences of 57-bits, a 26-bit training sequence for equalization, and 8.25 guard bits. There are 2 stealing bits (1 for each data sequence) that are used by Fast Access Control Channels.  The frequency correction burst and synchronous burst have the same length as normal burst. They have different internal structures to differentiate them from normal bursts. The frequency correction burst is used in Frequency Correction Channels (FCCH) and the synchronous burst is used in Synchronization Channels (SCH). The random access burst is shorter than a normal burst, and is only used on Random Access Channels (RACH).

 

Figure 5: Burst Structure [1] 

Channels A channel relates to the recurrence of one burst in every frame. The channel is characterized by both its frequency and its position within the TDMA frame. This characterization is cyclical, and the channel pattern repeats every 3 hours. There are two major categories of channels in GSM: traffic channels, and control channels. Channels can also be classified as being dedicated or common. Dedicated channels are assigned to a mobile station, while common channels are used by idle mobile stations.  

  Traffic Channels  Traffic channels transport speech and data information. A traffic channel using a group of 26 TDMA frames called 26-Multiframe. In this standard, traffic channels for uplinks and downlinks are separated by 3 bursts. Because of this, the mobile station does not need to transmit and receive at the same time. A full rate traffic channel uses 1 time slot in each of the traffic frames in a multiframe.  

  Control Channels Control channels deal with network management messages and channel maintenance tasks. These channels can be used by either idle or dedicated mobile stations. Some of the common channel types are: 

Broadcast Control Channels Frequency Correction Channels Synchronization Channels Random Access Channels Paging Channels Access Grant Channels

 Broadcast channels are used by the base station to provide the mobile station with network synchronization information.

Page 8: Interleaving AndSpeech Coding

There are 3 functions that a broadcast channel can have. The broadcast control channel (BCCH) provides that mobile station with the parameters it needs to identify and access the network. A synchronous channel (SCH) gives the mobile station the training sequence needed to demodulate the information transmitted by the base station. The Frequency Correction Channel (FCCH) supplies the mobile station with the frequency of the system to synchronize with the network. Every GSM cell broadcasts exactly one FCCH and one SCH, which are defined to be on time slot 0 in the TDMA frame [1]. Paging Channels are used to alert the mobile station of incoming calls. Random Access channels are used by the mobile station to request access to the network. The base station uses an Access Grant Channel to inform the mobile station about which channel it should use. 

  Ciphering Ciphering is used to encrypt the data so that no one can overhear the conversation of another user.  In GSM the two parties involved in encrypting and decrypting the data are the Authentication Center (AuC) and the SIM card in the mobile phone. Each SIM card holds a unique secret key, which is known by the AuC. The SIM card and AuC then, follow a couple algorithms to first authenticate the user, and then encryt and decrypt the data. For authentication, the AuC sends a 128-bit random number to the mobile phone [1]. The SIM card uses it's secret key and the A3 algorithm to perform a function on the random number and sends back the 32-bit result [1]. Since the AuC knows the SIM card's secret key, it performs the same function, and checks that the result obtained from the mobile phone matches the result it obtained. If it does, the mobile user is authenticated. Once authentication has been performed, the random number and the secret key are used in the A8 algorithm to obtain a 64-bit ciphering key [1]. This ciphering key is used with the TDMA frame number in the A5 algorithm to generate a 114 bit sequence [1]. Note: the ciphering key is constant throughout a conversation, but the 114 bit sequence is different for every TDMA frame. The 114 bit sequence is XORed with the two 57 bit blocks in a TDMA burst [1]. The only user that can decrypt the data is the mobile phone or the AuC since they are the only ones that have access to the secret key, which is needed to generate the ciphering key, and the 114 bit sequence. Note that the A3, A5, and A8 algorithms are not known to the public domain, however some information about A5 has been leaked. It is known that A5 has a 40-bit key length, which allows for the encryption to be broken in a matter of days, but since cellular calls have a short lifetime, the weakness of the algorithm is not an issue [3].

 

  Modulation The original analog voice signal, has been digitized, interleaved, grouped, and encoded, and the digital data is ready to be transmitted. The digital bit stream must be encoded in a pulse and transmitted over radio frequencies. Modulation changes the '1' and '0's in a digital representation to another representation that is more suitable for transmission over airwaves.  Please go to the Modulation to learn about the how the bit representation is changes, and what is transmitted.

Page 9: Interleaving AndSpeech Coding

GSM Physical Layer Modulation GSM uses Gaussian-Fitered Minimum Shift Keying (GMSK) as it's modulation scheme. Before the GMSK can be explained, some fundamentals of Minimum Shift Keying (MSK) must be known.

MSK         Signal Pulse MSK uses changes in phase to represent 0's and 1's, but unlike most other keying schemes we have seen in class, the pulse sent to represent a 0 or a 1, not only depends on what information is being sent, but what was previously sent.  The pulse used in MSK is the following [1]: 

where

                       if a '1' was sent

                        if a '0' was sent

 Right from the equation we can see that depends not only from the symbol being sent (from the change in the sign), but it can be seen that is also depends on which means that the pulse also depends on what was previously sent. To see how this works let's work through an example. Assume the data being sent is 111010000, then the phase of the signal would fluctuate as seen in Figure 1. 

 If it assumed that h = 1/2, then the figure simplifies. The phase can now go up or down by increments of pi/2, and the values at which the phase can be (at integer intervals of Tb) are {-pi/2, 0, pi/2, pi} [1] The above example now changes to the graph in figure 2. The figure illustrates one feature of MSK that may not be obvious, when a large number of the same symbol is transmitted, the phase does not go to infinity, but rotates around 0

Page 10: Interleaving AndSpeech Coding

phase. 

 

  Signal Constellation So what does the signal constellation of MSK look like. Taking the equation for the pulse and using the trigonometric identity for a sum in a cosine we get [1]: 

 It turns out that the function above can be simplified into the following [1]: 

where

and

 Thus the equations for s1 and s2 depend only on and with each taking one of two possible values. Therefore there are 4 different possibilities [1]:

 Now that the signal space has been defined by and , and the range of values for s1 and s2 have been determined, the signal constellation can be drawn. See Figure 3 for the signal constellation [1]. 

Page 11: Interleaving AndSpeech Coding

  Advantages of MSK Even though the derivation of MSK was produced by analyzing the changes in phase, MSK is actually a form of frequency-shift-keying (FSK) with  

 (where f1 and f2 are the frequencies used for the pulses). MSK produces an FSK with the minimum difference between the frequencies of the two FSK signals such that the signals do not interfere with each other [1].  MSK produces a power spectrum density that falls off much faster compared to the spectrum of QPSK. While QPSK falls off at the inverse square of the frequency, MSK falls off at the inverse fourth power of the frequency. Thus MSK can operate in in a smaller bandwidth compared to QPSK [1]. 

GMSK   Even though MSK's power spectrum density falls quite fast, it does not fall fast enough so that interference between adjacent signals in the frequency band can be avoided. To take care of the problem, the original binary signal is passed through a Gaussian shaped filter before it is modulated with MSK.   Frequency Response The principle parameter in designing an appropriate Gaussian filter is the time-bandwidth product WTb. Please see figure 4 for the frequency response of different Gaussian filters. Note that MSK has a time-bandwidth product of infinity [1]. 

Page 12: Interleaving AndSpeech Coding

As can be seen from Figure 4, GMSKs power spectrum drops much quicker than MSK's. Furthermore, as WTb is decreased, the roll-off is much quicker.

  Time-Domain Response Since lower time-bandwidth products produce a faster power-spectrum roll-off, why not have a very small time-bandwidth product. It happens that with lower time-bandwidth products the pulse is spread over a longer time, which can cause intersymbol interference. Please see Figure 5 for the time-domain response of the Gaussian filter [1]. 

Therefore as a compromise between spectral efficiency and time-domain performance, an intermediate time-bandwidth product must be chosen. 

  GSM Specifics In the GSM standard a time-bandwidth product of 0.3 was chosen as a compromise between spectral efficiency and intersymbol interference. With this value of WTb, 99% of the power spectrum is within a bandwidth of 250 kHz, and

Page 13: Interleaving AndSpeech Coding

since GSM spectrum is divided into 200 kHz channels for multiple access, there is very little interference between the channels [2]. The speed at which GSM can transmit at, with WTb=0.3, is 271 kb/s. (It cannot go faster, since that would cause intersymbol interference) [2].