MELP Implementation Guide

32
 MELP ENCODER The block diagram of the encoder described as follows: The input to the encoder is a human speech, and the output is bit stream to be transmitted. This application is of 2400 bits per second and frame time of 22.5 ms, and because of that there are 54 bits in the output bit stream per frame. The input speech signal is sampled by an a/d converter with sampling frequency of 8000Hz, and frame time of 22.5ms leads to a frame size of 180 samples. 1. LOW FREQUENCY REMOVE The first step is to remove the low frequencies between DC and 60Hz. This is accomplished by a 4 th order cheby chev type 2 highpass filter. The cutoff frequency is 60Hz, and a stop band rejection of 30 dB. From now on the output from this filter will be considered as the input speech to the system. 2. PITCH CALCULATION Several steps as shown in this block diagram accomplish the process of finding  pitch:

Transcript of MELP Implementation Guide

Page 1: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 1/32

MELP

ENCODER 

The block diagram of the encoder described as follows:

The input to the encoder is a human speech, and the output is bit stream to be

transmitted. This application is of 2400 bits per second and frame time of 22.5 ms,

and because of that there are 54 bits in the output bit stream per frame. The input

speech signal is sampled by an a/d converter with sampling frequency of 8000Hz,

and frame time of 22.5ms leads to a frame size of 180 samples.

1. LOW FREQUENCY REMOVE

The first step is to remove the low frequencies between DC and 60Hz. This is

accomplished by a 4th order cheby chev type 2 highpass filter. The cutoff frequency

is 60Hz, and a stop band rejection of 30 dB. From now on the output from thisfilter will be considered as the input speech to the system.

2. PITCH CALCULATION

Several steps as shown in this block diagram accomplish the process of finding

 pitch:

Page 2: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 2/32

2.1. INTEGER PITCH CALCULATION

In order to find the integer pitch, it is necessary to pass the input speech through a

1KHz lowpass filter 6th order butterworth. The integer pitch is the first pitch that

calculated by the process, and it is performed by an autocorrelation function. The

standard requires that the autocorrelation function will be performed on samples 20

till 160, so that leads to vector of 2*160=320 samples of autocorrelation. Every

calculation is centered on the last sample in the current frame.

Page 3: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 3/32

The normalized autocorrelation function will result in a vector according to the

input. We are interested in the maximum value of the autocorrelation result, and

the place of this maximum is the integer pitch .

Where,

And,

2.2. PITCH REFINEMENT AND VOICING ANALISYS

First of all the speech signal is passed through 5 parallel bandpass filters.

The filters are 6th order butterworth with passbands as shown in the figure below:

Page 4: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 4/32

2.2.1. PITCH REFINEMENT

This algorithm for pitch refinement is using the output of the first bandpass filter 

(0-500Hz) and the candidates are the integer pitch values from the current and the previous frames. The basic assumption is that we are dealing with sampled values

so that the “real” pitch is in offset of from the integer pitch ( ). These are

the steps that should be implemented in order to find the offset and with it to find

the fractional pitch: On every candidate of the integer pitch we apply a normalized

autocorrelation function over lags from 5 samples shorter to 5 samples longer than

the candidate. Then a fractional pitch refinement is performed around the optimum

integer pitch lag. Assuming that the integer has a value of T samples we perform a

linear interpolation function according to the maximum values of between lags

T and T+1. We assumed that the pitch will be between T and T+1, but in fact itmay be between T-1 and T. in order to solve this we compute and

and we decide in which interval the maximum falls. If >

then the maximum falls between T-1 and T, and we have to decrement T

 by one before making the linear interpolation. The formula of is described as by:

Page 5: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 5/32

And the normalized autocorrelation at the fractional pitch value is given by:

This produces two fractional pitch candidates and their corresponding normalized

autocorrelation values. The higher result in one of these two candidates will be the

fractional pitch , and the normalized autocorrelation will be , and he will be

saved as .

2.2.2. VOICING ANALISYS

In order to make an accurate voicing analysis we split the speech signal to five

spectral bands. The algorithm combines two methods for this implementation:

The first method is to apply the normalized autocorrelation function on every band.

In this way we seek for maximum value of the autocorrelation and it tells us about

the strength of the voice in that band.

This method is good for frames that contain a stationary voice, but it is not good

when the pitch is changing because the autocorrelation function will result in small

values, and will not represent the voice itself.

The second method is to find the envelope of the fractional pitch by using a full-

wave rectification followed by a smoothing filter with a zero at DC, and complex

 pole pair in 150Hz with radius of 0.97.

It means that we use a notch filter and a DC removal. We apply the autocorrelation

function on this envelope and get the maximum result.

The voicing decision in every band that will represent the voicing strength will be

the highest result comparing the two methods.

The five voicing strengths are saved in , where .

Page 6: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 6/32

Another tool that helps in the voicing decision is the peakiness calculation. We

have to calculate it on the residual signal over a 160 samples window centered on

the last sample in the current frame. The residual signal is represented by .

The peakiness value represents the ratio between the RMS value and the average

value and he finds peaks in the residual signal. If then the lowest band (0-

500Hz) is voiced, that means forcing . If then the lowest three bands

(0-500Hz, 500-1000Hz, and 1000-2000Hz) are voiced, that means forcing

.

2.3. FINAL PITCH CALCULATION

In order to calculate the final pitch we have to use the residual signal. We will pass

it through a 6th order butterworth lowpass filter, with a 1 KHz cutoff frequency.

First of all the normalized autocorrelation function is implemented over lags 5

samples shorter to 5 samples longer than rounded to the nearest integer. Then

around the optimum integer pitch lag we apply another fractional pitch refinement

  process. This leads to the final pitch , and the normalized autocorrelation .

The parameters that are taking place in the algorithm to find the final pitch are as

follows:

The input speech signal.

The residual signal.

The fractional pitch .

The long term average pitch .

The fractional pitch .

The normalized autocorrelation .

2.3.1. PITCH DOUBLING CHECK 

Page 7: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 7/32

This tool allows the encoder to detect pitch values that are multiplies of the actual

 pitch. In order to do so, we have to define to this tool the pitch that we want to

check, and define the doubling threshold value represented by . The output of 

this tool is the checked pitch , and his correlation .

The algorithm is described as follows:

First of all calculating the fractional pitch refinement around . This will lead to

initial values for and . Next is to find the largest value for that will lead

, where and .

will be calculated in two steps: the first one is to calculate a fractional pitch

refinement around producing ; and the second step is to make a double check 

verification if . If we find after this process such a then we have to perform

a fractional pitch refinement around and the result will update and .Afterwards if , then the double verification is performed. The actual

meaning of the double ctool is that if we apply to it the inputs: and then this

tool returns back the smaller value of and . This tool protects us against

spurious short pitch values.

2.3.2. GAIN CALCULATION

The gain calculation is performed on the input signal twice per frame and with

different window length and is determined as follows:

When the window length is the shortest multiple of which is longer than

120 samples. If this length exceeds 320 samples, it is divided by 2.

When the window length is 120 samples.

The formula of the gain is:

The gain is the RMS value measured in dB, of the signal in the window length of 

L.

First of all we calculate the first window and we will get the parameter and he

will be centered 90 samples before the last sample in the current frame.

Page 8: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 8/32

Secondly we calculate the second window and we will get the parameter and he

will be centered on last sample in the current frame.

2.3.3. AVERAGE PITCH UPFATE

The long-term average pitch is used for smoothing strong pitch values. It is

 based on a buffer that contains three most recent strong pitch values. If 

and dB (where G represents the gain) then is a strong pitch value and he

will be put in this buffer. If this condition is false then all three pitch values in the

  buffer are moved toward the default pitch, samples, according to:

Afterwards the average pitch is updated as the

median value of the three values in the buffer.

2.3.4. FINAL PITCH ALGORITHM

In this stage we can make the final pitch calculation. The algorithm is described

 below:

3. LPC ANALYSIS

The analysis of the linear predictor will spread into 2 paths: the first one is the

analysis of the speech signal, and the other one is the analysis of the residual

signal.

Page 9: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 9/32

3.1. SPEECH SIGNAL

The linear prediction is implemented by using 10 coefficients, on the input speech

signal. We take the speech signal and multiply it by a hamming window of 200

samples (25 ms), centered on the last sample in the current frame. Then we

calculate the autocorrealtion function on this window. Finding the coefficients is

 by the recursion of levinson – durbin, that uses a toeplitz matrix. In this stage we

have 10 coefficients that represent the predictor of this window.

The second step is to make a 15 Hz bandwidth expansion: this will be done by the

calculations as follows:

,Where

The result of this process is that we to multiply the prediction coefficients by the

factor where .

3.2. RESIDUAL SIGNAL

In order to get the residual signal out the speech signal we have to pass the speech

signal through a filter that his coefficients are the 10 LPC coefficients that already

 been calculated. The filter is:

This is a FIR filter and his output is the residual signal. Because we are working

with a speech signal that a vector in the length of 320 samples, the residual signal

will be also in the length of 320 samples. The residual window is centered on the

last sample in the current frame.

4. APERIODIC FLAG

Page 10: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 10/32

The aperiodic flag is a tool that helps us to deal with the problem that in voiced

frames the speech signal is not periodically perfect. In these cases the aperiodic

flag is going up, and tells the decoder to use a non- periodic excitation to simulate

unstable glottal pulses.

The aperiodic is set to “1” according to the lower band (0-500Hz) when ,

and set to “0” otherwise.

5. FOURIER MAGITUDE CALCULATION

This analysis measures the Fourier magnitudes of the residual signal in the

frequency domain. In order to do so it uses a vector of 200 samples of the residual

signal, and perform on it a 512 samples of FFT algorithm with zero padding.

The output is the magnitude of the first 10 pitch harmonics of the residual signal inthe Fourier domain.

6. QUANTIZATION

This chapter will deal with the quantization of the parameters of the encoder.

6.1. QUANTIZATION OF PREDICTION COEFFICIENTS

First of all we have to convert the 10 LPC coefficients that we already found in the

earlier stages, into 10 LSF components. LSF means Line Spectrum Frequency andthey will be represented in Hz. The second step is to organize these 10 LSF

components in ascending order. The result is that we get LSF vector. Then we have

to make sure that the 10 LSF frequencies are separated from one another by 50 Hz

minimum, and if not so then a separation algorithm must be applied as described

 below:

Calculating:

Required:

Page 11: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 11/32

The result of this algorithm is a vector - which is organized in ascending order 

and with difference of 50 Hz between the elements of the vector.

Page 12: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 12/32

The next stage is to implement the vector quantization. This is done by the MSVQ

 – multi stage vector quantizer. The MSVQ codebook consists of four stages of 

128,64,64 and 64 as shown in this figure:

The algorithm is to find the quantized vector - and as seen in the above figure he

is the sum of the vectors selected in each stage. The main purpose of the MSVQ is

to find the quantized vector that will best represent the original LSF vector. In

order to do so the MSVQ finds the codebook vector, which minimize the square of 

the weighted Euclidean distance, , between the original LSF and the quantized

LSF vectors:

Page 13: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 13/32

Where:

is the component of the unquantized LSF vector, and is the

inverse prediction filter power spectrum evaluated a frequency . That

means that this is the original spectrum.

The search is performed by saving in every stage the best 8 indexes. In the 1st stage

saving the 8 codebook vectors that gives the minimum error. In the 2nd stage the

search is combined with the result of the 1st stage, and searching again to find he

indexes. And this way the algorithm searches all the 4 stages.

This algorithm will result for every LSF vector8 best indexes, and because we have

10 LSF components, when the searching process will finish, we will have an arrayof vectors.

The final step is arranging the quantized LSF vector in ascenorder, and assuring

that the differbetween every two frequencies will be . The algorithm for 

this was described earlier.

 6.2. PITCH QUANTIZATION

The final pitch value is quantized on a logarithmic scale with a 99-level uniform

quantizer ranging from 20 samples to 160 samples. These pitch values are thenmapped to a 7-bit codeword using a look-up table. The values that are all zero

means that his is an unvoiced state, and is sent if .

6.3. GAIN QUANTIZATION

Page 14: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 14/32

The gain represented in two values: and . is quantized with a 5-bit uniform

quantizer ranging from 10 to 77 dB. is quantized to 3 bits using the following

algorithm:

is the gain of the previous frame.

The algorithm is: if for the current frame is within 5 dB of , and is within 3

of the average of the values for the current and the previous frames, then

qauntizer_index = 0 that means to the decoder to set to the mean of the values

for the current and the previous frames.

Page 15: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 15/32

Otherwise, the frame represents a transition and is quantized with a 7-level

uniform quantizer ranging from 6 dB below the minimum of the values for the

current and the previous frames to 6 dB above the maximum of those values. If 

the values are saturated they are clamped to 10 dB and 77 dB.

6.4. BANDPASS VOICING QUANTIZATION

This is the quantizing algorithm of the voicing strengths - . The

result is the quantized voicing strengths -

6.5. FOURIER MAGNITUDE QUANTIZATION

The algorithm for this quantization is described in the following steps:

-Quantizing the predictor coefficients from the quantized LSF vector.

Page 16: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 16/32

-Generating the residual window based on quantized predictor 

coefficients.

-Applying a 200 sample Hamming window and performing a 512

complex FFT with zero padding.

-Transforming the complex FFT output to magnitudes, and the

harmonics are found with a spectral peak-picking algorithm.

The spectral peak-picking algorithm finds the maximum magnitudes of the pitch

harmonics. The search is divided into several areas with width of where is the

quantized pitch. The location of the harmonics in the divided areas is calculated by

the formula: where represents the number of the harmonic required. The

number of harmonics will be the smaller between 10 and . After finding the

harmonics they will be normalized to RMS value of 1. When the process finishes

and the number of harmonics found is less than 10 the remaining will have the

value of 1.

In this stage we take the resulting 10 harmonics and quantize it with a codebook of 

256 vectors (using 8-bit vector). The codebook is searched using perceptually

weighted Euclidean distance with weights as described below that emphasize low

frequencies over higher ones:

Where is the frequency in Hz of the harmonic for the default pitch period of 60 samples. This searching process is applied by minimizing the squared

error between the magnitudes and the codebook values.

7. ERROR PROTECTION AND BIT PACKING

Page 17: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 17/32

To improve performance in channel errors, the unused coder parameters for the

unvoiced mode are replaced with FEC - forward error correction. Three Hamming

(7,4) codes and one Hamming (8,4) code are used.

In an unvoiced mode there is no need to transmit the 8-bits of Fourier magnitudes,

4-bits of bandpass voicing and the 1-bit of aperiodic flag. That means that we have

a total of 13 spare bits to be replaced by the FEC algorithm. FEC replaces these 13-

 bit with the parity bits of the Hamming codes.

The algorithm protects the first MSVQ index, and the two gain values.

The parity generator matrix for the Hamming (7,4) code is:

The parity generator matrix for the Hamming (8,4) code is:

The algorithm described as follows: the protected n-bits are placed into a columnvector, and then multiplied by the parity matrix. The result is an n-bit parity vector,

to be transmitted.

Defining the 3-bit and 4-bit protected vector as:

,

And the 3-bit and 4-bit parity vector as:

Page 18: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 18/32

,

The process for 3-bits is as follows:

The process for 3-bits is as follows:

The parity vector will be placed in the spare bits of the unvoiced frame, and he will

 be transmitted.

8. TRANSMITTION BIT STREAM

The total 54 bits that are the output of the encoder are listed below.

They are divided into 2 columns depended on the mode of the frame: voiced or 

unvoiced. These 54 are without forward error correction.

Page 19: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 19/32

The arrangement of the bit stream of these 54 bits is also depended on the mode of 

the frame: voiced or unvoiced, and for unvoiced frame the forward error correction

 bits are placed to protect the relevant bits.

 Notice: Some of the material in this site taken from books, federal standards, and

other sites. This is purely an Educational site - if you find we have violated any

copyright, please contact us and we will remove the material in violation.

 

Page 20: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 20/32

MELP

DECODER 

Overview

The MELP parameters which are quantized and transmitted are the final pitch, ;

the bandpass voicing strengths, ; the two gain values, and ; the

linear prediction coefficients, ; the Fourier magnitudes; and the

aperiodic flag.

The use of the following quantization procedures is required for interoperability

among various implementations.

Decoder Block Diagram

Bit Unpacking and Error Correction:

After the Encoder Operation, all the described parameters are transmitted to the

Decoder thought some medium path, which may cause Bit Errors (BER) at the

decoder side, it is important to verify the received data with the Error correction

mechanism.

The received bits are unpacked from the channel and assembled into the parameter 

codewords, according to this table:

Page 21: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 21/32

Parameter decoding is different for voiced and unvoiced modes, according to the

following table that show how the 54 bits in an MELP frame are allocated among

the parameters:

The pitch is decoded first, since it contains the mode information.

Pitch Decoding:

Page 22: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 22/32

This table described bellow is used in decoding the 7-bit pitch code to determine if 

a frame is voiced, unvoiced, or whether a frame erasure is indicated.

If the pitch code is all-zero or has only one bit set, then the unvoiced mode is used.

If two bits are set, a frame erasure is indicated. Otherwise, the pitch value is

decoded and the voiced mode is used.

In the unvoiced mode, the (8,4) Hamming code is decoded to correct single bit

errors and detect double errors. If an uncorrectable error is detected, a frame

erasure is indicated. Otherwise, the (7,4) Hamming codes are decoded, correcting

single errors but without double error detection. ( The theory on Error correctionwith Hamming code was explained in the Encoder chapter).

If any erasure is detected in the current frame, by the Hamming code, by the pitch

code, or directly signaled from the channel, then a frame repeat mechanism is

implemented.

All of the parameters for the current frame are replaced with the parameters from

Page 23: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 23/32

the previous frame. In addition, the first gain term is set equal to the second gain

term so that no gain transitions are allowed.

If an erasure is not indicated, the remaining parameters are decoded.

The LSF’s are checked for ascending order and minimum separation as described

in Section 6.1. - QUANTIZATION OF PREDICTION COEFFICIENTS in the

Encoder.

In the unvoiced mode, default parameter values are used for the pitch, jitter,

 bandpass voicing, and Fourier magnitudes.

The pitch value is set to 50 samples, the jitter is set to 25%, all of the bandpass

voicing strengths are set to 0, and the Fourier magnitudes are set to 1. In the voiced

mode, Vbp1 is set to 1; jitter is set to 25% if the aperiodic flag is a 1; otherwise

 jitter is set to 0%. The bandpass voicing strength for the upper four bands is set to

1 if the corresponding bit is a 1; otherwise the voicing strength is set to 0.There is one exception. If 0001 is received for , respectively, then

is set to 0.

When the special all-zero code for the first gain parameter, G1, is received, some

errors in the second gain parameter, G2, can be detected and corrected.

This correction process provides improved performance in channel errors.

Gain Decoding:

The decoding for the two gain parameters is shown in the following Flow Chart:

Page 24: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 24/32

Noise Attenuation:

For quiet input signals, a small amount of gain attenuation is applied to both

decoded gain parameters using a power subtraction rule.

This attenuation is a simplified, frequency invariant case of the Smoothed Spectral

Subtraction noise suppression method as defined in :

L. Arslan, A. McCree, and V. Viswanathan, “New Methods for Adaptive Noise

Suppression,” Proceedings of IEEE ICASSP 1995, pp. 812-815.

This article describes the estimation of the the value of - the Speech Signal gain

without the Environment Noise ( ).

Before determining the attenuation for the first gain term, , a background noise

estimate, , is updated as follows:

Page 25: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 25/32

We can see that the noise estimator moves up by 3 dB per second and down by 12

dB per second for the gain update rate of 88.9 updates per second,

and he clamped between 10 and 80.

 Noise estimation is disabled for repeated frames in order to prevent repeated

attenuation.

The background noise estimate is also used in the adaptive spectral enhancement

calculation - this method is described later.

Refinement Gain :

Gain is modified by subtracting a (positive) correction term, , given in dB by

- the background noise estimated [dB].

- the first gain term [dB].

The correction is clamped to a maximum value of 6 dB to avoid fluctuations and

signal distortion.

Page 26: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 26/32

To ensure that the attenuation is applied only to quiet signals, the value as used in

the equation bellow is clamped to an upper limit of 20 dB.

The noise estimation and gain modification steps are then repeated for the second

gain term,  G 2. 

 Noise estimation and gain attenuation are disabled for repeated frames.

Parameter Interpolation:

All MELP synthesis parameters are interpolated pitch-synchronously for each

synthesized pitch period.

The interpolated parameters are the gain (in dB), LSF’s, pitch, jitter, Fourier 

magnitudes, pulse and noise coefficients for mixed excitation, and spectral tilt

coefficient for the adaptive spectral enhancement filter.

Gain is linearly interpolated between the second gain of the prior frame, , and

the first gain of the current frame, ,if the starting point, , , of the new

 pitch period is less than 90. Otherwise, gain is interpolated between and .

 Normally, the other parameters are linearly interpolated between the past and

current frame values. The interpolation factor, int , for these parameters is based on

the starting point of the new pitch period:

There are two exceptions to this interpolation procedure.

First, if there is an onset with a high pitch frequency, pitch interpolation is disabled

and the new pitch is immediately used.

This condition is met when is more than 6 dB greater than and the current

frame’s pitch period is less than half the prior frame’s pitch period.

The second exception also involves a gain onset. If differs from  by more than

6 dB, then the LSF’s, spectral tilt, and pitch are interpolated using the interpolated

gain trajectory as a basis, since the gain is transmitted twice per frame and has a

more accurate interpolation path.In this case, the interpolation factor is given by:

Page 27: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 27/32

where Gint is the interpolated gain.

This interpolation factor is then clamped between 0 and 1.

Mixed Excitation Generation:

The mixed excitation is generated as the sum of the filtered pulse and noiseexcitations.

As described in the Bit Unpacking paragraph there is a differin the parameters

values (pitch, jitter, bandpass voicing, and Fourier magnitudes) between Voiced

and Unvoiced signals.

The pulse excitation, , is computed using an inverse Discrete

Fourier Transform of one pitch period in length.

The final equation for the pulse excitation is:

Page 28: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 28/32

Pitch Period Estimation:

The pitch period, T, is the interpolated pitch value plus the jitter times the pitch,

where the jitter is the interpolated jitter strength times the output of a uniform

randnumber generator between -1and 1. This pitch period is rounded to the nearest

integer and clamped between 20 and 160.

The equation that describe the Final Pitch Period:

All of the phases for the pulse excitation are set to zero, hence M(k) is real. Since

is real, the magnitudes obey:

M(T - k) = M(k), k = 1,2,…,L

Where,

If T is even number then L=T/2

If T is odd number then L=(T-1)/2

The DC term, M(0) , is set to 0.

Magnitude terms M(k), k=1,2,…10 , are set to the interpolated values of the

Fourier magnitudes, and any magnitudes not otherwise specified are set to 1.

To prevent rapid changes at the start of the pitch period, the pulse excitation is

circularly shifted by ten samples of delay so the main excitation pulse occurs at the

Page 29: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 29/32

tenth sample of the period. The pulse is then multiplied by the square root of the

 pitch to give a unity RMS signal, and then multiplied by 1000 to give a nominal

signal level.

The noise is generated by a uniform random number generator with an RMS value

of 1000, and range of -1732 to 1732.

The pulse and noise excitation signals are then filtered and summed to form the

mixed excitation.

The pulse filter for the current frame is given by the sum of all the bandpass filter 

coefficients for the voiced frequency bands, while the noise filter is given by the

sum of the bandpass filter coefficients for the unvoiced bands.

Page 30: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 30/32

These filter coefficients are interpolated pitch synchronously. The bandpass filter coefficients for each of the five bands are given in Appendix A in the MELP

Federal Standard specification.

Adaptive Spectral Enhancement

The adaptive spectral enhancement filter is applied to the mixed excitation signal.

This filter is a tenth order pole/zero filter, with additional first-order tilt

compensation. Its coefficients are generated by bandwidth expansion of the linear 

 prediction filter transfer function, A(z) , corresponding to the interpolated LSF’s.

The transfer function of the enhancement filter, , is given by:

where,

Page 31: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 31/32

and tilt coefficient is first calculated as , then interpolated, then

multiplied by p, the signal probability.

The first reflection coefficient, , is calculated from the decoded LSF’s.

By the MELP predictor coefficient sign convention, is usually negative for voiced

spectra.

The signal probability is estimated by comparing the current interpolated gain, ,

to the background noise estimate using the formula:

This signal probability is clamped between 0 and 1.

Linear Prediction Synthesis:

The synthesis uses a direct form filter, with the coefficients corresponding to the

interpolated LSF’s.

The interpolated LSF's with the Vector Quantization parameters, decodes the LPC

filter coefficients, so the LPC filter is described with the equation:

  A(z) is described as .

Gain Adjustment:

Since the excitation is generated at an arbitrary level, the speech gain must be

introduced to the synthesized speech.

The correct scaling factor, , is computed for each synthesized pitch period of lengthT  by dividing the desired RMS value ( must be converted from dB) by the RMS

value of the unscaled synthetic speech signal :

Page 32: MELP Implementation Guide

5/11/2018 MELP Implementation Guide - slidepdf.com

http://slidepdf.com/reader/full/melp-implementation-guide 32/32

To prevent discontinuities in the synthesized speech, this scale factor is linearly

interpolated between the previous and current values for the first ten samples of the

 pitch period.

Pulse Dispersion:

The pulse dispersion filter is a 65th order FIR filter derived from a spectrally-

flattened triangle pulse.

The coefficients are listed in Appendix B in the MELP Federal Standard

specification.

Synthesis Loop Control

After processing each pitch period, the decoder updates  by adding T , the number 

of samples in the period just synthesized.If , synthesis for the current frame continues from the Parameter 

Interpolation step. Otherwise, the decoder buffers the remainder of the current

 period which extends beyond the end of the current frame and subtracts 180 from

 produce its initial value next frame.

Notice: Some of the material in this site taken from books, federal standards, and

other sites. This is purely an Educational site - if you find we have violated any

copyright, please contact us and we will remove the material in violation.