MELP Implementation Guide
-
Upload
tran-manh-dung -
Category
Documents
-
view
134 -
download
1
Transcript of MELP Implementation Guide
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 1/32
MELP
ENCODER
The block diagram of the encoder described as follows:
The input to the encoder is a human speech, and the output is bit stream to be
transmitted. This application is of 2400 bits per second and frame time of 22.5 ms,
and because of that there are 54 bits in the output bit stream per frame. The input
speech signal is sampled by an a/d converter with sampling frequency of 8000Hz,
and frame time of 22.5ms leads to a frame size of 180 samples.
1. LOW FREQUENCY REMOVE
The first step is to remove the low frequencies between DC and 60Hz. This is
accomplished by a 4th order cheby chev type 2 highpass filter. The cutoff frequency
is 60Hz, and a stop band rejection of 30 dB. From now on the output from thisfilter will be considered as the input speech to the system.
2. PITCH CALCULATION
Several steps as shown in this block diagram accomplish the process of finding
pitch:
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 2/32
2.1. INTEGER PITCH CALCULATION
In order to find the integer pitch, it is necessary to pass the input speech through a
1KHz lowpass filter 6th order butterworth. The integer pitch is the first pitch that
calculated by the process, and it is performed by an autocorrelation function. The
standard requires that the autocorrelation function will be performed on samples 20
till 160, so that leads to vector of 2*160=320 samples of autocorrelation. Every
calculation is centered on the last sample in the current frame.
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 3/32
The normalized autocorrelation function will result in a vector according to the
input. We are interested in the maximum value of the autocorrelation result, and
the place of this maximum is the integer pitch .
Where,
And,
2.2. PITCH REFINEMENT AND VOICING ANALISYS
First of all the speech signal is passed through 5 parallel bandpass filters.
The filters are 6th order butterworth with passbands as shown in the figure below:
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 4/32
2.2.1. PITCH REFINEMENT
This algorithm for pitch refinement is using the output of the first bandpass filter
(0-500Hz) and the candidates are the integer pitch values from the current and the previous frames. The basic assumption is that we are dealing with sampled values
so that the “real” pitch is in offset of from the integer pitch ( ). These are
the steps that should be implemented in order to find the offset and with it to find
the fractional pitch: On every candidate of the integer pitch we apply a normalized
autocorrelation function over lags from 5 samples shorter to 5 samples longer than
the candidate. Then a fractional pitch refinement is performed around the optimum
integer pitch lag. Assuming that the integer has a value of T samples we perform a
linear interpolation function according to the maximum values of between lags
T and T+1. We assumed that the pitch will be between T and T+1, but in fact itmay be between T-1 and T. in order to solve this we compute and
and we decide in which interval the maximum falls. If >
then the maximum falls between T-1 and T, and we have to decrement T
by one before making the linear interpolation. The formula of is described as by:
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 5/32
And the normalized autocorrelation at the fractional pitch value is given by:
This produces two fractional pitch candidates and their corresponding normalized
autocorrelation values. The higher result in one of these two candidates will be the
fractional pitch , and the normalized autocorrelation will be , and he will be
saved as .
2.2.2. VOICING ANALISYS
In order to make an accurate voicing analysis we split the speech signal to five
spectral bands. The algorithm combines two methods for this implementation:
The first method is to apply the normalized autocorrelation function on every band.
In this way we seek for maximum value of the autocorrelation and it tells us about
the strength of the voice in that band.
This method is good for frames that contain a stationary voice, but it is not good
when the pitch is changing because the autocorrelation function will result in small
values, and will not represent the voice itself.
The second method is to find the envelope of the fractional pitch by using a full-
wave rectification followed by a smoothing filter with a zero at DC, and complex
pole pair in 150Hz with radius of 0.97.
It means that we use a notch filter and a DC removal. We apply the autocorrelation
function on this envelope and get the maximum result.
The voicing decision in every band that will represent the voicing strength will be
the highest result comparing the two methods.
The five voicing strengths are saved in , where .
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 6/32
Another tool that helps in the voicing decision is the peakiness calculation. We
have to calculate it on the residual signal over a 160 samples window centered on
the last sample in the current frame. The residual signal is represented by .
The peakiness value represents the ratio between the RMS value and the average
value and he finds peaks in the residual signal. If then the lowest band (0-
500Hz) is voiced, that means forcing . If then the lowest three bands
(0-500Hz, 500-1000Hz, and 1000-2000Hz) are voiced, that means forcing
.
2.3. FINAL PITCH CALCULATION
In order to calculate the final pitch we have to use the residual signal. We will pass
it through a 6th order butterworth lowpass filter, with a 1 KHz cutoff frequency.
First of all the normalized autocorrelation function is implemented over lags 5
samples shorter to 5 samples longer than rounded to the nearest integer. Then
around the optimum integer pitch lag we apply another fractional pitch refinement
process. This leads to the final pitch , and the normalized autocorrelation .
The parameters that are taking place in the algorithm to find the final pitch are as
follows:
The input speech signal.
The residual signal.
The fractional pitch .
The long term average pitch .
The fractional pitch .
The normalized autocorrelation .
2.3.1. PITCH DOUBLING CHECK
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 7/32
This tool allows the encoder to detect pitch values that are multiplies of the actual
pitch. In order to do so, we have to define to this tool the pitch that we want to
check, and define the doubling threshold value represented by . The output of
this tool is the checked pitch , and his correlation .
The algorithm is described as follows:
First of all calculating the fractional pitch refinement around . This will lead to
initial values for and . Next is to find the largest value for that will lead
, where and .
will be calculated in two steps: the first one is to calculate a fractional pitch
refinement around producing ; and the second step is to make a double check
verification if . If we find after this process such a then we have to perform
a fractional pitch refinement around and the result will update and .Afterwards if , then the double verification is performed. The actual
meaning of the double ctool is that if we apply to it the inputs: and then this
tool returns back the smaller value of and . This tool protects us against
spurious short pitch values.
2.3.2. GAIN CALCULATION
The gain calculation is performed on the input signal twice per frame and with
different window length and is determined as follows:
When the window length is the shortest multiple of which is longer than
120 samples. If this length exceeds 320 samples, it is divided by 2.
When the window length is 120 samples.
The formula of the gain is:
The gain is the RMS value measured in dB, of the signal in the window length of
L.
First of all we calculate the first window and we will get the parameter and he
will be centered 90 samples before the last sample in the current frame.
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 8/32
Secondly we calculate the second window and we will get the parameter and he
will be centered on last sample in the current frame.
2.3.3. AVERAGE PITCH UPFATE
The long-term average pitch is used for smoothing strong pitch values. It is
based on a buffer that contains three most recent strong pitch values. If
and dB (where G represents the gain) then is a strong pitch value and he
will be put in this buffer. If this condition is false then all three pitch values in the
buffer are moved toward the default pitch, samples, according to:
Afterwards the average pitch is updated as the
median value of the three values in the buffer.
2.3.4. FINAL PITCH ALGORITHM
In this stage we can make the final pitch calculation. The algorithm is described
below:
3. LPC ANALYSIS
The analysis of the linear predictor will spread into 2 paths: the first one is the
analysis of the speech signal, and the other one is the analysis of the residual
signal.
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 9/32
3.1. SPEECH SIGNAL
The linear prediction is implemented by using 10 coefficients, on the input speech
signal. We take the speech signal and multiply it by a hamming window of 200
samples (25 ms), centered on the last sample in the current frame. Then we
calculate the autocorrealtion function on this window. Finding the coefficients is
by the recursion of levinson – durbin, that uses a toeplitz matrix. In this stage we
have 10 coefficients that represent the predictor of this window.
The second step is to make a 15 Hz bandwidth expansion: this will be done by the
calculations as follows:
,Where
The result of this process is that we to multiply the prediction coefficients by the
factor where .
3.2. RESIDUAL SIGNAL
In order to get the residual signal out the speech signal we have to pass the speech
signal through a filter that his coefficients are the 10 LPC coefficients that already
been calculated. The filter is:
This is a FIR filter and his output is the residual signal. Because we are working
with a speech signal that a vector in the length of 320 samples, the residual signal
will be also in the length of 320 samples. The residual window is centered on the
last sample in the current frame.
4. APERIODIC FLAG
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 10/32
The aperiodic flag is a tool that helps us to deal with the problem that in voiced
frames the speech signal is not periodically perfect. In these cases the aperiodic
flag is going up, and tells the decoder to use a non- periodic excitation to simulate
unstable glottal pulses.
The aperiodic is set to “1” according to the lower band (0-500Hz) when ,
and set to “0” otherwise.
5. FOURIER MAGITUDE CALCULATION
This analysis measures the Fourier magnitudes of the residual signal in the
frequency domain. In order to do so it uses a vector of 200 samples of the residual
signal, and perform on it a 512 samples of FFT algorithm with zero padding.
The output is the magnitude of the first 10 pitch harmonics of the residual signal inthe Fourier domain.
6. QUANTIZATION
This chapter will deal with the quantization of the parameters of the encoder.
6.1. QUANTIZATION OF PREDICTION COEFFICIENTS
First of all we have to convert the 10 LPC coefficients that we already found in the
earlier stages, into 10 LSF components. LSF means Line Spectrum Frequency andthey will be represented in Hz. The second step is to organize these 10 LSF
components in ascending order. The result is that we get LSF vector. Then we have
to make sure that the 10 LSF frequencies are separated from one another by 50 Hz
minimum, and if not so then a separation algorithm must be applied as described
below:
Calculating:
Required:
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 11/32
The result of this algorithm is a vector - which is organized in ascending order
and with difference of 50 Hz between the elements of the vector.
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 12/32
The next stage is to implement the vector quantization. This is done by the MSVQ
– multi stage vector quantizer. The MSVQ codebook consists of four stages of
128,64,64 and 64 as shown in this figure:
The algorithm is to find the quantized vector - and as seen in the above figure he
is the sum of the vectors selected in each stage. The main purpose of the MSVQ is
to find the quantized vector that will best represent the original LSF vector. In
order to do so the MSVQ finds the codebook vector, which minimize the square of
the weighted Euclidean distance, , between the original LSF and the quantized
LSF vectors:
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 13/32
Where:
is the component of the unquantized LSF vector, and is the
inverse prediction filter power spectrum evaluated a frequency . That
means that this is the original spectrum.
The search is performed by saving in every stage the best 8 indexes. In the 1st stage
saving the 8 codebook vectors that gives the minimum error. In the 2nd stage the
search is combined with the result of the 1st stage, and searching again to find he
indexes. And this way the algorithm searches all the 4 stages.
This algorithm will result for every LSF vector8 best indexes, and because we have
10 LSF components, when the searching process will finish, we will have an arrayof vectors.
The final step is arranging the quantized LSF vector in ascenorder, and assuring
that the differbetween every two frequencies will be . The algorithm for
this was described earlier.
6.2. PITCH QUANTIZATION
The final pitch value is quantized on a logarithmic scale with a 99-level uniform
quantizer ranging from 20 samples to 160 samples. These pitch values are thenmapped to a 7-bit codeword using a look-up table. The values that are all zero
means that his is an unvoiced state, and is sent if .
6.3. GAIN QUANTIZATION
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 14/32
The gain represented in two values: and . is quantized with a 5-bit uniform
quantizer ranging from 10 to 77 dB. is quantized to 3 bits using the following
algorithm:
is the gain of the previous frame.
The algorithm is: if for the current frame is within 5 dB of , and is within 3
of the average of the values for the current and the previous frames, then
qauntizer_index = 0 that means to the decoder to set to the mean of the values
for the current and the previous frames.
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 15/32
Otherwise, the frame represents a transition and is quantized with a 7-level
uniform quantizer ranging from 6 dB below the minimum of the values for the
current and the previous frames to 6 dB above the maximum of those values. If
the values are saturated they are clamped to 10 dB and 77 dB.
6.4. BANDPASS VOICING QUANTIZATION
This is the quantizing algorithm of the voicing strengths - . The
result is the quantized voicing strengths -
6.5. FOURIER MAGNITUDE QUANTIZATION
The algorithm for this quantization is described in the following steps:
-Quantizing the predictor coefficients from the quantized LSF vector.
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 16/32
-Generating the residual window based on quantized predictor
coefficients.
-Applying a 200 sample Hamming window and performing a 512
complex FFT with zero padding.
-Transforming the complex FFT output to magnitudes, and the
harmonics are found with a spectral peak-picking algorithm.
The spectral peak-picking algorithm finds the maximum magnitudes of the pitch
harmonics. The search is divided into several areas with width of where is the
quantized pitch. The location of the harmonics in the divided areas is calculated by
the formula: where represents the number of the harmonic required. The
number of harmonics will be the smaller between 10 and . After finding the
harmonics they will be normalized to RMS value of 1. When the process finishes
and the number of harmonics found is less than 10 the remaining will have the
value of 1.
In this stage we take the resulting 10 harmonics and quantize it with a codebook of
256 vectors (using 8-bit vector). The codebook is searched using perceptually
weighted Euclidean distance with weights as described below that emphasize low
frequencies over higher ones:
Where is the frequency in Hz of the harmonic for the default pitch period of 60 samples. This searching process is applied by minimizing the squared
error between the magnitudes and the codebook values.
7. ERROR PROTECTION AND BIT PACKING
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 17/32
To improve performance in channel errors, the unused coder parameters for the
unvoiced mode are replaced with FEC - forward error correction. Three Hamming
(7,4) codes and one Hamming (8,4) code are used.
In an unvoiced mode there is no need to transmit the 8-bits of Fourier magnitudes,
4-bits of bandpass voicing and the 1-bit of aperiodic flag. That means that we have
a total of 13 spare bits to be replaced by the FEC algorithm. FEC replaces these 13-
bit with the parity bits of the Hamming codes.
The algorithm protects the first MSVQ index, and the two gain values.
The parity generator matrix for the Hamming (7,4) code is:
The parity generator matrix for the Hamming (8,4) code is:
The algorithm described as follows: the protected n-bits are placed into a columnvector, and then multiplied by the parity matrix. The result is an n-bit parity vector,
to be transmitted.
Defining the 3-bit and 4-bit protected vector as:
,
And the 3-bit and 4-bit parity vector as:
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 18/32
,
The process for 3-bits is as follows:
The process for 3-bits is as follows:
The parity vector will be placed in the spare bits of the unvoiced frame, and he will
be transmitted.
8. TRANSMITTION BIT STREAM
The total 54 bits that are the output of the encoder are listed below.
They are divided into 2 columns depended on the mode of the frame: voiced or
unvoiced. These 54 are without forward error correction.
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 19/32
The arrangement of the bit stream of these 54 bits is also depended on the mode of
the frame: voiced or unvoiced, and for unvoiced frame the forward error correction
bits are placed to protect the relevant bits.
Notice: Some of the material in this site taken from books, federal standards, and
other sites. This is purely an Educational site - if you find we have violated any
copyright, please contact us and we will remove the material in violation.
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 20/32
MELP
DECODER
Overview
The MELP parameters which are quantized and transmitted are the final pitch, ;
the bandpass voicing strengths, ; the two gain values, and ; the
linear prediction coefficients, ; the Fourier magnitudes; and the
aperiodic flag.
The use of the following quantization procedures is required for interoperability
among various implementations.
Decoder Block Diagram
Bit Unpacking and Error Correction:
After the Encoder Operation, all the described parameters are transmitted to the
Decoder thought some medium path, which may cause Bit Errors (BER) at the
decoder side, it is important to verify the received data with the Error correction
mechanism.
The received bits are unpacked from the channel and assembled into the parameter
codewords, according to this table:
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 21/32
Parameter decoding is different for voiced and unvoiced modes, according to the
following table that show how the 54 bits in an MELP frame are allocated among
the parameters:
The pitch is decoded first, since it contains the mode information.
Pitch Decoding:
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 22/32
This table described bellow is used in decoding the 7-bit pitch code to determine if
a frame is voiced, unvoiced, or whether a frame erasure is indicated.
If the pitch code is all-zero or has only one bit set, then the unvoiced mode is used.
If two bits are set, a frame erasure is indicated. Otherwise, the pitch value is
decoded and the voiced mode is used.
In the unvoiced mode, the (8,4) Hamming code is decoded to correct single bit
errors and detect double errors. If an uncorrectable error is detected, a frame
erasure is indicated. Otherwise, the (7,4) Hamming codes are decoded, correcting
single errors but without double error detection. ( The theory on Error correctionwith Hamming code was explained in the Encoder chapter).
If any erasure is detected in the current frame, by the Hamming code, by the pitch
code, or directly signaled from the channel, then a frame repeat mechanism is
implemented.
All of the parameters for the current frame are replaced with the parameters from
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 23/32
the previous frame. In addition, the first gain term is set equal to the second gain
term so that no gain transitions are allowed.
If an erasure is not indicated, the remaining parameters are decoded.
The LSF’s are checked for ascending order and minimum separation as described
in Section 6.1. - QUANTIZATION OF PREDICTION COEFFICIENTS in the
Encoder.
In the unvoiced mode, default parameter values are used for the pitch, jitter,
bandpass voicing, and Fourier magnitudes.
The pitch value is set to 50 samples, the jitter is set to 25%, all of the bandpass
voicing strengths are set to 0, and the Fourier magnitudes are set to 1. In the voiced
mode, Vbp1 is set to 1; jitter is set to 25% if the aperiodic flag is a 1; otherwise
jitter is set to 0%. The bandpass voicing strength for the upper four bands is set to
1 if the corresponding bit is a 1; otherwise the voicing strength is set to 0.There is one exception. If 0001 is received for , respectively, then
is set to 0.
When the special all-zero code for the first gain parameter, G1, is received, some
errors in the second gain parameter, G2, can be detected and corrected.
This correction process provides improved performance in channel errors.
Gain Decoding:
The decoding for the two gain parameters is shown in the following Flow Chart:
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 24/32
Noise Attenuation:
For quiet input signals, a small amount of gain attenuation is applied to both
decoded gain parameters using a power subtraction rule.
This attenuation is a simplified, frequency invariant case of the Smoothed Spectral
Subtraction noise suppression method as defined in :
L. Arslan, A. McCree, and V. Viswanathan, “New Methods for Adaptive Noise
Suppression,” Proceedings of IEEE ICASSP 1995, pp. 812-815.
This article describes the estimation of the the value of - the Speech Signal gain
without the Environment Noise ( ).
Before determining the attenuation for the first gain term, , a background noise
estimate, , is updated as follows:
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 25/32
We can see that the noise estimator moves up by 3 dB per second and down by 12
dB per second for the gain update rate of 88.9 updates per second,
and he clamped between 10 and 80.
Noise estimation is disabled for repeated frames in order to prevent repeated
attenuation.
The background noise estimate is also used in the adaptive spectral enhancement
calculation - this method is described later.
Refinement Gain :
Gain is modified by subtracting a (positive) correction term, , given in dB by
- the background noise estimated [dB].
- the first gain term [dB].
The correction is clamped to a maximum value of 6 dB to avoid fluctuations and
signal distortion.
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 26/32
To ensure that the attenuation is applied only to quiet signals, the value as used in
the equation bellow is clamped to an upper limit of 20 dB.
The noise estimation and gain modification steps are then repeated for the second
gain term, G 2.
Noise estimation and gain attenuation are disabled for repeated frames.
Parameter Interpolation:
All MELP synthesis parameters are interpolated pitch-synchronously for each
synthesized pitch period.
The interpolated parameters are the gain (in dB), LSF’s, pitch, jitter, Fourier
magnitudes, pulse and noise coefficients for mixed excitation, and spectral tilt
coefficient for the adaptive spectral enhancement filter.
Gain is linearly interpolated between the second gain of the prior frame, , and
the first gain of the current frame, ,if the starting point, , , of the new
pitch period is less than 90. Otherwise, gain is interpolated between and .
Normally, the other parameters are linearly interpolated between the past and
current frame values. The interpolation factor, int , for these parameters is based on
the starting point of the new pitch period:
There are two exceptions to this interpolation procedure.
First, if there is an onset with a high pitch frequency, pitch interpolation is disabled
and the new pitch is immediately used.
This condition is met when is more than 6 dB greater than and the current
frame’s pitch period is less than half the prior frame’s pitch period.
The second exception also involves a gain onset. If differs from by more than
6 dB, then the LSF’s, spectral tilt, and pitch are interpolated using the interpolated
gain trajectory as a basis, since the gain is transmitted twice per frame and has a
more accurate interpolation path.In this case, the interpolation factor is given by:
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 27/32
where Gint is the interpolated gain.
This interpolation factor is then clamped between 0 and 1.
Mixed Excitation Generation:
The mixed excitation is generated as the sum of the filtered pulse and noiseexcitations.
As described in the Bit Unpacking paragraph there is a differin the parameters
values (pitch, jitter, bandpass voicing, and Fourier magnitudes) between Voiced
and Unvoiced signals.
The pulse excitation, , is computed using an inverse Discrete
Fourier Transform of one pitch period in length.
The final equation for the pulse excitation is:
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 28/32
Pitch Period Estimation:
The pitch period, T, is the interpolated pitch value plus the jitter times the pitch,
where the jitter is the interpolated jitter strength times the output of a uniform
randnumber generator between -1and 1. This pitch period is rounded to the nearest
integer and clamped between 20 and 160.
The equation that describe the Final Pitch Period:
All of the phases for the pulse excitation are set to zero, hence M(k) is real. Since
is real, the magnitudes obey:
M(T - k) = M(k), k = 1,2,…,L
Where,
If T is even number then L=T/2
If T is odd number then L=(T-1)/2
The DC term, M(0) , is set to 0.
Magnitude terms M(k), k=1,2,…10 , are set to the interpolated values of the
Fourier magnitudes, and any magnitudes not otherwise specified are set to 1.
To prevent rapid changes at the start of the pitch period, the pulse excitation is
circularly shifted by ten samples of delay so the main excitation pulse occurs at the
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 29/32
tenth sample of the period. The pulse is then multiplied by the square root of the
pitch to give a unity RMS signal, and then multiplied by 1000 to give a nominal
signal level.
The noise is generated by a uniform random number generator with an RMS value
of 1000, and range of -1732 to 1732.
The pulse and noise excitation signals are then filtered and summed to form the
mixed excitation.
The pulse filter for the current frame is given by the sum of all the bandpass filter
coefficients for the voiced frequency bands, while the noise filter is given by the
sum of the bandpass filter coefficients for the unvoiced bands.
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 30/32
These filter coefficients are interpolated pitch synchronously. The bandpass filter coefficients for each of the five bands are given in Appendix A in the MELP
Federal Standard specification.
Adaptive Spectral Enhancement
The adaptive spectral enhancement filter is applied to the mixed excitation signal.
This filter is a tenth order pole/zero filter, with additional first-order tilt
compensation. Its coefficients are generated by bandwidth expansion of the linear
prediction filter transfer function, A(z) , corresponding to the interpolated LSF’s.
The transfer function of the enhancement filter, , is given by:
where,
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 31/32
and tilt coefficient is first calculated as , then interpolated, then
multiplied by p, the signal probability.
The first reflection coefficient, , is calculated from the decoded LSF’s.
By the MELP predictor coefficient sign convention, is usually negative for voiced
spectra.
The signal probability is estimated by comparing the current interpolated gain, ,
to the background noise estimate using the formula:
This signal probability is clamped between 0 and 1.
Linear Prediction Synthesis:
The synthesis uses a direct form filter, with the coefficients corresponding to the
interpolated LSF’s.
The interpolated LSF's with the Vector Quantization parameters, decodes the LPC
filter coefficients, so the LPC filter is described with the equation:
A(z) is described as .
Gain Adjustment:
Since the excitation is generated at an arbitrary level, the speech gain must be
introduced to the synthesized speech.
The correct scaling factor, , is computed for each synthesized pitch period of lengthT by dividing the desired RMS value ( must be converted from dB) by the RMS
value of the unscaled synthetic speech signal :
5/11/2018 MELP Implementation Guide - slidepdf.com
http://slidepdf.com/reader/full/melp-implementation-guide 32/32
To prevent discontinuities in the synthesized speech, this scale factor is linearly
interpolated between the previous and current values for the first ten samples of the
pitch period.
Pulse Dispersion:
The pulse dispersion filter is a 65th order FIR filter derived from a spectrally-
flattened triangle pulse.
The coefficients are listed in Appendix B in the MELP Federal Standard
specification.
Synthesis Loop Control
After processing each pitch period, the decoder updates by adding T , the number
of samples in the period just synthesized.If , synthesis for the current frame continues from the Parameter
Interpolation step. Otherwise, the decoder buffers the remainder of the current
period which extends beyond the end of the current frame and subtracts 180 from
produce its initial value next frame.
Notice: Some of the material in this site taken from books, federal standards, and
other sites. This is purely an Educational site - if you find we have violated any
copyright, please contact us and we will remove the material in violation.