Formant Tracking Using LPC Root Solving

download Formant Tracking Using LPC Root Solving

of 24

description

Speech Signal Processing

Transcript of Formant Tracking Using LPC Root Solving

  • Robust formant tracking

    using LPC root solving Team Members

    Patel Shabaz Basheer EE09B041

    Pinjala Sandeep EE09B025

    Indian Institute of Technology Hyderabad

  • Motivation

    The application of formants is useful in different applications such as speech

    recognition, enhancement, noise reduction, hearing aid adaptive filters, etc.

  • Formants

    Formants are defined as the spectral peaks of the sound spectrum of the

    voice. In speech science and phonetics, formant is also used to mean an

    acoustic resonance of the human vocal tract.

  • Algorithm

    S[n]

  • Pre-emphasis Filter

    In order to improve the overall SNR ratio in the given band of frequencies, the

    magnitude of the higher frequencies are increased w.r.t the magnitude of the

    usually lower frequencies.

    In this algorithm we use a common method of pre-emphasizing such as

    filtering the speech signal with the help of HPF (High Pass Filter).

    Using this, the above mentioned glottal waveform and radiation load

    contribution is removed and the energy is redistributed to approximately all

    the frequency in the given band region.

    A pre-emphasis High Pass Filter would be given by

    H[n] =H[n] a1*H[n-1]

  • Hilbert Transformer (Conversion into

    analytic signal)

    The conversion of a real signal into an analytical signal has many advantages

    and the main advantage while dealing with the adaptive filter banks is that

    the analytic signal forms a complex signal for corresponding filtering.

    Sc[n] = SR[n] + j*SH[n]

    Where,

    Sc[n] is the analytic signal,

    SR[n] is the real signal,

    SH[n] is the Hilbert Transform

  • Algorithm

    S[n]

  • Adaptive Band-pass Filtering

    The Adaptive Band-pass filter suppress interference from neighboring formant

    frequencies while tracking an individual formant frequency as it varies with

    time. Hence, it tracks only a single formant frequency.

    Adaptive Band-pass filter consists :

    1) All Zero Filters (AZF)

    2) Dynamic Tracking Filter (DTF)

  • AZF (All Zero Filters)

    The AZF in each formant filter is the Adaptive All Zero Filter whose three zero

    locations are always set to the value of the previous formant frequency

    estimated from the other three formant filters.

    The Filters Transfer Function is:-

    The value of Kk[n] ensures that the gain is unity and there is zero phase lag at the estimated formant

    frequency of the kth component. There is an additional zero which is present at the location of the

    pitch estimate and to suppress the pitch effect the zero is included in the filter.

  • Algorithm

    S[n]

  • DTF (Dynamic Tracking Filters)

    The Dynamic Tracking Filter (DTF) in each formant filter is a single pole

    dynamic tracking filter for which the pole location is always set to the

    previous value of the formant estimate. The transfer function of the kth DTF

    at index n is:

  • Algorithm

    S[n]

  • Voiced Speech Detector

    This detector checks if the initial window frame speech signal considered is

    the voiced part of the signal.

    This is done by finding the pitch period of the signal window by finding its

    autocorrelation.

    This pitch period would lie in the range of 4ms to 9 ms for male and female

    speaker.

  • Energy Detector

    After the speech signal is filtered using the adaptive band-pass filter-bank,

    energy of the signal in that window frame is calculated. The energy of that

    formant band must be higher than a specified energy threshold value.

    The LPC root solving is only done if both minimum energy criteria and that

    particular window frame belongs to the voiced part of the speech

  • LPC Root Solving

    Linear Prediction analysis provides a good approximation to the vocal track

    spectral envelope especially to the voiced region of speech where all pole

    model of LPC is used.

    During unvoiced transient region of speech, this LPC model is less effective

    than for voiced regions and but still provides acceptable results.

    The Linear Predication method can be stated as finding the coefficients ak

    which results in the best prediction i.e. which minimizes the mean-squared

    prediction error of the speech sample s[n] in terms of the past samples s[n-k]

    The Linear predictor of order p is:

    E[n] = S[n] -

  • Moving Average

    The Moving Average computes the Moving average of each formant frequency

    and assigns the estimated value of Moving Average if the segment is unvoiced

    or the energy of the formant frequency is below the threshold value.

    In all the other cases when the energy is above a threshold value and the

    speech being voiced, the estimated value of the formant from the LPC

    analysis is assigned.

    The Formant assigns Moving average of the formant frequency is given by:

  • Results

    The above discussed algorithm has been applied over the speech .wav files.

    Formant Tracker performance for the database speech signal

  • Formant Tracker performance on the database speech signal with a

    background noise of SNR of 40dB

  • RMS errors of formant trackers in presence of AWGN of varying SNR values

  • Discussion

    As the adaptive filter is used with initial values of formant frequencies, the

    outputs also depend on these specific initial values given. So, in few cases

    when the actual formant frequency does not lie near the initial formant

    frequency given as input, we would be few more poles and zeros rather than

    removing those. Although, it is found that possibility of such cases are rare.

    Difficulty also arises if background noise or a sudden change in the formant

    frequencies causes the tracker to wander far away from the true formant

    values. Hence, it was necessary to place limit on the frequency range

    allowable for each formant.

  • References

    [1] Bruce, Ian C., et al. "Robust formant tracking in noise." Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on. Vol. 1. IEEE, 2002.

    [2] A. Rao and R. Kumaresan, On decomposing speech into modulated components, IEEE Trans. Speech Audio Processing, vol. 8, no. 3, pp. 240254, May 2000.

    [3] Poonam Jindal, Algorithms for tracking formant frequencies of a continuous speech with speaker variability, Thesis.

    [4] Snell, Roy C., and Fausto Milinazzo. "Formant location from LPC analysis data." Speech and Audio Processing, IEEE Transactions on 1.2 (1993): 129-134.

  • Demo on Matlab!!!

  • Thank you

  • Questions