Linear Predictive Coding for Speech Compression

18
Linear Predictive Coding for Speech Compression Dev Ghosh ECE 463 9 March 2006

description

Linear Predictive Coding for Speech Compression. Dev Ghosh ECE 463. 9 March 2006. Overview. General Model for Speech Synthesis Channel Vocoder Linear Predictive Coder (LPC-10) Code Excited Linear Prediction (CELP) Novel Application Sub-band adaptive filtering based on cochlear model. - PowerPoint PPT Presentation

Transcript of Linear Predictive Coding for Speech Compression

Page 1: Linear Predictive Coding for Speech Compression

Linear Predictive Coding for Speech Compression

Dev GhoshECE 463

9 March 2006

Page 2: Linear Predictive Coding for Speech Compression

Overview

General Model for Speech Synthesis Channel Vocoder Linear Predictive Coder (LPC-10) Code Excited Linear Prediction

(CELP) Novel Application

Sub-band adaptive filtering based on cochlear model

Page 3: Linear Predictive Coding for Speech Compression

Model for Speech Synthesis Speech produced by forcing air through

vocal cords, larynx, pharynx, mouth and nose

At transmitter speech is divided into segments Each segment analyzed to determine excitation

signal and parameters of vocal tract filter

ExcitationSource

Vocal tractfilter

Speech

Page 4: Linear Predictive Coding for Speech Compression

Channel Vocoder - analysis

Each segment of input speech analyzed by a bank of (bandpass) analysis filters

Energy at output of each filter is estimated 50 times a second and transmitted to receiver

Decision made whether segment voiced /a/, /e/, /o/ or unvoiced /s/, /f/

Estimate of pitch period (period of fundamental harmonic) is determined

Page 5: Linear Predictive Coding for Speech Compression

Voice vs. Unvoiced Speech

Page 6: Linear Predictive Coding for Speech Compression

Channel vocoder - synthesis

Vocal tract filter implemented by bank of (bandpass) synthesis filters For voiced segments, periodic pulse

generator is input For unvoiced segments, pseudonoise source

is input Period determined by pitch estimate Scaled by output of energy estimate First approach to speech compression

Page 7: Linear Predictive Coding for Speech Compression

Linear Predictive Coder

Models vocal tract as a single linear filter

yn = ∑aiyn-i+Gn

Output: yn, Input: n, Gain: G Input is random noise (unvoiced)

or periodic pulse (voiced) LPC-10 is a standard (2.4 kb, 8000

Samples/sec)

Page 8: Linear Predictive Coding for Speech Compression

LPC - Voiced/Unvoiced Decision

Voiced speech has more energy and lower frequency than unvoiced

Speech segment lowpass filtered, energy at output relative to background noise used to determine

Zero-crossings counted to determine frequency

Continuity critereon: voicing decision of neighboring frames taken into account

Page 9: Linear Predictive Coding for Speech Compression

LPC - Estimating Pitch Period

Extracting pitch from short noisy segment is difficult

One approach is to maximize autocorrelation Periodicity isn’t strong enough Threshold can’t be used because

maximum value not known in advance

Page 10: Linear Predictive Coding for Speech Compression

LPC - Estimating Pitch Period LPC-10 uses average magnitude difference

function (AMDF)AMDF(P) =(1/N)∑|yi-yi-P|

If {yn} is periodic with period P0, samples P0 apart will have values close to each other and AMDF will have a min at P0

AMDF is periodic for voiced and roughly flat for unvoiced

AMDF is min when P is the pitch period and spurious min in unvoiced segments are shallow

Page 11: Linear Predictive Coding for Speech Compression

LPC - Obtaining Vocal Tract Filter

At transmitter, we want filter coeffs that best match the segment in a mean squared error

en2=(yn- ∑aiyn-i+Gn)2

Autocorrelation approach assumes {yn} is stationary

A = R-1P Recursive solution uses Levinson-

Durbin

Page 12: Linear Predictive Coding for Speech Compression

LPC - Obtaining the Vocal Tract Filter

Covariance approach discards stationarity assumption (not valid for speech signals)

cij =E[yn-iyn-j]

yieldsCA = S

Page 13: Linear Predictive Coding for Speech Compression

LPC - Obtaining the Vocal Tract Filter

cij are estimated as

cij = ∑yn-iyn-j

No longer assume values of yn outside of segment are zero

Cholesky decomposition required Reflection coeffs used to update

voicing decision

Page 14: Linear Predictive Coding for Speech Compression

LPC - Transmitting Parameters

Tenth order filter used for voiced speech and fourth order for unvoiced

Vocal tract filter is sensitive to errors in reflection coeffs close to one

gi = (1+ki)/(1-ki)

are quantized and sent instead of ki

Page 15: Linear Predictive Coding for Speech Compression

Code Excited Linear Prediction

Single pulse per pitch period leads to buzzy twang

Variety of excitation signals is allowed

For each segment encoder finds excitation vector that generates synthesized speech that best matches speech being coded

Page 16: Linear Predictive Coding for Speech Compression

Sub-band adaptive filtering

Multi-channel speech enhancement system

Greater number of sub-bands used, the faster the convergence of the overall system

Page 17: Linear Predictive Coding for Speech Compression

Cochlear Modelling

Sub-band filters are distributed logarithmically in frequency to approximate distribution of filters in cochlea

Page 18: Linear Predictive Coding for Speech Compression

Adaptive Noise Cancellation

LMS algorithm is used to model differential transfer function between noise signals in a number of sub-bands

Lower power and shorter filters used in each sub-band

Convergence is equal across all bands if power is distributed equally and filter lengths are the same

Convergence dominated by sub-band with greatest power