„Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and...

26
„Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd and 23rd June 2005 - Mainz, Germany Bernd Iser [email protected]

Transcript of „Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and...

„Bandwidth Extension ofSpeech Signals“

2nd Workshop on Wideband Speech Quality in Terminals and Networks:

Assessment and Prediction

22nd and 23rd June 2005 - Mainz, Germany

Bernd [email protected]

2nd Workshop on Wideband Speech Quality - June 2005

2

Bernd IserContents Motivation

Model for Speech Production Process

Bandwidth Extension

• Generation of the excitation signal

-Non-linear characteristics

-Results using non-linear characteristics

• Generation of the spectral envelope

-Codebook approach

-Neural network approach

-Linear mapping approach

• Power adjustment

Current Results

• Audio samples

Outlook

2nd Workshop on Wideband Speech Quality - June 2005

3

Bernd Iser

Band limited audio signal:Original audio signal:

Motivation

Problem: Degradation of speech quality due to suppression/cancelation of frequency bands (e.g., transmission over telephone network)

Idea: Extrapolate missing frequency components out of bandlimited signal

Advantage: Network as well as transmission system can remain unchanged

But: In most cases environment provides more bandwidth(e.g., - MOST-bus: 11025 Hz sampling rate or - GSM: 8000 Hz sampling rate)

2nd Workshop on Wideband Speech Quality - June 2005

4

Bernd IserGeneration of the Excitation Signal

Power adjustment

Envelope estimation

Band stop

Narrowband parameters

Removing spectral

envelope

Excitation signal

extension

Input signal Output signal

Phase manipulation

Excitation signal (source)

Spectral envelope (filter)

Model gain

Block diagram of BWE:

2nd Workshop on Wideband Speech Quality - June 2005

5

Bernd IserGeneration of the Excitation Signal

• Extension of pitch structure in case of voiced sounds.

• Generation of a noise like excitation signal in case of unvoiced sounds.

Generation of a „broadband“ excitation signal:

2nd Workshop on Wideband Speech Quality - June 2005

6

Bernd IserGeneration of the Excitation Signal

„Harmonic Modeling“

• Placing spectral components (pitch, voicing)

• Function generators: sine (pitch, voicing), noise, ...

Shifting / modulation approaches (frequency / time domain)

• Fixed

• Pitch adaptive (requires pitch analysis!)

Application of non-linear characteristics

• Piecewise defined characteristics (distributions): halfway-, fullway-rectification, saturation ...

• Quadratic-, cubic-, tanh-,... characteristics (functions)

Approaches for the generation of a „broadband“ excitation signal:

2nd Workshop on Wideband Speech Quality - June 2005

7

Bernd IserGeneration of the Excitation Signal

Applied to a har-monic signal filtered by a bandpass the resulting signal shows the missing harmonics. Notice the aliasing in the upper frequencies.

Application of a non-linear characteristic:

2nd Workshop on Wideband Speech Quality - June 2005

8

Bernd IserGeneration of the Excitation Signal

If the input signal is upsampled (e.g., by the factor of 4) before the half-way rectification is performed, almost no aliasing can be observed after lowpassfiltering and downsampling.

Application of a non-linear characteristic:

2nd Workshop on Wideband Speech Quality - June 2005

9

Bernd Iser

Predictor error filter

•Predictor error filtering for extracting the excitation signal

Generation of the Excitation SignalApplication of a cubic characteristic in the

time domain:

2nd Workshop on Wideband Speech Quality - June 2005

10

Bernd Iser

Power adjustment

Envelope estimation

Band stop

Narrowband parameters

Removing spectral

envelope

Excitation signal

extension

Input signal Output signal

Phase manipulation

Excitation signal (source)

Spectral envelope (filter)

Model gain

Generation of the Spectral Envelope

2nd Workshop on Wideband Speech Quality - June 2005

11

Bernd IserGeneration of the Spectral Envelope

• Extension of spectral envelope.

• Placing formants of estimated envelope where broadband formants are.

2nd Workshop on Wideband Speech Quality - June 2005

12

Bernd IserGeneration of the Spectral Envelope

Approaches for the generation of a „broadband“ spectral envelope out of the „narrowband“ information:

Codebook

• „Narrowband“ and „broadband“ codebook trained jointly using envelopes of wideband data and bandlimited counterparts

• Weight codebook entries with inverse distance to input envelope and sum them up (LSF)

• Possibility of including other features than spectral envelope in „narrowband“ codebook using a special distance measure

• Codebook approach as classification stage with post processing by e.g., neural network or linear mapping

• Can be implemented taking predecessor and successor into account

2nd Workshop on Wideband Speech Quality - June 2005

13

Bernd IserGeneration of the Spectral Envelope

Approaches for the generation of a „broadband“ spectral envelope out of the „narrowband“ information:

Neural network

• Exploit quasy-stationarity of speech by using a memory

• Feeding NN with other features than just spectral envelope

• Various architectures and training algorithms

• Can be used as post processing after codebook classification

2nd Workshop on Wideband Speech Quality - June 2005

14

Bernd IserGeneration of the Spectral Envelope

Approaches for the generation of a „broadband“ spectral envelope out of the „narrowband“ information:

Linear mapping

• Can be implemented taking predecessor and successor into account

• Can be used as post processing after codebook classification

2nd Workshop on Wideband Speech Quality - June 2005

15

Bernd IserGeneration of the Spectral Envelope

Codebook:

„Narrowband“ codebook

„Broadband“ codebook

Comparison (distance measure)

Envelope input signal Output of „broadband“ counterpart

Weighting the codebook entries with the „inverse“ distance

2nd Workshop on Wideband Speech Quality - June 2005

16

Bernd IserGeneration of the Spectral Envelope

With N being the LSF order and M the codebook size, respectively

Computation of the output LSFs:

2nd Workshop on Wideband Speech Quality - June 2005

17

Bernd Iser

Spectral distortion:

1p 2p

p

City block distance

Euclidean distance

Minkowski distance

1. Initialising: Compute the centroid for the whole training data.

2. Splitting: Each centroid is splitted into two near vectors by the application of a perturbance.

3. Quantization: The whole training data is assigned to the centroids by the application of a certain distance measure and afterwards the centroids are calculated again. Step 3 is executed again and again until the result doesn‘t show any significant changes.

4. Is the desired codebook size reached => abort. Otherwise continue with step 2.

Generation of the Spectral EnvelopeTraining of codebook (LBG-algorithm):

Likelihood ratio distance measure:

2nd Workshop on Wideband Speech Quality - June 2005

18

Bernd IserGeneration of the Spectral Envelope

Linear Mapping:

Narrowband input features (LPC, CC, LSF):

Broadband input features (LPC, CC, LSF):

Aim to find mapping matrix:

Optimization criterion:

Leads to optimal mapping matrix:

2nd Workshop on Wideband Speech Quality - June 2005

19

Bernd IserGeneration of the Spectral Envelope

2nd Workshop on Wideband Speech Quality - June 2005

20

Bernd IserGeneration of the Spectral Envelope

Linear Mapping as post processing algorithm after codebook classification:

Note that this principle can be applied to other approaches. E.g., one could exchange the multiplication with the linear mapping matrix with the processing by a neural network which has been trained corresponding to the classification to the respective codebook entry.

2nd Workshop on Wideband Speech Quality - June 2005

21

Bernd Iser

Power adjustment

Envelope estimation

Band stop

Narrowband parameters

Removing spectral

envelope

Excitation signal

extension

Input signal Output signal

Phase manipulation

Excitation signal (source)

Spectral envelope (filter)

Model gain

Power Adjustment

2nd Workshop on Wideband Speech Quality - June 2005

22

Bernd IserPower Adjustment

Power comparison:

Computation of the gain out of the ratio of the power of the extended signal to the input signal within the telephone band

2nd Workshop on Wideband Speech Quality - June 2005

23

Bernd IserCurrent ResultsSetup used to produce results:

Database

• TIMIT processed with WM NetSim tool (training, english)

-Phone filter / GSM / phone filter

Algorithm

• Excitation signal

-Lower part extended using half way rectification

-Higher part extended using half way rectification

• Spectral envelope

-Codebook classification using 64 entries

-Post processing with linear mapping

2nd Workshop on Wideband Speech Quality - June 2005

24

Bernd IserCurrent Results

Audio samples:

Female 1 Female 2 Male 1 Male 2

Telephone limited

Extended

2nd Workshop on Wideband Speech Quality - June 2005

25

Bernd IserOutlook

Outlook on future work:

Integration of additional features into codebook training

• Pitch information

• Information on „voicedness“

Add „comfort-noise“

Training of neural network

• Using additional features

• In combination with codebook

2nd Workshop on Wideband Speech Quality - June 2005

26

Bernd Iser

Thank you for your attention!