„Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and...
-
Upload
anastasia-cross -
Category
Documents
-
view
233 -
download
1
Transcript of „Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and...
„Bandwidth Extension ofSpeech Signals“
2nd Workshop on Wideband Speech Quality in Terminals and Networks:
Assessment and Prediction
22nd and 23rd June 2005 - Mainz, Germany
Bernd [email protected]
2nd Workshop on Wideband Speech Quality - June 2005
2
Bernd IserContents Motivation
Model for Speech Production Process
Bandwidth Extension
• Generation of the excitation signal
-Non-linear characteristics
-Results using non-linear characteristics
• Generation of the spectral envelope
-Codebook approach
-Neural network approach
-Linear mapping approach
• Power adjustment
Current Results
• Audio samples
Outlook
2nd Workshop on Wideband Speech Quality - June 2005
3
Bernd Iser
Band limited audio signal:Original audio signal:
Motivation
Problem: Degradation of speech quality due to suppression/cancelation of frequency bands (e.g., transmission over telephone network)
Idea: Extrapolate missing frequency components out of bandlimited signal
Advantage: Network as well as transmission system can remain unchanged
But: In most cases environment provides more bandwidth(e.g., - MOST-bus: 11025 Hz sampling rate or - GSM: 8000 Hz sampling rate)
2nd Workshop on Wideband Speech Quality - June 2005
4
Bernd IserGeneration of the Excitation Signal
Power adjustment
Envelope estimation
Band stop
Narrowband parameters
Removing spectral
envelope
Excitation signal
extension
Input signal Output signal
Phase manipulation
Excitation signal (source)
Spectral envelope (filter)
Model gain
Block diagram of BWE:
2nd Workshop on Wideband Speech Quality - June 2005
5
Bernd IserGeneration of the Excitation Signal
• Extension of pitch structure in case of voiced sounds.
• Generation of a noise like excitation signal in case of unvoiced sounds.
Generation of a „broadband“ excitation signal:
2nd Workshop on Wideband Speech Quality - June 2005
6
Bernd IserGeneration of the Excitation Signal
„Harmonic Modeling“
• Placing spectral components (pitch, voicing)
• Function generators: sine (pitch, voicing), noise, ...
Shifting / modulation approaches (frequency / time domain)
• Fixed
• Pitch adaptive (requires pitch analysis!)
Application of non-linear characteristics
• Piecewise defined characteristics (distributions): halfway-, fullway-rectification, saturation ...
• Quadratic-, cubic-, tanh-,... characteristics (functions)
Approaches for the generation of a „broadband“ excitation signal:
2nd Workshop on Wideband Speech Quality - June 2005
7
Bernd IserGeneration of the Excitation Signal
Applied to a har-monic signal filtered by a bandpass the resulting signal shows the missing harmonics. Notice the aliasing in the upper frequencies.
Application of a non-linear characteristic:
2nd Workshop on Wideband Speech Quality - June 2005
8
Bernd IserGeneration of the Excitation Signal
If the input signal is upsampled (e.g., by the factor of 4) before the half-way rectification is performed, almost no aliasing can be observed after lowpassfiltering and downsampling.
Application of a non-linear characteristic:
2nd Workshop on Wideband Speech Quality - June 2005
9
Bernd Iser
Predictor error filter
•Predictor error filtering for extracting the excitation signal
Generation of the Excitation SignalApplication of a cubic characteristic in the
time domain:
2nd Workshop on Wideband Speech Quality - June 2005
10
Bernd Iser
Power adjustment
Envelope estimation
Band stop
Narrowband parameters
Removing spectral
envelope
Excitation signal
extension
Input signal Output signal
Phase manipulation
Excitation signal (source)
Spectral envelope (filter)
Model gain
Generation of the Spectral Envelope
2nd Workshop on Wideband Speech Quality - June 2005
11
Bernd IserGeneration of the Spectral Envelope
• Extension of spectral envelope.
• Placing formants of estimated envelope where broadband formants are.
2nd Workshop on Wideband Speech Quality - June 2005
12
Bernd IserGeneration of the Spectral Envelope
Approaches for the generation of a „broadband“ spectral envelope out of the „narrowband“ information:
Codebook
• „Narrowband“ and „broadband“ codebook trained jointly using envelopes of wideband data and bandlimited counterparts
• Weight codebook entries with inverse distance to input envelope and sum them up (LSF)
• Possibility of including other features than spectral envelope in „narrowband“ codebook using a special distance measure
• Codebook approach as classification stage with post processing by e.g., neural network or linear mapping
• Can be implemented taking predecessor and successor into account
2nd Workshop on Wideband Speech Quality - June 2005
13
Bernd IserGeneration of the Spectral Envelope
Approaches for the generation of a „broadband“ spectral envelope out of the „narrowband“ information:
Neural network
• Exploit quasy-stationarity of speech by using a memory
• Feeding NN with other features than just spectral envelope
• Various architectures and training algorithms
• Can be used as post processing after codebook classification
2nd Workshop on Wideband Speech Quality - June 2005
14
Bernd IserGeneration of the Spectral Envelope
Approaches for the generation of a „broadband“ spectral envelope out of the „narrowband“ information:
Linear mapping
• Can be implemented taking predecessor and successor into account
• Can be used as post processing after codebook classification
2nd Workshop on Wideband Speech Quality - June 2005
15
Bernd IserGeneration of the Spectral Envelope
Codebook:
„Narrowband“ codebook
„Broadband“ codebook
Comparison (distance measure)
Envelope input signal Output of „broadband“ counterpart
Weighting the codebook entries with the „inverse“ distance
2nd Workshop on Wideband Speech Quality - June 2005
16
Bernd IserGeneration of the Spectral Envelope
With N being the LSF order and M the codebook size, respectively
Computation of the output LSFs:
2nd Workshop on Wideband Speech Quality - June 2005
17
Bernd Iser
Spectral distortion:
1p 2p
p
City block distance
Euclidean distance
Minkowski distance
1. Initialising: Compute the centroid for the whole training data.
2. Splitting: Each centroid is splitted into two near vectors by the application of a perturbance.
3. Quantization: The whole training data is assigned to the centroids by the application of a certain distance measure and afterwards the centroids are calculated again. Step 3 is executed again and again until the result doesn‘t show any significant changes.
4. Is the desired codebook size reached => abort. Otherwise continue with step 2.
Generation of the Spectral EnvelopeTraining of codebook (LBG-algorithm):
Likelihood ratio distance measure:
2nd Workshop on Wideband Speech Quality - June 2005
18
Bernd IserGeneration of the Spectral Envelope
Linear Mapping:
Narrowband input features (LPC, CC, LSF):
Broadband input features (LPC, CC, LSF):
Aim to find mapping matrix:
Optimization criterion:
Leads to optimal mapping matrix:
2nd Workshop on Wideband Speech Quality - June 2005
19
Bernd IserGeneration of the Spectral Envelope
2nd Workshop on Wideband Speech Quality - June 2005
20
Bernd IserGeneration of the Spectral Envelope
Linear Mapping as post processing algorithm after codebook classification:
Note that this principle can be applied to other approaches. E.g., one could exchange the multiplication with the linear mapping matrix with the processing by a neural network which has been trained corresponding to the classification to the respective codebook entry.
2nd Workshop on Wideband Speech Quality - June 2005
21
Bernd Iser
Power adjustment
Envelope estimation
Band stop
Narrowband parameters
Removing spectral
envelope
Excitation signal
extension
Input signal Output signal
Phase manipulation
Excitation signal (source)
Spectral envelope (filter)
Model gain
Power Adjustment
2nd Workshop on Wideband Speech Quality - June 2005
22
Bernd IserPower Adjustment
Power comparison:
Computation of the gain out of the ratio of the power of the extended signal to the input signal within the telephone band
2nd Workshop on Wideband Speech Quality - June 2005
23
Bernd IserCurrent ResultsSetup used to produce results:
Database
• TIMIT processed with WM NetSim tool (training, english)
-Phone filter / GSM / phone filter
Algorithm
• Excitation signal
-Lower part extended using half way rectification
-Higher part extended using half way rectification
• Spectral envelope
-Codebook classification using 64 entries
-Post processing with linear mapping
2nd Workshop on Wideband Speech Quality - June 2005
24
Bernd IserCurrent Results
Audio samples:
Female 1 Female 2 Male 1 Male 2
Telephone limited
Extended
2nd Workshop on Wideband Speech Quality - June 2005
25
Bernd IserOutlook
Outlook on future work:
Integration of additional features into codebook training
• Pitch information
• Information on „voicedness“
Add „comfort-noise“
Training of neural network
• Using additional features
• In combination with codebook