Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz,...

Advanced Signal Processing 2, SE

Professor Horst Cerjak, 19.12.20051

Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models

Basics of HMM-based speech synthesis

Contents

• Speech Parameter Generation Using Dynamic Features

• Speech Parameter Generation Algorithms for HMM-Based Speech Synthesis

• Multi-Space Probability Distribution HMM• Simultaneous Modelling of Spectrum, Pitch and

Duration in HMM-based Speech Synthesis

Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms

Multi-Space Probability

Distribution HMM

Simultaneous modelling of

Spectrum, Pitch and

Duration




Speech Parameter Generation using Dynamic Features (1)

1. Problem

vector sequence of speech parameter

state sequence of an HMM λ

vector of speech parameter at time t

ct … static feature vector (e.g. cepstral coefficient)Δct … dynamic feature vector (e.g. delta

cepstral coefficient)

Determine parameter sequence c that maximizes

Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms


Distribution HMM


Spectrum, Pitch and

Duration





2. Solution to the Problem

For a given q, maximizing ,with respect to c is equivalent to maximizing , with respect to c, since does not depend on O.

… single mixture Gaussian distribution

By setting to maximize we obtain a set of equations

with

Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms


Distribution HMM


Spectrum, Pitch and

Duration






… mean vector of ct

… covariance matrix of ct

… mean vector of Δct

… covariance matrix of Δ ct

To obtain optimal q and c the set of equations has to be solved for every possible state sequence

Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms


Distribution HMM


Spectrum, Pitch and

Duration





Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms


Distribution HMM


Spectrum, Pitch and

Duration






Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms


Distribution HMM


Spectrum, Pitch and

Duration

[1], [2]

[2]

3. Example




Speech Parameter Generation Algorithms (1)

Previously: state sequence was assumed to be known

Now: (part of) state sequence is hidden

Goal: determine speech parameter vector sequence

so that is

maximized with respect to O,

where is the state and mixture sequence, i.e. (q, i) indicates the i-th mixture of state q

… speech parameter vector

Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms


Distribution HMM


Spectrum, Pitch and

Duration





1. Derived Algorithm Based on EM-Algorithm Find critical point of the likelihood function P(O| λ)

… auxiliary function of

current and new parameter vector sequence O and O’, respectively

Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms


Distribution HMM


Spectrum, Pitch and

Duration

… occupancy probability





Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms


Distribution HMM


Spectrum, Pitch and

Duration

1. Derived Algorithm Under condition O’=WC’, C’ which maximizes Q(O,O’) is

given by

Solved with recursive algorithm for dealing with dynamic features





2. Example 5-state left-to-right HMMs Speech sampled at 16 kHz & windowed by 25.6ms

Blackman window with 5ms shift Mel-cepstral coefficients obtained by mel-cepstral analysis Sentence fragment “kiNzokuhiroo”

Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms


Distribution HMM


Spectrum, Pitch and

Duration

[3]




Observation o1 consists of a set of space indices X1{1,2,G} and a 3D vector x1 R³

x1 is drawn from one of three spaces Ω1, Ω2, Ω3 R³

Pdf for x1 : w1N1(x)+w2N2(x)+wGNG(x)

Multi-Space Probability Distribution HMM (1)

1. Problem: We cannot apply both the conventional and continuous HMMs to observations which consist of continuous values and discrete symbols [4]

Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms


Distribution HMM


Spectrum, Pitch and

Duration

[4]





2. Example: A man fishing in a pondContents

Speech Parameter

Generation

- Dynamic Features

- Algorithms


Distribution HMM


Spectrum, Pitch and

Duration

Sample space Ω

Ω1: 2D space for length and height of red fish

Ω2: 2D space for length and height of blue fish

Ω3: 1D space for diameters of tortoises

Ω4: 0D space for articles of junk

Weights w1 to w4 are determined by ratio of blue and red fish, tortoises and junk in the pond

N1, N2: 2D pdfs for sizes and heights of fish

N3: 1D pdf for diameter of tortoise

Man catches red fish: observation o=({1},x) is made

Same by night: o=({1,2},x)

Ω1=R²

Ω2=R²

Ω3=R¹

Ω4=Rº

4





3. Algorithm

Each state i has G pdfs NiG and their

weights wiG

The observation probability of O, P(O| λ)

is calculated with the forward-backward

algorithm

Then, we need to maximize the

observation likelihood of P(O| λ) over all parameters

reestimation formulas for the maximum likelihood estimation are calculated in analogy to the Baum-Welch-Algorithm

Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms


Distribution HMM


Spectrum, Pitch and

Duration

[4]




1. Simultaneous Modelling

Pitch patterns modelled by MSD-HMM

observation sequence composed of continuous values and discrete symbols

Spectrum modelled by continuous probability distribution

State duration densities modelled by single Gaussian distributions (dimension of SDD equal to number of HMM states)

Simultaneous modelling of Spectrum, Pitch and Duration (1)

Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms


Distribution HMM

Simultaneous modelling

of Spectrum, Pitch and

Duration

[5]





2. Context Dependent ModelMany contextual factors taken into account such as:

• Position of breath group in sentence

• Position of current phoneme in current accentual phrase

• {preceding, current succeeding} part of speech

• {preceding, current succeeding} phoneme

• (etc.)

Decision-tree based context clustering applied because• As contextual factors increase, their combinations increase

exponentially

model parameters cannot be estimated with sufficient accuracy by limited training data

• Impossible to prepare speech database which includes all combinations of contextual factors

Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms


Distribution HMM



Duration





2. Context Dependent Modelspectrum, pitch and duration have their own influential contextual factors

distributions of the respective parameters are clustered independently

Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms


Distribution HMM



Duration

[5]





arbitrary text converted to context-based label sequence

sentence HMM constructed according to label sequence

State durations determined so as to maximize likelihood of state duration densities

Sequence of mel-cepstrum coefficients and pitch values generated

Speech synthesized directly by MLSA filter

Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms


Distribution HMM



Duration

[5]

3. Text-to-Speech Synthesis System




References

[1] K. Tokuda, T. Kobayashi and S. Imai, Speech Parameter Generation from HMM using Dynamic Features, Proc. ICASSP, 1995

[2] http://hts.sp.nitech.ac.jp/?Publications – see ‘Attach file’ tokuda_TTSworkshop2002.pdf

[3] Tokuda, Yoshimura, Masuko, Kobayashi, Kitamura, Speech Parameter Generation Algortihms for HMM-based Speech Synthesis, ICASSP, 2000

[4] Tokuda, Masuko, Miyazaki, Kobayashi, Multi-Space Probability Distribution HMM, IEICE Trans. Inf. & Syst., 2000




References

[5] Yoshimura, Tokuda, Masuko, Kobayashi, Kitamura, Simultaneous modeling of Spectrum, Pitch and Duration in HMM-based Speech Synthesis, Proc EU-ROSPEECH, 1999




Thanks for the attention

Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz,...

Documents

Transcript of Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz,...