Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz,...

20
Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz, 2008-29- 04 Basics of Hidden Markov Models Basics of HMM-based speech synthesis Contents Speech Parameter Generation Using Dynamic Features Speech Parameter Generation Algorithms for HMM-Based Speech Synthesis Multi-Space Probability Distribution HMM Simultaneous Modelling of Spectrum, Pitch and Duration in HMM-based Speech Synthesis Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration

Transcript of Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz,...

Page 1: Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Basics of HMM-based.

Advanced Signal Processing 2, SE

Professor Horst Cerjak, 19.12.20051

Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models

Basics of HMM-based speech synthesis

Contents

• Speech Parameter Generation Using Dynamic Features

• Speech Parameter Generation Algorithms for HMM-Based Speech Synthesis

• Multi-Space Probability Distribution HMM• Simultaneous Modelling of Spectrum, Pitch and

Duration in HMM-based Speech Synthesis

Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms

Multi-Space Probability

Distribution HMM

Simultaneous modelling of

Spectrum, Pitch and

Duration

Page 2: Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Basics of HMM-based.

Advanced Signal Processing 2, SE

Professor Horst Cerjak, 19.12.20052

Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models

Speech Parameter Generation using Dynamic Features (1)

1. Problem

vector sequence of speech parameter

state sequence of an HMM λ

vector of speech parameter at time t

ct … static feature vector (e.g. cepstral coefficient)Δct … dynamic feature vector (e.g. delta

cepstral coefficient)

Determine parameter sequence c that maximizes

Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms

Multi-Space Probability

Distribution HMM

Simultaneous modelling of

Spectrum, Pitch and

Duration

Page 3: Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Basics of HMM-based.

Advanced Signal Processing 2, SE

Professor Horst Cerjak, 19.12.20053

Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models

Speech Parameter Generation using Dynamic Features (2)

2. Solution to the Problem

For a given q, maximizing ,with respect to c is equivalent to maximizing , with respect to c, since does not depend on O.

… single mixture Gaussian distribution

By setting to maximize we obtain a set of equations

with

Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms

Multi-Space Probability

Distribution HMM

Simultaneous modelling of

Spectrum, Pitch and

Duration

Page 4: Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Basics of HMM-based.

Advanced Signal Processing 2, SE

Professor Horst Cerjak, 19.12.20054

Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models

Speech Parameter Generation using Dynamic Features (3)

2. Solution to the Problem

… mean vector of ct

… covariance matrix of ct

… mean vector of Δct

… covariance matrix of Δ ct

To obtain optimal q and c the set of equations has to be solved for every possible state sequence

Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms

Multi-Space Probability

Distribution HMM

Simultaneous modelling of

Spectrum, Pitch and

Duration

Page 5: Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Basics of HMM-based.

Advanced Signal Processing 2, SE

Professor Horst Cerjak, 19.12.20055

Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models

Speech Parameter Generation using Dynamic Features (3)

Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms

Multi-Space Probability

Distribution HMM

Simultaneous modelling of

Spectrum, Pitch and

Duration

2. Solution to the Problem

Page 6: Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Basics of HMM-based.

Advanced Signal Processing 2, SE

Professor Horst Cerjak, 19.12.20056

Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models

Speech Parameter Generation using Dynamic Features (4)

Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms

Multi-Space Probability

Distribution HMM

Simultaneous modelling of

Spectrum, Pitch and

Duration

[1], [2]

[2]

3. Example

Page 7: Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Basics of HMM-based.

Advanced Signal Processing 2, SE

Professor Horst Cerjak, 19.12.20057

Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models

Speech Parameter Generation Algorithms (1)

Previously: state sequence was assumed to be known

Now: (part of) state sequence is hidden

Goal: determine speech parameter vector sequence

so that is

maximized with respect to O,

where is the state and mixture sequence, i.e. (q, i) indicates the i-th mixture of state q

… speech parameter vector

Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms

Multi-Space Probability

Distribution HMM

Simultaneous modelling of

Spectrum, Pitch and

Duration

Page 8: Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Basics of HMM-based.

Advanced Signal Processing 2, SE

Professor Horst Cerjak, 19.12.20058

Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models

Speech Parameter Generation Algorithms (2)

1. Derived Algorithm Based on EM-Algorithm Find critical point of the likelihood function P(O| λ)

… auxiliary function of

current and new parameter vector sequence O and O’, respectively

Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms

Multi-Space Probability

Distribution HMM

Simultaneous modelling of

Spectrum, Pitch and

Duration

… occupancy probability

Page 9: Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Basics of HMM-based.

Advanced Signal Processing 2, SE

Professor Horst Cerjak, 19.12.20059

Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models

Speech Parameter Generation Algorithms (3)

Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms

Multi-Space Probability

Distribution HMM

Simultaneous modelling of

Spectrum, Pitch and

Duration

1. Derived Algorithm Under condition O’=WC’, C’ which maximizes Q(O,O’) is

given by

Solved with recursive algorithm for dealing with dynamic features

Page 10: Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Basics of HMM-based.

Advanced Signal Processing 2, SE

Professor Horst Cerjak, 19.12.200510

Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models

Speech Parameter Generation Algorithms (4)

2. Example 5-state left-to-right HMMs Speech sampled at 16 kHz & windowed by 25.6ms

Blackman window with 5ms shift Mel-cepstral coefficients obtained by mel-cepstral analysis Sentence fragment “kiNzokuhiroo”

Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms

Multi-Space Probability

Distribution HMM

Simultaneous modelling of

Spectrum, Pitch and

Duration

[3]

Page 11: Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Basics of HMM-based.

Advanced Signal Processing 2, SE

Professor Horst Cerjak, 19.12.200511

Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models

Observation o1 consists of a set of space indices X1{1,2,G} and a 3D vector x1 R³

x1 is drawn from one of three spaces Ω1, Ω2, Ω3 R³

Pdf for x1 : w1N1(x)+w2N2(x)+wGNG(x)

Multi-Space Probability Distribution HMM (1)

1. Problem: We cannot apply both the conventional and continuous HMMs to observations which consist of continuous values and discrete symbols [4]

Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms

Multi-Space Probability

Distribution HMM

Simultaneous modelling of

Spectrum, Pitch and

Duration

[4]

Page 12: Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Basics of HMM-based.

Advanced Signal Processing 2, SE

Professor Horst Cerjak, 19.12.200512

Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models

Multi-Space Probability Distribution HMM (2)

2. Example: A man fishing in a pondContents

Speech Parameter

Generation

- Dynamic Features

- Algorithms

Multi-Space Probability

Distribution HMM

Simultaneous modelling of

Spectrum, Pitch and

Duration

Sample space Ω

Ω1: 2D space for length and height of red fish

Ω2: 2D space for length and height of blue fish

Ω3: 1D space for diameters of tortoises

Ω4: 0D space for articles of junk

Weights w1 to w4 are determined by ratio of blue and red fish, tortoises and junk in the pond

N1, N2: 2D pdfs for sizes and heights of fish

N3: 1D pdf for diameter of tortoise

Man catches red fish: observation o=({1},x) is made

Same by night: o=({1,2},x)

Ω1=R²

Ω2=R²

Ω3=R¹

Ω4=Rº

4

Page 13: Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Basics of HMM-based.

Advanced Signal Processing 2, SE

Professor Horst Cerjak, 19.12.200513

Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models

Multi-Space Probability Distribution HMM (3)

3. Algorithm

Each state i has G pdfs NiG and their

weights wiG

The observation probability of O, P(O| λ)

is calculated with the forward-backward

algorithm

Then, we need to maximize the

observation likelihood of P(O| λ) over all parameters

reestimation formulas for the maximum likelihood estimation are calculated in analogy to the Baum-Welch-Algorithm

Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms

Multi-Space Probability

Distribution HMM

Simultaneous modelling of

Spectrum, Pitch and

Duration

[4]

Page 14: Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Basics of HMM-based.

Advanced Signal Processing 2, SE

Professor Horst Cerjak, 19.12.200514

Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models

1. Simultaneous Modelling

Pitch patterns modelled by MSD-HMM

observation sequence composed of continuous values and discrete symbols

Spectrum modelled by continuous probability distribution

State duration densities modelled by single Gaussian distributions (dimension of SDD equal to number of HMM states)

Simultaneous modelling of Spectrum, Pitch and Duration (1)

Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms

Multi-Space Probability

Distribution HMM

Simultaneous modelling

of Spectrum, Pitch and

Duration

[5]

Page 15: Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Basics of HMM-based.

Advanced Signal Processing 2, SE

Professor Horst Cerjak, 19.12.200515

Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models

Simultaneous modelling of Spectrum, Pitch and Duration (2)

2. Context Dependent ModelMany contextual factors taken into account such as:

• Position of breath group in sentence

• Position of current phoneme in current accentual phrase

• {preceding, current succeeding} part of speech

• {preceding, current succeeding} phoneme

• (etc.)

Decision-tree based context clustering applied because• As contextual factors increase, their combinations increase

exponentially

model parameters cannot be estimated with sufficient accuracy by limited training data

• Impossible to prepare speech database which includes all combinations of contextual factors

Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms

Multi-Space Probability

Distribution HMM

Simultaneous modelling

of Spectrum, Pitch and

Duration

Page 16: Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Basics of HMM-based.

Advanced Signal Processing 2, SE

Professor Horst Cerjak, 19.12.200516

Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models

Simultaneous modelling of Spectrum, Pitch and Duration (3)

2. Context Dependent Modelspectrum, pitch and duration have their own influential contextual factors

distributions of the respective parameters are clustered independently

Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms

Multi-Space Probability

Distribution HMM

Simultaneous modelling

of Spectrum, Pitch and

Duration

[5]

Page 17: Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Basics of HMM-based.

Advanced Signal Processing 2, SE

Professor Horst Cerjak, 19.12.200517

Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models

Simultaneous modelling of Spectrum, Pitch and Duration (4)

arbitrary text converted to context-based label sequence

sentence HMM constructed according to label sequence

State durations determined so as to maximize likelihood of state duration densities

Sequence of mel-cepstrum coefficients and pitch values generated

Speech synthesized directly by MLSA filter

Contents

Speech Parameter

Generation

- Dynamic Features

- Algorithms

Multi-Space Probability

Distribution HMM

Simultaneous modelling

of Spectrum, Pitch and

Duration

[5]

3. Text-to-Speech Synthesis System

Page 18: Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Basics of HMM-based.

Advanced Signal Processing 2, SE

Professor Horst Cerjak, 19.12.200518

Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models

References

[1] K. Tokuda, T. Kobayashi and S. Imai, Speech Parameter Generation from HMM using Dynamic Features, Proc. ICASSP, 1995

[2] http://hts.sp.nitech.ac.jp/?Publications – see ‘Attach file’ tokuda_TTSworkshop2002.pdf

[3] Tokuda, Yoshimura, Masuko, Kobayashi, Kitamura, Speech Parameter Generation Algortihms for HMM-based Speech Synthesis, ICASSP, 2000

[4] Tokuda, Masuko, Miyazaki, Kobayashi, Multi-Space Probability Distribution HMM, IEICE Trans. Inf. & Syst., 2000

Page 19: Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Basics of HMM-based.

Advanced Signal Processing 2, SE

Professor Horst Cerjak, 19.12.200519

Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models

References

[5] Yoshimura, Tokuda, Masuko, Kobayashi, Kitamura, Simultaneous modeling of Spectrum, Pitch and Duration in HMM-based Speech Synthesis, Proc EU-ROSPEECH, 1999

Page 20: Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Basics of HMM-based.

Advanced Signal Processing 2, SE

Professor Horst Cerjak, 19.12.200520

Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models

Thanks for the attention