Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz,...
-
Upload
bruce-norton -
Category
Documents
-
view
229 -
download
3
Transcript of Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz,...
Advanced Signal Processing 2, SE
Professor Horst Cerjak, 19.12.20051
Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models
Basics of HMM-based speech synthesis
Contents
• Speech Parameter Generation Using Dynamic Features
• Speech Parameter Generation Algorithms for HMM-Based Speech Synthesis
• Multi-Space Probability Distribution HMM• Simultaneous Modelling of Spectrum, Pitch and
Duration in HMM-based Speech Synthesis
Contents
Speech Parameter
Generation
- Dynamic Features
- Algorithms
Multi-Space Probability
Distribution HMM
Simultaneous modelling of
Spectrum, Pitch and
Duration
Advanced Signal Processing 2, SE
Professor Horst Cerjak, 19.12.20052
Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models
Speech Parameter Generation using Dynamic Features (1)
1. Problem
vector sequence of speech parameter
state sequence of an HMM λ
vector of speech parameter at time t
ct … static feature vector (e.g. cepstral coefficient)Δct … dynamic feature vector (e.g. delta
cepstral coefficient)
Determine parameter sequence c that maximizes
Contents
Speech Parameter
Generation
- Dynamic Features
- Algorithms
Multi-Space Probability
Distribution HMM
Simultaneous modelling of
Spectrum, Pitch and
Duration
Advanced Signal Processing 2, SE
Professor Horst Cerjak, 19.12.20053
Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models
Speech Parameter Generation using Dynamic Features (2)
2. Solution to the Problem
For a given q, maximizing ,with respect to c is equivalent to maximizing , with respect to c, since does not depend on O.
… single mixture Gaussian distribution
By setting to maximize we obtain a set of equations
with
Contents
Speech Parameter
Generation
- Dynamic Features
- Algorithms
Multi-Space Probability
Distribution HMM
Simultaneous modelling of
Spectrum, Pitch and
Duration
Advanced Signal Processing 2, SE
Professor Horst Cerjak, 19.12.20054
Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models
Speech Parameter Generation using Dynamic Features (3)
2. Solution to the Problem
… mean vector of ct
… covariance matrix of ct
… mean vector of Δct
… covariance matrix of Δ ct
To obtain optimal q and c the set of equations has to be solved for every possible state sequence
Contents
Speech Parameter
Generation
- Dynamic Features
- Algorithms
Multi-Space Probability
Distribution HMM
Simultaneous modelling of
Spectrum, Pitch and
Duration
Advanced Signal Processing 2, SE
Professor Horst Cerjak, 19.12.20055
Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models
Speech Parameter Generation using Dynamic Features (3)
Contents
Speech Parameter
Generation
- Dynamic Features
- Algorithms
Multi-Space Probability
Distribution HMM
Simultaneous modelling of
Spectrum, Pitch and
Duration
2. Solution to the Problem
Advanced Signal Processing 2, SE
Professor Horst Cerjak, 19.12.20056
Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models
Speech Parameter Generation using Dynamic Features (4)
Contents
Speech Parameter
Generation
- Dynamic Features
- Algorithms
Multi-Space Probability
Distribution HMM
Simultaneous modelling of
Spectrum, Pitch and
Duration
[1], [2]
[2]
3. Example
Advanced Signal Processing 2, SE
Professor Horst Cerjak, 19.12.20057
Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models
Speech Parameter Generation Algorithms (1)
Previously: state sequence was assumed to be known
Now: (part of) state sequence is hidden
Goal: determine speech parameter vector sequence
so that is
maximized with respect to O,
where is the state and mixture sequence, i.e. (q, i) indicates the i-th mixture of state q
… speech parameter vector
Contents
Speech Parameter
Generation
- Dynamic Features
- Algorithms
Multi-Space Probability
Distribution HMM
Simultaneous modelling of
Spectrum, Pitch and
Duration
Advanced Signal Processing 2, SE
Professor Horst Cerjak, 19.12.20058
Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models
Speech Parameter Generation Algorithms (2)
1. Derived Algorithm Based on EM-Algorithm Find critical point of the likelihood function P(O| λ)
… auxiliary function of
current and new parameter vector sequence O and O’, respectively
Contents
Speech Parameter
Generation
- Dynamic Features
- Algorithms
Multi-Space Probability
Distribution HMM
Simultaneous modelling of
Spectrum, Pitch and
Duration
… occupancy probability
Advanced Signal Processing 2, SE
Professor Horst Cerjak, 19.12.20059
Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models
Speech Parameter Generation Algorithms (3)
Contents
Speech Parameter
Generation
- Dynamic Features
- Algorithms
Multi-Space Probability
Distribution HMM
Simultaneous modelling of
Spectrum, Pitch and
Duration
1. Derived Algorithm Under condition O’=WC’, C’ which maximizes Q(O,O’) is
given by
Solved with recursive algorithm for dealing with dynamic features
Advanced Signal Processing 2, SE
Professor Horst Cerjak, 19.12.200510
Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models
Speech Parameter Generation Algorithms (4)
2. Example 5-state left-to-right HMMs Speech sampled at 16 kHz & windowed by 25.6ms
Blackman window with 5ms shift Mel-cepstral coefficients obtained by mel-cepstral analysis Sentence fragment “kiNzokuhiroo”
Contents
Speech Parameter
Generation
- Dynamic Features
- Algorithms
Multi-Space Probability
Distribution HMM
Simultaneous modelling of
Spectrum, Pitch and
Duration
[3]
Advanced Signal Processing 2, SE
Professor Horst Cerjak, 19.12.200511
Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models
Observation o1 consists of a set of space indices X1{1,2,G} and a 3D vector x1 R³
x1 is drawn from one of three spaces Ω1, Ω2, Ω3 R³
Pdf for x1 : w1N1(x)+w2N2(x)+wGNG(x)
Multi-Space Probability Distribution HMM (1)
1. Problem: We cannot apply both the conventional and continuous HMMs to observations which consist of continuous values and discrete symbols [4]
Contents
Speech Parameter
Generation
- Dynamic Features
- Algorithms
Multi-Space Probability
Distribution HMM
Simultaneous modelling of
Spectrum, Pitch and
Duration
[4]
Advanced Signal Processing 2, SE
Professor Horst Cerjak, 19.12.200512
Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models
Multi-Space Probability Distribution HMM (2)
2. Example: A man fishing in a pondContents
Speech Parameter
Generation
- Dynamic Features
- Algorithms
Multi-Space Probability
Distribution HMM
Simultaneous modelling of
Spectrum, Pitch and
Duration
Sample space Ω
Ω1: 2D space for length and height of red fish
Ω2: 2D space for length and height of blue fish
Ω3: 1D space for diameters of tortoises
Ω4: 0D space for articles of junk
Weights w1 to w4 are determined by ratio of blue and red fish, tortoises and junk in the pond
N1, N2: 2D pdfs for sizes and heights of fish
N3: 1D pdf for diameter of tortoise
Man catches red fish: observation o=({1},x) is made
Same by night: o=({1,2},x)
Ω1=R²
Ω2=R²
Ω3=R¹
Ω4=Rº
4
Advanced Signal Processing 2, SE
Professor Horst Cerjak, 19.12.200513
Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models
Multi-Space Probability Distribution HMM (3)
3. Algorithm
Each state i has G pdfs NiG and their
weights wiG
The observation probability of O, P(O| λ)
is calculated with the forward-backward
algorithm
Then, we need to maximize the
observation likelihood of P(O| λ) over all parameters
reestimation formulas for the maximum likelihood estimation are calculated in analogy to the Baum-Welch-Algorithm
Contents
Speech Parameter
Generation
- Dynamic Features
- Algorithms
Multi-Space Probability
Distribution HMM
Simultaneous modelling of
Spectrum, Pitch and
Duration
[4]
Advanced Signal Processing 2, SE
Professor Horst Cerjak, 19.12.200514
Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models
1. Simultaneous Modelling
Pitch patterns modelled by MSD-HMM
observation sequence composed of continuous values and discrete symbols
Spectrum modelled by continuous probability distribution
State duration densities modelled by single Gaussian distributions (dimension of SDD equal to number of HMM states)
Simultaneous modelling of Spectrum, Pitch and Duration (1)
Contents
Speech Parameter
Generation
- Dynamic Features
- Algorithms
Multi-Space Probability
Distribution HMM
Simultaneous modelling
of Spectrum, Pitch and
Duration
[5]
Advanced Signal Processing 2, SE
Professor Horst Cerjak, 19.12.200515
Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models
Simultaneous modelling of Spectrum, Pitch and Duration (2)
2. Context Dependent ModelMany contextual factors taken into account such as:
• Position of breath group in sentence
• Position of current phoneme in current accentual phrase
• {preceding, current succeeding} part of speech
• {preceding, current succeeding} phoneme
• (etc.)
Decision-tree based context clustering applied because• As contextual factors increase, their combinations increase
exponentially
model parameters cannot be estimated with sufficient accuracy by limited training data
• Impossible to prepare speech database which includes all combinations of contextual factors
Contents
Speech Parameter
Generation
- Dynamic Features
- Algorithms
Multi-Space Probability
Distribution HMM
Simultaneous modelling
of Spectrum, Pitch and
Duration
Advanced Signal Processing 2, SE
Professor Horst Cerjak, 19.12.200516
Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models
Simultaneous modelling of Spectrum, Pitch and Duration (3)
2. Context Dependent Modelspectrum, pitch and duration have their own influential contextual factors
distributions of the respective parameters are clustered independently
Contents
Speech Parameter
Generation
- Dynamic Features
- Algorithms
Multi-Space Probability
Distribution HMM
Simultaneous modelling
of Spectrum, Pitch and
Duration
[5]
Advanced Signal Processing 2, SE
Professor Horst Cerjak, 19.12.200517
Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models
Simultaneous modelling of Spectrum, Pitch and Duration (4)
arbitrary text converted to context-based label sequence
sentence HMM constructed according to label sequence
State durations determined so as to maximize likelihood of state duration densities
Sequence of mel-cepstrum coefficients and pitch values generated
Speech synthesized directly by MLSA filter
Contents
Speech Parameter
Generation
- Dynamic Features
- Algorithms
Multi-Space Probability
Distribution HMM
Simultaneous modelling
of Spectrum, Pitch and
Duration
[5]
3. Text-to-Speech Synthesis System
Advanced Signal Processing 2, SE
Professor Horst Cerjak, 19.12.200518
Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models
References
[1] K. Tokuda, T. Kobayashi and S. Imai, Speech Parameter Generation from HMM using Dynamic Features, Proc. ICASSP, 1995
[2] http://hts.sp.nitech.ac.jp/?Publications – see ‘Attach file’ tokuda_TTSworkshop2002.pdf
[3] Tokuda, Yoshimura, Masuko, Kobayashi, Kitamura, Speech Parameter Generation Algortihms for HMM-based Speech Synthesis, ICASSP, 2000
[4] Tokuda, Masuko, Miyazaki, Kobayashi, Multi-Space Probability Distribution HMM, IEICE Trans. Inf. & Syst., 2000
Advanced Signal Processing 2, SE
Professor Horst Cerjak, 19.12.200519
Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models
References
[5] Yoshimura, Tokuda, Masuko, Kobayashi, Kitamura, Simultaneous modeling of Spectrum, Pitch and Duration in HMM-based Speech Synthesis, Proc EU-ROSPEECH, 1999
Advanced Signal Processing 2, SE
Professor Horst Cerjak, 19.12.200520
Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models
Thanks for the attention