Atsuhiro Sakurai (Texas Instruments Japan, Tsukuba R&D Center)

Modeling and Generation of Accentual Phrase F0 Contours Based on Discrete HMMs Synchronized at Mora-U

nit TransitionsAtsuhiro Sakurai (Texas Instruments Japan, Tsukuba R&D Center)Koji Iwano (currently with Tokyo Institute of Technology, Japan)Keikichi Hirose (Dep. of Frontier Eng., The University of Tokyo, Japan)

Introduction to Corpus-Based Intonation Modeling

• Traditional approach: rules derived from linguistic expertiseHuman-dependent (too complicated and not satisfactory, because the phenomena involved are not completely understood)

• Corpus-based approach: modeling derived from statistical analysis of speech corporaAutomatic (potential to improve as better speech corpora become available)

Background

• HMMs are widely used in speech recognition, and fast learning algorithms exist

• Macroscopic discrete HMMs associated to accentual phrases can store information such as accent type and prosodic structure

• Morae are extremely important to describe Japanese intonation - sequences of high and low mora can characterize accent types

Overview of the Method

• Definition of HMM and alphabet:– Accent types modeled by discrete HMMs

– 2-code mora F0 contour alphabet used as output symbols

– State transitions sychronized with mora transitions

• Classification of HMMs and training:– HMMs classified according to linguistic attributes

– Training by usual FB algorithm

• Generation of F0 contours:– Best sequence of symbols generated by a modified Vi

terbi algorithm

The Mora-F0 Alphabet

• Two codes: stylized mora F0 contours and mora-to-mora F0: 34 symbols each

• Obtained by LBG clustering from a 500-sentence database (ATR continuous speech database, speaker MHT)

• All the database is labeled using the 2-code symbols.

State transition Mora transition

Accentual phrase

The Accentual Phrase HMM

• Accentual phrases are classified according to:– Accent type

– Position of accentual phrase in the sentence

– (Optional: number of morae, part-of-speech, syntactic structure)

Example:

Example: ‘Karewa Tookyookara kuru. (He comes from Tokyo)

Accent type Position

Label sequence

[],[],[]

[],[],[],[],[],[]

shape1

F01, shape2

HMM Topologies

(a) Accent types 0 and 1

(a) Other accent types

Training Database

• ATR Continuous Speech Database (500 sentences, speaker MHT)

• Segmented in mora and accentual phrases

• Mora labels using the mora-F0 alphabet: shape (stylized F0 contour), mora F0.

• Accentual phrase labels: number of morae, position in the sentence

Output Code Generation

How to use the HMM for synthesis?

A) Recognition

B) Synthesis

1 output sequenceLikelihoodBest path

Best output sequenceBest path

Intonation Modeling Using HMM

for t=2,3,...,Tfor it=1,2,...,S

Dmin(t, it) = min(it-1){Dmin(t-1, it-1) + [-log a(it| it-1)]+[-log b(y(t)| it)]}

(t, it) =argmin(it-1){Dmin(t-1, it-1)+[-log a(it| it-1)]+[-log b(y(t)| it)]}

next it

next t

Viterbi Search for the Recognition Problem:

Intonation Modeling Using HMM

for t=2,3,...,Tfor it=1,2,...,S

Dmin(t, it) = min(it-1){Dmin(t-1, it-1) + [-log a(it| it-1)]+[-log b(ymax(t)| it)]}

(t, it) =argmin(it-1){Dmin(t-1, it-1)+[-log a(it| it-1)]+[-log b(ymax(t)| it)]}

next it

next t

Modified Viterbi Search for the Synthesis Problem:

Use of Bigram Probabilities

for t=2,3,...,Tfor it=1,2,...,S

Dmin(t, it) = min(it-1){Dmin(t-1, it-1) + [-log a(it| it-1)]+maxk{[-log b(y (t)| y(t-1),it)]}}

(t, it) =argmin(it-1){Dmin(t-1, it-1)+[-log a(it| it-1)]+maxk{[-log b(y (t)| y(t-1),it)]}}

next it

next t

k=1,…,K (dimension of y)

Accent Type Modeling Using HMM

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Mora #

log(Hz) "Type0""Type1""Type2""Type3"

Phrase Boundary Level Modeling Using HMM

0 0.5 1 1.5 2 2.5 3 3.5 4

Mora #

log(Hz) "level1.graph""level2.graph""level3.graph"J-TOBI

B.I.PauseY/N

Bound.Level

0 50 100 150 200 250 300 350 400 450 500

t [msec]

"PH1_0"

PH1_0.original

0 50 100 150 200 250 300 350 400 450 500

t [msec]

"PH1_0"

PH1_0.bigram

0 50 100 150 200 250 300 350 400 450 500

t [msec]

"PH1_1"

PH1_1.original

0 50 100 150 200 250 300 350 400 450 500

t [msec]

"PH1_1"

PH1_1.bigram

0 50 100 150 200 250 300 350 400 450 500

t [msec]

"PH1_2"

PH1_2.original

0 50 100 150 200 250 300 350 400 450 500

t [msec]

"PH1_2"

PH1_2.bigram

The Effect ofBigrams

Comments• We presented a novel approach to intonation modeli

ng for TTS synthesis based on discrete mora-synchronous HMMs.

• For now on, more features should be included in the HMM modeling (phonetic context, part-of-speech, etc.), and the approach should be compared to rule-based methods.

• Training data scarcity is a major problem to overcome (by feature clustering, an F0 contour generation model, etc.)

Hidden Markov Models (HMM)A Hidden Markov Model (HMM) is a Finite State Automaton where both state transitions and outputs are stochastic. It changes to a new state each time period, generating a new vector according to the output distribution of that state.

Symbols: 1,2, ..., K

a12 a23 a34

a22 a33

b(1|1)~b(K|1) b(1|2)~b(K|2) b(1|3)~b(K|3)

b(1|4)~b(K|4)

1 2 3 4

ステップ１：データベース作成

•ATR の連続音声データベースを使用（５００文，話者 MHT)

•モーラ単位に分割•モーララベルの付与•F0 パターンを抽出•LBG 法によるクラスタリング•全データベースにクラスタクラスを付与

Ｂｉｇｒａｍの導入

for t=2,3,...,Tfor it=1,2,...,S

Dmin(t, it) = min(it-1){Dmin(t-1, it-1) + [-log a(it| it-1)]+maxk{[-log b(y (t)| y(t-1),it)]}}

(t, it) =argmin(it-1){Dmin(t-1, it-1)+[-log a(it| it-1)]+maxk{[-log b(y (t)| y(t-1),it)]}}

next it

next t

k=1,…,K (dimension of y)

考察・今後の展望

•学習データが少ない•TTS システムへの組込みにはさらなる工夫が必要他の言語情報を考慮（音素、モーラ数、品詞等）データ不足を克服するための工夫（クラスタリング等）モデルの接続に関する検討

Hidden Markov Models (HMM)A Hidden Markov Model (HMM) is a Finite State Automaton where both state transitions and outputs are stochastic. It changes to a new state each time period, generating a new vector according to the output distribution of that state.

Symbols: 1,2, ..., K

a12 a23 a34

a22 a33

b(1|1)~b(K|1) b(1|2)~b(K|2) b(1|3)~b(K|3)

b(1|4)~b(K|4)

1 2 3 4

Atsuhiro Sakurai (Texas Instruments Japan, Tsukuba R&D Center)

Documents

Transcript of Atsuhiro Sakurai (Texas Instruments Japan, Tsukuba R&D Center)

Sakurai-Sugiura algorithm based eigenvalue solver for … · Sakurai-Sugiura algorithm based eigenvalue solver ... Georg Huhs, TNT 2011, Tenerife | Sakurai-Sugiura algorithm based

Theinfraredview ofdustandmoleculesaroundV4334Sgr (Sakurai ...

sakurai solution manual

Sakurai, Advanced Quantum Mechanics

[Sakurai, napolitano].pdf

Georgia sakurai tutorial 3

Soluções Sakurai Cap 4

日本応用数理学会 2013 年研究部会連合発表会Lei DU (University of Tsukuba, CREST/JST), Akira Imakura (University of Tsukuba), Tetsuya Sakurai (University of Tsukuba,

gaslog202104 - sakurai-gas.co.jp

Takayasu Sakurai

Sakurai Sho Aiba

Lista 4 - Sakurai

Sakurai Modern Quantum Mechanics.djvu

Kohno Sakurai

Management-By-Objectives in Healthcare · Association for Development of Community Medicine, Professor Hideyuki Sakurai from University of Tsukuba, and Mr. Kiyoshi Yasuoka from the

TSUKUBA WORLD 2014 Tsukuba World Futsal gives participants …ƒラシ案4（シューレロゴ... · TSUKUBA WORLD 2014 Tsukuba World Futsal gives participants of all ages an oppotunity

Tutorial 4 georgia sakurai

Solutions Sakurai in Espanhol

client.blueskybroadcast.com · 2014-03-18 · (EIG) 11:25-11:45 Yasunori Futamura and Tetsuya Sakurai (The University of Tsukuba, Japan) — A Hierarchical Parallel Software Package

Kohno Sakurai catalogue