Speech, Prosody, and Voice Characteristics of a Mother and Daughter With a 7
LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.
-
Upload
bartholomew-mccoy -
Category
Documents
-
view
227 -
download
0
Transcript of LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.
![Page 1: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/1.jpg)
LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH
Speech Lab., CM, NCTUChen Yu Chiang
2007/2/8
![Page 2: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/2.jpg)
Outline
Introduction Base Latent Prosody Models (LPM)
A Statistical Syllable Duration Model A Statistical Syllable Pitch Contour
Model Automatic Prosody Labeling based
on LPM Summary
![Page 3: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/3.jpg)
Introduction (1/11)
What is Prosody? Prosody is an inherent supra-
segmental feature of human speech. It carries stress, intonation patterns and timing structures of continuous speech which decide the naturalness and understandability of an utterance.
![Page 4: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/4.jpg)
Introduction (2/11) For the listener’s points of view, prosody consis
ts of systematic perception and recovery of a speaker’s intentions based on: Pause: to indicate phrases and avoid running out of ai
r. Pitch: rate of vocal-fold cycling( fundamental frequen
cy or F0) as a function of time. Rate/relative duration: phoneme durations, timing, a
nd rhythm. Loudness (Energy): relative amplitude/volume
For simplicity, we may say “ 抑 , 揚 , 頓 , 挫 , 輕 ,重 , 緩 , 急”
![Page 5: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/5.jpg)
Introduction (3/11)
The affecting factors of prosody Linguistic
Lexical, Syntactic, Semantic, Pragmatic Para-linguistic
Intentional, Attitudinal, Stylistic Non-linguistic
Physical, Emotional
![Page 6: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/6.jpg)
Introduction (4/11) Issues concerned in prosody modeling
Labeling of important prosodic cues Construction of prosody hierarchy Modeling of syntax-prosody relationship Prediction of prosodic phrase boundary (break)
from text, etc. Applications
Automatic Speech Recognition (ASR) Important prosodic cues can be explored from the input
utterance to assist in both acoustic and linguistic decoding
Text-to-Speech (TTS) A good prosody model can be used to generate
appropriate prosodic features from the input text
![Page 7: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/7.jpg)
Introduction (5/11)
Important characteristics of Mandarin Chinese A tonal language (Four lexical tones,
one neutral tone) The tonality of a monosyllable is mainly
characterized by the shape of its fundamental frequency (F0) contour
A syllable-based language (411 base-syllables)
![Page 8: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/8.jpg)
Introduction (6/11) Syllable duration is also seriously affected
by the phonetic structure of base-syllables. Generally speaking, syllable duration
increases as the number of constituent phonemes increases.
For examples: Syllables with single vowels are shortest. Syllables with stop initials or no initials, and
without nasal endings are pronounced shorter. Syllables with fricative initials and with nasal
endings are longer.
![Page 9: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/9.jpg)
Introduction (7/11) Standard tone pattern
Affection of context and intonation
![Page 10: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/10.jpg)
Introduction (8/11) As a tonal language, in Mandarin
speech, there is a tight interaction between four lexical tones, a neutral tone, base-syllable types and the underlying speech prosody/intonation.
![Page 11: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/11.jpg)
Introduction (9/11) To find the underlying prosody/intonation structur
e, we propose the Latent Prosody Models (LPM) LPM considered several Companding Factors (CFs)
(or affecting factors) on syllable pitch contour and syllable duration, including tone, initial-final type, base syllable type and prosodic state, etc.
The prosodic state (treated as a latent variable) is conceptually defined as the state of a syllable in a prosodic phrase and used as a substitute for high level linguistic information, like a word, phrase or a syntactic boundary.
Use of unlabeled database
![Page 12: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/12.jpg)
Introduction (10/11) LPMs are formulated based on the assu
mption that all affecting factors are combined additively or multiplicatively
n n n n nn n t y j l sZ X
n n n n nn n t y j l sZ X
Prosodic observed
feature vector
Normalized feature vector
Affecting factors
![Page 13: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/13.jpg)
Introduction (11/11) The main purpose of using prosodic state to replace
conventional high level linguistic information is to decompose the affections of low-level and high-level linguistic features on speech.
Through this modeling approach, some unsolved problems, such as the inconsistency of prosodic and syntactic structures, the ambiguity of word segmentation and word chunking for Mandarin Chinese, can be avoided.
Hence, based on the LPM, the proposed prosody labeling model can focus on modeling the global effect of mapping high-level linguistic features to the prosodic state and break indices, since interference caused by low-level linguistic feature has been removed by LPM.
![Page 14: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/14.jpg)
References1. Sin-Horng Chen, Wen-hsing Lai and Yih-Ru Wang, “A new duratio
n modeling approach for Mandarin speech”, IEEE transaction on speech and audio processing, vol. 11, no.4, Jul 2003, pp. 308-320
2. Sin-Horng Chen, Wen-hsing Lai and Yih-Ru Wang, “A statistics-based pitch contour model for Mandarin speech”, J. Acoust. Soc. Am. 117(2), Feb. 2005, pp. 908 – 925
3. Chen-Yu Chiang, Yih-Ru Wang, and Sin-Horng Chen, "On the inter-syllable coarticulation effect of pitch modeling for Mandarin speech", INTERSPEECH-2005, pp. 3269-3272
4. Chen-Yu Chiang, Xiao-Dong Wang, Yuan-Fu Liao, Yih-Ru Wang, Sin-Horng Chen, Keikichi Hirose, “Latent prosody model of continuous Mandarin speech”, ICASSP 2007
![Page 15: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/15.jpg)
Base Latent Prosody Models (LPM)
A Statistical Syllable Duration Model A Statistical Syllable Pitch Contour
Model
![Page 16: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/16.jpg)
A Statistical Syllable Duration Model• In ASR, state duration models are constructed to a
ssist.• In TTS, synthesis of proper duration information is
essential for natural speech.• An extension includes the modelings of initial and f
inal durations.• Multiplicative and additive models are compared.
![Page 17: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/17.jpg)
The Multiplicative Duration Model
n n n n nn n t y j l sZ X
nZ
nX
nt
ny
nj
nl
ns
observed duration of the nth syllable
normalized duration of the nth syllable
affecting factor
lexical tone of the nth syllable
prosodic state of the nth syllable
base-syllable of the nth syllable
utterance of the nth syllable
speaker of the nth syllable
![Page 18: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/18.jpg)
Training of the Model (1/2)
Expectation-Maximization (EM) algorithm
},,,,,,{ sljytvu
N the total number of training samples
Y the total number of prosodic states
the set of parameters to be estimated
auxiliary function in E-step
: new set : old set
)|,(log),|(),(1 1
n
N
n
Y
ynnn yZpZypQ
n
![Page 19: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/19.jpg)
Training of the Model (2/2)
nX : normal distribution with mean u and variance v
Assumption
Y
ynn
nnnn
n
yZp
yZpZyp
1
)|,(
)|,(),|(
),;()|,( 22222
nnnnnnnnnn sljytsljytnnn vZyZp
sequential optimizations in M-step
Assign prosodic state * max ( | , )n
n n ny
y p y Z
![Page 20: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/20.jpg)
The Additive Duration Model
nnnnn sljytnn XZ Model ->
Auxiliary Function ->
))((
)|,(log),|(),(
1
1 1
zsl
N
njyt
N
n
Y
ynnnn
N
yZpZypQ
nnnnn
n
![Page 21: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/21.jpg)
Experimental Database (1/2) MIC
high-quality, reading style microphone-speech database
MIC-sent : 455 phonetic-balanced sentential utterances
MIC-para : 300 paragraphic utterances Training : 102,529 syllables Testing : 22,109 syllables 20kHz sampling rate downsampled to 8kHz 1 frame = 5 ms
![Page 22: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/22.jpg)
Experimental Database (2/2)
Data Set Speaker Sentence Paragraph Syllable
Training Male A 1-455 1-200 34670
Training Female B 1-455 1-50 12945
Training Male C 1-455 1-100 20748
Training Female D 1-455 1-200 34166
Testing Female E None 201-300 22109
![Page 23: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/23.jpg)
Experimental Results (1/7)Training set Testing set
Mean Variance Mean Variance
Syllable44.31
(42.34)“43.89”
180.17(2.52)“2.53”
41.08(44.77)“43.77”
136.26(4.44)“3.97”
Initial17.21
(16.63)“17.20”
62.28(0.74)“0.78”
13.83(18.36)“17.05”
40.02(5.92)“1.73”
Final31.75
(31.50)“31.44”
117.06(2.12)“1.84”
30.94(33.90)“31.38”
104.15(3.40)“2.85”
(units: mean in frame and variance in frame2; 1 frame = 5 ms)
Observed Durations
( ) Normalized Durations in Multiplicative Model with 16 prosodic states
“ “ Normalized Durations in Additive Model with 16 prosodic states
![Page 24: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/24.jpg)
Experimental Results (2/7)
0 20 40 60 80 100 120 1400
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
duration(frame)
num
ber
Histogram of Observed (left)/Normalized (right) Syllable Duration in Multiplicative Model for Training Set
0 10 20 30 40 50 60 70 80 900
1000
2000
3000
4000
5000
6000
7000
8000
9000
duration(frame)
num
ber
![Page 25: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/25.jpg)
Experimental Results (3/7) Analyses of CFs
tone 1 2 3 4 5
CF 1.00 1.02 0.99 1.03 0.84
state 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
syllable0.56
-16.070.72
-12.360.79-9.69
0.84-7.71
0.89-5.79
0.91-4.70
0.95-3.14
0.98-1.94
1.000.00
1.020.12
1.051.69
1.094.10
1.145.87
1.229.65
1.3315.08
1.6928.74
initial0.30
-11.200.49-6.82
0.63-6.22
0.71-4.98
0.80-3.82
0.85-3.60
0.86-2.92
0.89-2.49
0.96-1.40
1.00-0.41
1.040.00
1.090.89
1.121.39
1.193.56
1.306.03
1.6112.69
final0.50
-14.280.68
-10.240.75-7.94
0.80-6.45
0.84-5.15
0.87-4.24
0.91-2.99
0.95-.1.73
0.98-0.86
1.000.00
1.020.73
1.083.12
1.145.10
1.248.50
1.4013.42
1.8625.49
CFs for prosodic states (up: multiplicative model down: additive model)
CFs for tones
![Page 26: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/26.jpg)
Experimental Results (4/7)
用 14* 百 14 子 9 蓮 15* 、蕾 11 絲 4 花 15* 、姬 7 百 11 合 15* 、龍 13 膽 15* 、土 5耳 9 其 11 桔 10 梗 13* 和 14* 蒜 4 香 1藤 12* 為 4 材 15* ,以 14* 維 4 納 6 斯 13* 執 8 壺 2 的 14* 石 10 膏 13* 花 3 器14* 烘 10 托 15* ,好 4 一 2 趟 11* 春 4雨 14* 濛 3濛 10 的 15* 郊 9外 14* 田 4野 9風 13 光 15* 。
Examples of Prosodic State Labeling* denotes word boundary
![Page 27: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/27.jpg)
Experimental Results (5/7)1.07
0.86 1.10
0.79 0.89
0.83 0.92
1.00 0.91
1.221.06
1.211.03
0.96 1.05
{b, d, g}?
Single vowel
Compoundvowel
Open vowel
{f, s, sh, shi, h}
{ts, ch, chi}
Single vowel
Decision Tree of Base-Syllable CFs for Syllable Duration ModelThe number associated with a node is the mean of the CFs belonging to the cluster
Solid line indicates positive answerDashed line indicates negative answer
![Page 28: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/28.jpg)
Experimental Results (6/7)0.79
0 0.87
0.95
0.89
0.76
Null initial
0.37
1.29
1.42 1.25
0.42 0.35
1.321.18
0.91
1.21
0.70
1.00 0.89
{b, d, g}
{ts, ch, chi}
Singlevowel
{f, s, sh, shi, h}
Vowel begins with {i}
Singlevowel
{p, t, k}
Vowel begins with {i}
1.141.22
With medial
1.291.17
Vowel begins with {u}
Decision Tree of Base-Syllable CFs for Initial Duration Model
![Page 29: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/29.jpg)
Experimental Results (7/7)1.07
1.37 1.04
1.08
1.06
Null initial
1.33
0.96
Single vowel
1.40
1.47 1.35
0.91
1.150.83
1.02 0.94 1.01 1.08
1.150.94 1.071.02
With medial
Vowel begins with {i}With medial
{m, n, l, r}
{m, n, l, r}Compound
vowel
{b, d, g}{ts, ch, chi}
Decision Tree of Base-Syllable CFs for Final Duration Model
![Page 30: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/30.jpg)
A Statistical Syllable Pitch Contour Model (1/7)• Mandarin is a tonal language. Information o
f the tonality appears on its pitch contour.• Pitch contour patterns in continuous speec
h are highly varying and can deviate dramatically away from their canonical forms.
• Separate an utterance’s pitch contour into a global trend pitch mean model and a locally variational shape model.
• A quantitative description to the coarticulation effect is given.
![Page 31: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/31.jpg)
A Statistical Syllable Pitch Contour Model (2/7)
Gaussian normalization
original pitch period of frame t
mean of speaker k
standard deviation of speaker k
normalized pitch period of frame t
( )( ) k
all allk
f tf t
( )f t
( )f t
k
k
all
all
averaged mean of all speaker
averaged standard deviation of all speakers
![Page 32: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/32.jpg)
A Statistical Syllable Pitch Contour Model (3/7)
Discrete orthogonal polynomial Basis Functions (Discrete Legendre
Polynomials) :
1)(0 Mi
][][)( 212/1
212
1
Mi
MM
Mi
])[(][)( 6122/1
)3)(2)(1(180
2
3
MM
Mi
Mi
MMMM
Mi
])()()[(][)( 22
25
20
)2)(1(
102362
2332/1
)4)(3)(2)(2)(1(2800
3 M
MMMi
MMM
Mi
Mi
MMMMMM
Mi
Mi 0 3M
![Page 33: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/33.jpg)
A Statistical Syllable Pitch Contour Model (4/7)
Parameterized pitch contour
3
0
)()(ˆj
Mi
jjMi af Mi 0
M
iMi
jMi
Mj fa0
11 )()(
![Page 34: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/34.jpg)
A Statistical Syllable Pitch Contour Model (5/7)
Pitch mean modeling
nn ssnn YZ )(
nZ observed log-pitch mean
ns
ns speaker’s dynamic range change CF
speaker’s level shift CF
nY speaker-compensated log-pitch mean
![Page 35: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/35.jpg)
A Statistical Syllable Pitch Contour Model (6/7)
nnnnnn pfiftpttnn XY
nX
nt
normalized log-pitch mean of the nth syllable
affecting factor
current lexical tone of the nth syllable
prosodic state of the nth syllable
r
npt
nft
ni
nf
np
previous lexical tone of the nth syllable
initial class of the nth syllable
following lexical tone of the nth syllable
final class of the nth syllable
![Page 36: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/36.jpg)
A Statistical Syllable Pitch Contour Model (7/7)
Pitch shape modeling
normalized pitch shape vector of the nth syllable
CF vector for affecting factor
lexical tone combinations of the nth syllable
nZ
nX
rb
ntc
pause < 13 frames : tight coupling effect >=13 : loose
Taaa 321observed of the nth syllable
nnnnn fisqtcnn bbbbbXZ
nq prosodic state of pitch shape
![Page 37: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/37.jpg)
Experimental Results (1/6)Observed Log-Pitch
(unit of pitch period: ms)
training set test set
mean (co)variance mean (co)variance
mean 1.949 0.0372 1.948 0.0345
Shape(x 0.01)
056.0
982.0
545.3
900.2106.0140.5
106.0671.9229.3
140.5229.3550.58
142.0
749.0
012.4
356.4276.0007.4
276.0460.12653.3
007.4653.3489.49
![Page 38: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/38.jpg)
Experimental Results (2/6)
Normalized Log-Pitch with 16 Prosodic States
(unit of pitch period: ms)
training set test set
mean (co)variance
RMSE mean (co)variance
RMSE
mean 1.948 0.000402 0.0203 1.948 0.000344 0.0183
shape(x 0.01)
104.0
996.0
660.3
251.1232.0076.0
232.0907.1354.0
076.0354.0865.9
120.1
381.1
143.3
085.0
906.0
861.3
263.2808.0073.1
808.0101.3955.0
073.1955.0885.12
505.1
762.1
603.3
![Page 39: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/39.jpg)
Experimental Results (3/6)
1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 30
200
400
600
800
1000
1200
1400
1600
1800
2000
pitch mean
num
ber
1.6 1.7 1.8 1.9 2 2.1 2.2 2.30
1000
2000
3000
4000
5000
6000
7000
pitch mean
num
ber
Histograms of Observed (left)/Normalized (right) Log-Pitch Mean for the Training Set
![Page 40: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/40.jpg)
Experimental Results (4/6)
Examples of the Reconstructed Pitch Contours Inside Test : ” 在國人消費習慣改變,國民所得提高,信用貸款市場,成為潛力市場。
”
0 200 400 600 800 1000 1200 14000
2
4
6
8
10
12
Frame
Pitch P
eroid (m
s)
original predicted
![Page 41: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/41.jpg)
Experimental Results (5/6)
Examples of the Reconstructed Pitch ContoursOutside Test : ” 在意國政經混亂中臨危受命的齊安培,未來在政經兩方面都有不少
艱困任務待完成。 ”
0 200 400 600 800 1000 1200 1400 1600 18000
1
2
3
4
5
6
7
8
Frame
Pitch P
eroid (m
s)
original predicted
![Page 42: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/42.jpg)
Experimental Results (6/6)
Influences of the 16 Unified Prosodic States
0 2 4 6 8 10 12 14 164
5
6
7
8
9
10
11
prosodic state
pitc
h pe
riod
(ms)
![Page 43: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/43.jpg)
Analyses of the Inferred Model (1/13)
t
pt
tone 1 2 3 4 5
-0.154 0.054 0.160 -0.035 0.128
-0.022 -0.034 0.018 0.024 0.029
0.022 -0.003 -0.047 0.011 0.013ft
CFs of Current, Previous and Following Tones in Pitch Mean Model
![Page 44: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/44.jpg)
Analyses of the Inferred Model (2/13)
Comparison of a Tone 3 Precedes another Tone 3 with Canonical Tone 2 and 3
0 2 4 6 8 10 12 14 16 18 206
6.5
7
7.5
8
8.5
9
9.5
frame
pitc
h pe
riod
(ms)
033133233333433533020030
![Page 45: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/45.jpg)
Analyses of the Inferred Model (3/13)
Comparison of a Tone 4 Precedes another Tone 4 with Canonical Tone 4
0 2 4 6 8 10 12 14 16 18 205.5
6
6.5
7
7.5
8
8.5
frame
pitc
h pe
riod
(ms)
044144244344444544040
![Page 46: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/46.jpg)
Analyses of the Inferred Model (4/13)
CFs of Initial/Final Classes in Pitch Mean Model
i
f
class 0 1 2 3 4 5 6
-0.008 0.004 0.011 -0.013 0.003 -0.014 0.003
0.011 -0.001 -0.004 0.008 -0.005 -0.019 0.004
(unit of pitch period: ms)
Null initial {b,d,g} {f,s,sh,shi,h}
{m,n,l,r} {ts,ch,chi}
{p,t,k} {tz,j,ji}
Low vowels Middle vowels
High vowels
Compound vowels
Vowel with nasal ending
retroflexion Null vowels
![Page 47: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/47.jpg)
Analyses of the Inferred Model (5/13)
CFs of Initial/Final Classes in Pitch Shape Model
(unit of pitch period: ms)
class 0 1 2 3 4 5 6
ib
fb
548.0
125.1
971.0
020.0
015.0
522.0
321.0
440.0
509.0
697.0
506.0
520.0
648.0
666.0
270.1
389.0
627.0
111.0
075.0
161.0
722.0
095.0
280.0
641.0
076.0
865.0
278.0
094.0
017.0
978.0
166.0
703.0
640.0
080.0
891.0
266.1
291.0
696.0
354.0
182.0
131.0
224.0
(x 0.01)(x 0.01)
(x 0.01)(x 0.01)
![Page 48: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/48.jpg)
Analyses of the Inferred Model (6/13)
CFs of Speakers in Pitch Mean Model
s
s
speakers 1(M) 2(F) 3(M) 4(F)
1.014 0.971 1.026 0.981
-0.030 0.049 -0.044 0.041
(unit of pitch period: ms)
![Page 49: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/49.jpg)
Analyses of the Inferred Model (7/13)
CFs of Speakers in Pitch Shape Model
(unit of pitch period: ms)
speakers 1(M) 2(F) 3(M) 4(F)
sb
012.0
134.0
291.0
125.0
302.0
324.0
348.0
349.0
216.0
152.0
472.0
301.0
(x 0.01)(x 0.01)
![Page 50: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/50.jpg)
Analyses of the Inferred Model (8/13)
state 0 1 2 3 4 5 6 7
-0.400 -0.225 -0.159 -0.113 -0.081 -0.047 -0.016 0.014
state 8 9 10 11 12 13 14 15
0.039 0.073 0.102 0.130 0.161 0.196 0.265 0.348
p
p
CFs of Prosodic States in Pitch Mean Model
(unit of pitch period: ms)
![Page 51: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/51.jpg)
Analyses of the Inferred Model (9/13)
(unit of pitch period: ms)
CFs of Prosodic States in Pitch Shape Model
state 0 1 2 3 4 5 6 7
state 8 9 10 11 12 13 14 15
qb
qb
108.0
832.4
662.3
476.1
249.1
354.9
535.1
179.0
047.0
304.0
479.0
164.0
436.0
221.3
167.1
773.0
295.0
707.3
346.0
218.4
297.2
164.1
798.0
340.1
267.0
591.0
245.2
184.0
249.2
849.0
466.0
194.1
558.1
961.0
582.0
033.4
248.0
550.1
167.1
603.1
469.1
094.0
684.0
455.2
550.1
106.0
289.0
279.0
(x 0.01)(x 0.01)
(x 0.01)(x 0.01)
![Page 52: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/52.jpg)
Analyses of the Inferred Model (10/13)
![Page 53: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/53.jpg)
Analyses of the Inferred Model (11/13)
![Page 54: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/54.jpg)
Analyses of the Inferred Model (12/13)
BreakPM
Non-boundary Minor boundary Major boundary
Non-PM 89.18% 9.80% 1.02%
Minor PM 57.73% 33.48% 8.80%
Secondary Major PM
30.52% 44.65% 24.83%
Major PM 19.31% 31.66% 49.02%
Statistics of the Prosodic Labeling
Major PM={, ,。 ,! ,; ,? }, Secondary Major PM={、 ,: }, Minor PM={brace, bracket, dot}
1
1
major boundary if 10 15
location after syllable minor boundary if 4 9
non-boundary otherwise
n n
n n
p p
n p p
![Page 55: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/55.jpg)
Analyses of the Inferred Model (13/13)
這位約翰霍普金斯大學名譽教授 *在第一屆國際 &性高潮會議中說 *,他對這一始於 &一九八O年代的性趨勢 &感到 ...這場比賽 *將於今日下午2時 &在 &台北 &市立棒球場舉行 *,黑鷹組織 &所屬 &三級棒球隊 *,包括台南六信 *、台東農工 &、屏東鶴聲國中 *、台東鹿野國中 &及台南善化國小等隊 *,將各著球隊服裝&到場加油 *,預計人數有近千人以上 *。黑鷹兩位教練 *黃永裕及&江泰權 *,對於此場比賽 *不敢掉以輕心 *,除了排出鑽石陣容外,也要親自上場 *。黑鷹所 ...商人非法囤積 &大量爆竹 *,萬一發生爆炸事件 *,不但會造成死傷慘劇 *,自己也可能成為 &受害最大 ...世界性的環保潮流 &,使人們日益重視環境汙染的問題 *;而觀光旅遊 & 這個﹁無煙囪工業 *﹂正好吻合此一 *健康訴求 * ,因此可預期& 今年將是遊樂區 ...
Examples of Possible Minor (&) and Major (*) Prosodic Phrase Boundaries
![Page 56: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/56.jpg)
Conclusions Effectiveness on isolating several main
factors Greatly reducing the variance of the mo
deled duration/pitch The estimated companding factors (CF
s) conformed well to the prior linguistic knowledge
The prosodic-state labels produced are linguistically meaningful
![Page 57: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/57.jpg)
Automatic Prosody Labeling based on LPM
![Page 58: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/58.jpg)
Break types In this study, we define break types to be five
levels; i.e., B0~B4. B0 : tightly coupling syllabic boundary that the
pitch contour on the syllable juncture may be connected and affected by contextual syllables severely
B1 represents normal syllabic boundary which loosely couples two consecutive syllables and does not have a pitch reset.
B2 represents prosodic word boundary which has short pause or an irregular pitch reset.
B3 /B4 :minor/major breaks with medium and long pauses, respectively. Besides, they usually accompany large or medium pitch resets.
![Page 59: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/59.jpg)
Break Labeling Algorithm
* *
,
,
,
, argmax ( , | , , , )
argmax ( , , , | , )
argmax ( , | , , , ) ( , | , )
P
P
P P
B p
B p
B p
B p p B x Pau L t
p B x PauL t
x Pau p B L t p BL t
Break type
Prosodic state
Pitch contour
Pause duration
High-level Linguistic feature
Low-level Linguistic feature (tone)
Acoustic-prosodic model
linguistic-prosodic model
![Page 60: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/60.jpg)
Acoustic-prosodic model (1/3)
1, , , -1 , , 1 , , 1 , , ,
1 1
( , | , , , )
( | , , , ) ( | , , , )
( | , , ) ( | , )
( | , , , , , ) ( | , )kNK
k n k n k n k n k n k n k n k n k n k nk n
P
P P
P P
P p B B t t t P Pau B L
x Pau p B L t
x p B L t Pau p B L t
x p B t PauB L
x
The syllable pitch contour model
(Base LPM)
The pause-break model
![Page 61: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/61.jpg)
Acoustic-prosodic model (2/3)
The syllable pitch contour model, , , ,, 1 , 1 ,, , ,k n k n k n k nk n k n
f bt p B tpk n k n B tp
μx y PT PP PC PC
![Page 62: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/62.jpg)
Acoustic-prosodic model (3/3)
The pause-break model
, , , -1 , , 1 , , 1
, , , , 1 , 1 , ,, ,
( | , , , , , )
( ; , )
k n k n k n k n k n k n k n
k n k n k n k n k n k n k nf b
t p B tp B tp
P p B B t t t
N
x
x μ RPT PP PC PC
1 1, , , ,
1, , , , , ,
( | , ) ( ; , )k n k n k n k n
k n k n k n k n B L B LP Pau B L g Pau
![Page 63: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/63.jpg)
Linguistic-prosodic model
12
,1 , , 1 , 1 , ,1 2 1
( , | , ) ( , | ) ( | , ) ( | ) ( | ) ( | )
( ) ( | , ) ( | )k kN NK
k k n k n k n k n k nk n n
P P P P P P
P p P p p B P B L
p BL t p BL pB L BL pB BL
Prosodic state transition modelLinguistic-break model
![Page 64: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/64.jpg)
Training of the Model To estimate the parameters of the break
labeling model, a sequential optimization procedure based on the ML criterion is adopted. It first defines a likelihood function
expressed by 1
, , , -1 , , 1 , , 1 , , ,1 1
12
,1 , , 1 , 1 , ,1 2 1
log ( | , , , , , ) ( | , )
( ) ( | , ) ( | )
k
k k
NK
k n k n k n k n k n k n k n k n k n k nk n
N NK
k k n k n k n k n k nk n n
Q P p B B t t t P Pau B L
P p P p p B P B L
x
![Page 65: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/65.jpg)
Initialization of Break Labeling
Pause ≥ 300ms
Pause ≥ 125ms
Pause ≥ 75ms
PMNormalized pitch reset ≥ threshold
Pitch pause ≥ 30ms
Interword
Pitch pause ≥ 30ms
B4
B3
B3 B2
B1 B0
B1 B0
B2
Y
Y
Y
YY
YY
Y
N
N
N
NN
N
N
![Page 66: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/66.jpg)
Experimental Database Performance of the proposed pitch modeling meth
od was evaluated using a Mandarin speech database
The database contained the read speech of a single female professional announcer
Its texts were all short paragraphs composed of several sentences selected from the Sinica Tree-Bank Corpus
The database consisted of 380 utterances with 52192 syllables
Sampling rate 16kHz All segmentations and F0 values are manually corr
ected
![Page 67: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/67.jpg)
Experimental Results
The learning curve
![Page 68: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/68.jpg)
Experimental Results
Covariance matrices of observed and normalized feature vectors
-4
932.3 0 0 0
0 89.9 0 0 10
0 0 17.8 0
0 0 0 5.0
xR-4
y
9.0 0 0 0
0 31.9 0 0 10
0 0 11.1 0
0 0 0 3.8
R
![Page 69: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/69.jpg)
Experimental Results-syllable pitch contour model(1/12)
The learned pitch contour of 5 tones
![Page 70: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/70.jpg)
Experimental Results-syllable pitch contour model(2/12)
Prosodic state patterns
![Page 71: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/71.jpg)
Experimental Results-syllable pitch contour model(3/12)
Coarticulation patterns
![Page 72: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/72.jpg)
Experimental Results-syllable pitch contour model(4/12)
![Page 73: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/73.jpg)
Experimental Results-syllable pitch contour model(5/12)
![Page 74: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/74.jpg)
Experimental Results-syllable pitch contour model(6/12)
![Page 75: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/75.jpg)
Experimental Results-syllable pitch contour model(7/12)
![Page 76: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/76.jpg)
Experimental Results-syllable pitch contour model(8/12)
![Page 77: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/77.jpg)
Experimental Results-syllable pitch contour model(9/12)
![Page 78: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/78.jpg)
Experimental Results-syllable pitch contour model(10/12)
![Page 79: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/79.jpg)
Experimental Results-syllable pitch contour model(11/12)
![Page 80: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/80.jpg)
Experimental Results-syllable pitch contour model(12/12)
![Page 81: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/81.jpg)
Experimental Results-Pause-break model (1/2)
Pause-break model
Break type
B0 B1 B2 B3 B4
Pause duration mean in
sec
0.0020.00
90.035 0.206
0.479
![Page 82: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/82.jpg)
Experimental Results-Pause-break model (2/2)
1, , ,( | 4, )k n k n k nP Pau B L
![Page 83: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/83.jpg)
Experimental Results-length of prosodic units (1/3)
Histogram of length of prosodic group
![Page 84: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/84.jpg)
Experimental Results-length of prosodic units (2/3)
Histogram of length of prosodic phrase
![Page 85: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/85.jpg)
Experimental Results-length of prosodic units (3/3)
Histogram of length of word
![Page 86: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/86.jpg)
Experimental Results
Count of break indices
![Page 87: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/87.jpg)
Experimental Results
Count of prosodic state
![Page 88: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/88.jpg)
Experimental Results
Prob. of prosodic state after B3
![Page 89: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/89.jpg)
Experimental Results
Prob. of prosodic state before B3
![Page 90: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/90.jpg)
Experimental Results
Prob. of prosodic state after B4
![Page 91: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/91.jpg)
Experimental Results
Prob. of prosodic state before B4
![Page 92: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/92.jpg)
Experimental Results-prosodic state transition model(1/5)
, , 1 , 1( | , 4)k n k n k nP p p B
Pn-1\Pn 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 0.01 0.01 0.01 0.02 0.01 0.08 0.08 0.02 0.04 0.13 0.01 0.12 0.13 0.12 0.12 0.092 0.00 0.01 0.01 0.01 0.04 0.00 0.10 0.01 0.08 0.16 0.07 0.00 0.18 0.11 0.13 0.083 0.00 0.00 0.00 0.03 0.00 0.04 0.00 0.10 0.03 0.07 0.00 0.23 0.10 0.19 0.12 0.084 0.00 0.00 0.00 0.02 0.00 0.04 0.00 0.13 0.00 0.14 0.00 0.04 0.13 0.20 0.16 0.135 0.00 0.00 0.00 0.01 0.00 0.06 0.00 0.00 0.13 0.00 0.17 0.00 0.33 0.10 0.17 0.006 0.00 0.00 0.01 0.00 0.06 0.01 0.00 0.20 0.00 0.03 0.14 0.00 0.07 0.08 0.28 0.107 0.00 0.00 0.00 0.00 0.02 0.01 0.00 0.00 0.08 0.01 0.00 0.26 0.00 0.43 0.00 0.178 0.00 0.00 0.00 0.03 0.00 0.00 0.00 0.09 0.01 0.00 0.11 0.00 0.38 0.00 0.35 0.009 0.01 0.01 0.01 0.01 0.01 0.01 0.07 0.01 0.01 0.01 0.01 0.24 0.01 0.35 0.01 0.2510 0.01 0.01 0.01 0.01 0.12 0.01 0.01 0.01 0.22 0.03 0.01 0.01 0.31 0.07 0.16 0.0111 0.02 0.02 0.02 0.02 0.04 0.02 0.02 0.04 0.02 0.06 0.02 0.25 0.04 0.19 0.04 0.1512 0.01 0.01 0.01 0.02 0.01 0.01 0.01 0.08 0.02 0.10 0.07 0.01 0.08 0.08 0.28 0.1713 0.02 0.02 0.02 0.02 0.15 0.02 0.02 0.12 0.04 0.10 0.06 0.15 0.12 0.02 0.08 0.0414 0.03 0.03 0.03 0.05 0.03 0.03 0.19 0.03 0.03 0.11 0.05 0.08 0.16 0.05 0.05 0.0315 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.10 0.05 0.05 0.05 0.05 0.05 0.10 0.10 0.0516 0.03 0.03 0.03 0.03 0.03 0.07 0.03 0.03 0.03 0.17 0.10 0.07 0.07 0.03 0.10 0.10
![Page 93: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/93.jpg)
Experimental Results-prosodic state transition model(2/5)
, , 1 , 1( | , 3)k n k n k nP p p B
Pn-1\Pn 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 0.03 0.03 0.03 0.15 0.05 0.03 0.08 0.03 0.08 0.03 0.20 0.05 0.10 0.05 0.05 0.032 0.01 0.03 0.08 0.11 0.01 0.15 0.09 0.03 0.01 0.16 0.15 0.05 0.05 0.05 0.01 0.013 0.00 0.01 0.06 0.05 0.10 0.07 0.00 0.18 0.01 0.26 0.00 0.07 0.09 0.03 0.04 0.004 0.00 0.01 0.04 0.00 0.11 0.00 0.22 0.00 0.12 0.00 0.20 0.11 0.00 0.10 0.04 0.025 0.00 0.00 0.00 0.09 0.13 0.00 0.00 0.10 0.18 0.00 0.00 0.32 0.10 0.00 0.05 0.016 0.00 0.00 0.08 0.02 0.00 0.00 0.28 0.00 0.01 0.33 0.01 0.03 0.00 0.19 0.00 0.047 0.00 0.00 0.00 0.05 0.00 0.17 0.00 0.28 0.00 0.00 0.25 0.00 0.13 0.01 0.08 0.008 0.00 0.00 0.03 0.00 0.15 0.00 0.13 0.00 0.25 0.00 0.00 0.00 0.26 0.12 0.00 0.059 0.00 0.00 0.03 0.16 0.00 0.00 0.00 0.11 0.00 0.00 0.46 0.04 0.00 0.00 0.16 0.0010 0.00 0.01 0.02 0.00 0.10 0.18 0.00 0.00 0.15 0.23 0.00 0.00 0.24 0.01 0.00 0.0411 0.01 0.05 0.03 0.01 0.10 0.01 0.19 0.06 0.01 0.08 0.01 0.17 0.03 0.06 0.14 0.0112 0.00 0.00 0.00 0.10 0.00 0.13 0.01 0.07 0.17 0.00 0.00 0.25 0.00 0.19 0.00 0.0413 0.01 0.01 0.04 0.01 0.14 0.01 0.01 0.37 0.02 0.01 0.14 0.01 0.15 0.01 0.08 0.0114 0.01 0.01 0.03 0.04 0.05 0.01 0.08 0.10 0.09 0.02 0.22 0.06 0.07 0.14 0.03 0.0515 0.02 0.02 0.02 0.04 0.02 0.02 0.02 0.17 0.08 0.17 0.13 0.06 0.11 0.02 0.02 0.0816 0.05 0.05 0.10 0.05 0.05 0.05 0.05 0.10 0.05 0.05 0.10 0.05 0.05 0.05 0.05 0.05
![Page 94: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/94.jpg)
Experimental Results-prosodic state transition model(3/5)
, , 1 , 1( | , 2)k n k n k nP p p B
Pn-1\Pn 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 0.06 0.13 0.15 0.09 0.04 0.02 0.11 0.02 0.06 0.09 0.02 0.02 0.09 0.02 0.02 0.022 0.05 0.05 0.09 0.22 0.01 0.05 0.19 0.12 0.03 0.01 0.12 0.01 0.03 0.01 0.01 0.013 0.02 0.01 0.05 0.11 0.17 0.03 0.00 0.22 0.04 0.06 0.00 0.16 0.00 0.09 0.02 0.014 0.01 0.00 0.04 0.00 0.19 0.00 0.06 0.00 0.35 0.00 0.00 0.15 0.14 0.01 0.04 0.005 0.02 0.00 0.03 0.03 0.00 0.00 0.18 0.00 0.00 0.39 0.00 0.13 0.04 0.16 0.00 0.026 0.00 0.00 0.00 0.15 0.00 0.00 0.00 0.38 0.00 0.00 0.29 0.00 0.00 0.05 0.11 0.007 0.00 0.01 0.04 0.00 0.00 0.00 0.15 0.00 0.13 0.22 0.00 0.00 0.31 0.06 0.06 0.018 0.00 0.00 0.00 0.00 0.06 0.00 0.00 0.00 0.17 0.00 0.09 0.37 0.00 0.28 0.00 0.019 0.00 0.01 0.03 0.03 0.00 0.01 0.02 0.00 0.00 0.23 0.00 0.00 0.37 0.00 0.24 0.0610 0.00 0.01 0.00 0.00 0.04 0.00 0.00 0.10 0.00 0.00 0.00 0.36 0.00 0.47 0.00 0.0011 0.01 0.00 0.00 0.04 0.00 0.04 0.02 0.01 0.11 0.01 0.01 0.00 0.43 0.00 0.23 0.0912 0.00 0.00 0.01 0.00 0.00 0.04 0.00 0.05 0.00 0.00 0.18 0.00 0.20 0.17 0.26 0.0613 0.01 0.01 0.01 0.01 0.01 0.00 0.00 0.02 0.03 0.01 0.00 0.08 0.00 0.33 0.37 0.1014 0.01 0.01 0.01 0.08 0.01 0.02 0.00 0.00 0.12 0.02 0.03 0.00 0.13 0.05 0.29 0.2215 0.01 0.01 0.02 0.03 0.00 0.10 0.01 0.04 0.01 0.07 0.03 0.13 0.05 0.03 0.21 0.2316 0.01 0.03 0.04 0.04 0.03 0.01 0.09 0.01 0.01 0.11 0.01 0.12 0.08 0.06 0.14 0.22
![Page 95: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/95.jpg)
Experimental Results-prosodic state transition model(4/5)
, , 1 , 1( | , 1)k n k n k nP p p B
Pn-1\Pn 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 0.09 0.29 0.18 0.07 0.04 0.02 0.02 0.02 0.04 0.04 0.02 0.04 0.02 0.02 0.02 0.022 0.09 0.28 0.30 0.10 0.01 0.07 0.00 0.05 0.02 0.02 0.00 0.00 0.01 0.02 0.01 0.003 0.04 0.24 0.21 0.30 0.00 0.12 0.03 0.00 0.00 0.01 0.00 0.00 0.03 0.01 0.00 0.004 0.02 0.13 0.26 0.17 0.30 0.00 0.00 0.08 0.00 0.03 0.00 0.00 0.00 0.00 0.00 0.005 0.00 0.05 0.22 0.35 0.00 0.18 0.00 0.14 0.00 0.00 0.00 0.04 0.00 0.00 0.00 0.006 0.02 0.10 0.00 0.34 0.07 0.00 0.21 0.00 0.07 0.12 0.00 0.00 0.03 0.03 0.00 0.007 0.00 0.03 0.11 0.18 0.22 0.00 0.33 0.00 0.12 0.00 0.00 0.00 0.00 0.00 0.00 0.008 0.00 0.02 0.15 0.00 0.45 0.00 0.00 0.24 0.00 0.00 0.11 0.00 0.02 0.00 0.00 0.009 0.01 0.00 0.00 0.35 0.00 0.20 0.00 0.24 0.00 0.10 0.00 0.00 0.08 0.00 0.00 0.0110 0.00 0.02 0.06 0.00 0.00 0.00 0.43 0.00 0.34 0.00 0.00 0.15 0.00 0.00 0.00 0.0011 0.00 0.01 0.05 0.00 0.36 0.00 0.00 0.00 0.00 0.33 0.00 0.09 0.00 0.10 0.05 0.0112 0.00 0.01 0.00 0.14 0.00 0.16 0.00 0.34 0.00 0.17 0.05 0.00 0.11 0.00 0.00 0.0013 0.00 0.01 0.04 0.00 0.09 0.00 0.13 0.00 0.24 0.08 0.00 0.29 0.00 0.10 0.02 0.0114 0.00 0.00 0.01 0.06 0.00 0.07 0.00 0.18 0.02 0.19 0.00 0.17 0.12 0.11 0.04 0.0215 0.00 0.01 0.00 0.02 0.00 0.00 0.08 0.00 0.12 0.08 0.00 0.19 0.19 0.19 0.09 0.0416 0.00 0.01 0.01 0.03 0.00 0.00 0.02 0.03 0.03 0.08 0.00 0.12 0.10 0.24 0.23 0.07
![Page 96: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/96.jpg)
Experimental Results-prosodic state transition model(5/5)
, , 1 , 1( | , 0)k n k n k nP p p B
Pn-1\Pn 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 0.19 0.10 0.03 0.13 0.13 0.03 0.03 0.03 0.06 0.03 0.03 0.03 0.03 0.03 0.03 0.032 0.11 0.31 0.16 0.13 0.01 0.06 0.02 0.06 0.02 0.01 0.02 0.02 0.01 0.02 0.01 0.013 0.03 0.14 0.33 0.24 0.00 0.00 0.12 0.00 0.03 0.04 0.01 0.02 0.01 0.00 0.01 0.004 0.02 0.06 0.21 0.10 0.31 0.00 0.00 0.21 0.00 0.00 0.02 0.03 0.00 0.02 0.00 0.005 0.00 0.01 0.02 0.38 0.00 0.40 0.00 0.00 0.15 0.00 0.01 0.00 0.02 0.00 0.01 0.006 0.02 0.00 0.21 0.00 0.46 0.00 0.00 0.15 0.09 0.01 0.00 0.02 0.00 0.02 0.00 0.017 0.01 0.02 0.04 0.00 0.18 0.00 0.46 0.00 0.00 0.17 0.00 0.08 0.00 0.03 0.00 0.008 0.00 0.02 0.00 0.22 0.24 0.00 0.00 0.00 0.35 0.00 0.07 0.00 0.06 0.01 0.02 0.009 0.00 0.01 0.01 0.00 0.00 0.23 0.00 0.47 0.00 0.00 0.00 0.20 0.06 0.00 0.00 0.0010 0.00 0.00 0.03 0.00 0.15 0.00 0.34 0.00 0.00 0.36 0.00 0.00 0.00 0.09 0.01 0.0111 0.00 0.00 0.01 0.01 0.00 0.00 0.00 0.00 0.54 0.00 0.16 0.00 0.26 0.00 0.00 0.0012 0.00 0.01 0.00 0.05 0.00 0.11 0.00 0.20 0.00 0.21 0.00 0.30 0.00 0.08 0.02 0.0013 0.00 0.00 0.01 0.00 0.03 0.00 0.12 0.03 0.19 0.00 0.16 0.00 0.31 0.06 0.07 0.0214 0.00 0.00 0.00 0.01 0.00 0.01 0.00 0.10 0.00 0.20 0.00 0.25 0.08 0.17 0.11 0.0415 0.00 0.00 0.00 0.01 0.00 0.01 0.00 0.00 0.05 0.00 0.08 0.16 0.23 0.20 0.17 0.0816 0.00 0.00 0.01 0.01 0.00 0.00 0.00 0.01 0.03 0.07 0.00 0.02 0.13 0.26 0.28 0.16
![Page 97: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/97.jpg)
Experimental Results-The decision tree of linguistic-break model
![Page 98: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/98.jpg)
Experimental Results-break labeling example
![Page 99: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/99.jpg)
Summary In base LPM
The prosodic state was introduced to replace conventional high level linguistic information so as to decompose the affections of low-level and high-level linguistic features on speech
Effectiveness on isolating several main factors Greatly reducing the variance of the modeled du
ration/pitch The estimated companding factors conformed
well to the prior linguistic knowledge The prosodic-state labels produced are linguisti
cally meaningful
![Page 100: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8.](https://reader035.fdocuments.net/reader035/viewer/2022062519/5697bffc1a28abf838cc1c40/html5/thumbnails/100.jpg)
Summary In Automatic Prosody Labeling
We propose a new automatic prosody labeling algorithm based on base LPM
We treat both break type and prosodic state as latent variables
The premiere experimental results are both linguistically and acoustically meaningful
Further discussion for each models is needed