SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.

17
SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University

Transcript of SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.

Page 1: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.

SSML Extension for Expressive Mandarin TTS

Shuang LiHongwu YangLianhong Cai

Tsinghua University

Page 2: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.

Outline

MotivationMotivation

Expression of SpeechExpression of Speech

Proposed SSML extension Proposed SSML extension

ConclusionConclusion

Page 3: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.

Motivation(1/3)

• Sentences with the same text can be expressed with different styles, emotions and moods

• Current tts system lacks variability

Page 4: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.

Motivation(2/3)• Current SSML cannot define speaking style, em

otion and mood– Good news: 生日快乐 “ Happy birthday”

expressed in happiness (emotion)

– Bad news: 张总去世了 “ Director Zhang passed away” expressed in sadness (emotion)

– Information provider: 飞往纽约的飞机将要起飞 “Flight for New York is going to take off”:

Expressed in a mild mood

– Dialog: 是中国队赢了吗?“Did Chinese team win?”: Emphasize “Chinese”, with interrogative mood

• Current SSML hard to show the difference between the expressions above

Page 5: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.

Motivation(3/3)

emotion

Positive, neutral, negative

style

news

Sports comment

dialog

Info providing

……

characteristic

Expressive speech

Emotion, style and characteristic are relatively independent but cannot be separatedCharacteristic and style: relatively stable and global featuresEmotion: short-time, local feature

Expressing pattern

No tag

Phisiological/social characteristics

Voice tag

Phisiological reactations

No tag

With different speaking stylesRepresenting speaker’s attitude, purpose and emotionMore harmonious with the circumstance

Page 6: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.

Outline

MotivationMotivation

Expression of SpeechExpression of Speech

Proposed SSML extension Proposed SSML extension

ConclusionConclusion

Page 7: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.

Expression of SpeechStyle : speaking style( dialog, news, information providing…)Mood : mood( request, acquisition, affirmation, apology…) Emotion : emotional activities( neutral, negative, positive)

Mood Emotion

Intonation Emphasis

Speaking RateBreak

Spectral Features

Duration Energy Pitch

Style

Page 8: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.

Hierarchical framework of Prosody

• Break level– B0: no break– B1: Syllable – B2: Prosodic word– B3: Prosodic Phrase– B4: Breath Group– B5: Prosodic Group

• Chiu-yu Tseng,et al. Fluent speech prosody: Framework and modeling. Speech Communication, 46(2005) 284-399

Page 9: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.

我永远忘不了 <B3/25ms> 一张对日抗战时的新闻照片, <B3/507ms> 轰炸后的废墟焦土上,<B3/272ms> 一个衣不蔽体、 <B3/384ms> 满身尘土灰烟的幼儿 <B3/100ms> 坐在地上 <B3/75ms> 无助的大哭着。 <B5/1110ms> 那是一再令我热泪盈眶的镜头。 <B3/507ms> 新闻摄影中的战争传真 <B3/276ms> 已不能只称是照片了。 <B5/802ms>

• From Chiu-yu Tseng, report in Beijing University, Oct 11, 2005

Page 10: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.

Outline

IntroductionIntroduction

Expression of SpeechExpression of Speech

Proposed SSML extension Proposed SSML extension

ConclusionConclusion

Page 11: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.

Proposed tag ( 1/2 )• Utterance: prosodic group, expressing a complete meaning

– Attributes:Style : speaking style

Value :News, Reading, Information provider, dialog, etc

Emotion: speaking emotion

Value :Happy 、 Sad 、 Angry 、 Calm 、 Despair, etc

+1 for positive,0 for neutral, -1 for negative

mood : speaking mood

Value :given, request, acquisition, affirmation,apology, etc

Page 12: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.

Proposed tag ( 2/2 )• BG: breath group

– attributes:intonation :

Value : indicative, interrogative, imperative

• PPh: prosodic phrase

• PW: prosodic word

• Syl: Syllable

Page 13: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.

Some examples(1/3)• <?xml version="1.0"?>• <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"• xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"• xsi:schemaLocation="http://www.w3.org/2001/10/synthesis• http://www.w3.org/TR/speech-synthesis/synthesis.xsd"• xml:lang=“zh-CN">• <utterence style=”information provide” emotion=”-1” mood=”apology”>• <bg intonation=” indicative”>• <pph>1121 次航班 (Flight 1121)</pph>• <pph> 延误 (has been delayed )• <pw><emphasis level=”strong”>1 小时 (for an hour )</emphasis></pw></pph>• <break strength=”medium”, time=”215ms”/>• <pph> 请旅客们到 (Please go to )</pph>• <pw><emphasis=”moderate”>G6</emphasis=”moderate”></pw>• <pph> 候机厅等候 (the waiting room)</pph>• </bg>• </utterence>• </speak>

Page 14: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.

Some examples(2/3)• <?xml version="1.0"?>• <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"• xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"• xsi:schemaLocation="http://www.w3.org/2001/10/synthesis• http://www.w3.org/TR/speech-synthesis/synthesis.xsd"• xml:lang=“zh-CN">• <utterence style=”dialog” emotion=”neutral” mood=”acquisition”>• <bg intonation=”interrogative”>• <pph><pw>• <emphasis level=”strong”> 张威 (Zhang Wei )</emphasis>• </pw></pph>• <break strength=medium time=75ms/>• <pph> 担心肖荫开车发晕 (is afraid of Xiao Yin being dizzy when driving

)</pph>• </bg>• </utterence>• </speak>

Page 15: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.

Some examples(3/3)• <?xml version="1.0"?>• <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"• xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"• xsi:schemaLocation="http://www.w3.org/2001/10/synthesis• http://www.w3.org/TR/speech-synthesis/synthesis.xsd"• xml:lang=“zh-CN">• <utterence style=”dialog” emotion=”angery”>• <bg intonation=”interrogative”>• <prosody rate=”x-fast”> 难道不是你的错吗? (Isn’t it your fault? )• <break strength=”medium” time=”520ms”/>• </bg>• <bg intonation=”imperative”>• 以后你小心一点 (Be careful next time)• </bg>• </utterence>• </speak>

Page 16: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.

Outline

MotivationMotivation

Expression of SpeechExpression of Speech

Proposed SSML extension Proposed SSML extension

ConclusionConclusion

Page 17: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.

Conclusion & question?

• 5 elements for hierarchic prosodic structure– utterance, bg, pph, pw, syl

• 3 expressive attributes for utterance– style– emotion– mood

• 1 intonation attributes for bg– intonation