Income producing activities. What Activities are Income Producing?
Producing Emotional Speech Thanks to Gabriel Schubiner.
-
Upload
adela-hudson -
Category
Documents
-
view
226 -
download
3
Transcript of Producing Emotional Speech Thanks to Gabriel Schubiner.
![Page 1: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/1.jpg)
Producing Emotional
Speech
Thanks to Gabriel Schubiner
![Page 2: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/2.jpg)
Papers
Generation of Affect in Synthesized Speech
Corpus-based approach to synthesis
Expressive visual speech using talking head
Demos
Affect Editor Quiz/Demo
Synface Demo
![Page 3: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/3.jpg)
Affect in SpeechGoals
Addition of Emotion to Synthetic speech
Acoustic Model
Typology of parameters of emotional speech
Quantification
Addresses problem of expressiveness
What benefit is gained from expressive speech?
![Page 4: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/4.jpg)
Emotion Theory/Assumptions
Emotion -> Nervous System -> Speech Output
Binary distinction
Parasympathetic vs Sympathetic
based on physical changes
universal emotions
![Page 5: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/5.jpg)
Approaches to Affect
Generative
Emotion -> Physical -> Acoustic
Descriptive
Observed acoustic params imposed
![Page 6: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/6.jpg)
Descriptive Framework
4 Parameter groups
Pitch
Timing
Voice Quality
Articulation
Assumption of independence
How could this affect design and results?
![Page 7: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/7.jpg)
PitchTiming
Accent Shape
Average Pitch
Contour Slope
Final Lowering
Pitch Range
Reference Line
Exaggeration (not used)
Fluent Pauses
Hesitation Pauses
Speech Rate
Stress Frequency
Stressed Stressable
![Page 8: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/8.jpg)
Voice Quality Articulation
Breathiness
Brilliance
Loudness
Pause Discontinuity
Pitch Discontinuity
Tremor
Laryngealization
Precision
![Page 9: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/9.jpg)
Implementation
Each parameter has scale
Each scale is independent
from other parameters
between positive and negative
![Page 10: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/10.jpg)
Implementation
Settings grouped into preset conditions for each emotion
based on prior studies
![Page 11: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/11.jpg)
Program Flow: Input
Emotion -> parameter representation
Utterance -> clauses
Agent, Action, Object, Locative
Clause and lexeme annotations
Finds all possible locations of affect and chooses whether or not to use
![Page 12: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/12.jpg)
Program Flow
Utterance -> Tree structure -> linear phonology
“compiled” for specific synthesizer with software to simulate affects not available in hardware
![Page 13: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/13.jpg)
![Page 14: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/14.jpg)
Perception
30 Utterances
5 sentences * 6 affects
Forced choice of one of six affects
magnitude and comments
![Page 15: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/15.jpg)
Elicitation Sentences
Intro
I’m almost finished
I’m going to the city
I saw your name in the paper X
I thought you really meant it
Look at that picture
![Page 16: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/16.jpg)
Pop Quiz!!!
![Page 17: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/17.jpg)
Pop Quiz Solutions
I’m almost finishedDisgust : Surprise : Sadness : Gladness : Anger : Fear
I’m going to the citySurprise : Gladness : Anger : Disgust : Sadness : Fear
I thought you really meant itAnger : Disgust : Gladness : Sadness : Fear : Surprise
Look at that pictureAnger : Fear : Disgust : Sadness : Gladness : Surprise
![Page 18: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/18.jpg)
Resultsapprox 50% recognition rate
91% sadness
![Page 19: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/19.jpg)
![Page 20: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/20.jpg)
Conclusions
Effective?
Thoughts?
![Page 21: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/21.jpg)
Corpus-based Approach to
Expressive Speech Synthesis
![Page 22: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/22.jpg)
Corpus
Collect utterances in each emotion
emotion-dependent semantics
One speaker
Good news, Bad news, Question
![Page 23: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/23.jpg)
Model: Feature Vector
FeaturesLexical stressPhrase-level stressDistance from beginning of phraseDistance from end of phrasePOSPhrase-typeEnd of syllable pitch
![Page 24: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/24.jpg)
Model: Classification
Predicts F0
5 syllable window
Uses feature vector to predict observation vector
observation vector: log(p), Δp
p = end of syllable pitch
Decision Tree
![Page 25: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/25.jpg)
Model: Target Duration
Similar to predicting F0
build tree with goal of providing Gaussian at leafs
Use mean of class as target duration
discretization
![Page 26: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/26.jpg)
ModelsUses acoustic analogue of n-grams
captures sense of contextcompared to describing full emotion as sequence
compare to Affect EditorUses only F0 and length (comp. A E)Include information about from which utterance the features are derived
intentional bias, justified?
![Page 27: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/27.jpg)
Model: SynthesisData tagged with original expression and emotion
expression-cost matrix
noted trade-off:
emotional intensity vs. smoothness
Paralinguistic events
![Page 28: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/28.jpg)
SSML
Compare to Cahn’s typology
Abstraction layers
![Page 29: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/29.jpg)
Perception Experiment
Distinguish same utterance spoken with neutral and affected prosody
Semantic content problematic?
![Page 30: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/30.jpg)
Results
Binary decision
Reasonable gain over baseline?
![Page 31: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/31.jpg)
Conclusion
Major contributions?
Paths forward?
![Page 32: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/32.jpg)
Synthesis of Expressive Visual Speech on a
Talking Head
![Page 33: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/33.jpg)
< Not these Talking Heads...
>
![Page 34: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/34.jpg)
Synthesis Background
Manipulation of video imagesVirtual model with deformation parametersSynchronized with time-aligned transcriptionArticulatory Control Model
Cohen & Massaro (1993)
![Page 35: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/35.jpg)
Data
Single actor
Given specific emotion as instruction
6 emotions + neutral
![Page 36: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/36.jpg)
Facial Animation Parameters
Face independent
FAP Matrix * scaling factor + position0
Weighted deformations of distance between vertices and feature point
![Page 37: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/37.jpg)
Modeling
Phonetic segments assigned target parameter vector
temporal blending over dominance functions
Principal components
![Page 38: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/38.jpg)
ML
Separate models for each emotion
6:1 training:testing ratio
models -> PC traj -> FAP traj * emotion param matrix
![Page 39: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/39.jpg)
Results
More extreme emotions easier to perceive
73% sad, 60% angry, 40% sad
![Page 40: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/40.jpg)
Synface Demo
![Page 41: Producing Emotional Speech Thanks to Gabriel Schubiner.](https://reader035.fdocuments.net/reader035/viewer/2022062516/56649d845503460f94a6bb45/html5/thumbnails/41.jpg)
Discussion
Changes in approach from Cahn to Eide
Production compared to Detection