Optical Phonetics and Visual Perception of Lexical and Phrasal Stress in English Patricia Keating,...
-
date post
19-Dec-2015 -
Category
Documents
-
view
213 -
download
0
Transcript of Optical Phonetics and Visual Perception of Lexical and Phrasal Stress in English Patricia Keating,...
Optical Phonetics and Visual Perception of Lexical and Phrasal
Stress in English
Patricia Keating, Marco Baroni,
Sven Mattys, Rebecca Scarborough,
Abeer Alwan, Edward T. Auer,
Lynne E. Bernstein
IntroductionPhrasal (focal) stress can be perceived visually
above chance, though intonation cannot (e.g. Bernstein et al. 1989).
Many studies have shown that stress is marked by longer, larger, and faster movements of jaw, lips, and tongue; sometimes by eyebrow movements; and acoustically mainly by f0 (pitch accents), lengthening, and loudness.
Jaw lowering and acoustic duration are known to correlate with auditory perception of stress, and eyebrow movement with visual perception.
Optical phonetics of stress
• Extents, durations, and velocities of movements of lips, chin, and eyebrows, and mouth opening, are all potentially visible to perceivers.
• Our production (optical) measures are position and movement measures of visible fleshpoints.
This study• Production experiment: Do speakers show
any consistent optical correlates of phrasal and lexical stresses?
• Perception experiment: Are there differences in the visual intelligibility of phrasal and lexical stress, and of the different speakers?
• Production-perception comparison: Which, if any, of the optical production correlates account for visual intelligibility?
Production methodsLexical stress materials
• 4 minimal pairs– DIScharge / disCHARGE
– DIScount / disCOUNT
– PERvert / perVERT
– SUBject / subJECT
• 4 non-minimal pairs– DEbit / casSETTE
– INstance / conVINCE
– BUSiness / subMIT
– COUrage / gaZELLE
• Minimal pairs read as given, and also reiterantly
• Non-minimal pairs only reiterantly
• 2 reiterant syllables– “buh” = [bʌ] / [bƏ]
– “fer” = [fɝ] / [fɚ]
– differ in mouth opening
• TOTAL 40 words
Production methodsPhrasal stress materials
“So TOMMY gave Timmy a song from Debby.”“So Tommy gave TIMMY a song from Debby.”“So Tommy gave Timmy a song from DEBBY.”“So Tommy gave Timmy a song from Debby.”
• narrow (contrast) accent on one name or “neutral” broad focus
• these 4 stress conditions x 6 combinations of names = 24 sentences
• sentences not read reiterantly
Production methodsBoth stress contrasts involve nuclear accent
• Lexical stress items read in isolation• Phrasal stress items read with narrow focus to
show contrast and/or emphasis
H* L-L% H* L-L%
…a song from TIMMY DIScount
(phrasal stress) (lexical stress)
Production MethodsSpeakers
• 3 male Californians differing in perceptually-determined visual intelligibility for segments– low-medium = Sp-LO– medium = Sp-MID– high = Sp-HI
• VISUAL INTELLIGIBILITY SCORING:– speakers video-recorded reading 320 (other)
sentences– 8 expert deaf lipreaders transcribed sentences,
yielding % correct visual intelligibility scores
Production methodsRecording set-up and procedure
• Videorecording – professional-quality– teleprompter under
camera
• DAT recording
• Facial motion using Qualisys™ system – 120 Hz SR – 20 small passive
retroreflectors – three cameras– infrared flash– 3D position for each
retroreflector
•Items blocked by stress location•Two tokens of each item
eyebrow markers head marker
chin marker
Production methodsFacepoint marker locations and measurements
lip markers
• Left eyebrow displacement
• Head displacement
• Interlip maximum distance
• Interlip opening displacement
• Interlip closing displacement
• Lower lip opening peak velocity
• Lower lip closing peak velocity
• Chin opening displacement
• Chin opening peak velocity
• Chin closing displacement
• Chin closing peak velocity
Production methodsData analysis
• Prosody of audio speech signals checked by two transcribers (some small differences found between prompted and produced stresses, but these differences generally do not affect analyses presented here)
• Here, only tokens used in perception study analyzed (1 of the 2 tokens of each item)
• Effects of stress on the 11 facepoint marker measurements tested by (factorial) ANOVAs
Production resultsOverview
• Stress is well-marked by these measures
• Lexical vs. phrasal stress: more significantly different measures, and larger differences between stressed and unstressed, with phrasal stress than with lexical
• Reiterant vs. nonreiterant words: both sets show stress effect
Production resultsSignificant differences due to Lexical stress
Interlip Opening Displacement all reiterant words
syllable 1 syllable 2
• 5 of 11 measures distinguish stress - 3 opening gesture measures e.g.Head, and Interlip Max. Distance
• Generally holds across speakers and real vs. reiterant
Production resultsSignificant differences due to Phrasal stress
• All 11 measures distinguish stress, e.g.
• Chin and eyebrow measures are more consistent across speakers
00.10.20.30.40.50.60.70.80.9
1
1st name 2nd name 3rd name
Chin Closing Peak Velocity
accented unaccented
Production resultsSignificant Head and Eyebrow movements
Stress in words• Head moves, eyebrow not
Stress in phrases
• Head down
(2 speakers)
• Eyebrow up
head movement
eyebrow movement
So TIMMY gave Tommy a song from Debby
Production resultsAn aside: Eyebrows and F0
• 40 sentences from the phrasal stress corpus
• F0 from audio, and right and left eyebrow positions, at 12 ms intervals
• Significant correlations between eyebrows and F0, but accounting for little variance (only 1-4%)
Perception methods • 1 token of each item from production corpus (120
words, 72 sentences), each presented twice (384 total trials)
• 16 hearing perceivers (not screened for lipreading ability)
• Test video clip (no sound) on right monitor, clickable response choices on left monitor
• Lexical stress: Response choices were pairs of real words, even for reiterant items
• Sentences: Click on one name, or on “NoStress”
Perception resultsOverview
• Stress is perceived above chance
• Lexical vs. phrasal stress: phrasal stress is perceived better
• Reiterant vs. nonreiterant words: perceived equally well
Perception results Overall results, all above chance
0
20
40
60
80
100
sentences reit words non-reit words
Chance 25%
Chance 50%%correct
N=2304 N=3072 N=768
Perception resultsLexical vs. phrasal stress
all lexicalphrasal
Individual subjects’ % correct relative to levels that are significantly above chance: phrasal perceived better (significantly so by paired t-test)
Perception resultsLexical stress
All lexical speech conditions equally-well perceived overall:
•Reiterant & non•buh & fer•Minimal & non
0
20
40
60
80
100
buh fer non-reit
Minimal pairs non-minimal
% correct
Perception results Speakers: lexical stress
• All speakers’ lexical stress perceived above chance (50%)
• Sp-LO perceived better on reiterant words
0
20
40
60
80
100
Sp-LO Sp-MID Sp-HI
% correct
non-reiterant reiterant minimalreiterant non-minimal
Perception resultsPhrasal stress
• 3 focal positions perceived equally well, and correct above chance for almost every item
• Responses to Neutral condition at chance
0
20
40
60
80
100
1 2 3 Neutral
% correct
Position of stress in sentence
Perception results Speakers: phrasal stress
• All speakers’ phrasal stress perceived above chance (25%)
• Sp-MID perceived less accurately
• Sp-LO best for Neutral condition (not shown here)
0
20
40
60
80
100
Sp-LO Sp-MID Sp-HI
% correct
Production-perception comparisons: Speaker differences
• Prosodic intelligibility: Sp-LO highest for words, Neutral sentences; Sp-MID lowest for sentences
• Re production: Sp-LO shows larger lip differences than Sp-MID on sentences, and largest Chin closing displacement on words (but Sp-HI has largest head movement differences)
• Unrelated to segmental intelligibility: compare above with speakers’ names LO-MID-HI, which reflect their segmental intelligibility
Production-perception comparisons:Correlational analyses of sentences
• Tested relations between production measures and % correct perception of phrasal stresses
• 10 of 11 measures correlated significantly with perception, with chin measures accounting for the most variance (up to 40%)
• Only Interlip maximum distance (mouth opening) did not correlate with perception
Production-perception comparisons:Correlational analyses of sentences
• Partial correlations (controlling for contributions of various lip measures) show independent contributions to perception of– Chin opening displacement (15% of variance)– Chin peak opening velocity (11%)– Lower lip peak opening velocity (11%)
• Closing gestures generally make no independent contributions to perception
Summary• Lexical and phrasal stress are visually
perceived above chance• Phrasal stress is marked by more and larger
production differences, and perceived better• Chin opening accounts for most variance in
perception of phrasal stress• Speakers’ visual intelligibility for prosody
does not correspond to segmental