Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective

11
Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science University of Houston

description

Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective. Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science University of Houston. Motivation. Avatars have been increasingly used in Human-Computer Interfaces - PowerPoint PPT Presentation

Transcript of Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective

Page 1: Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective

Perceptual Analysis of Talking Avatar Head Movements: A Quantitative

Perspective

Xiaohan Ma, Binh H. Le, and Zhigang Deng

Department of Computer Science University of Houston

Page 2: Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective

Motivation

Avatars have been increasingly used in Human-Computer Interfaces– Teleconferencing, computer-mediated

communication, distance education, online virtual worlds, etc.

Human-like avatar gestures influence human perception significantly– Facial expressions– Hand gestures– Lip movements– head movements

• One of the crucial visual cues to facilitate engaging social interaction and communication

Page 3: Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective

How do talking head movements affect perception?

Page 4: Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective

Our Quantitative Perspective

Uncover how talking avatar head movements affect human perception– User-rated head

animations’ naturalness– Joint features extracted

from head animations (with audio)

• Acoustic speech features• Head motion patterns

– Quantitatively analyze the association between extracted joint features and user ratings

Joint Features

Perception (rating)

Analysis of the association

Talking Avatar Head Animations

User evaluation

Featureextraction

Page 5: Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective

Data Acquisition and Processing

Acquisition of the audio-head motion dataset– Head & speech were recorded

simultaneously– Head motion: optical motion

capture system (120 Hz)– Speech: microphone (48 kHz)

Processing of the captured audio-head motion dataset– Head motion: 3 Euler rotation

angles per frame– Speech: pitches and RMS

energy– Aligned head & speech

datasets to the same frame rate (24 FPS)

Y-axis rotation

X-axis rotation

Z-axis rotation

Page 6: Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective

Subjective Evaluation Using the captured dataset,

we generated 60 head animation clips– Based on 15 recorded speech

clips– 4 different audio-head motion

generation techniques– Mosaic on the mouth region

User study– 18 participants– Ages: 23~28– Gender: female (16.67%),

male (83.33%)– Language: fluent English-

speakers– User rating: 1~5

Original data Play back the captured

HMMs [Busso et al. 05]

Mood-Swings [Chuang et al. 05]

Random Randomly generated

Page 7: Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective

Speech-Head Motion Features and Perception

Measure the correlation between head motion and speech features– Canonical Correlation Analysis

(CCA)

Pitch-Head motion and human perception– Computed Pearson coefficient:

0.731

Energy-Head motion and human perception– Seem random, definitely not

linear.

Page 8: Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective

Speech-Head Motion Features and Perception

Implications for CHI– Validate the tight coordination between speech and head

motion: Precise timing in generation is required• Delayed head movement generation may significantly degrade

human perception

– An approximate linear correlation between user ratings and CCA for Pitch-head motion

• Prosody driven head motion synthesis could be fundamentally sound.

– No a simple linear correlation between user ratings and CCA for RMS Energy-head motion

• RMS energy may vary among sentences

Page 9: Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective

Frequency-Domain Analysis of Head Motion

Frequency-domain analysis of head motion– Head motion: rotation angles– Frequency spectrum: FFT

transform applied to the head rotation angle vector

Association between head motion spectrum and human perception– With squared magnitude less

than 5 degree.

- X-axis: average user rating (2.1 ~ 4.2) - Y-axis: the squared magnitude of three Euler angles in the head rotation (0 ~ 5 degree) - Z-axis: Frequency spectrum (0 ~ 19 Hz)

X-axis

Y-axis

Z-axis

Page 10: Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective

Frequency-Domain Analysis of Head Motion

Key observations– Highly rated: low-frequency

• Natural head motion: less than 10 Hz

– Lowly rated: high-frequency• Typically lager than 12 Hz• With a small range of head movements

Implications for HCI– The comfortable head motion

frequency zone: 0~12 Hz – Smooth post-processing for head

motion generations of talking avatar• Smooth: Post-process the synthesized head motions• Simply crop the high frequency part

from the synthesized head motions

Low-frequency patterns

High-frequency patterns

Page 11: Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective

Conclusion and Future Work Summary of our findings

– The coupling between the pitch and head motion has a strong linear correlation with human perception

– The perceived-natural head motions mainly consist of low-frequency motion components and those high-frequency components (>12 Hz) will damage human perception significantly.

Future work– Multi-party conversation scenario– Analysis of other fundamental speech features: pause,

repetitions, etc.

Acknowledgments: This work is in part supported by NSF IIS-0914965, Texas Norman Hackerman Advanced Research 003652-0058-2007, and research gifts from Google and Nokia.