Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept,...

18
Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June, 2003

Transcript of Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept,...

Page 1: Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,

Vocal Tract & Lip Shape Estimation

By

MS Shah & Vikash Sethia

Supervisor: Prof. PC PandeyEE Dept, IIT Bombay

AIM-2003, EE Dept, IIT Bombay, 27th June, 2003

Page 2: Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,

2

ABSTRACT:The display of intensity, pitch, and vocal tract shape is considered to be helpful in speech training of the hearing impaired. A speech analysis package is developed in MATLAB for displaying speech waveforms, pitch and energy contours, spectrogram, and areagram (a two-dimensional plot of cross- sectional area of vocal tract as a function of time and position along the tract length). While vocal tract shape estimation works satisfactorily for vowels, during stop closures, the place of closure can not be estimated due to very low signal energy. There is a need to investigate methods for predicting vocal tract shape during stop closure from the shapes estimated on either side of the closure. Work is in progress for lip shape estimation which may find application in video telephony.

Page 3: Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,

3

Introduction Hearing impairment → Lack of auditory feedback during speech production → Speech impairment

Speech training to hearing impaired children by visual (using a mirror) & tactile feedback : some important features and efforts not distinguishable

Speech training aids: Display of articulatory efforts and acoustic parameters: vocal tract and lip shape, pitch, and energy variations

Page 4: Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,

4

Vocal tract shape estimationGeneral model for speech production system

)()()()()( nrnvngnuns Where s (n) = speech signal,

u (n) = glottal excitation,

g (n) = glottis impulse response,v (n) = impulse response of the vocal tract,r (n) = impulse response of radiation from lips.

Cont..

Page 5: Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,

5

Acoustic tube model of the vocal tract

),(),(),( txutxutxu mmm

1

1

mm

mmm AA

AAr

Cont..

),(),(),( txutxuA

ctxp mm

mm

At the mth section,

volume velocity:

pressure:

reflection coefficient:

Page 6: Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,

6

Speech analysis model (Wakita-1973)

Assumption

vocal tract represented as an all-pole filter with

Algorithmic steps:• inverse filtering for error signal with LMS technique• set of simultaneous equations solved with Robinson’s algorithm

for reflection coefficients & relative area values

Cont..

)()()()( nrnvngnh

Page 7: Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,

7

Implementation

■ Set-up: PC with sound card for signal acquisition (sampling rate used: 11.025 k sa/s)

■ “VTAG-1” developed for speech pr. & display Pre-emphasis for 6 dB/octave equalization, analysis

window: 256-sample Hamming with 50% overlap

Robinson’s algorithm for obtaining reflection coefficients & area values

Beizer form algorithm for interpolation of area values

Page 8: Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,

8

VTAG-1 result for all-vowel word /aIje/

Page 9: Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,

9

Synthesized vowels/a/ /u//i/

Page 10: Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,

10

Amplitude/pitch modulated synthesized vowel /a/ Amplitude modulated Pitch modulated Amp. & pitch modulated

Page 11: Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,

11

Spectrograms for V-C-V sequence

/aka/

/aga/

/ata/

/ada/

Page 12: Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,

12

/aka/

/aga/

/ata/

/ada/

Areagram for V-C-V sequence

Page 13: Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,

13

Lip shape estimation

Mouth parameters:

Parameter estimation:

• Pitch tracking : odd harmonics absent for analysis window length = 2 * pitch period• Magnitude spectrum above 4000 Hz clipped to zero• Mean & variance used for generation of predictor surfaces

Page 14: Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,

14

Lip shape estimation results

Pitch and mean vs. variance result (1): synthesized amplitude modulated vowel /u/

Page 15: Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,

15

Pitch and mean vs. variance result (2): synthesized pitch/amplitude modulated vowel /a/

Page 16: Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,

16

Pitch and mean vs. variance result (3): synthesized pitch modulated vowel /i/

Page 17: Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,

17

Summary

■ Analysis & display package VTAG-1 developed for pitch/energy variation, spectrogram, & areagram (2-D plot of v.t. area) to investigate the problems in estimation of vocal tract shape, for use in speech training aid of the hearing impaired children.

Cont.

Page 18: Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,

18

■ Area estimation for vowels: not affected by amplitude & pitch variation

■ Area estimation during stop closure: place of closure can not be estimated from analysis result during stop closure

■ Further work: Investigate methods for predicting vocal tract area during stop closure from the areas estimated on either side of closure Implement algorithm for generation of predictor surfaces for extraction of lip shape estimation parameters