Audio Visual Speech Recognition

Post on 02-Aug-2015

81 views 3 download

Tags:

description

Intro about utilizing image processing technology in speech recognition

Transcript of Audio Visual Speech Recognition

Image Processing Techniques for Speech Recognition

Presented by

Amr Medhat, Mostafa Fathy and Sameh Serag

Supervised by

Dr. Magda Fayek

Date: 5-5-2004

AgendaAgenda

• Introduction

• Audio Visual Modeling

• Spectrogram Reading

• Spectrogram Filtering

IntroductionIntroduction

• What is Speech Recognition?

• Speech Recognition Problems– noise– inter and intra speaker variation– continuous: no boundaries between words

• Image Processing is a possible solution

Audio Visual Speech ModelingAudio Visual Speech Modeling

• Reading speech from facial and lip movements.

• Categorizing mouth shapes visual phonemes (visemes)phoneme: the smallest distinctive unit of speech sound

• Why?– distinguish between confusing phonemes

(like f, s and m, n)– improve recognition performance in noisy

environments.

DemoDemo

Ready for the challenge ?• Listen to this audio and try to understand

the speech content: vox_mix[1].mov• Listen to speech with video image:

dig_tranexp[1].mov• Did you understand the content? Get a

prize from IBM• Play the answer: vox_clean[1].mov

(5893642)

How?How?

Audio Feature Extraction

Visual Feature Extraction

Audio-Visual Integration

• Geometric lip dimensions– Lip shape:height/width of the inner/outer lip

• Visibility of the tongue/teeth

Visual FeaturesVisual Features

AudioAudio--Visual IntegrationVisual Integration

• Feature Fusion

• Synchronization Problem

• Use low-resolution image

SpectrogramsSpectrograms

• A Spectrogram:– Translation of speech into the visual

domain

frequency

time

Spectrogram ReadingSpectrogram Reading

Waveform and Spectrogram of the word: "phonetician"

Spectrogram FilteringSpectrogram Filtering

Required:

How?

Using:Morphological Processing

Morphological ProcessingMorphological Processing

• Based on the theory of Mathematical Morphology ?!!

• Stresses the role of shape in image preprocessing used for region identification.

• Important Morphological operations:– Erosion– Dilation– Opening– Closing

Erosion & DilationErosion & Dilation

• Erosion: the meaning– Used to shrink objects.

• Dilation: the meaning– Dual of erosion.– Used to fill small gaps or valleys between shapes

• Both are irreversible

Opening & ClosingOpening & Closing

• Both used for smoothing an object contourcontour• Opening: Erosion followed by dilation

– smoothes from the inside of the object contour separate objects.

• Closing: Dilation followed by erosion– smoothes from the outside of the object contour fill in

small halls.

Erosion

Erosion

Dilation

Dilation

Spectrogram FilteringSpectrogram Filtering

thresholdingconvert

dilation

erosion

ANDing

convert