74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic...

74.419 Artificial Intelligence 2004 Speech & Natural Language Processing

• Speech Recognition• acoustic signal as input

• conversion into written words

• Natural Language Processing• written text as input

• sentences (well-formed or not)

• Spoken Language Understanding• analysis of spoken language (transcribed speech)

Speech & Natural Language Processing

Areas in Speech Recognition• Signal Processing• Phonetics• Word RecognitionAreas in Natural Language Processing• Morphology• Grammar & Parsing (syntactic analysis)• Semantics• Pragamatics• Discourse / Dialogue• Spoken Language Understanding

Speech Production & Reception

Sound and Hearing• change in air pressure sound wave• reception through inner ear membrane /

microphone• break-up into frequency components: receptors

in cochlea / mathematical frequency analysis (e.g. Fast-Fourier Transform FFT) → Frequency Spectrum

• perception/recognition of phonemes and subsequently words (e.g. Neural Networks, Hidden-Markov Models)

Phoneme Recognition:HMM, Neural Networks

Phonemes

Acoustic / sound waveFiltering, Sampling Spectral Analysis; FFT

Frequency Spectrum

Features (Phonemes; Context)

Grammar or Statistics Phoneme Sequences / Words

Grammar or Statistics for likely word sequences

Word Sequence / Sentence

Speech Recognition

Signal Processing / Analysis

Speech Signal

Analog-Digital Conversion of acoustic signal → Sampling in Time Frames = “windows”

Characteristics of a Speech Signal formants - strong frequency components;

characterize e.g. vowels, gender of speaker; dark stripe in spectrum

pitch – fundamental frequency (baseline for higher frequency harmonics like formants)

place of articulation (recognition model based on model of vocal tract)

change in frequency distribution

Video of glottis and speech signal in lingWAVES (from http://www.lingcom.de)

Speech Signal

Analog-Digital Conversion of Acoustic Signals

→ Sampling

Analysis of Signal in Time Frames (“windows”)

Characteristics of a Speech Signal formants - strong frequency components; characterize

e.g. vowels, gender of speaker; dark stripe in spectrum pitch – fundamental frequency (baseline for higher

frequency harmonics like formants) place of articulation (recognition model based on model

of vocal tract) change in frequency distribution

Speech Recognition Characteristics

Speech Recognition vs. Speaker Identification

Speaker-dependent vs. speaker independent

Single word vs. continuous speech

Large vs. small vocabulary

Additional References

Hong, X. & A. Acero & H. Hon: Spoken Language Processing. A Guide to Theory, Algorithms, and System Development. Prentice-Hall, NJ, 2001.

74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic...

Documents

Transcript of 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic...