74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic...
-
Upload
henry-cole -
Category
Documents
-
view
233 -
download
1
Transcript of 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic...
![Page 1: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649f345503460f94c50b8e/html5/thumbnails/1.jpg)
74.419 Artificial Intelligence 2004 Speech & Natural Language Processing
• Speech Recognition• acoustic signal as input
• conversion into written words
• Natural Language Processing• written text as input
• sentences (well-formed or not)
• Spoken Language Understanding• analysis of spoken language (transcribed speech)
![Page 2: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649f345503460f94c50b8e/html5/thumbnails/2.jpg)
Speech & Natural Language Processing
Areas in Speech Recognition• Signal Processing• Phonetics• Word RecognitionAreas in Natural Language Processing• Morphology• Grammar & Parsing (syntactic analysis)• Semantics• Pragamatics• Discourse / Dialogue• Spoken Language Understanding
![Page 3: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649f345503460f94c50b8e/html5/thumbnails/3.jpg)
Speech Production & Reception
Sound and Hearing• change in air pressure sound wave• reception through inner ear membrane /
microphone• break-up into frequency components: receptors
in cochlea / mathematical frequency analysis (e.g. Fast-Fourier Transform FFT) → Frequency Spectrum
• perception/recognition of phonemes and subsequently words (e.g. Neural Networks, Hidden-Markov Models)
![Page 4: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649f345503460f94c50b8e/html5/thumbnails/4.jpg)
![Page 5: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649f345503460f94c50b8e/html5/thumbnails/5.jpg)
![Page 6: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649f345503460f94c50b8e/html5/thumbnails/6.jpg)
Phoneme Recognition:HMM, Neural Networks
Phonemes
Acoustic / sound waveFiltering, Sampling Spectral Analysis; FFT
Frequency Spectrum
Features (Phonemes; Context)
Grammar or Statistics Phoneme Sequences / Words
Grammar or Statistics for likely word sequences
Word Sequence / Sentence
Speech Recognition
Signal Processing / Analysis
![Page 7: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649f345503460f94c50b8e/html5/thumbnails/7.jpg)
Speech Signal
Analog-Digital Conversion of acoustic signal → Sampling in Time Frames = “windows”
Characteristics of a Speech Signal formants - strong frequency components;
characterize e.g. vowels, gender of speaker; dark stripe in spectrum
pitch – fundamental frequency (baseline for higher frequency harmonics like formants)
place of articulation (recognition model based on model of vocal tract)
change in frequency distribution
![Page 8: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649f345503460f94c50b8e/html5/thumbnails/8.jpg)
![Page 9: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649f345503460f94c50b8e/html5/thumbnails/9.jpg)
Video of glottis and speech signal in lingWAVES (from http://www.lingcom.de)
![Page 10: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649f345503460f94c50b8e/html5/thumbnails/10.jpg)
![Page 11: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649f345503460f94c50b8e/html5/thumbnails/11.jpg)
Speech Signal
Analog-Digital Conversion of Acoustic Signals
→ Sampling
Analysis of Signal in Time Frames (“windows”)
Characteristics of a Speech Signal formants - strong frequency components; characterize
e.g. vowels, gender of speaker; dark stripe in spectrum pitch – fundamental frequency (baseline for higher
frequency harmonics like formants) place of articulation (recognition model based on model
of vocal tract) change in frequency distribution
![Page 12: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649f345503460f94c50b8e/html5/thumbnails/12.jpg)
![Page 13: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649f345503460f94c50b8e/html5/thumbnails/13.jpg)
![Page 14: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649f345503460f94c50b8e/html5/thumbnails/14.jpg)
![Page 15: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649f345503460f94c50b8e/html5/thumbnails/15.jpg)
![Page 16: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649f345503460f94c50b8e/html5/thumbnails/16.jpg)
Speech Recognition Characteristics
Speech Recognition vs. Speaker Identification
Speaker-dependent vs. speaker independent
Single word vs. continuous speech
Large vs. small vocabulary
![Page 17: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649f345503460f94c50b8e/html5/thumbnails/17.jpg)
![Page 18: 74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649f345503460f94c50b8e/html5/thumbnails/18.jpg)
Additional References
Hong, X. & A. Acero & H. Hon: Spoken Language Processing. A Guide to Theory, Algorithms, and System Development. Prentice-Hall, NJ, 2001.