SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING 4110-4890-5110-7960 NSCI...

13
SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING 4110-4890-5110-7960 NSCI 4110-4891-6110 Harry Howard Tulane University

Transcript of SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING 4110-4890-5110-7960 NSCI...

Page 1: SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING 4110-4890-5110-7960 NSCI 4110-4891-6110 Harry Howard Tulane University.

SPEECH RECOGNITION 2DAY 15 – SEPT 30, 2013

Brain & Language

LING 4110-4890-5110-7960

NSCI 4110-4891-6110

Harry Howard

Tulane University

Page 2: SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING 4110-4890-5110-7960 NSCI 4110-4891-6110 Harry Howard Tulane University.

2

Course organization• The syllabus, these slides and my recordings are

available at http://www.tulane.edu/~howard/LING4110/.• If you want to learn more about EEG and neurolinguistics,

you are welcome to participate in my lab. This is also a good way to get started on an honor's thesis.

• The grades are posted to Blackboard.

9/30/13 Brain & Language, Harry Howard, Tulane University

Page 3: SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING 4110-4890-5110-7960 NSCI 4110-4891-6110 Harry Howard Tulane University.

REVIEW

9/30/13 Brain & Language, Harry Howard, Tulane University 3

Page 4: SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING 4110-4890-5110-7960 NSCI 4110-4891-6110 Harry Howard Tulane University.

4

ReviewPitch shows fundamental frequency (F0)

Spectrogram shows formants (F1-3)

Sound wave

9/30/13 Brain & Language, Harry Howard, Tulane University

Page 5: SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING 4110-4890-5110-7960 NSCI 4110-4891-6110 Harry Howard Tulane University.

SPEECH RECOGNITIONIngram §5

9/30/13 Brain & Language, Harry Howard, Tulane University 5

Page 6: SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING 4110-4890-5110-7960 NSCI 4110-4891-6110 Harry Howard Tulane University.

6

• use Praat in class

9/30/13 Brain & Language, Harry Howard, Tulane University

Page 7: SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING 4110-4890-5110-7960 NSCI 4110-4891-6110 Harry Howard Tulane University.

Brain & Language, Harry Howard, Tulane University 79/30/13

Vowel articulation• Tongue height: high, (mid), low

• put your hand under your jaw and say the vowel of:• mat, met, mate, mitt, meat• meat, mitt, mate, met, mat

• Tongue advancement: front, central, back• Lip configuration: rounded, neutral, retracted

Page 8: SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING 4110-4890-5110-7960 NSCI 4110-4891-6110 Harry Howard Tulane University.

Brain & Language, Harry Howard, Tulane University 89/30/13

Vowel description

Front Central Back

Highi

ɪu

ʊ

(Mid)

e

ɛ

ɝə

ɚ

ʌ

o

ɔ

Lowæ a

Retracted Neutral Rounded

Page 9: SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING 4110-4890-5110-7960 NSCI 4110-4891-6110 Harry Howard Tulane University.

Brain & Language, Harry Howard, Tulane University 9

Sample vowel spectrograms

9/30/13

• Wide band spectrograms of the vowels of American English in a /b__d/ context. • Top row, left to right: [i, ɪ, eɪ, ɛ, æ]. Bottom row, left to right: [ɑ, ɔ, o, ʊ, u].

Page 10: SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING 4110-4890-5110-7960 NSCI 4110-4891-6110 Harry Howard Tulane University.

10

Acoustic cues and distinctive features

• Three problemsa. Input signal

b. Internal representation

c. Interface between (a)and (b)

• Lexical information retrieval• but we only need the

phonological form of a lexical item

9/30/13 Brain & Language, Harry Howard, Tulane University

Page 11: SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING 4110-4890-5110-7960 NSCI 4110-4891-6110 Harry Howard Tulane University.

11

Why speech recognition is difficult• The segmentation problem• The variability problem

• coarticulation

• The speaking environment• Speakers’ vocal tracts• Speech rate and style• Rate of information transmission

9/30/13 Brain & Language, Harry Howard, Tulane University

Page 12: SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING 4110-4890-5110-7960 NSCI 4110-4891-6110 Harry Howard Tulane University.

12

Lexical retrieval• Speech perception involves phonological parsing prior to

lexical access• It is not enough to know the lexicon beforehand.

• Phonetic forms and phonological representations• Speech/speaker normalization• Distinctive features and acoustic cues• Underspecified vs. fully specified• Discrete vs. continuous• Hierarchical organization vs. entrainment

9/30/13 Brain & Language, Harry Howard, Tulane University

Page 13: SPEECH RECOGNITION 2 DAY 15 – SEPT 30, 2013 Brain & Language LING 4110-4890-5110-7960 NSCI 4110-4891-6110 Harry Howard Tulane University.

NEXT TIMEFinish Ingram §6.

☞ Go over questions at end of chapter.

9/30/13 Brain & Language, Harry Howard, Tulane University 13