Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to...
Transcript of Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to...
![Page 1: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/1.jpg)
AcousticModelsTaylorBerg-Kirkpatrick– CMU
Slides:DanKlein– UCBerkeley
AlgorithmsforNLP
![Page 2: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/2.jpg)
SpeechSignals
![Page 3: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/3.jpg)
n Frequencygivespitch;amplitudegivesvolume
n Frequenciesateachtimesliceprocessedintoobservationvectors
s p ee ch l a b
ampl
itude
SpeechinaSlide
……………………………………………..x12x13x12x14x14………..
![Page 4: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/4.jpg)
Articulation
![Page 5: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/5.jpg)
TextfromOhala,Sept2001,fromSharonRoseslide
Sagittal sectionofthevocaltract(Techmer 1880)
Nasalcavity
Pharynx
Vocalfolds(inthelarynx)
Trachea
Lungs
ArticulatorySystem
Oralcavity
![Page 6: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/6.jpg)
SpaceofPhonemes
§ Standardinternationalphoneticalphabet(IPA)chartofconsonants
![Page 7: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/7.jpg)
Place
![Page 8: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/8.jpg)
PlacesofArticulation
labial
dentalalveolar post-alveolar/palatal
velaruvular
pharyngeal
laryngeal/glottal
FigurethankstoJenniferVenditti
![Page 9: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/9.jpg)
Labialplace
bilabial
labiodental
FigurethankstoJenniferVenditti
Bilabial:p,b,m
Labiodental:f,v
![Page 10: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/10.jpg)
Coronalplace
dentalalveolar post-alveolar/palatal
FigurethankstoJenniferVenditti
Dental:th/dh
Alveolar:t/d/s/z/l/n
Post:sh/zh/y
![Page 11: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/11.jpg)
DorsalPlace
velaruvular
pharyngeal
FigurethankstoJenniferVenditti
Velar:k/g/ng
![Page 12: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/12.jpg)
SpaceofPhonemes
§ Standardinternationalphoneticalphabet(IPA)chartofconsonants
![Page 13: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/13.jpg)
Manner
![Page 14: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/14.jpg)
MannerofArticulation§ Inadditiontovaryingbyplace,soundsvaryby
manner
§ Stop:completeclosureofarticulators,noairescapesviamouth§ Oralstop:palateisraised(p,t,k,b,d,g)§ Nasalstop:oralclosure,butpalateislowered(m,
n,ng)
§ Fricatives:substantialclosure,turbulent:(f,v,s,z)
§ Approximants:slightclosure,sonorant:(l,r,w)
§ Vowels:noclosure,sonorant:(i,e,a)
![Page 15: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/15.jpg)
SpaceofPhonemes
§ Standardinternationalphoneticalphabet(IPA)chartofconsonants
![Page 16: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/16.jpg)
Vowels
![Page 17: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/17.jpg)
VowelSpace
![Page 18: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/18.jpg)
Acoustics
![Page 19: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/19.jpg)
“Shejusthadababy”
§ Whatcanwelearnfromawavefile?§ Nogapsbetweenwords(!)§ Vowelsarevoiced,long,loud§ Lengthintime=lengthinspaceinwaveformpicture§ Voicing:regularpeaksinamplitude§ Whenstopsclosed:nopeaks,silence§ Peaks=voicing:.46to.58(vowel[iy],fromsecond.65to.74(vowel[ax])andsoon
§ Silenceofstopclosure(1.06to1.08forfirst[b],or1.26to1.28forsecond[b])
§ Fricativeslike[sh]:intenseirregularpattern;see.33to.46
![Page 20: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/20.jpg)
Time-DomainInformation
bad
pad
spat
pat
ExamplefromLadefoged
![Page 21: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/21.jpg)
SimplePeriodicWavesofSound
Time (s)0 0.02
œ0.99
0.99
0
• Y axis: Amplitude = amount of air pressure at that point in time• Zero is normal air pressure, negative is rarefaction
• X axis: Time.• Frequency = number of cycles per second.• 20 cycles in .02 seconds = 1000 cycles/second = 1000 Hz
![Page 22: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/22.jpg)
ComplexWaves:100Hz+1000Hz
Time (s)0 0.05
œ0.9654
0.99
0
Ampl
itude
![Page 23: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/23.jpg)
Spectrum
100 1000Frequency in Hz
Coe
ffici
ent
Frequency components (100 and 1000 Hz) on x-axis
![Page 24: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/24.jpg)
Partof[ae]waveformfrom“had”
§ Notecomplexwaverepeatingninetimesinfigure§ Plussmallerwaveswhichrepeats4timesforeverylarge
pattern§ Largewavehasfrequencyof250Hz(9timesin.036seconds)§ Smallwaveroughly4timesthis,orroughly1000Hz§ Twolittletinywavesontopofpeakof1000Hzwaves
Ampl
itude
Time
![Page 25: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/25.jpg)
SpectrumofanActualSpeech
Frequency (Hz)0 5000
0
20
40
Coe
ffici
ent
![Page 26: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/26.jpg)
Spectrogramsam
pl
time
slice
Frequency (Hz)0 5000
0
20
40
freq
coeff
FFT
time
ampl
![Page 27: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/27.jpg)
Spectrograms
Fre
qu
en
cy (H
z)
05
00
0
0
20
40
Fre
qu
en
cy (H
z)
05
00
0
0
20
40
Fre
qu
en
cy (H
z)
05
00
0
0
20
40
Fre
qu
en
cy (H
z)
05
00
0
0
20
40
Fre
qu
en
cy (H
z)
05
00
0
0
20
40
Fre
qu
en
cy (H
z)
05
00
0
0
20
40
Fre
qu
en
cy (H
z)
05
00
0
0
20
40
Fre
qu
en
cy (H
z)
05
00
0
0
20
40
Fre
qu
en
cy (H
z)
05
00
0
0
20
40
Fre
qu
en
cy (H
z)
05
00
0
0
20
40
Fre
qu
en
cy (H
z)
05
00
0
0
20
40
Fre
qu
en
cy (H
z)
05
00
0
0
20
40
Fre
qu
en
cy (H
z)
05
00
0
0
20
40
Fre
qu
en
cy (H
z)
05
00
0
0
20
40
Fre
qu
en
cy (H
z)
05
00
0
0
20
40
Fre
qu
en
cy (H
z)
05
00
0
0
20
40
Fre
qu
en
cy (H
z)
05
00
0
0
20
40
Fre
qu
en
cy (H
z)
05
00
0
0
20
40
time
ampl
![Page 28: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/28.jpg)
Spectrogramsfre
q
time
time
ampl
![Page 29: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/29.jpg)
TypesofGraphsfre
q
time
time
ampl
ampl
time
Frequency (Hz)0 5000
0
20
40
freq
coeff
![Page 30: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/30.jpg)
BacktoSpectra§ Spectrumrepresentsthesefreqcomponents§ ComputedbyFouriertransform,algorithmwhichseparates
outeachfrequencycomponentofwave.
§ x-axisshowsfrequency,y-axisshowsmagnitude(indecibels,alogmeasureofamplitude)
§ Peaksat930Hz,1860Hz,and3020Hz.
![Page 31: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/31.jpg)
Source/Filter
![Page 32: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/32.jpg)
WhythesePeaks?
§ Articulationprocess:§ Thevocalcordvibrations
createharmonics§ Themouthisanamplifier§ Dependingonshapeof
mouth,someharmonicsareamplifiedmorethanothers
![Page 33: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/33.jpg)
Figures from Ratree Wayland
A3
A4
A2
C4 (middle C)
C3
F#3
F#2
Vowel[i]atincreasingpitches
![Page 34: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/34.jpg)
ResonancesoftheVocalTract
§ Thehumanvocaltractasanopentube:
§ Airinatubeofagivenlengthwilltendtovibrateatresonancefrequencyoftube.
§ Constraint:Pressuredifferentialshouldbemaximalat(closed)glottalendandminimalat(open)lipend.
Closedend Openend
Length17.5cm.
Figure from W. Barry
![Page 35: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/35.jpg)
FromSundberg
![Page 36: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/36.jpg)
Computingthe3FormantsofSchwa
§ LetthelengthofthetubebeL§ F1 =c/l1 =c/(4L)=35,000/4*17.5=500Hz§ F2 =c/l2 =c/(4/3L)=3c/4L=3*35,000/4*17.5=1500Hz§ F3 =c/l3 =c/(4/5L)=5c/4L=5*35,000/4*17.5=2500Hz
§ Soweexpectaneutralvoweltohave3resonancesat500,1500,and2500Hz
§ Thesevowelresonancesarecalledformants
![Page 37: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/37.jpg)
FromMarkLiberman
![Page 38: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/38.jpg)
SeeingFormants:theSpectrogram
![Page 39: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/39.jpg)
VowelSpace
![Page 40: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/40.jpg)
SeeingFormants:theSpectrogram
![Page 41: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/41.jpg)
AmericanEnglishVowelSpace
FRONT BACK
HIGH
LOW
iy
ih
eh
ae aa
ao
uw
uh
ahax
ix ux
Figures from Jennifer Venditti, H. T. Bunnell
![Page 42: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/42.jpg)
Spectrograms
![Page 43: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/43.jpg)
HowtoReadSpectrograms
§ [bab]:closureoflipslowersallformants:sorapidincreaseinallformantsatbeginningof"bab”
§ [dad]:firstformantincreases,butF2andF3slightfall§ [gag]:F2andF3cometogether:thisisacharacteristicof
velars.Formanttransitionstakelongerinvelarsthaninalveolars orlabials
From Ladefoged “A Course in Phonetics”
![Page 44: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/44.jpg)
“Shecamebackandstartedagain”
1.lotsofhigh-freqenergy3.closurefork4.burstofaspirationfork5.ey vowel;faint1100Hzformantisnasalization6.bilabialnasal7.shortbclosure,voicingbarelyvisible.8.ae;noteupwardtransitionsafterbilabialstopatbeginning9.noteF2andF3comingtogetherfor"k"
FromLadefoged “ACourseinPhonetics”
![Page 45: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/45.jpg)
DialectIssues
§ Speechvariesfromdialecttodialect(examplesareAmericanvs.BritishEnglish)§ Syntactic(“Icould”vs.“Icould
do”)§ Lexical(“elevator”vs.“lift”)§ Phonological§ Phonetic
§ Mismatchbetweentrainingandtestingdialectscancausealargeincreaseinerrorrate
American British
all
old
![Page 46: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/46.jpg)
SpeechRecognition
![Page 47: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/47.jpg)
TheNoisyChannelModel
Acoustic model: HMMs over word positions with mixtures of Gaussians as emissions
Language model: Distributions over sequences
of words (sentences)
![Page 48: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/48.jpg)
SpeechModel
w1 w2Words
s1 s2 s3 s4 s5 s6 s7Soundtypes
a1 a2 a3 a4 a5 a6 a7Acousticobservations
Languagemodel
Acousticmodel
![Page 49: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/49.jpg)
AcousticModel
s1 s2 s3 s4 s5 s6 s7Soundtypes
a1 a2 a3 a4 a5 a6 a7Acousticobservations
Acousticmodel
![Page 50: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/50.jpg)
Frame Extraction
§ A frame (25 ms wide) extracted every 10 ms
25 ms
10ms
a1 a2 a3
Figure:SimonArnfield
Previewoffeatureextractionforeachframe:1) DFT(Spectrum)2) Log(Calibrate?)3) anotherDFT(!!??)
![Page 51: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/51.jpg)
FeatureExtraction
![Page 52: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/52.jpg)
DigitizingSpeech
Figure:BryanPellom
![Page 53: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/53.jpg)
Source/Filter
§ Articulationprocess:§ Thevocalcordvibrations
createharmonics§ Themouthisanamplifier§ Dependingonshapeof
mouth,someharmonicsareamplifiedmorethanothers
![Page 54: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/54.jpg)
Figures from Ratree Wayland
ProblemwithRawSpectrum
![Page 55: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/55.jpg)
Deconvolution /Liftering
![Page 56: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/56.jpg)
Deconvolution /Lifterings
e f
�
![Page 57: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/57.jpg)
Deconvolution /Lifterings
e f
log
⇣
log
⇣log
⇣ ⌘
⌘
⌘+
![Page 58: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/58.jpg)
Deconvolution /Liftering
GraphsfromDanEllis
s = e � f
log(s) = log(e) + log(f)
IDFT(log(s))
![Page 59: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/59.jpg)
MelFreq.Cepstral Coefficients
§ DoFFTtogetspectralinformation§ Likethespectrogramwesawearlier
§ ApplyMelscaling(New)§ Modelshumanear;moresensitivity
inlowerfreqs§ Approx linearbelow1kHz,logabove,
equalsamplesaboveandbelow1kHz
§ TakeLog§ Dodiscretecosinetransform
[Graph:Wikipedia]
![Page 60: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/60.jpg)
FinalFeatureVector
§ 39(real)featuresper10msframe:§ 12MFCCfeatures§ 12deltaMFCCfeatures§ 12delta-deltaMFCCfeatures§ 1(log)frameenergy§ 1delta(log)frameenergy§ 1delta-delta(logframeenergy)
§ Soeachframeisrepresentedbya39Dvector
![Page 61: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/61.jpg)
EmissionModel
![Page 62: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/62.jpg)
HMMsforContinuousObservations
§ Before:discretesetofobservations
§ Now:featurevectorsarereal-valued
§ Solution1:discretization§ Solution2:continuousemissions
§ Gaussians§ MultivariateGaussians§ MixturesofmultivariateGaussians
§ Astateisprogressively§ Contextindependentsubphone (~3per
phone)§ Contextdependentphone(triphones)§ StatetyingofCDphone
![Page 63: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/63.jpg)
VectorQuantization
§ Idea:discretization§ MapMFCCvectorsonto
discretesymbols§ Computeprobabilities
justbycounting
§ ThisiscalledvectorquantizationorVQ
§ NotusedforASRanymore
§ But:usefultoconsiderasastartingpoint
![Page 64: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/64.jpg)
GaussianEmissions§ VQisinsufficientfortop-
qualityASR§ Hardtocoverhigh-
dimensionalspacewithcodebook
§ Movesambiguityfromthemodeltothepreprocessing
§ Instead:assumethepossiblevaluesoftheobservationvectorsarenormallydistributed.§ Representtheobservation
likelihoodfunctionasaGaussian?
From bartus.org/akustyk
![Page 65: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/65.jpg)
GaussiansforAcousticModeling
§ P(x):
P(x)
x
P(x) is highest here at mean
P(x) is low here, far from mean
A Gaussian is parameterized by a mean and a variance:
![Page 66: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/66.jpg)
MultivariateGaussians§ Insteadofasinglemeanµ andvariances2:
§ Vectorofmeansµ andcovariancematrixS
§ Usuallyassumediagonalcovariance(!)§ Thisisn’tverytrueforFFTfeatures,butislessbadforMFCCfeatures
![Page 67: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/67.jpg)
Gaussians:SizeofS
§ µ =[00] µ =[00] µ =[00]§ S =I S =0.6I S =2I§ AsS becomeslarger,Gaussianbecomesmorespreadout;asS becomessmaller,Gaussianmorecompressed
TextandfiguresfromAndrewNg
![Page 68: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/68.jpg)
Gaussians:ShapeofS
§ Asweincreasetheoffdiagonalentries,morecorrelationbetweenvalueofxandvalueofy
TextandfiguresfromAndrewNg
![Page 69: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/69.jpg)
Butwe’renotthereyet
§ SingleGaussiansmaydoabadjobofmodelingacomplexdistributioninanydimension
§ Evenworsefordiagonalcovariances
§ Solution:mixturesofGaussians
From openlearn.open.ac.uk
![Page 70: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/70.jpg)
MixturesofGaussians§ MixturesofGaussians:
Fromrobots.ox.ac.uk http://www.itee.uq.edu.au/~comp4702
![Page 71: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/71.jpg)
GMMs§ Summary:eachstatehasanemission
distributionP(x|s)(likelihoodfunction)parameterizedby:§ Mmixtureweights§ MmeanvectorsofdimensionalityD§ EitherM covariancematricesofDxD orM
Dx1diagonalvariancevectors
§ Likesoftvectorquantizationafterall§ Thinkofthemixturemeansasbeing
learnedcodebookentries§ ThinkoftheGaussiandensitiesasa
learnedcodebookdistancefunction§ ThinkofthemixtureofGaussianslikea
multinomialovercodes§ (EvenmoretruegivensharedGaussian
inventories,cf nextweek)
![Page 72: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/72.jpg)
StateModel
![Page 73: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/73.jpg)
StateTransitionDiagrams§ BayesNet:HMMasaGraphicalModel
§ StateTransitionDiagram:MarkovModelasaWeightedFSA
w w w
x x x
the cat chased
doghas
![Page 74: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/74.jpg)
ASRLexicon
Figure:J&M
![Page 75: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/75.jpg)
LexicalStateStructure
Figure:J&M
![Page 76: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/76.jpg)
AddinganLM
FigurefromHuangetalpage618
![Page 77: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/77.jpg)
StateSpace§ Statespacemustinclude
§ Currentword(|V|onorderof20K+)§ Indexwithincurrentword(|L|onorderof5)§ E.g.(lec[t]ure)(thoughnotinorthography!)
§ Acousticprobabilitiesonlydependonphonetype§ E.g.P(x|lec[t]ure)=P(x|t)
§ Fromastatesequence,canreadawordsequence
![Page 78: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/78.jpg)
StateRefinement
![Page 79: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/79.jpg)
PhonesAren’tHomogeneous
![Page 80: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/80.jpg)
NeedtoUseSubphones
Figure:J&M
![Page 81: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/81.jpg)
AWordwithSubphones
Figure:J&M
![Page 82: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/82.jpg)
Modelingphoneticcontext
wiyriymiyniy
![Page 83: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/83.jpg)
“Need”withtriphonemodels
Figure:J&M
![Page 84: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/84.jpg)
LotsofTriphones
§ Possibletriphones:50x50x50=125,000
§ Howmanytriphonetypesactuallyoccur?
§ 20KwordWSJTask(fromBryanPellom)§ Wordinternalmodels:need14,300triphones§ Crosswordmodels:need54,400triphones
§ Needtogeneralizemodels,tietriphones
![Page 85: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/85.jpg)
StateTying/Clustering
§ [Young,Odell,Woodland1994]
§ Howdowedecidewhichtriphonestoclustertogether?
§ Usephoneticfeatures (or‘broadphoneticclasses’)§ Stop§ Nasal§ Fricative§ Sibilant§ Vowel§ lateral
Figure:J&M
![Page 86: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/86.jpg)
StateSpace§ Statespacenowincludes
§ Currentword:|W|isorder20K§ Indexincurrentword:|L|isorder5§ Subphone position:3§ E.g.(lec[t-mid]ure)
§ Acousticmodeldependsonclusteredphonecontext§ Butthisdoesn’tgrowthestatespace
§ But,addingtheLMcontextfortrigram+does§ (afterthe,lec[t-mid]ure)§ Thisisarealproblemfordecoding
![Page 87: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/87.jpg)
Decoding
![Page 88: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/88.jpg)
InferenceTasks
Mostlikelywordsequence:d- ae- d
Mostlikelystatesequence:d1-d6-d6-d4-ae5-ae2-ae3-ae0-d2-d2-d3-d7-d5
![Page 89: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/89.jpg)
ViterbiDecoding
Figure:EnriqueBenimeli
![Page 90: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/90.jpg)
ViterbiDecoding
Figure:EnriqueBenimeli
![Page 91: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/91.jpg)
EmissionCaching§ Problem:scoringalltheP(x|s)valuesistooslow§ Idea:manystatessharetiedemissionmodels,socachethem
![Page 92: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/92.jpg)
PrefixTrie Encodings§ Problem:manypartial-wordstatesareindistinguishable§ Solution:encodewordproductionasaprefixtrie (with
pushedweights)
§ AspecificinstanceofminimizingweightedFSAs[Mohri,94]Figure:Aubert,02
n i d
n i t
n o t
d
ni
t
o t
0.04
0.02
0.01
0.04
0.25
0.5
11
1
![Page 93: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/93.jpg)
BeamSearch§ Problem:trellisistoobigtocomputev(s)vectors§ Idea:moststatesareterrible,keepv(s)onlyfortopstatesat
eachtime
§ Important:stilldynamicprogramming;collapseequiv states
theb.
them.
andthen.
atthen.
theba.thebe.thebi.
thema.theme.themi.
thena.thene.theni.
theba.
thebe.
thema.
thena.
![Page 94: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/94.jpg)
LMFactoring§ Problem:Higher-ordern-gramsexplodethestatespace§ (One)Solution:
§ Factorstatespaceinto(wordindex,lmhistory)§ Scoreunigramprefixcostswhileinsideaword§ Subtractunigramcostandaddtrigramcostoncewordiscomplete
d
ni
t
o t
0.04
0.25
0.5
11
1
the
![Page 95: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/95.jpg)
LMReweighting§ Noisychannelsuggests
§ Inpractice,wanttoboostLM
§ Also,goodtohavea“wordbonus”tooffsetLMcosts
§ Thesearebothconsequencesofbrokenindependenceassumptionsinthemodel
![Page 96: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/96.jpg)
![Page 97: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/97.jpg)
Training
![Page 98: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/98.jpg)
TrainingMixtureModels§ Input:wavfileswithunalignedtranscriptions
§ Forcedalignment§ Computingthe“Viterbipath”overthetrainingdata(wherethe
transcriptionisknown)iscalled“forcedalignment”§ Weknowwhichwordstringtoassigntoeachobservationsequence.§ Wejustdon’tknowthestatesequence.§ Soweconstrainthepathtogothroughthecorrectwords(byusinga
specialexample-specificlanguagemodel)§ AndotherwiseruntheViterbialgorithm
§ Result:alignedstatesequence
![Page 99: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/99.jpg)
StateTying
§ CreatingCDphones:§ Startwithmonophone,doEM
training§ CloneGaussiansintotriphones§ Builddecisiontreeandcluster
Gaussians§ Cloneandtrainmixtures
(GMMs)
§ Generalidea:§ Introducecomplexitygradually§ Interleaveconstraintwith
flexibility
![Page 100: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/100.jpg)
Standardsubphone/mixtureHMM
Temporal Structure
GaussianMixtures
Model Error rateHMM Baseline 25.1%
![Page 101: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/101.jpg)
AnInducedModel
Standard Model
Single Gaussians
Fully Connected
[Petrov, Pauls, and Klein, 07]
![Page 102: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/102.jpg)
HierarchicalSplitTrainingwithEM
32.1%
28.7%
25.6%
HMM Baseline 25.1%5 Split rounds 21.4%
23.9%
![Page 103: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/103.jpg)
Refinementofthe/ih/-phone
![Page 104: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/104.jpg)
Refinementofthe/ih/-phone
![Page 105: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/105.jpg)
Refinementofthe/ih/-phone
![Page 106: Algorithms for NLPtbergkir/11711fa17/FA17 11-711... · 2017. 9. 14. · §Peaks = voicing: .46 to .58 (vowel [iy], from second .65 to .74 (vowel [ax]) and so on §Silence of stop](https://reader035.fdocuments.net/reader035/viewer/2022062416/61031e72cf23ee660e7b5d04/html5/thumbnails/106.jpg)
0
5
10
15
20
25
30
35
ae
ao
ay
eh
er
ey
ih f r s sil
aa
ah
ix
iy z cl k sh n
vcl
ow l m t v
uw
aw
ax
ch
w
th
el
dh
uh p en
oy
hh
jh
ng y b d dx g zh
epi
HMMstatesperphone