Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien...
Transcript of Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien...
CSC401/2511– Spring2017 2
Statisticalmachinetranslation• Machinetranslationseemedtobeanintractableproblemby
thelate1940suntilachangeinperspective…
WhenIlookatanarticleinRussian,I
say:‘ThisisreallywritteninEnglish,but
ithasbeencoded insomestrange
symbols.Iwillnowproceedtodecode.’
WarrenWeaver March,1947
ClaudeShannon July,1948
Transmitter
!(#)Receiver!(%|#)
Noisychannel# %
3
Thenoisychannel
Source'(()
Languagemodel
Channel'()|()
Translationmodel
*′
Decoder
,′
(∗ Observed)
*∗= argmax
4
!(,|*)!(*)
CSC401/2511– Spring2017
CSC401/2511– Spring2017 4
Howtotrain'()|()?• Solution:collectstatisticsonvastparalleltexts
…citizen of
Canadahasthe
right tovoteinanelectionof
membersofthe
Houseof
Commonsorofa
legislative
assemblyandto
bequalifiedfor
membership…
e.g.,theCanadianHansards:bilingualParliamentaryproceedings
…citoyencanadienale
droit devoteetestéligibleaux
élections
législatives
fédéralesou
provinciales …
CSC401/2511– Spring2017 5
Sentencealignment• Sentencescanalsobeunalignedacrosstranslations.• E.g., Hewashappy.E1 Hehadbacon.E2 →
Ilétait heureux parcequ'il avaitdubacon.F1*5
,5
*6
,6
*7
,7
*8
,8
*9
,9
*:
,:
*;
,;
…
*5
,5
*6
*7
,6
*8
,7
*9
,8
,9
*:
,:
*;
,;
…
CSC401/2511– Spring2017 6
Sentencealignment• Weoftenneedtoalignsentences beforewecanalignwords.
• Twobroadmethodsofsentencealignmentare:• Methodsbasedonsentencelength,• Methodsbasedonlexicalmatches,or“cognates”.
• Whataboutphraseandword alignments?
CSC401/2511– Spring2017 7
Wordalignment
• Wordalignmentscanbe1:1,N:1,1:N,0:1,1:0,…E.g.,“zerofertility”word:nottranslated(1:0)
“spurious”words:generatedfrom‘nothing’(0:1)
Onewordtranslated
asseveralwords(1:N)
alignment
Note thatthisisonlyonepossible
alignment
CSC401/2511– Spring2017 8
IntuitionofstatisticalMT
• Thewords‘the’and‘maison’ co-occurfrequently,butnot asfrequentlyas‘the’and‘la’.
'(<=|>?@) shouldbehigher than'(A<@BC|>?@),'(D<@B@|>?@),andeven'(E=FGHI|>?@)
Note:we’reconsideringallpossiblewordalignments….
CSC401/2511– Spring2017 9
IBMModel1
CSC401/2511– Spring2017 10
IBMModel1:theNULLword
• TheNULL wordisanimaginarywordthatweneedto
accountfortheproductionofspuriouswords.
“NULL”word
CSC401/2511– Spring2017 11
IBMModel1:somedefinitions
• Englishsentence* hasJ4words,K
5…K
MN,
plus NULLword,KP.
• Frenchsentence, hasJQwords,R
5…R
MS.
K5
K6
K7
K8
K9
K:
R5
R6
R7
R8
R9
R:
R;
RT
RU
KP
CSC401/2511– Spring2017 12
IBMModel1:alignments
• Analignment,=,identifiestheEnglishwordthat‘produced’thegivenFrenchwordateachindex.• = = {=
W,… , =
Y)} where[
\∈ {0,… , J
4}
• E.g.,= = {_, `, _, W, a, b, c, c, c}
K5
K6
K7
K8
K9
K:
R5
R6
R7
R8
R9
R:
R;
RT
RU
KP
[5= 0
[U= 6
CSC401/2511– Spring2017 13
IBMModel1:alignments
• Thereare J4+ 1
MS possiblealignments.(since [ = JQ)
• IBM-1doesn’tknowthatsomeareverybadinreality.• E.g.,= = {`, `, `, `, `, `, `, `, `}
K5
K6
K7
K8
K9
K:
R5
R6
R7
R8
R9
R:
R;
RT
RU
KP
CSC401/2511– Spring2017 14
IBMModel1:alignments
• IBMModel1assumesthatallalignmentsof* are
equallylikely givenonlythelength (notthewords)of).
∀=, ! [ *, JQ=
1
J4+ 1
MS
• Thisisamajor simplifyingassumption,butitgetsthe
processstarted.
Uniformoverall
possible
alignments.
CSC401/2511– Spring2017 15
Equallylikelyalignmentsapriori
K5
K6
K7
K8
K9
K:
R5
R6
R7
R8
R9
R:
R;
RT
RU
KP
K5
K6
K7
K8
K9
K:
R5
R6
R7
R8
R9
R:
R;
RT
RU
KP
!(
!( )
)
CSC401/2511– Spring2017 16
IBMModel1:translationprobability
• Givenanalignment [ andanEnglishsentence*,whatistheprobabilityofaFrenchsentence,?
!(,|[, *)
• InIBM-1,
! , [, * =h!(R\|Kij)
MS
\k5
(anothersimplifyingassumption)
Theprobabilityofthelmn
Frenchword,giventhatit
wasgeneratedfromthe
[\
mn Englishword.
CSC401/2511– Spring2017 17
IBMModel1:translationprobability
• * = Canada‘sprogramhasbeenimplemented• [ = {0,3,0,1,4,5,6,6,6}
• , = LeprogrammeduCanadaàété mis enapplication
• ! , [, * = ! JK|∅ ! stuvt[wwK stuvt[w ×
! yz ∅ ! {[|[y[ {[|[y[ ! à ℎ[� ×
! éÅé ÇKK| ! wÉ� ÉwsÑKwK|ÅKy ×
! K| ÉwsÑKwK|ÅKy ×
!([ssÑÉÖ[ÅÉu||ÉwsÑKwK|ÅKy)
CSC401/2511– Spring2017 18
IBMModel1:generation
• Togenerate aFrench sentence, fromEnglish*,1. Pickalengthof, (withprobability!(J
Q)).
2. Pickanalignment(withuniform probability,5
MNÜ5áS
).
3. SampleFrenchwordswithprobability
! , [, * =h!(R\|Kij)
MS
\k5
! ,, [ * = ! [ * ! , [, * =
!(JQ)
J4+ 1
MS
h!(R\|Kij)
MS
\k5
So,
ThisishowweimagineEnglishgetscorruptedinthenoisychannel.
Slide16
Slide14
CSC401/2511– Spring2017 19
IBM-1:alignmentashiddenvariable
• If! ,, [ * describestheprocessofgeneratingFrench
wordsand alignments fromEnglishwords…
• Then! , * = à !(,, [|*)
�
i∈ä
whereä istheset ofallpossiblealignments
Remember,thenoisychannelmodelstatesthatFrenchwordsarereally
encodedEnglishwords!
CSC401/2511– Spring2017 22
IBM-1:training
• Ourtrainingdataã isasetofpairs ofcorrespondingFrench andEnglishsentences,ã = (,
å, *å) , É = 0. . é.
• Ifweknewthewordalignments,[,learning!(R|K)
wouldbetrivialwithMLE:! R K =èêëím(ì,î)
èêëím(î)
.
• Butthealignments arehidden.Weneedtouse…
#times
algined
CSC401/2511– Spring2017 23
IBM-1:expectation-maximization
1. Initialize translationparameters! R K (e.g.,randomly).
2. Expectation: Giventhecurrentïñ= ! R K ,compute
theexpectedvalueof óHBI>(A, @)forallwordsintrainingdataã.
3. Maximization:GiventheexpectedvalueofóHBI> A, @ ,
computethemaximumlikelihoodestimateofï
ñ= ! R K
CSC401/2511– Spring2017 24
IBM-1EM:Example
• Imagineourtrainingdatais
ã = { ÇÑzKℎuz�K,w[É�u|ÇÑKzK ,
(ÅℎKℎuz�K, Ñ[w[É�u|)}
• Thevocabulariesare
ò4= {ÇÑzK, ℎuz�K, ÅℎK} and
òQ= {w[É�u|, ÇÑKzK, Ñ[}.
• Forsimplicity,weconsideronly1:1alignments:
thereisno NULLword,thereareno zero-fertilitywords.
CSC401/2511– Spring2017 25
IBM-1EM:Example• First,weinitialize ourparameters,ô = ' A @ .
• IntheExpectation step,wecomputeexpected counts:
• ö{uz|Å(R, K): thetotalnumberoftimesK
andR arealigned.
• öuÅ[Ñ(K): thetotalnumberofK.
Thishastobedoneinstepsbyfirstcomputing' ), = ( then'(=|), ()
• IntheMaximization step,weperformMLEwiththe
expectedcounts.
CSC401/2511– Spring2017 26
IBM-1EM:Exampleinitialization
! w[É�u| ÇÑzK =
1
3
! ÇÑKzK ÇÑzK =
1
3
! Ñ[ ÇÑzK =
1
3
! w[É�u| ℎuz�K =
1
3
! ÇÑKzK ℎuz�K =
1
3
! Ñ[ ℎuz�K =
1
3
! w[É�u| ÅℎK =
1
3
! ÇÑKzK ÅℎK =
1
3
! Ñ[ ÅℎK =
1
3
ïP:
1. Makeatableof!(R|K) forallpossiblepairsR andK.
Initializeuniformlyacrossrows.
CSC401/2511– Spring2017 27
IBM-1 E:compute'()|=, ()
2. Makeagridwhere
eachsentencepairisarow,andeachpossibleword-alignmentis
acolumn.
‘Sentence’1
‘Sentence’2
Alignment1
Alignment1
Alignment2
CSC401/2511– Spring2017
IBM-1E:compute'()|=, ()
!(,|[, *) = ! w[É�u||ÇÑzK ×
! ÇÑKzK ℎuz�K =
1
3
ú
1
3
=
W
ù
!(,|[, *) = ! ÇÑKzK|ÇÑzK ×
! w[É�u| ℎuz�K =
1
3
ú
1
3
=
W
ù
!(,|[, *) = ! Ñ[|ÅℎK ×
! w[É�u| ℎuz�K =
1
3
ú
1
3
=
W
ù
! ,, [ * = ! w[É�u||ÅℎK ×
! Ñ[ ℎuz�K =
1
3
ú
1
3
=
W
ù
[ = [5,5
[ = [5,6
[ = [6,5
[ = [6,6
‘Sentence’1
‘Sentence’2
28
3. Foreachsentencepairand
alignment,compute(slide16)
!(,|[, *) =h!(R\|Kij)
�
ìj
CSC401/2511– Spring2017 29
IBM-1E:compute'(=|(, ))
• Wewanttheprobabilityofanalignment= sothat
wecancomputetheexpected {uz|Å(R\, Kå).
! [ *, , =
!(,, [|*)
∑ !(,, [|*)�
i∈ä
=
!(,|[, *)
∑ !(,|[, *)�
i∈ä
• Thisisnot thesameastheprobability! [ *, JQ.
• i.e.,itwon’talways beuniform.
(**)Rewrite! ,, [ * asonslide18,
andü(M
S)
MNÜ5
áS
cancelsout
(**)(*)
(*)Because! [ *, , =ü(i,4,Q)
ü 4 ü(Q|4)
and
! [, *, , = ! ,, [ * !(*) and
! , * = ∑ !(,, [|*)�
i∈ä(slide19)
CSC401/2511– Spring2017 30
IBM-1E:compute'(=|(, ))
! [ *, , =
W ù⁄
W ù⁄ + W/ù
=
W
¢
! [ *, , =
W ù⁄
W ù⁄ + W/ù
=
W
¢
! [ ,, * =
1 9⁄
1 9⁄ + 1/9
=
W
¢
! [ *, , =
1 9⁄
1 9⁄ + 1/9
=
W
¢
‘Sentence’1
‘Sentence’2
4. Foreachelementinyourgrid,
divide!(,|[, *) bythesumof
therow(slide29).
CSC401/2511– Spring2017 31
IBM-1E:compute§óHBI>
ö{uz|Å(w[É�u|, ÇÑzK)
=
1
2
ö{uz|Å ÇÑKzK, ÇÑzK
=?
ö{uz|Å Ñ[, ÇÑzK =?
ö{uz|Å w[É�u|, ℎuz�K
=
1
2
+
1
2
= 1
ö{uz|Å ÇÑKzK, ℎuz�K
=?
ö{uz|Å Ñ[, ℎuz�K
=?
ö{uz|Å w[É�u|, ÅℎK
=?
ö{uz|Å ÇÑKzK, ÅℎK
=?
ö{uz|Å Ñ[, ÅℎK =?
5. Foreach possiblewordpairK andR,sum! [ *, , fromstep4across
allalignmentsandsentencepairs
whereK isalignedwithR
maisonandblue arealignedonlyinalignment1,sentence1.
! [ = 1 ,5, *5=
1
2
maison andhousearealignedinalignment2,sentence1.
and
alignment1,sentence2
! [ = 2 ,5, *5=
1
2
! [ = 1 ,6, *6=
1
2
Thisisanew table,not theï = ! R K table
frombefore!
CSC401/2511– Spring2017 32
IBM-1E:compute§H>=<E.g., §H>=< D<B@ =
W
¢
+W
¢
= W,
§H>=< ?HBG@ = W +W
¢
+W
¢
= ¢, …
ö{uz|Å(w[É�u|, ÇÑzK)
=
1
2
ö{uz|Å(ÇÑKzK, ÇÑzK)
=
1
2
ö{uz|Å(Ñ[, ÇÑzK)
= 0
ö{uz|Å w[É�u|, ℎuz�K
=
1
2
+
1
2
= 1
ö{uz|Å(ÇÑKzK, ℎuz�K)
=
1
2
ö{uz|Å(Ñ[, ℎuz�K)
=
1
2
ö{uz|Å(w[É�u|, ÅℎK)
=
1
2
ö{uz|Å ÇÑKzK, ÅℎK
= 0ö{uz|Å(Ñ[, ÅℎK) =
1
2
6. Sumovertherowsofthistabletoget
thetotal estimatesforeach English
word,K.
CSC401/2511– Spring2017 33
IBM-1M:Recompute '(A|@)
! w[É�u| ÇÑzK
=
1/2
1
! ÇÑKzK ÇÑzK
=
1/2
1
! Ñ[ ÇÑzK
=
0
1
! w[É�u| ℎuz�K
=
1
2
! ÇÑKzK ℎuz�K
=
1/2
2
=
1
4
! Ñ[ ℎuz�K
=
1/2
2
=
1
4
! w[É�u| ÅℎK
=
1/2
1
! ÇÑKzK ÅℎK
=
0
1
! Ñ[ ÅℎK
=
1/2
1
Notethe‘correct’and
‘incorrect’changes
inprobability
ï5:
7. Compute! R K =ßèêëím(ì,î)
ßêmi®(î)
Thisisyourmodelafteriteration1.
CSC401/2511– Spring2017 34
IBM-1EM:Repeat
• Youhavefinished1iterationofEMwhenyouhave
completed Step7,
• GobacktoStep2andrepeat.
CSC401/2511– Spring2017 35
IBM-1E:compute'()|=, ()
!(,|[, *) = ! w[É�u||ÇÑzK ×
! ÇÑKzK ℎuz�K =
1
2
ú
1
4
=
W
©
!(,|[, *) = ! ÇÑKzK|ÇÑzK ×
! w[É�u| ℎuz�K =
1
2
ú
1
2
=
W
a
!(,|[, *) = ! Ñ[|ÅℎK ×
! w[É�u| ℎuz�K =
1
2
ú
1
2
=
W
a
!(,|[, *) = ! w[É�u||ÅℎK ×
! Ñ[ ℎuz�K =
1
2
ú
1
4
=
W
©
‘Sentence’1
‘Sentence’2
2:makegrid
3:computeproductsof!(R|K)
CSC401/2511– Spring2017 36
IBM-1E:compute'(=|(, ))
! [ *, , =
1/4
1 8⁄ + 1/4
=
¢
`
! [ *, , =
1 8⁄
1 8⁄ + 1/4
=
W
`
! [ *, , =
1 8⁄
1/4 + 1/8
=
W
`
! [ *, , =
1 4⁄
1 4⁄ + 1/8
=
¢
`
‘Sentence’1
‘Sentence’2
4:dividebysumofrowsinstep3
CSC401/2511– Spring2017 37
IBM-1E:compute§óHBI>&§H>=<öuÅ[Ñ ÇÑzK =
W
`
+¢
`
= 1, öuÅ[Ñ ℎuz�K =8
7
+5
7
+5
7
= 2,
öuÅ[Ñ ÅℎK =5
7
+6
7
= 1
ö{uz|Å(w[É�u|, ÇÑzK)
=
W
`
ö{uz|Å(ÇÑKzK, ÇÑzK)
=
¢
`
ö{uz|Å(Ñ[, ÇÑzK)
= 0
ö{uz|Å w[É�u|, ℎuz�K
=
2
3
+
2
3
=
4
3
ö{uz|Å(ÇÑKzK, ℎuz�K)
=
1
3
ö{uz|Å(Ñ[, ℎuz�K)
=
1
3
ö{uz|Å(w[É�u|, ÅℎK)
=
1
3
ö{uz|Å ÇÑKzK, ÅℎK
= 0ö{uz|Å(Ñ[, ÅℎK) =
2
3
5.ComputeTCount bysumming
relevantprobabilitiesfromstep4
6.ComputeTotal bysummingrows
CSC401/2511– Spring2017 38
IBM-1M:Recompute '(A|@)
• Compute! R K =ßèêëím(ì,î)
ßêmi®(î)
' E=FGHI D<B@
=
W/`
W
' D<@B@ D<B@
=
¢/`
W
! Ñ[ ÇÑzK
=
0
1
! w[É�u| ℎuz�K
=
4/3
2
=
2
3
! ÇÑKzK ℎuz�K
=
1/3
2
=
1
6
! Ñ[ ℎuz�K
=
1/3
2
=
1
6
! w[É�u| ÅℎK
=
1/3
1
! ÇÑKzK ÅℎK
=
0
1
! Ñ[ ÅℎK
=
2/3
1
Tieshavebeenbroken
e.g.,
!(w[É�u||ÇÑzK)
≠ !(ÇÑKzK|ÇÑzK)ï6:
CSC401/2511– Spring2017 39
Assignment2
• Build n-gramlanguagemodels,withadd-¨ smoothing.
• Learn word-levelalignmentswiththeIBM-1model
usingdatafromtheCanadianHansard.
• Combine thelanguageandalignmentmodelsintoa
simpleFrench-to-Englishtranslator.
• Therearesomebonusmarksavailableforsubstantially
goingbeyondtheminimalrequirements.
CSC401/2511– Spring2017 40
Assignment2– languages
• Sentenceshavealreadybeensplit andaligned foryou.
• Wordshavenot beenaligned.
• Youdon’t needtoknowFrenchforthisassignment.
• Frenchismore‘rigid’thanEnglish,soitsuseof
contractions,e.g.,aremoreregular.• Youhavetodosomepre-processingofFrench
sentences,butthoserulesaregiventoyouexplicitly.
CSC401/2511– Spring2017 41
Assignment2– practical
• Posted13February.Due10March.
• WillbeprogrammedinMatlab.
• Varioussupportfunctionsforthisassignmentwillbe
availableonCDF.
• Markswillbegivenmoreforunderstanding thealgorithmsandconceptsthanforspecificresults.