Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien...

39

Transcript of Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien...

Page 1: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen
Page 2: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 2

Statisticalmachinetranslation• Machinetranslationseemedtobeanintractableproblemby

thelate1940suntilachangeinperspective…

WhenIlookatanarticleinRussian,I

say:‘ThisisreallywritteninEnglish,but

ithasbeencoded insomestrange

symbols.Iwillnowproceedtodecode.’

WarrenWeaver March,1947

ClaudeShannon July,1948

Transmitter

!(#)Receiver!(%|#)

Noisychannel# %

Page 3: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

3

Thenoisychannel

Source'(()

Languagemodel

Channel'()|()

Translationmodel

*′

Decoder

,′

(∗ Observed)

*∗= argmax

4

!(,|*)!(*)

CSC401/2511– Spring2017

Page 4: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 4

Howtotrain'()|()?• Solution:collectstatisticsonvastparalleltexts

…citizen of

Canadahasthe

right tovoteinanelectionof

membersofthe

Houseof

Commonsorofa

legislative

assemblyandto

bequalifiedfor

membership…

e.g.,theCanadianHansards:bilingualParliamentaryproceedings

…citoyencanadienale

droit devoteetestéligibleaux

élections

législatives

fédéralesou

provinciales …

Page 5: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 5

Sentencealignment• Sentencescanalsobeunalignedacrosstranslations.• E.g., Hewashappy.E1 Hehadbacon.E2 →

Ilétait heureux parcequ'il avaitdubacon.F1*5

,5

*6

,6

*7

,7

*8

,8

*9

,9

*:

,:

*;

,;

*5

,5

*6

*7

,6

*8

,7

*9

,8

,9

*:

,:

*;

,;

Page 6: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 6

Sentencealignment• Weoftenneedtoalignsentences beforewecanalignwords.

• Twobroadmethodsofsentencealignmentare:• Methodsbasedonsentencelength,• Methodsbasedonlexicalmatches,or“cognates”.

• Whataboutphraseandword alignments?

Page 7: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 7

Wordalignment

• Wordalignmentscanbe1:1,N:1,1:N,0:1,1:0,…E.g.,“zerofertility”word:nottranslated(1:0)

“spurious”words:generatedfrom‘nothing’(0:1)

Onewordtranslated

asseveralwords(1:N)

alignment

Note thatthisisonlyonepossible

alignment

Page 8: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 8

IntuitionofstatisticalMT

• Thewords‘the’and‘maison’ co-occurfrequently,butnot asfrequentlyas‘the’and‘la’.

'(<=|>?@) shouldbehigher than'(A<@BC|>?@),'(D<@B@|>?@),andeven'(E=FGHI|>?@)

Note:we’reconsideringallpossiblewordalignments….

Page 9: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 9

IBMModel1

Page 10: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 10

IBMModel1:theNULLword

• TheNULL wordisanimaginarywordthatweneedto

accountfortheproductionofspuriouswords.

“NULL”word

Page 11: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 11

IBMModel1:somedefinitions

• Englishsentence* hasJ4words,K

5…K

MN,

plus NULLword,KP.

• Frenchsentence, hasJQwords,R

5…R

MS.

K5

K6

K7

K8

K9

K:

R5

R6

R7

R8

R9

R:

R;

RT

RU

KP

Page 12: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 12

IBMModel1:alignments

• Analignment,=,identifiestheEnglishwordthat‘produced’thegivenFrenchwordateachindex.• = = {=

W,… , =

Y)} where[

\∈ {0,… , J

4}

• E.g.,= = {_, `, _, W, a, b, c, c, c}

K5

K6

K7

K8

K9

K:

R5

R6

R7

R8

R9

R:

R;

RT

RU

KP

[5= 0

[U= 6

Page 13: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 13

IBMModel1:alignments

• Thereare J4+ 1

MS possiblealignments.(since [ = JQ)

• IBM-1doesn’tknowthatsomeareverybadinreality.• E.g.,= = {`, `, `, `, `, `, `, `, `}

K5

K6

K7

K8

K9

K:

R5

R6

R7

R8

R9

R:

R;

RT

RU

KP

Page 14: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 14

IBMModel1:alignments

• IBMModel1assumesthatallalignmentsof* are

equallylikely givenonlythelength (notthewords)of).

∀=, ! [ *, JQ=

1

J4+ 1

MS

• Thisisamajor simplifyingassumption,butitgetsthe

processstarted.

Uniformoverall

possible

alignments.

Page 15: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 15

Equallylikelyalignmentsapriori

K5

K6

K7

K8

K9

K:

R5

R6

R7

R8

R9

R:

R;

RT

RU

KP

K5

K6

K7

K8

K9

K:

R5

R6

R7

R8

R9

R:

R;

RT

RU

KP

!(

!( )

)

Page 16: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 16

IBMModel1:translationprobability

• Givenanalignment [ andanEnglishsentence*,whatistheprobabilityofaFrenchsentence,?

!(,|[, *)

• InIBM-1,

! , [, * =h!(R\|Kij)

MS

\k5

(anothersimplifyingassumption)

Theprobabilityofthelmn

Frenchword,giventhatit

wasgeneratedfromthe

[\

mn Englishword.

Page 17: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 17

IBMModel1:translationprobability

• * = Canada‘sprogramhasbeenimplemented• [ = {0,3,0,1,4,5,6,6,6}

• , = LeprogrammeduCanadaàété mis enapplication

• ! , [, * = ! JK|∅ ! stuvt[wwK stuvt[w ×

! yz ∅ ! {[|[y[ {[|[y[ ! à ℎ[� ×

! éÅé ÇKK| ! wÉ� ÉwsÑKwK|ÅKy ×

! K| ÉwsÑKwK|ÅKy ×

!([ssÑÉÖ[ÅÉu||ÉwsÑKwK|ÅKy)

Page 18: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 18

IBMModel1:generation

• Togenerate aFrench sentence, fromEnglish*,1. Pickalengthof, (withprobability!(J

Q)).

2. Pickanalignment(withuniform probability,5

MNÜ5áS

).

3. SampleFrenchwordswithprobability

! , [, * =h!(R\|Kij)

MS

\k5

! ,, [ * = ! [ * ! , [, * =

!(JQ)

J4+ 1

MS

h!(R\|Kij)

MS

\k5

So,

ThisishowweimagineEnglishgetscorruptedinthenoisychannel.

Slide16

Slide14

Page 19: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 19

IBM-1:alignmentashiddenvariable

• If! ,, [ * describestheprocessofgeneratingFrench

wordsand alignments fromEnglishwords…

• Then! , * = à !(,, [|*)

i∈ä

whereä istheset ofallpossiblealignments

Remember,thenoisychannelmodelstatesthatFrenchwordsarereally

encodedEnglishwords!

Page 20: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 22

IBM-1:training

• Ourtrainingdataã isasetofpairs ofcorrespondingFrench andEnglishsentences,ã = (,

å, *å) , É = 0. . é.

• Ifweknewthewordalignments,[,learning!(R|K)

wouldbetrivialwithMLE:! R K =èêëím(ì,î)

èêëím(î)

.

• Butthealignments arehidden.Weneedtouse…

#times

algined

Page 21: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 23

IBM-1:expectation-maximization

1. Initialize translationparameters! R K (e.g.,randomly).

2. Expectation: Giventhecurrentïñ= ! R K ,compute

theexpectedvalueof óHBI>(A, @)forallwordsintrainingdataã.

3. Maximization:GiventheexpectedvalueofóHBI> A, @ ,

computethemaximumlikelihoodestimateofï

ñ= ! R K

Page 22: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 24

IBM-1EM:Example

• Imagineourtrainingdatais

ã = { ÇÑzKℎuz�K,w[É�u|ÇÑKzK ,

(ÅℎKℎuz�K, Ñ[w[É�u|)}

• Thevocabulariesare

ò4= {ÇÑzK, ℎuz�K, ÅℎK} and

òQ= {w[É�u|, ÇÑKzK, Ñ[}.

• Forsimplicity,weconsideronly1:1alignments:

thereisno NULLword,thereareno zero-fertilitywords.

Page 23: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 25

IBM-1EM:Example• First,weinitialize ourparameters,ô = ' A @ .

• IntheExpectation step,wecomputeexpected counts:

• ö{uz|Å(R, K): thetotalnumberoftimesK

andR arealigned.

• öuÅ[Ñ(K): thetotalnumberofK.

Thishastobedoneinstepsbyfirstcomputing' ), = ( then'(=|), ()

• IntheMaximization step,weperformMLEwiththe

expectedcounts.

Page 24: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 26

IBM-1EM:Exampleinitialization

! w[É�u| ÇÑzK =

1

3

! ÇÑKzK ÇÑzK =

1

3

! Ñ[ ÇÑzK =

1

3

! w[É�u| ℎuz�K =

1

3

! ÇÑKzK ℎuz�K =

1

3

! Ñ[ ℎuz�K =

1

3

! w[É�u| ÅℎK =

1

3

! ÇÑKzK ÅℎK =

1

3

! Ñ[ ÅℎK =

1

3

ïP:

1. Makeatableof!(R|K) forallpossiblepairsR andK.

Initializeuniformlyacrossrows.

Page 25: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 27

IBM-1 E:compute'()|=, ()

2. Makeagridwhere

eachsentencepairisarow,andeachpossibleword-alignmentis

acolumn.

‘Sentence’1

‘Sentence’2

Alignment1

Alignment1

Alignment2

Page 26: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017

IBM-1E:compute'()|=, ()

!(,|[, *) = ! w[É�u||ÇÑzK ×

! ÇÑKzK ℎuz�K =

1

3

ú

1

3

=

W

ù

!(,|[, *) = ! ÇÑKzK|ÇÑzK ×

! w[É�u| ℎuz�K =

1

3

ú

1

3

=

W

ù

!(,|[, *) = ! Ñ[|ÅℎK ×

! w[É�u| ℎuz�K =

1

3

ú

1

3

=

W

ù

! ,, [ * = ! w[É�u||ÅℎK ×

! Ñ[ ℎuz�K =

1

3

ú

1

3

=

W

ù

[ = [5,5

[ = [5,6

[ = [6,5

[ = [6,6

‘Sentence’1

‘Sentence’2

28

3. Foreachsentencepairand

alignment,compute(slide16)

!(,|[, *) =h!(R\|Kij)

ìj

Page 27: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 29

IBM-1E:compute'(=|(, ))

• Wewanttheprobabilityofanalignment= sothat

wecancomputetheexpected {uz|Å(R\, Kå).

! [ *, , =

!(,, [|*)

∑ !(,, [|*)�

i∈ä

=

!(,|[, *)

∑ !(,|[, *)�

i∈ä

• Thisisnot thesameastheprobability! [ *, JQ.

• i.e.,itwon’talways beuniform.

(**)Rewrite! ,, [ * asonslide18,

andü(M

S)

MNÜ5

áS

cancelsout

(**)(*)

(*)Because! [ *, , =ü(i,4,Q)

ü 4 ü(Q|4)

and

! [, *, , = ! ,, [ * !(*) and

! , * = ∑ !(,, [|*)�

i∈ä(slide19)

Page 28: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 30

IBM-1E:compute'(=|(, ))

! [ *, , =

W ù⁄

W ù⁄ + W/ù

=

W

¢

! [ *, , =

W ù⁄

W ù⁄ + W/ù

=

W

¢

! [ ,, * =

1 9⁄

1 9⁄ + 1/9

=

W

¢

! [ *, , =

1 9⁄

1 9⁄ + 1/9

=

W

¢

‘Sentence’1

‘Sentence’2

4. Foreachelementinyourgrid,

divide!(,|[, *) bythesumof

therow(slide29).

Page 29: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 31

IBM-1E:compute§óHBI>

ö{uz|Å(w[É�u|, ÇÑzK)

=

1

2

ö{uz|Å ÇÑKzK, ÇÑzK

=?

ö{uz|Å Ñ[, ÇÑzK =?

ö{uz|Å w[É�u|, ℎuz�K

=

1

2

+

1

2

= 1

ö{uz|Å ÇÑKzK, ℎuz�K

=?

ö{uz|Å Ñ[, ℎuz�K

=?

ö{uz|Å w[É�u|, ÅℎK

=?

ö{uz|Å ÇÑKzK, ÅℎK

=?

ö{uz|Å Ñ[, ÅℎK =?

5. Foreach possiblewordpairK andR,sum! [ *, , fromstep4across

allalignmentsandsentencepairs

whereK isalignedwithR

maisonandblue arealignedonlyinalignment1,sentence1.

! [ = 1 ,5, *5=

1

2

maison andhousearealignedinalignment2,sentence1.

and

alignment1,sentence2

! [ = 2 ,5, *5=

1

2

! [ = 1 ,6, *6=

1

2

Thisisanew table,not theï = ! R K table

frombefore!

Page 30: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 32

IBM-1E:compute§H>=<E.g., §H>=< D<B@ =

W

¢

+W

¢

= W,

§H>=< ?HBG@ = W +W

¢

+W

¢

= ¢, …

ö{uz|Å(w[É�u|, ÇÑzK)

=

1

2

ö{uz|Å(ÇÑKzK, ÇÑzK)

=

1

2

ö{uz|Å(Ñ[, ÇÑzK)

= 0

ö{uz|Å w[É�u|, ℎuz�K

=

1

2

+

1

2

= 1

ö{uz|Å(ÇÑKzK, ℎuz�K)

=

1

2

ö{uz|Å(Ñ[, ℎuz�K)

=

1

2

ö{uz|Å(w[É�u|, ÅℎK)

=

1

2

ö{uz|Å ÇÑKzK, ÅℎK

= 0ö{uz|Å(Ñ[, ÅℎK) =

1

2

6. Sumovertherowsofthistabletoget

thetotal estimatesforeach English

word,K.

Page 31: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 33

IBM-1M:Recompute '(A|@)

! w[É�u| ÇÑzK

=

1/2

1

! ÇÑKzK ÇÑzK

=

1/2

1

! Ñ[ ÇÑzK

=

0

1

! w[É�u| ℎuz�K

=

1

2

! ÇÑKzK ℎuz�K

=

1/2

2

=

1

4

! Ñ[ ℎuz�K

=

1/2

2

=

1

4

! w[É�u| ÅℎK

=

1/2

1

! ÇÑKzK ÅℎK

=

0

1

! Ñ[ ÅℎK

=

1/2

1

Notethe‘correct’and

‘incorrect’changes

inprobability

ï5:

7. Compute! R K =ßèêëím(ì,î)

ßêmi®(î)

Thisisyourmodelafteriteration1.

Page 32: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 34

IBM-1EM:Repeat

• Youhavefinished1iterationofEMwhenyouhave

completed Step7,

• GobacktoStep2andrepeat.

Page 33: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 35

IBM-1E:compute'()|=, ()

!(,|[, *) = ! w[É�u||ÇÑzK ×

! ÇÑKzK ℎuz�K =

1

2

ú

1

4

=

W

©

!(,|[, *) = ! ÇÑKzK|ÇÑzK ×

! w[É�u| ℎuz�K =

1

2

ú

1

2

=

W

a

!(,|[, *) = ! Ñ[|ÅℎK ×

! w[É�u| ℎuz�K =

1

2

ú

1

2

=

W

a

!(,|[, *) = ! w[É�u||ÅℎK ×

! Ñ[ ℎuz�K =

1

2

ú

1

4

=

W

©

‘Sentence’1

‘Sentence’2

2:makegrid

3:computeproductsof!(R|K)

Page 34: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 36

IBM-1E:compute'(=|(, ))

! [ *, , =

1/4

1 8⁄ + 1/4

=

¢

`

! [ *, , =

1 8⁄

1 8⁄ + 1/4

=

W

`

! [ *, , =

1 8⁄

1/4 + 1/8

=

W

`

! [ *, , =

1 4⁄

1 4⁄ + 1/8

=

¢

`

‘Sentence’1

‘Sentence’2

4:dividebysumofrowsinstep3

Page 35: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 37

IBM-1E:compute§óHBI>&§H>=<öuÅ[Ñ ÇÑzK =

W

`

`

= 1, öuÅ[Ñ ℎuz�K =8

7

+5

7

+5

7

= 2,

öuÅ[Ñ ÅℎK =5

7

+6

7

= 1

ö{uz|Å(w[É�u|, ÇÑzK)

=

W

`

ö{uz|Å(ÇÑKzK, ÇÑzK)

=

¢

`

ö{uz|Å(Ñ[, ÇÑzK)

= 0

ö{uz|Å w[É�u|, ℎuz�K

=

2

3

+

2

3

=

4

3

ö{uz|Å(ÇÑKzK, ℎuz�K)

=

1

3

ö{uz|Å(Ñ[, ℎuz�K)

=

1

3

ö{uz|Å(w[É�u|, ÅℎK)

=

1

3

ö{uz|Å ÇÑKzK, ÅℎK

= 0ö{uz|Å(Ñ[, ÅℎK) =

2

3

5.ComputeTCount bysumming

relevantprobabilitiesfromstep4

6.ComputeTotal bysummingrows

Page 36: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 38

IBM-1M:Recompute '(A|@)

• Compute! R K =ßèêëím(ì,î)

ßêmi®(î)

' E=FGHI D<B@

=

W/`

W

' D<@B@ D<B@

=

¢/`

W

! Ñ[ ÇÑzK

=

0

1

! w[É�u| ℎuz�K

=

4/3

2

=

2

3

! ÇÑKzK ℎuz�K

=

1/3

2

=

1

6

! Ñ[ ℎuz�K

=

1/3

2

=

1

6

! w[É�u| ÅℎK

=

1/3

1

! ÇÑKzK ÅℎK

=

0

1

! Ñ[ ÅℎK

=

2/3

1

Tieshavebeenbroken

e.g.,

!(w[É�u||ÇÑzK)

≠ !(ÇÑKzK|ÇÑzK)ï6:

Page 37: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 39

Assignment2

• Build n-gramlanguagemodels,withadd-¨ smoothing.

• Learn word-levelalignmentswiththeIBM-1model

usingdatafromtheCanadianHansard.

• Combine thelanguageandalignmentmodelsintoa

simpleFrench-to-Englishtranslator.

• Therearesomebonusmarksavailableforsubstantially

goingbeyondtheminimalrequirements.

Page 38: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 40

Assignment2– languages

• Sentenceshavealreadybeensplit andaligned foryou.

• Wordshavenot beenaligned.

• Youdon’t needtoknowFrenchforthisassignment.

• Frenchismore‘rigid’thanEnglish,soitsuseof

contractions,e.g.,aremoreregular.• Youhavetodosomepre-processingofFrench

sentences,butthoserulesaregiventoyouexplicitly.

Page 39: Statistical machine translation - cs.toronto.edufrank/csc401/lectures2017/6-1_SMT_2.pdf · canadien a le droitde vote et est éligible aux ... •,=Le programme du Canada à étémisen

CSC401/2511– Spring2017 41

Assignment2– practical

• Posted13February.Due10March.

• WillbeprogrammedinMatlab.

• Varioussupportfunctionsforthisassignmentwillbe

availableonCDF.

• Markswillbegivenmoreforunderstanding thealgorithmsandconceptsthanforspecificresults.