Phrase-Based MT: Decoding - Carnegie Mellon...

61
Phrase-Based MT: Decoding February 7, 2013 Tuesday, February 19, 13

Transcript of Phrase-Based MT: Decoding - Carnegie Mellon...

Page 1: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Phrase-Based MT: Decoding

February 7, 2013

Tuesday, February 19, 13

Page 2: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Phrase Based MT

• Recipe

• Segmentation / Alignment model

• Phrase model

• Language Model

e⇤ = argmax

ep(e | f)

= argmax

ep(f | e)⇥ p(e)

⇡ argmax

ep(f,a | e)⇥ p(e)

Tuesday, February 19, 13

Page 3: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Phrase Tables

das Thema

the issue 0.41

das Themathe point 0.72

das Themathe subject 0.47

das Thema

the thema 0.99

es gibtthere is 0.96

es gibtthere are 0.72

morgen tomorrow 0.9

fliege ichwill I fly 0.63

fliege ich will fly 0.17fliege ich

I will fly 0.13

f e p(f | e)

Tuesday, February 19, 13

Page 4: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Reordering Model

Tuesday, February 19, 13

Page 5: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Translation Process

• Task: translate this sentence from German into English

er geht ja nicht nach hause

Chapter 6: Decoding 2

Tuesday, February 19, 13

Page 6: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Translation Process

• Task: translate this sentence from German into English

er geht ja nicht nach hause

er

he

• Pick phrase in input, translate

Chapter 6: Decoding 3

Tuesday, February 19, 13

Page 7: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Translation Process

• Task: translate this sentence from German into English

er geht ja nicht nach hause

er ja nicht

he does not

• Pick phrase in input, translate

– it is allowed to pick words out of sequence reordering– phrases may have multiple words: many-to-many translation

Chapter 6: Decoding 4

Tuesday, February 19, 13

Page 8: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Translation Process

• Task: translate this sentence from German into English

er geht ja nicht nach hause

er geht ja nicht

he does not go

• Pick phrase in input, translate

Chapter 6: Decoding 5

Tuesday, February 19, 13

Page 9: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Translation Process

• Task: translate this sentence from German into English

er geht ja nicht nach hause

er geht ja nicht nach hause

he does not go home

• Pick phrase in input, translate

Chapter 6: Decoding 6

Tuesday, February 19, 13

Page 10: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Computing Translation Probability

• Probabilistic model for phrase-based translation:

ebest = argmaxe

IY

i=1

�(

¯fi|ei) d(starti � endi�1 � 1) plm(e)

• Score is computed incrementally for each partial hypothesis

• Components

Phrase translation Picking phrase ¯fi to be translated as a phrase ei

! look up score �(

¯fi|ei) from phrase translation tableReordering Previous phrase ended in endi�1, current phrase starts at starti! compute d(starti � endi�1 � 1)

Language model For n-gram model, need to keep track of last n� 1 words! compute score plm(wi|wi�(n�1), ..., wi�1) for added words wi

Chapter 6: Decoding 7

Tuesday, February 19, 13

Page 11: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Translation Options

he

er geht ja nicht nach hause

it, it

, he

isare

goesgo

yesis

, of course

notdo not

does notis not

afterto

according toin

househome

chamberat home

notis not

does notdo not

homeunder housereturn home

do not

it ishe will be

it goeshe goes

isare

is after alldoes

tofollowingnot after

not to

,

notis not

are notis not a

• Many translation options to choose from

– in Europarl phrase table: 2727 matching phrase pairs for this sentence– by pruning to the top 20 per phrase, 202 translation options remain

Chapter 6: Decoding 8

Tuesday, February 19, 13

Page 12: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Translation Options

he

er geht ja nicht nach hause

it, it

, he

isare

goesgo

yesis

, of course

notdo not

does notis not

afterto

according toin

househome

chamberat home

notis not

does notdo not

homeunder housereturn home

do not

it ishe will be

it goeshe goes

isare

is after alldoes

tofollowingnot after

not tonot

is notare notis not a

• The machine translation decoder does not know the right answer– picking the right translation options– arranging them in the right order

! Search problem solved by heuristic beam search

Chapter 6: Decoding 9

Tuesday, February 19, 13

Page 13: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Decoding: Precompute Translation Options

er geht ja nicht nach hause

consult phrase translation table for all input phrases

Chapter 6: Decoding 10

Tuesday, February 19, 13

Page 14: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Decoding: Start with Initial Hypothesis

er geht ja nicht nach hause

initial hypothesis: no input words covered, no output produced

Chapter 6: Decoding 11

Tuesday, February 19, 13

Page 15: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Decoding: Hypothesis Expansion

er geht ja nicht nach hause

are

pick any translation option, create new hypothesis

Chapter 6: Decoding 12

Tuesday, February 19, 13

Page 16: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Decoding: Hypothesis Expansion

er geht ja nicht nach hause

are

it

he

create hypotheses for all other translation options

Chapter 6: Decoding 13

Tuesday, February 19, 13

Page 17: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Decoding: Hypothesis Expansion

er geht ja nicht nach hause

are

it

hegoes

does not

yes

go

to

home

home

also create hypotheses from created partial hypothesis

Chapter 6: Decoding 14

Tuesday, February 19, 13

Page 18: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Decoding: Find Best Path

er geht ja nicht nach hause

are

it

hegoes

does not

yes

go

to

home

home

backtrack from highest scoring complete hypothesis

Chapter 6: Decoding 15

Tuesday, February 19, 13

Page 19: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Complexity• This is an NP-complete problem

• Reduction to TSP (sketch)

• Each source word is a city

• A bigram LM encodes the distance between pairs of cities

• Knight (1999) has careful proof

• How do we solve such problems?

• Dynamic programming [risk free]

• The state is the current city C & the set of previous visited cities

• Doesn’t matter the order the previous list was visited in as long as we keep the best path to C through

• How many states are there?

• Approximate search [risky]

Tuesday, February 19, 13

Page 20: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Recombination

• Two hypothesis paths lead to two matching hypotheses

– same number of foreign words translated– same English words in the output– di↵erent scores

it is

it is

• Worse hypothesis is dropped

it is

Chapter 6: Decoding 17

Tuesday, February 19, 13

Page 21: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Recombination

• Two hypothesis paths lead to hypotheses indistinguishable in subsequent search

– same number of foreign words translated– same last two English words in output (assuming trigram language model)– same last foreign word translated– di↵erent scores

it

he

does not

does not

• Worse hypothesis is dropped

it

he does not

Chapter 6: Decoding 18

Tuesday, February 19, 13

Page 22: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Restrictions on Recombination

• Translation model: Phrase translation independent from each other

! no restriction to hypothesis recombination

• Language model: Last n�1 words used as history in n-gram language model

! recombined hypotheses must match in their last n� 1 words

• Reordering model: Distance-based reordering model based on distance toend position of previous input phrase

! recombined hypotheses must have that same end position

• Other feature function may introduce additional restrictions

Chapter 6: Decoding 19

Tuesday, February 19, 13

Page 23: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Pruning

• Recombination reduces search space, but not enough

(we still have a NP complete problem on our hands)

• Pruning: remove bad hypotheses early

– put comparable hypothesis into stacks(hypotheses that have translated same number of input words)

– limit number of hypotheses in each stack

Chapter 6: Decoding 20

Tuesday, February 19, 13

Page 24: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Stacks

are

it

he

goes does not

yes

no wordtranslated

one wordtranslated

two wordstranslated

three wordstranslated

• Hypothesis expansion in a stack decoder

– translation option is applied to hypothesis– new hypothesis is dropped into a stack further down

Chapter 6: Decoding 21

Tuesday, February 19, 13

Page 25: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Stack Decoding Algorithm

1: place empty hypothesis into stack 02: for all stacks 0...n� 1 do3: for all hypotheses in stack do4: for all translation options do5: if applicable then6: create new hypothesis7: place in stack8: recombine with existing hypothesis if possible9: prune stack if too big

10: end if11: end for12: end for13: end for

Chapter 6: Decoding 22

Tuesday, February 19, 13

Page 26: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

: <s> : --------- : 1.0

ecp

Q[0] Q[1] Q[2] ...Maria no dio una bofetada a la bruja verdef:

Tuesday, February 19, 13

Page 27: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

: <s> : --------- : 1.0

ecp

: <s> Mary : *-------- : 0.9

ecp

Mary

Maria no dio una bofetada a la bruja verdef:Q[0] Q[1] Q[2] ...

Tuesday, February 19, 13

Page 28: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

: <s> : --------- : 1.0

ecp

: <s> Mary : *-------- : 0.9

ecp

: <s> Maria : *-------- : 0.3

ecp

Mary

Maria

Maria no dio una bofetada a la bruja verdef:Q[0] Q[1] Q[2] ...

Tuesday, February 19, 13

Page 29: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

: <s> : --------- : 1.0

ecp

: <s> Mary : *-------- : 0.9

ecp

: <s> Maria : *-------- : 0.3

ecp

Mary

Maria

: did not : **------- : 0.3

ecp

Mary did not

Maria no dio una bofetada a la bruja verdef:Q[0] Q[1] Q[2] ...

Tuesday, February 19, 13

Page 30: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

: <s> : --------- : 1.0

ecp

: <s> Mary : *-------- : 0.9

ecp

: <s> Maria : *-------- : 0.3

ecp

Mary

Maria

: did not : **------- : 0.3

ecp

Mary did not

did not

Maria no dio una bofetada a la bruja verdef:Q[0] Q[1] Q[2] ...

Tuesday, February 19, 13

Page 31: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

: <s> : --------- : 1.0

ecp

: <s> Mary : *-------- : 0.9

ecp

: <s> Maria : *-------- : 0.3

ecp

Mary

Maria

: did not : **------- : 0.45

ecp

Mary did not

did not

Maria no dio una bofetada a la bruja verdef:

did not

Q[0] Q[1] Q[2] ...

Tuesday, February 19, 13

Page 32: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

: <s> : --------- : 1.0

ecp

: <s> Mary : *-------- : 0.9

ecp

: <s> Maria : *-------- : 0.3

ecp

Mary

Maria

: did not : **------- : 0.45

ecp

Mary did not

did not

did not

: Mary not : **------- : 0.1

ecp

not

: not slap : *****---- : 0.316

ecpslap

Maria no dio una bofetada a la bruja verdef:Q[0] Q[1] Q[2] ...

Tuesday, February 19, 13

Page 33: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Pruning

• Pruning strategies

– histogram pruning: keep at most k hypotheses in each stack– stack pruning: keep hypothesis with score ↵ ⇥ best score (↵ < 1)

• Computational time complexity of decoding with histogram pruning

O(max stack size⇥ translation options⇥ sentence length)

• Number of translation options is linear with sentence length, hence:

O(max stack size⇥ sentence length2)

• Quadratic complexity

Chapter 6: Decoding 23

Tuesday, February 19, 13

Page 34: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Reordering Limits

• Limiting reordering to maximum reordering distance

• Typical reordering distance 5–8 words

– depending on language pair– larger reordering limit hurts translation quality

• Reduces complexity to linear

O(max stack size⇥ sentence length)

• Speed / quality trade-o↵ by setting maximum stack size

Chapter 6: Decoding 24

Tuesday, February 19, 13

Page 35: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Translating the Easy Part First?

the tourism initiative addresses this for the first time

the

dietm:-0.19,lm:-0.4,

d:0, all:-0.65

tourism

touristischetm:-1.16,lm:-2.93

d:0, all:-4.09

the first time

das erste maltm:-0.56,lm:-2.81

d:-0.74. all:-4.11

initiative

initiativetm:-1.21,lm:-4.67

d:0, all:-5.88

both hypotheses translate 3 wordsworse hypothesis has better score

Chapter 6: Decoding 25

Tuesday, February 19, 13

Page 36: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Estimating Future Cost

• Future cost estimate: how expensive is translation of rest of sentence?

• Optimistic: choose cheapest translation options

• Cost for each translation option

– translation model: cost known

– language model: output words known, but not context! estimate without context

– reordering model: unknown, ignored for future cost estimation

Chapter 6: Decoding 26

Tuesday, February 19, 13

Page 37: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Cost Estimates from Translation Options

the tourism initiative addresses this for the first time

-1.0 -2.0 -1.5 -2.4 -1.0 -1.0 -1.9 -1.6-1.4

-4.0 -2.5

-1.3

-2.2

-2.4

-2.7

-2.3

-2.3

-2.3

cost of cheapest translation options for each input span (log-probabilities)

Chapter 6: Decoding 27

Tuesday, February 19, 13

Page 38: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Cost Estimates for all Spans• Compute cost estimate for all contiguous spans by combining cheapest options

first future cost estimate for n words (from first)word 1 2 3 4 5 6 7 8 9the -1.0 -3.0 -4.5 -6.9 -8.3 -9.3 -9.6 -10.6 -10.6

tourism -2.0 -3.5 -5.9 -7.3 -8.3 -8.6 -9.6 -9.6initiative -1.5 -3.9 -5.3 -6.3 -6.6 -7.6 -7.6addresses -2.4 -3.8 -4.8 -5.1 -6.1 -6.1

this -1.4 -2.4 -2.7 -3.7 -3.7for -1.0 -1.3 -2.3 -2.3the -1.0 -2.2 -2.3first -1.9 -2.4time -1.6

• Function words cheaper (the: -1.0) than content words (tourism -2.0)• Common phrases cheaper (for the first time: -2.3)

than unusual ones (tourism initiative addresses: -5.9)

Chapter 6: Decoding 28

Tuesday, February 19, 13

Page 39: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Combining Score and Future Cost

the first time

das erste maltm:-0.56,lm:-2.81

d:-0.74. all:-4.11

the tourism initiative

die touristische

initiativetm:-1.21,lm:-4.67

d:0, all:-5.88

-6.1 -9.3

this for ... time

für diese zeittm:-0.82,lm:-2.98

d:-1.06. all:-4.86

-6.9 -2.2

-5.88

-11.98

-6.1 +

= -4.11

-13.41

-9.3 +

= -4.86

-13.96

-9.1 +

=

• Hypothesis score and future cost estimate are combined for pruning

– left hypothesis starts with hard part: the tourism initiative

score: -5.88, future cost: -6.1 ! total cost -11.98

– middle hypothesis starts with easiest part: the first time

score: -4.11, future cost: -9.3 ! total cost -13.41

– right hypothesis picks easy parts: this for ... time

score: -4.86, future cost: -9.1 ! total cost -13.96

Chapter 6: Decoding 29

Tuesday, February 19, 13

Page 40: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

: <s> : --------- : 1.0 1.5e-9

: <s> Mary : *-------- : 0.9 8.6e-9

: <s> Maria : *-------- : 0.3 8.6e-9

Mary

Maria

Maria no dio una bofetada a la bruja verdef:

ecp fc:

ecp fc:

ecp fc:

: <s> Not : -*------- : 0.4 1.0e-9

ecp fc:

Not

Future costs make thesehypotheses comparable.}

Q[0] Q[1] Q[2] ...

Tuesday, February 19, 13

Page 41: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Other Decoding Algorithms

• A* search

• Greedy hill-climbing

• Using finite state transducers (standard toolkits)

Chapter 6: Decoding 30

Tuesday, February 19, 13

Page 42: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

A* Search

pro

ba

bili

ty +

he

uristic e

stim

ate

number of words covered

① depth-first

expansion to completed path

② recombination

③ alternative path leading to hypothesis beyond threshold

cheapest score

• Uses admissible future cost heuristic: never overestimates cost

• Translation agenda: create hypothesis with lowest score + heuristic cost

• Done, when complete hypothesis created

Chapter 6: Decoding 31

Tuesday, February 19, 13

Page 43: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Greedy Hill-Climbing

• Create one complete hypothesis with depth-first search (or other means)

• Search for better hypotheses by applying change operators

– change the translation of a word or phrase– combine the translation of two words into a phrase– split up the translation of a phrase into two smaller phrase translations– move parts of the output into a di↵erent position– swap parts of the output with the output at a di↵erent part of the sentence

• Terminates if no operator application produces a better translation

Chapter 6: Decoding 32

Tuesday, February 19, 13

Page 44: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Marginal Decodinge⇤ = argmax

ep(e | f)

= argmax

ep(f | e)⇥ p(e)

⇡ argmax

ep(f,a | e)⇥ p(e)

Does this last approximation matter?

- Variational & MCMC explored- marginal benefits, depending on training- Really hard problem (Sima’an, 1997)

Tuesday, February 19, 13

Page 45: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Maria no dio una bofetada a la bruja verde

Mary not give

did not

no

a slap to

by

the witch green

did not give

slap

a slap

to the

the

green witch

the witch

Adapted from Koehn (2006)

hag bawdy

Tuesday, February 19, 13

Page 46: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Maria no dio una bofetada a la bruja verde

Mary not give

did not

no

a slap to

by

the witch green

did not give

slap

a slap

to the

the

the witch

Adapted from Koehn (2006)

green witch

hag bawdy

Tuesday, February 19, 13

Page 47: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Maria no dio una bofetada a la bruja verde

Mary not give

did not

no

a slap to

by

the witch green

did not give

slap

a slap

to the

the

the witch

Adapted from Koehn (2006)

green witch

hag bawdy

Tuesday, February 19, 13

Page 48: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Decoding algorithm• Translation as a search problem

• Partial hypothesis keeps track of

• which source words have been translated (coverage vector)

• n-1 most recent words of English (for LM!)

• a back pointer list to the previous hypothesis + (e,f) phrase pair used

• the (partial) translation probability

• the estimated probability of translating the remaining words (precomputed, a function of the coverage vector)

• Start state: no translated words, E=<s>, bp=nil

• Goal state: all translated words

Tuesday, February 19, 13

Page 49: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Decoding algorithm• Q[0] ← Start state

• for i = 0 to |f|-1

• Keep b best hypotheses at Q[i]

• for each hypothesis h in Q[i]

• for each untranslated span in h.c for which there is a translation <e,f> in the phrase table

• h’ = h extend by <e,f>

• Is there an item in Q[|h’.c|] with = LM state?

• yes: update the item bp list and probability

• no: Q[|h’.c|] ← h’

• Find the best hypothesis in Q[|f|], reconstruction translation by following back pointers

Tuesday, February 19, 13

Page 50: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

: <s> : --------- : 1.0

ecp

Q[0] Q[1] Q[2] ...Maria no dio una bofetada a la bruja verdef:

Tuesday, February 19, 13

Page 51: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

: <s> : --------- : 1.0

ecp

: <s> Mary : *-------- : 0.9

ecp

Mary

Q[0] Q[1] Q[2] ...Maria no dio una bofetada a la bruja verdef:

Tuesday, February 19, 13

Page 52: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

: <s> : --------- : 1.0

ecp

: <s> Mary : *-------- : 0.9

ecp

: <s> Maria : *-------- : 0.3

ecp

Mary

Maria

Q[0] Q[1] Q[2] ...Maria no dio una bofetada a la bruja verdef:

Tuesday, February 19, 13

Page 53: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

: <s> : --------- : 1.0

ecp

: <s> Mary : *-------- : 0.9

ecp

: <s> Maria : *-------- : 0.3

ecp

Mary

Maria

: did not : **------- : 0.3

ecp

Mary did not

Q[0] Q[1] Q[2] ...Maria no dio una bofetada a la bruja verdef:

Tuesday, February 19, 13

Page 54: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

: <s> : --------- : 1.0

ecp

: <s> Mary : *-------- : 0.9

ecp

: <s> Maria : *-------- : 0.3

ecp

Mary

Maria

: did not : **------- : 0.3

ecp

Mary did not

did not

Q[0] Q[1] Q[2] ...Maria no dio una bofetada a la bruja verdef:

Tuesday, February 19, 13

Page 55: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

: <s> : --------- : 1.0

ecp

: <s> Mary : *-------- : 0.9

ecp

: <s> Maria : *-------- : 0.3

ecp

Mary

Maria

: did not : **------- : 0.45

ecp

Mary did not

did not

Q[0] Q[1] Q[2] ...Maria no dio una bofetada a la bruja verdef:

did not

Tuesday, February 19, 13

Page 56: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

: <s> : --------- : 1.0

ecp

: <s> Mary : *-------- : 0.9

ecp

: <s> Maria : *-------- : 0.3

ecp

Mary

Maria

: did not : **------- : 0.45

ecp

Mary did not

did not

did not

: Mary not : **------- : 0.1

ecp

not

: not slap : *****---- : 0.316

ecpslap

Q[0] Q[1] Q[2] ...Maria no dio una bofetada a la bruja verdef:

Tuesday, February 19, 13

Page 57: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Reordering• Language express words in different orders

• bruja verde vs. green witch

• Phrase pairs can “memorize” some of these

• More general: in decoding, “skip ahead”

• Problem:

• Won’t “easy parts” of the sentence be translated first?

• Solution:

• Future cost estimate

• For every coverage vector, estimate what it will cost to translate the remaining untranslated words

• When pruning, use p * future cost!

Tuesday, February 19, 13

Page 58: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

: <s> : --------- : 1.0 1.5e-9

: <s> Mary : *-------- : 0.9 8.6e-9

: <s> Maria : *-------- : 0.3 8.6e-9

Mary

Maria

Q[0] Q[1] Q[2] ...Maria no dio una bofetada a la bruja verdef:

ecp fc:

ecp fc:

ecp fc:

Tuesday, February 19, 13

Page 59: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

: <s> : --------- : 1.0 1.5e-9

: <s> Mary : *-------- : 0.9 8.6e-9

: <s> Maria : *-------- : 0.3 8.6e-9

Mary

Maria

Q[0] Q[1] Q[2] ...Maria no dio una bofetada a la bruja verdef:

ecp fc:

ecp fc:

ecp fc:

: <s> Not : -*------- : 0.4 1.0e-9

ecp fc:

Not

Tuesday, February 19, 13

Page 60: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

: <s> : --------- : 1.0 1.5e-9

: <s> Mary : *-------- : 0.9 8.6e-9

: <s> Maria : *-------- : 0.3 8.6e-9

Mary

Maria

Q[0] Q[1] Q[2] ...Maria no dio una bofetada a la bruja verdef:

ecp fc:

ecp fc:

ecp fc:

: <s> Not : -*------- : 0.4 1.0e-9

ecp fc:

Not

Future costs make thesehypotheses comparable.}

Tuesday, February 19, 13

Page 61: Phrase-Based MT: Decoding - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/08.pbmt2.pdf · Phrase Tables das Thema the issue 0.41 the point 0.72 the subject 0.47

Decoding summary

• Finding the best hypothesis is NP-hard

• Even with no language model, there are an exponential number of states!

• Solution 1: limit reordering

• Solution 2: (lossy) pruning

Tuesday, February 19, 13