IBM Translation Models - Cornell University€¦ · Parameter Estimation with the EM Algorithm •...

IBM Translation Models

Instructor: Yoav Artzi

CS5740: Natural Language ProcessingSpring 2016

Slides adapted from Michael Collins

The Noisy Channel Model• Goal: translate from French to English• Have a model 𝑝(𝑒|𝑓) to estimate the probability of an

English sentence 𝑒 given a French sentence 𝑓• Estimate the parameters from training corpus• A noisy channel model has two components:

𝑝(𝑒) the language model𝑝(𝑓|𝑒) the translation model

• Giving:

andp(e|f) = p(e, f)

p(e)p(f |e)Pe p(e)p(f |e)

argmax

ep(e|f) = argmax

ep(e)p(f |e)

Overview• IBM Model 1• IBM Model 2• EM Training of Models 1 and 2

IBM Model 1: Alignments• How do we model 𝑝(𝑓|𝑒)?• English sentence 𝑒 has 𝑙 words 𝑒1… 𝑒*

French sentence 𝑓 has 𝑚 words 𝑓1…𝑓,

• An alignment a identifies which English word each French word originated from

• Formally, an alignent a is:where

• There are (𝑙 + 1)𝑚 possible alignments{a1, . . . , am} aj 2 0 . . . l

IBM Model 1: Alignments𝑙 = 6, 𝑚 = 7𝑒 = And the program has been implemented

𝑓 = Le programme a ete mis en application

• One alignment is{2, 3, 4, 5, 6, 6, 6}

• Another (bad!) alignment is

{1, 1, 1, 1, 1, 1, 1}

• Another (bad!) alignment is

{1, 1, 1, 1, 1, 1, 1}

Alignments in the IBM Models• We define two models:

• Giving:

• Also:

where 𝐴 is a set of all possible alignments

p(a|e,m) p(f |a, e,m)

p(f, a|e,m) = p(a|e,m)p(f |a, e,m)

p(f |e,m) =X

a2Ap(a|e,m)p(f |a, e,m)

Most Likely Alignments

• We can also calculate:

for any alignment a• For a given f,e pair, can also compute the most likely

alignment (details in notes)• The original IBM models are rarely used for translation,

but still key for recovering alignments

p(f, a|e,m) = p(a|e,m)p(f |a, e,m)

p(a|f, e,m) =p(f, a|e,m)P

a2A p(f, a|e,m)

Example Alignment• French:

le conseil a rendu son avis , et nous devons à présent adopter un nouvel avis sur la base de la première position .

• English:the council has stated its position , and now , on the basis of the first position , we again have to give our opinion .

• Alignment:the/le council/conseil has/à stated/rendu its/son position/avis ,/,and/et now/présent ,/NULL on/sur the/le basis/base of/de the/lafirst/première position/position ,/NULL we/nous again/NULLhave/devons to/a give/adopter our/nouvel opinion/avis ./.

IBM Model 1: Alignments• In IBM Model 1 all alignments a are

equally likely:

• Reasonable assumption?– Simplifying assumption, but it gets things

started …

p(a|e,m) =1

(1 + l)m

IBM Model 1: Translation Probabilities

• Next step: come up with an estimate for

• In Model 1, this is:

p(f |a, e,m)

p(f |a, e,m) =mY

t(fj |eaj )

IBM Model 1: Example𝑙 = 6, 𝑚 = 7𝑒 = And the program has been implemented

a = {2, 3, 4, 5, 6, 6, 6}

IBM Model 1: Example

p(f|e) And the program has been implementedLe 0.2 0.6 0.1 0.025 0.05 0.025programme 0.05 0.2 0.45 0.1 0.1 0.1a 0.1 0.1 0.15 0.2 0.15 0.3ete 0.05 0.05 0.05 0.05 0.7 0.1mis 0.2 0.05 0.05 0.05 0.25 0.4en 0.25 0.1 0.25 0.25 0.1 0.05application 0.01 0.03 0.01 0.02 0.03 0.9

IBM Model 1: Example𝑙 = 6, 𝑚 = 7𝑒 = And the program has been implemented

a = {2, 3, 4, 5, 6, 6, 6}

p(f |a, e) =t(Le|the)⇥ t(programme|program)

⇥ t(a|has)⇥ t(ete|been)⇥ t(mis|implemented)⇥ t(en|implemented)

⇥ t(application|implemented) = 0.0006804

p(f, a | e, 7) = 8.26186E � 10

IBM Model 1: The Generative Process

To generate a French string 𝑓 from an English string 𝑒:• Step 1: Pick an alignment 𝑎 with probability• Step 2: Pick the French words with probability

The final result:

(l + 1)m

p(f |a, e,m) =mY

t(fj |eaj )

p(f, a|e,m) = p(a|e,m)⇥ p(f |a, e,m) =1

(1 + l)m

t(fj |eaj )

Example Lexical Entry

… de la situation au niveau des négociations de l’ompi …... of the current position in the wipo negotiations ...

nous ne sommes pas en mesure de décider, …we are not in position to decide …

... Le point de vue de la commission face à ce problème complexe .… the commission ‘s position on this complex problem .

IBM Model 2• Only difference: we now introduce alignment

distortion parameters

• Probability that j’th French word is connected to i’th English word, given sentence length of e and fare l and m

• Define

where• Gives

q(i|j, l,m)

p(a|e,m) =mY

q(aj |j, l,m)

a = {a1, . . . , am}

p(f, a|e,m) =mY

q(aj |j, l,m)t(fj |eaj )

Example

IBM Model 2: The Generative Process

To generate a French string 𝑓 from an English string 𝑒:• Step 1: Pick an alignment

with probability

• Step 2: Pick the French words with probability

The final result:

p(f |a, e,m) =mY

t(fj |eaj )

p(a|e,m) =mY

q(aj |j, l,m)

a = {a1, . . . , am}

p(f, a|e,m) = p(a|e,m)⇥ p(f |a, e,m) =mY

q(aj |j, l,m)t(fj |eaj )

Recovering Alignments• If we have parameters q and t, we can easily recover the

most likely alignment for any sentence pairGiven a sentence pair

define

e = And the program has been implemented

f = Le programme a ete mis en application

e1, e2, . . . , el, f1, f2, . . . , fm

aj = arg max

a2{0...l}q(a|j, l,m)⇥ t(fj , ea)

j = 1 . . .m

The Parameter Estimation Problem• Input:

Each e(k) is an English sentence, each f(k) is a French sentence

• Output: parameter for

• A key challenge: we do not have alignments in our training examples

e(100) = And the program has been implemented

f(100) = Le programme a ete mis en application

(e(k), f (k)), k = 1 . . . n

t(f |e) q(i|j, l,m)

Parameter Estimation if Alignments are Observed

• Assume alignments are observed in training datae(100) = And the program has been implemented

f(100) = Le programme a ete mis en applicationa(100) = <2,3,4,5,6,6,6>

• Training data is

Each e(k) is an English sentence, each f(k) is a French sentence, each a(k) is an alignment

• Maximum-likelihood parameter estimates are trivial:

(e(k), f (k), a(k)), k = 1 . . . n

tML(f |e) =count(e, f)

count(e)qML(j|i, l,m) =

count(j, i, l,m)

count(i, l,m)

Pseudo Code

Parameter Estimation with the EM Algorithm

• Input:

• The algorithm is related to algorithm with observed alignments, but with two key differences:– Iterative: start with initial (e.g., random) choice of q and t

parameters, at each iteration: compute some “counts” base on data and parameters, and re-estimate parameters

– The definition of of the delta function is different:

(e(k), f (k)), k = 1 . . . n

Pseudo Code

• Input:

• The log-likelihood function:

• The maximum-likelihood estimates are:

• The EM algorithm will converge to a local maximum of the log-likelihood function

Justification for the Algorithm(e(k), f (k)), k = 1 . . . n

Summary• Key ideas in the IBM translation models:– Alignment variables– Translation parameters, e.g., t(chien|dog)– Distortion parameters, e.g., q(2|1,6,7)

• The EM algorithm: an iterative algorithm for training the q and t parameters

• Once parameters are trained, can recover the most likely alignment on our training examples

e(100) = And the program has been implemented

f(100) = Le programme a ete mis en application

IBM Translation Models - Cornell University€¦ · Parameter Estimation with the EM Algorithm •...

Documents

Transcript of IBM Translation Models - Cornell University€¦ · Parameter Estimation with the EM Algorithm •...

In a group: Discuss why each tense is used in each sentence. (Handout 1.6) Comparing Tenses.

What Is a Sentence? - Macmillan/McGraw-Hill · What Is a Sentence? Read each group of words. Draw a lineunder each sentence. 1.Dad is in the van. In the van. 2. ... Grade 1, Unit

d6vsczyu1rky0.cloudfront.netd6vsczyu1rky0.cloudfront.net/.../Year-5-14-05-20.docx · Web viewNow write a sentence for each word. Can you spell it correctly in each sentence without

y5stmicb32.files.wordpress.com · 09.05.2018 · Active and Passive Verbs For each sentence, underline the passive verb and the person who did the action in each one. Write the sentence

Name Date Improving Your Sentences (Sentence Combining ...€¦ · e-Date Improving Your Sentences (Sentence Combining) • Practice 2 ~ Sentence Combining. Combine each ofthe following

North News - goulburnn-p.schools.nsw.gov.au · program. They work on building a complete sentence each day that is interesting and has They work on building a complete sentence each

MMH2865 G2 Ret T001-032 AK - MHSchoolMcGraw-Hill School Division Mechanics and Usage: Sentence Punctuation Write each sentence. Be sure to begin and end each sentence correctly. 1.

25 Popular Business Books Summarized In One Sentence Each

SENTENCE PARTS AND PATTERNS€¦ · Sentence combining: Sentence structures Combine each set of simple sentences below to produce the kind of sentence speciﬁed in parentheses. You

The Scientific Method - Norwell High School · Can You Spot the Scientific Method Worksheet Each sentence below describes a step of the scientific method. Match each sentence with

Complete each sentence with the correct number of things ......Measured Progress, Inc.| measuredprogress.org | P0591-3b ELEM STEM FT 01302017 FW 3-2-1 Feedback Complete each sentence

State whether each sentence is true or false. false ...

Lesson Launchers - · PDF file85 Lesson Launchers: ... Write a sentence for each tense of the verbs below. One sentence in each group has been ... 2. Present:

Correcting Sentence Fragments and Run-ons. Directions: State whether each sentence contains a complete sentence (S), a fragment (F), a run –on (RO), or.

Warm Up Write a sentence the demonstrates your understanding of each of the words in context. You will write 1 sentence for each term: colony tyranny revolution.

PEXACC: A Parallel Sentence Mining Algorithm from ... · 2. score each possible pair of sentences/phrases as to their parallelism degree by using equation 1 (see the next section);

Toward a debating machine: A news sentence network analysis algorithm based on similarity and cooccurrence

Sentence structure/verbs pretest answer key · 2017. 10. 5. · Part IV: Sentence Structure Select the letter of the term which correctly describes each sentence. A. Simple Sentence

Integrated Turbulence Forecasting Algorithm (ITFA ... Turbulence Forecasting Algorithm (ITFA): ... thresholds were applied to each algorithm. The verification analyses were primarily

Find the topic sentence in each paragraph. Circle the ...writingskillsclass.weebly.com/uploads/1/3/8/2/...Find the topic sentence in each paragraph. Circle the topic and underline