Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John...

Why Generative Models Underperform Surface

Heuristics

UC BerkeleyNatural Language Processing

John DeNero, Dan Gillick, James Zhang, and Dan Klein

Overview: Learning Phrases

Sentence-aligned corpus

cat ||| chat ||| 0.9 the cat ||| le chat ||| 0.8dog ||| chien ||| 0.8 house ||| maison ||| 0.6 my house ||| ma maison ||| 0.9language ||| langue ||| 0.9 …

Phrase table(translation model)

Intersected and grown word alignments

Directional word alignments

Overview: Learning Phrases

Sentence-aligned corpus

cat ||| chat ||| 0.9 the cat ||| le chat ||| 0.8dog ||| chien ||| 0.8 house ||| maison ||| 0.6 my house ||| ma maison ||| 0.9language ||| langue ||| 0.9 …

Phrase table(translation model)

Phrase-level generative model

• Early successful phrase-based SMT system [Marcu & Wong ‘02]

• Challenging to train

• Underperforms heuristic approach

OutlineI) Generative phrase-based alignment

Motivation Model structure and training Performance results

II) Error analysis Properties of the learned phrase

table Contributions to increased error rate

III) Proposed Improvements

Motivation for Learning Phrases

Translate!

Input sentence:

Output sentence:

J ’ ai un chat .

I have a spade .

appelle un chat un chat

appelle call

chat un chat spade a spade

appelle un chat un chat

appelleappelle un appelle un chatunun chatun chat unchatchat unchat un chat

callcall acall a spadea x2

a spade x2

a spade aspade x2

spade aspade a spade

… appelle un chat un chat …

A Phrase Alignment Model Compatible with Pharaoh

les chats aiment le poisson frais .

cats like fresh fish .

Training Regimen That Respects Word Alignment

les chatsaiment

lepoisson

.frais

les chatsaiment

lepoisson

Training Regimen That Respects Word Alignment

les chatsaiment

lepoisson

.frais

Only 46% of training sentences contributed to training.

0 1 2 3 4

EM Iterations

100k25k

Performance Results

Heuristically generated parameters

Performance Results

38.538.3

Heuristic(100k)

Heuristic(50k)

Heuristic(25k)

Learned(100k)

Lost training data is not the whole story

Learned parameters with 4x training data

underperform heuristic

Model structure and training Performance results

table Contributions to increased error

Training Corpus

French: carte sur la table

English: map on the table

English: notice on the chart

Example: Maximizing Likelihood with Competing Segmentations

cartecartecarte surcarte surcarte sur lacarte sur lasurlasur lasur la tablesur la tablela tablela tabletabletable

mapnoticemap onnotice onmap on thenotice on theontheon theon the tableon the chartthe tablethe charttablechart

0.50.50.50.50.50.51.01.01.00.50.50.50.50.50.50.25 * 7 / 7

= 0.25

carte sur la tableLikelihood Computation

Training Corpus

English: map on the table

English: notice on the chart

Example: Maximizing Likelihood with Competing Segmentations

cartecarte surcarte sur lasursur lasur la tablelala tabletable

mapnotice onnotice on theonon theon the tablethethe tablechart

1.01.01.01.01.01.01.01.01.0

carte sur la table

Likelihood of “notice on the chart” pair: 1.0 * 2 / 7 = 0.28 > 0.25

Likelihood of “map on the table” pair: 1.0 * 2 / 7 = 0.28 > 0.25

EM Training Significantly Decreases Entropy of the Phrase Table

French phrase entropy:

0 10 20 30 40

.01-.5

Entropy

Percent of French Phrases

LearnedHeuristic

10% of French phrases have deterministic distributions

Effect 1: Useful Phrase Pairs Are Lost Due to Critically Small Probabilities

In 10k translated sentences, no phrases with weight less than 10-5 were used by the decoder.

0 100 200 300 400

Heuristic

Learned

Effective Table Size (1000 phrases)

Effect 2: Determinized Phrases Override Better Candidates During Decoding

the situation varies to an enormous degree

the situation varie d ' une immense degré

the situation varies to an enormous degree

the situation varie d ' une immense caractérise

Heuristic

Learned

~00.02amount

0.010.02extent

0.260.38level

0.640.49degree

degré

0.998~0degree

~00.05features

0.0010.21characterized

0.0010.49characterizes

caractérise

Effect 3: Ambiguous Foreign Phrases Become Active During Decoding

Deterministic phrases can be used by the decoder with no cost.

Translations for the French apostrophe

Model structure and training Performance results

table Contributions to increased error

Motivation for Reintroducing Entropy to the Phrase Table

1. Useful phrase pairs are lost due to critically small probabilities.

2. Determinized phrases override better candidates.

3. Ambiguous foreign phrases become active during decoding.

Reintroducing Lost Phrases

36.5 37 37.5 38 38.5 39

Learned

Heuristic

Interpolated

BLEU (25k sentences)

Interpolation yields up to 1.0 BLEU improvement

Smoothing Phrase Probabilities

Reserves probability mass for unseen translations based on

the length of the French phrase

36.5 37 37.5 38 38.5 39

Learned

Heuristic

Smoothed

BLEU (25k sentences)

Conclusion Generative phrase models determinize the phrase table via the latent segmentation variable.

A determinized phrase table introduces errors at decoding time.

Modest improvement can be realized by reintroducing phrase table entropy.

Questions?

Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John...

Documents

Transcript of Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John...

CS 188: Artificial Intelligence Spring 2009 Lecture 15: Bayes’ Nets II -- Independence 3/10/2009 John DeNero – UC Berkeley Slides adapted from Dan Klein.

Liam Gillick Persbericht NL

Audit report: Victorian Institute of Applied Learning Pty Ltd€¦ · Lead auditor: Sharyn Gillick Assistant/s: Sharyn Gillick (Response Analysis) Audit details Application number/s:

By how much did socialist economies underperform? A cross ... · By how much did socialist economies underperform? A cross-country investigation By TAMÁS VONYÓ * This paper applies

Ohio State pre-enrollment programs ... - First Year Experiencefye.osu.edu/Presentations/Breakout_Session_II/Ohio...• Which student subpopulations seem to underperform compared to

CS61A Lecture 30 MapReduce Jom Magrotker UC Berkeley EECS August 8, 2012 Slides developed from those of John DeNero, Paul Hilfinger, and Eric Tzeng.

Why Smart Employees Underperform

Ivy Tech Community College Tech Community College ... Anderson Steele Adrienne N ... Bloomington Gillick Jessica L Bloomington Clark David A

Revolver 2009.pdf · Videodokumentation: Sonia Leimer. Verausgabungssymposium - Ausstellungsansichten Angela Bulloch und Liam Gillick »An Old Song and a New Drink«, 1993 Whiskey

The Elements of Automatic Summarization · The Elements of Automatic Summarization Daniel Jacob Gillick Electrical Engineering and Computer Sciences University of California at Berkeley

Darragh Gillick Uaneen Portfolio

Statistical NLP Spring 2010 Lecture 22: Summarization Dan Klein – UC Berkeley Includes slides from Aria Haghighi, Dan Gillick.

Recomendación de Inversión: MARKET UNDERPERFORM • Precio ...

Approximate Factoring for A* Search Aria Haghighi, John DeNero, and Dan Klein Computer Science Division University of California Berkeley.

Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.

Speaker Detection Without Models Dan Gillick July 27, 2004.

James Gillick : Still Lifes 2013

GRAPHIC COMMUNICATIONS INTERNATIONAL UNION ...werc.wi.gov/grievance_awards/5881.pdfMurphy, Gillick, Wicht & Prachthauser, Attorneys at Law by Mr. George Graf, 300 North Corporate Drive,

GILLICK AND THE CONSENT OF MINORS: CONTRACEPTIVE ...

overloaded circuits, why smart people underperform