9/12/2003LTI Student Research Symposium1 An Integrated Phrase Segmentation/Alignment Algorithm for...

31
9/12/2003 LTI Student Research Symp osium 1 An Integrated Phrase Segmentation/Alignment Algorithm for Statistical Machine Translation Joy Advisor: Stephan Vogel and Alex Waibel
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    1

Transcript of 9/12/2003LTI Student Research Symposium1 An Integrated Phrase Segmentation/Alignment Algorithm for...

9/12/2003 LTI Student Research Symposium 1

An Integrated Phrase Segmentation/Alignment Algorithm

for Statistical Machine Translation

Joy

Advisor: Stephan Vogel and Alex Waibel

9/12/2003 LTI Student Research Symposium 2

Outline

• Background

• Phrase Alignment Algorithms in SMT

• Segmentation Approaches

• Integrated Segmentation and Alignment Algorithm (ISA)

• Experiments

• Discussions

9/12/2003 LTI Student Research Symposium 3

Statistical Machine Translation

• Statistical Machine Translation (Brown et al, 93)– Noisy Channel Model

• Translating from F to E

• Given a testing sentence f, generate translation e*, which is

• Pr(e): Language Model (LM)

• Pr(f|e): Translation Model (TM)

)|Pr()Pr(maxarg)|Pr(maxarg* efefeeee

9/12/2003 LTI Student Research Symposium 4

Training

– Training• Using large English corpora (e.g. Wall Street

Journal) to train an LM

• Using bilingual corpora (e.g. Canadian Hansard) to train the TM

– To get the building blocks for Pr(f|e)

» Word to word translation or phrase to phrase translations

» Reordering information

» Other features

9/12/2003 LTI Student Research Symposium 5

Alignment

• Alignment for one sentence pair (e,f):– Suppose e has l words:

and f has m words

– Then alignment a can be represented as:

Of m values, each between 0 and l.

aj=i means fj is “aligned” to ei, where e0 stands for NULL word

– In short: alignment tells us which word in e is translated into which word in f

ll eeeee ...211

mm fffff ...211

mm aaaa ...211

9/12/2003 LTI Student Research Symposium 6

Alignment Example

9/12/2003 LTI Student Research Symposium 7

Alignment Models

• Alignment algorithms:– IBM model 1 to 5 (Brown et al.)– HMM model similar to IBM2 (Vogel)– Competitive linking (Melamed)– Flow Network (Gaussier)– Others

9/12/2003 LTI Student Research Symposium 8

IBM Model 1

• IBM model 1– Easy to train

– Simple to understand

– Used very often in MT research

– One serious problem for IBM models• Word-to-word alignment assumption

9/12/2003 LTI Student Research Symposium 9

Phrase-to-phrase Alignment

• Phrase-to-phrase alignment is better– Mismatch between languages

– Phrases encapsulate the context of words– Phrases encapsulate local reordering

9/12/2003 LTI Student Research Symposium 10

Outline

• Background

• Phrase Alignment Algorithms in SMT

• Segmentation Approaches

• Integrated Segmentation and Alignment Algorithm (ISA)

• Experiments

• Discussions

9/12/2003 LTI Student Research Symposium 11

Alignment Algorithms

• Based on initial word alignment– Train word alignment

– Read off phrase-to-phrase alignments from Viterbi path

– Examples: • HMM phrase alignment (Vogel)

• Alignment templates from IBM 4 (Och)

• Bilingual bracketing (Wu, B. Zhao)

• Popular in SMT research

9/12/2003 LTI Student Research Symposium 12

Outline

• Background

• Phrase Alignment Algorithms in SMT

• Segmentation Approaches

• Integrated Segmentation and Alignment Algorithm (ISA)

• Experiments

• Discussions

9/12/2003 LTI Student Research Symposium 13

Segmentation Approaches

• Identify monolingual phrases and segment/bracket phrases into one unit (super-word) (Zhang 2000)

• Train the regular word-to-word alignment

9/12/2003 LTI Student Research Symposium 14

Problems in Segmentation Approaches

• Segmentation uses only monolingual information

• Good segmentations may make alignment even harder

9/12/2003 LTI Student Research Symposium 15

Outline

• Background

• Alignment Algorithms in SMT

• Segmentation Approaches

• Integrated Segmentation and Alignment Algorithm (ISA)

• Experiments

• Discussions

9/12/2003 LTI Student Research Symposium 16

Integrated Segmentation and Alignment

• Let’s look at an example first

9/12/2003 LTI Student Research Symposium 17

Integrated Segmentation and Alignment

• Represent a sentence pair (e,f) as a matrix D

• D(i,j) = I’(ei,fj). I’ is a modified point-wise mutual information

• A partition over D is a series of non-overlapping rectangle regions d1, d2,…,dm.

• Region dk(rs,re,cs,ce) indicates:

are aligned to

• Segmentation and alignment are achieved at the same time

es rrk eee ,...,:~

es cck fff ,...,:~

9/12/2003 LTI Student Research Symposium 18

Integrated Segmentation and Alignment

• Best partition should yield maximum• Computationally intractable to search all possible

partitions– Exponential to sentence length– DP: not a good idea.

• An optimal policy has the property that whatever the initial state and the initial decisions are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision. -- Richard Bellman's Principle of Optimality

• But here, decision of how to expand the first cell changes the search space for the rest of the cells

• Using a computationally cheap algorithm to find the “good” partitions

kd

kk feI ),(~~

9/12/2003 LTI Student Research Symposium 19

An Example

9/12/2003 LTI Student Research Symposium 20

Computational Cheap Algorithm

• Assumption:– if the translation for e1e2 is f, I’(e1 , f) should be very “similar” to

I’(e2 , f).

– Example:

• Algorithm– Step1: find the cell in D with max value of I’

– Step2: expand this cell to a rectangle region where all cells in the region has similar I’ as this cell

– Repeat Step1 and Step2 until no more regions can be found

9/12/2003 LTI Student Research Symposium 21

Example: Apply the Algorithm

9/12/2003 LTI Student Research Symposium 22

Estimate the probabilities for phrase translations

• The decoder needs the conditional probabilities P(f|e)

• Can not be estimated directly: data sparseness

• Convert I’(f,e) to P(f|e)

• IBM model 1 style:

• Context-dependent style

where:

and

m

j

l

iij efPefP

1 0

~~

)|()|(

)|(),()|(),,,(),(~

,

~

,

~~~~

jji

jiiji

ji eePefPffPefefPefP

m

iiii ffPffP

1''

~

)|()|(

n

jjjj eePeeP

1''

~

)|()|(

9/12/2003 LTI Student Research Symposium 23

Outline

• Background

• Phrase Alignment Algorithms in SMT

• Segmentation Approaches

• Integrated Segmentation and Alignment Algorithm (ISA)

• Experiments

• Discussions

9/12/2003 LTI Student Research Symposium 24

Experiments

• Chinese-English

• Small data track

• Evaluation: NIST score against 4 human references

Sentences Chinese Words

English Words

Training 3540 pairs 90 K 115 K

Testing 993 26 K

9/12/2003 LTI Student Research Symposium 25

Results

• Baseline: IBM model1 + HMM phrase• Compare to using ISA only, and ISA+Baseline

Prec Length Penalty

FinalScore

Baseline 6.77 1.00 6.77

ISA 6.97 0.97 6.78

ISA+Baseline 7.05 0.99 7.06

9/12/2003 LTI Student Research Symposium 26

T-test

• Student's t-test at the sentence level

Precision Scores Final Scores

t-value Confidence

Level

t-value Confidence Level

ISA vs. Baseline 6.6084 99.99% 1.6890 95.00%

ISA+Baseline vs. Baseline

9.0772 99.99% 4.1007 99.99%

9/12/2003 LTI Student Research Symposium 27

Compared to IBM1

Using 20M words LM

LDC+IBM

NIST=6.6235 LenPenalty=0.9998

LDC+IBM+ISA

NIST=7.4234

LenPenalty=0.9915

Incr.

+0.800

#Type % Contrib. #Type % Contrib.

1-gram 2425 0.60 5.587 2680 0.66 6.161 +0.574

2-gram 3601 0.22 0.877 4369 0.27 1.091 +0.214

3-gram 1807 0.08 0.130 2403 0.11 0.186 +0.056

4-gram 788 0.03 0.024 1096 0.05 0.036 +0.012

5-gram 382 0.02 0.007 499 0.02 0.011 +0.004

Sum 6.625 7.486 +0.861

•Large data track (2.6M English words, 414K Chinese words)

9/12/2003 LTI Student Research Symposium 28

No IBM1 is Better• Small data track (LDC+IBM1+ISA)

• ISA is better even on unigram match than IBM1

WIBM1 NIST 1-gram score

2-gram score

3-gram score

1.00 6.70 5.43 1.07 0.16

0.50 6.76 5.47 1.08 0.16

0.20 6.78 5.49 1.09 0.16

0.02 6.79 5.50 1.09 0.16

0.00 (no IBM1)

6.81 5.51 1.10 0.16

9/12/2003 LTI Student Research Symposium 29

Summary

• Integrated Alignment and Segmentation

• Simple algorithm

• Enhanced translation quality– Better than IBM models– Higher quality than HMM alignment

• A major component in the CMU SMT system

9/12/2003 LTI Student Research Symposium 30

ISA Toolkit

• Location:– /afs/cs.cmu.edu/user/joy/Release/PhraseAlign

• Documentation:– /afs/cs.cmu.edu/user/joy/Release/PhraseAlign/

documentation/readme.txt

• Speed– Example: 4172 sentence pairs (133K En words, 20K

Ch words)– About 160 seconds for the alignment (10 loops for each

sentence pair)

9/12/2003 LTI Student Research Symposium 31

Selected References• Franz Josef Och, Christoph Tillmann, Hermann Ney, “Improved Alignment Models

for Statistical Machine Translation,” Proceedings of the Joint Conference of Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 20-28. University of Maryland, College Park, MD, June 1999.

• Stephan Vogel, Hermann Ney, and Christoph Till-mann, “HMM-based Word Alignment in Statistical Translation,” Proceedings of COLING '96: The 16th International Conference on Computational Linguistics, pp. 836-841. Copenhagen, August 1996.

• Stephan Vogel, Ying Zhang, Fei Huang, Alicia Tribble, Ashish Venogupal, Bing Zhao, Alex Waibel, “The CMU Statistical Translation System,” to appear in the Proceedings of MT Summit IX, New Orleans, LA, U.S.A., September 2003.

• Ying Zhang, Ralf D. Brown, Robert E. Frederking and Alon Lavie, “Pre-processing of Bilingual Corpora for Mandarin-English EBMT,” Proceedings of MT Summit VIII, Santiago de Compostela, Spain, September 2001.

• Ying Zhang, Stephan Vogel, Alex Waibel, "Integrated Phrase Segmentation and Alignment Algorithm for Statistical Machine Translation," in the Proceedings of International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE'03), Beijing, China, October 2003.