Dynamically shaping the reordering search space of phrase...

Dynamically shaping the reordering search space of phrase-based SMT

Arianna Bisazza & Marcello Federico

Phrase-based SMT

2 2

•  No sentence structure, can only model local dependencies •  Wrt tree-‐based SMT: smaller models, faster decoding, very

compe>>ve for transla>ng between similar languages

•  Most popular framework in SMT produc>on scenarios today

Bisazza & Federico – Dynamically shaping the reordering search space of PSMT

Phrase-based SMT

3 3

•  No sentence structure, can only model local dependencies •  Wrt tree-‐based SMT: smaller models, faster decoding, very

compe>>ve for transla>ng between similar languages

•  Most popular framework in SMT produc>on scenarios today •  Problem: doesn’t handle well long-‐range reordering!


•  Goal of this work: dynamically shape the space of reorderings explored during search

•  BeNer transla>on and faster decoding with loose reordering contraints

Phrase-based SMT

4

wordT1 wordT2 wordT3 wordT4 . . .

LM scores

wordS1 wordS2 wordS3 wordS4 wordS5 wordS6 wordS7

LM scores

Disto. scores Disto. scores

SRC:

TRG:


logPTM-‐d(f|e) logPTM-‐i(e|f) logPLM(e) logPRM(ft-‐1,ft)

αTM αTM-‐i αLM αRM … + +

5

Reordering search space

5 Bisazza & Federico – Dynamically shaping the reordering search space of PSMT


6 6 Bisazza & Federico – Dynamically shaping the reordering search space of PSMT

•  Searching over all permuta>ons is NP-‐hard

•  Hard reordering constraints applied on word-‐to-‐word jumps



w0 w1 w2 w3 w4 w5 w6 w7 w8 w9

<s> 0 1 2 3 4 5 6 7 8 9 w0 0 1 2 3 4 5 6 7 8 w1 2 0 1 2 3 4 5 6 7 w2 3 2 0 1 2 3 4 5 6 w3 4 3 2 0 1 2 3 4 5 w4 5 4 3 2 0 1 2 3 4 w5 6 5 4 3 2 0 1 2 3 w6 7 6 5 4 3 2 0 1 2 w7 8 7 6 5 4 3 2 0 1 w8 9 8 7 6 5 4 3 2 0 w9 10 9 8 7 6 5 4 3 2



. . .

w0 w1 w2 w3 w4 w5 w6 w7 w8 w9

<s> 0 1 2 3 4 5 6 7 8 9 w0 0 1 2 3 4 5 6 7 8 w1 2 0 1 2 3 4 5 6 7 w2 3 2 0 1 2 3 4 5 6 w3 4 3 2 0 1 2 3 4 5 w4 5 4 3 2 0 1 2 3 4 w5 6 5 4 3 2 0 1 2 3 w6 7 6 5 4 3 2 0 1 2 w7 8 7 6 5 4 3 2 0 1 w8 9 8 7 6 5 4 3 2 0 w9 10 9 8 7 6 5 4 3 2



Linear distor>on limit (DL)



DL=3

. . .

The problem with DL

9 9

Arabic-‐English

AR

EN

AR

EN

w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10

<s> 0 1 2 3 4 5 6 7 8 9 10 w0 0 1 2 3 4 5 6 7 8 9 w1 2 0 1 2 3 4 5 6 7 8 w2 3 2 0 1 2 3 4 5 6 7 w3 4 3 2 0 1 2 3 4 5 6 w4 5 4 3 2 0 1 2 3 4 5 w5 6 5 4 3 2 0 1 2 3 4 w6 7 6 5 4 3 2 0 1 2 3 w7 8 7 6 5 4 3 2 0 1 2 w8 9 8 7 6 5 4 3 2 0 1 w9 10 9 8 7 6 5 4 3 2 0 w10 11 10 9 8 7 6 5 4 3 2


The problem with DL

10 10

Arabic-‐English

AR

EN

AR

EN

w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10

<s> 0 1 2 3 4 5 6 7 8 9 10 w0 0 1 2 3 4 5 6 7 8 9 w1 2 0 1 2 3 4 5 6 7 8 w2 3 2 0 1 2 3 4 5 6 7 w3 4 3 2 0 1 2 3 4 5 6 w4 5 4 3 2 0 1 2 3 4 5 w5 6 5 4 3 2 0 1 2 3 4 w6 7 6 5 4 3 2 0 1 2 3 w7 8 7 6 5 4 3 2 0 1 2 w8 9 8 7 6 5 4 3 2 0 1 w9 10 9 8 7 6 5 4 3 2 0 w10 11 10 9 8 7 6 5 4 3 2


The problem with DL

11 11

German-‐English

w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10

<s> 0 1 2 3 4 5 6 7 8 9 10 w0 0 1 2 3 4 5 6 7 8 9 w1 2 0 1 2 3 4 5 6 7 8 w2 3 2 0 1 2 3 4 5 6 7 w3 4 3 2 0 1 2 3 4 5 6 w4 5 4 3 2 0 1 2 3 4 5 w5 6 5 4 3 2 0 1 2 3 4 w6 7 6 5 4 3 2 0 1 2 3 w7 8 7 6 5 4 3 2 0 1 2 w8 9 8 7 6 5 4 3 2 0 1 w9 10 9 8 7 6 5 4 3 2 0 w10 11 10 9 8 7 6 5 4 3 2


DE

EN

DE

EN

The problem with DL

12 12

German-‐English

w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10

<s> 0 1 2 3 4 5 6 7 8 9 10 w0 0 1 2 3 4 5 6 7 8 9 w1 2 0 1 2 3 4 5 6 7 8 w2 3 2 0 1 2 3 4 5 6 7 w3 4 3 2 0 1 2 3 4 5 6 w4 5 4 3 2 0 1 2 3 4 5 w5 6 5 4 3 2 0 1 2 3 4 w6 7 6 5 4 3 2 0 1 2 3 w7 8 7 6 5 4 3 2 0 1 2 w8 9 8 7 6 5 4 3 2 0 1 w9 10 9 8 7 6 5 4 3 2 0 w10 11 10 9 8 7 6 5 4 3 2


DE

EN

DE

EN

The problem with DL


SRC

� � � � ��

�� verb subj. obj. compl.

ywASl sfyr Almmlkp AlErbyp AlsEwdyp ldY lbnAn EbdAlEzyz xwjp tHrk -h fy AtjAh ...continues ambassador Kingdom Arabian Saudi to Lebanon Abdulaziz Khawja move his in direction

REF The Kingdom of Saudi Arabia ’s ambassador to Lebanon Abdulaziz Khawja continues his moves towards ...BASE continue to Saudi Arabian ambassador to Lebanon , Abdulaziz Khwja its move in the direction of ...NEW The Kingdom of Saudi Arabia ’s ambassador to Lebanon , Abdulaziz Khwja continue its move in the direction of ...

SRC

�� adv. verb obj. subj. compl.fymA dEA -hm r}ys Almktb AlsyAsy l- Hrkp HmAs xAld m$El AlY AltzAm AlHyAd

meanwhile called them head bureau political of movement Hamas Khaled Mashal to necessity neutrality

REF Meanwhile, the Head of the Political Bureau of the Hamas movement, Khaled Mashal, called upon them to remain neutralBASE The called them, head of Hamas’ political bureau, Khalid Mashal, to remain neutralNEW The head of Hamas’ political bureau, Khalid Mashal, called on them to remain neutral

Figure 3: Long reordering examples showing improvements over the baseline system (BASE) when the DL is raised to18 and early pruning based on WaW reordering scores is enabled (NEW).

Long jumps statistics and examples. To betterunderstand the behavior of the early-pruning system,we extract phrase-to-phrase jump statistics from thedecoder log file. We find that 132 jumps beyond thenon-prunable zone (D>5) were performed to trans-late the 586 sentences of eval09-nw; 38 out of thesewere longer than 8 and mostly concentrated on theVS- sentence subset (27 jumps D>8 performed invs-09).13 This and the higher reordering scores sug-gest that long jumps are mainly carried out to cor-rectly reorder clause-inital verbs over long subjects.

Fig. 3 shows two Arabic sentences taken fromeval09-nw, that were erroneuously reordered by thebaseline system. The system including the WaWmodel and early reordering pruning, instead, pro-duced the correct translation. The first sentence isa typical example of VSO order with a long subject:while the baseline system left the verb in its Ara-bic position, producing an incomprehensible trans-lation, the new system placed it rightly between theEnglish subject and object. This reordering involvedtwo long jumps: one with D=9 backward and onewith D=8 forward.

The second sentence displays another, less com-mon, Arabic construction: namely VOS, with a per-sonal pronoun object. In this case, a backward jumpwith D=10 and a forward jump with D=8 were nec-essary to achieve the correct reordering.

13Statistics computed on the medium-LM system.

6 Conclusions

We have trained a discriminative model to predictlikely reordering steps in a way that is complemen-tary to state-of-the-art PSMT reordering models. Wehave effectively integrated it into a PSMT decoder asadditional feature, ensuring that its total score over acomplete translation hypothesis is consistent acrossdifferent phrase segmentations. Lastly, we have pro-posed early reordering pruning as a novel methodto dynamically shape the input reordering space andcapture long-range reordering phenomena that areoften critical when translating between languageswith different syntactic structures.

Evaluated on a popular Arabic-English newstranslation task against a strong baseline, our ap-proach leads to similar or even higher BLEU, ME-TEOR and KRS scores at a very high distortion limit(18), which is by itself an important achievement.At the same time, the reordering of verbs, measuredwith a novel version of the KRS, is consistently im-proved, while decoding gets significantly faster. Theimprovements are also confirmed when a very largeLM is used and the decoder’s beam size is dou-bled, which shows that our method reduces not onlysearch errors but also model errors even when base-line models are very strong.

Word reordering is probably the most difficult as-pect of SMT and an important factor of both its qual-ity and efficiency. Given its strong interaction withthe other aspects of SMT, it appears natural to solve

337



DL = 3 à 3,000 word permuta>ons

•  Current solu>on: increase distor>on limit

w0 w1 w2 w3 w4 w5 w6 w7 w8 w9

<s> 0 1 2 3 4 5 6 7 8 9 w0 0 1 2 3 4 5 6 7 8 w1 2 0 1 2 3 4 5 6 7 w2 3 2 0 1 2 3 4 5 6 w3 4 3 2 0 1 2 3 4 5 w4 5 4 3 2 0 1 2 3 4 w5 6 5 4 3 2 0 1 2 3 w6 7 6 5 4 3 2 0 1 2 w7 8 7 6 5 4 3 2 0 1 w8 9 8 7 6 5 4 3 2 0 w9 10 9 8 7 6 5 4 3 2

w0 w1 w2 w3 w4 w5 w6 w7 w8 w9

<s> 0 1 2 3 4 5 6 7 8 9 w0 0 1 2 3 4 5 6 7 8 w1 2 0 1 2 3 4 5 6 7 w2 3 2 0 1 2 3 4 5 6 w3 4 3 2 0 1 2 3 4 5 w4 5 4 3 2 0 1 2 3 4 w5 6 5 4 3 2 0 1 2 3 w6 7 6 5 4 3 2 0 1 2 w7 8 7 6 5 4 3 2 0 1 w8 9 8 7 6 5 4 3 2 0 w9 10 9 8 7 6 5 4 3 2




DL = 3 à 3,000 word permuta>ons DL = 7 à 1,246,000

word permuta>ons

w0 w1 w2 w3 w4 w5 w6 w7 w8 w9

<s> 0 1 2 3 4 5 6 7 8 9 w0 0 1 2 3 4 5 6 7 8 w1 2 0 1 2 3 4 5 6 7 w2 3 2 0 1 2 3 4 5 6 w3 4 3 2 0 1 2 3 4 5 w4 5 4 3 2 0 1 2 3 4 w5 6 5 4 3 2 0 1 2 3 w6 7 6 5 4 3 2 0 1 2 w7 8 7 6 5 4 3 2 0 1 w8 9 8 7 6 5 4 3 2 0 w9 10 9 8 7 6 5 4 3 2




DL = 3 à 3,000 word permuta>ons DL = 7 à 1,246,000

word permuta>ons

Coarse defini>on of reordering space : à  slower decoding à  worse transla>ons

17

Word-after-Word reordering model


Word-after-Word model

18 18

… w-‐ $Ark fy AltZAhrp E$rAt AlmslHyn mn AlktA}b yes no no no no no no no


… and dozens of militants from the brigades took part in the march

•  Predict whether input word j should be translated right a:er input word i

•  Maximum-‐entropy binary classifier •  Features of i, j, their context and words between i and j

Word-after-Word model

19 19

… w-‐ $Ark fy AltZAhrp E$rAt AlmslHyn mn AlktA}b yes no no no no no no no


… and dozens of militants from the brigades took part in the march

Feature examples: •  wi=“w-‐” and wj=“E$rAt” •  pi=conj and pj=nns

•  ball=“$Ark fy AltZAhrp” •  b*=“$Ark”

Decoder integration


Addi>onal feature func>on:

logPWaW(wt-‐1,wt) +


αTM αTM-‐i αLM αRM αWaW … + +

usual approach

Decoder integration


Addi>onal feature func>on:

+ Dynamically prune the reordering search space: ‘early reordering pruning’

logPWaW(wt-‐1,wt) +


αTM αTM-‐i αLM αRM αWaW … + +

usual approach

novel approach

22

Early reordering pruning



23 23

Standard search: explore all jumps within fixed DL, then score with all models

DL=6



24 24


Our method: only explore long reorderings that are likely according to the reordering model

DL=6

0.2 0.2 0.4 0.6 0.6 0.2 0.7 0.4

WaW scores



25 25



DL=6

0.2 0.2 0.4 0.6 0.6 0.2 0.7 0.4

WaW scores


Histogram and threshold pruning based on WaW score


26 26



DL=6

0.6 0.6 0.7

WaW scores


Histogram and threshold pruning based on WaW score


27 27



DL=6

0.6 0.6 0.7

WaW scores



28 28




WaW scores

DL=6

0.2 0.4 0.6 0.6 0.7

ϑ=2

“Safe zone” always explored

0.2



DL=6

0.2 0.4 0.6 0.6 0.7

ϑ=2

0.2



DL=6

0.2 0.4 0.6 0.6 0.7

ϑ=2

0.2

0.6 0.5 0.2 0.1 0.3 0.1 0.1 0.2 0.2 0.1 10

0.6 0.5 0.1 0.3 0.1 0.1 0.4 0.1 0.2 0.1

0.6 0.9 0.4 0.2 0.2 0.1 0.1 0.2 0.1 0.1

0.6 0.5 0.8 0.4 0.2 0.3 0.4 0.4 0.2 0.2

0.2 0.4 0.3 0.9 0.3 0.4 0.6 0.2 0.5 0.3

0.1 0.3 0.6 0.7 0.9 0.3 0.4 0.6 0.7 0.1

0.1 0.1 0.4 0.5 0.2 0.6 0.8 0.4 0.4 0.2

0.4 0.2 0.3 0.4 0.6 0.2 0.8 0.4 0.1 0.1

0.1 0.1 0.1 0.3 0.5 0.3 0.1 0.9 0.5 0.7

0.2 0.2 0.1 0.2 0.2 0.2 0.1 0.4 0.6 0.5

0.1 0.1 0.2 0.1 0.1 0.8 0.6 0.1 0.3 0.6

0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.3 0.1 0.1

Off limits

Prunable zone

Non-‐prunable zone

31

Experiments


Experimental setup

32 32

•  NIST-‐MT09 Arabic-‐English newswire (eval09)

•  Hierarchical lexicalized reordering models [Galley & Manning 08]

•  Early distor>on cost [Moore & Quirk 07]

•  Evalua>on by: BLEU for lexical match & local order KRS Kendall Reordering Score for global order [Birch & al.10] •  Two tes>ng condi>ons:

medium-‐scale LM, stack size 200 large-‐scale LM, stack size 400


base,DL8

base,DL18

+waw,DL18 +reoPrune

83.8

84.2

84.6

85.0

50.2 50.4 50.6 50.8 51 51.2

KRS

BLEU

Results (medium-scale)


Transla>on Quality

Early reo. pruning: -‐  histogram: 3 -‐  threshold: 0.1 -‐  non-‐prunable zone of width ϑ=5

base,DL8

base,DL18

+waw,DL18 +reoPrune

83.8

84.2

84.6

85.0

50.2 50.4 50.6 50.8 51 51.2

KRS

BLEU



Transla>on Quality

+0.6 BLEU +1.0 KRS


base,DL8

base,DL18

+waw,DL18 +reoPrune

83.8

84.2

84.6

85.0

50.2 50.4 50.6 50.8 51 51.2

KRS

BLEU



Decoding Time Transla>on Quality

87

164

68

0 50 100 150

base,DL8

base,DL18

+WaW,DL18 +reo.prune

ms/word

+0.6 BLEU +1.0 KRS


base,DL8

+waw,DL18 +reoPrune

base,DL18

82.8

83.2

83.6

84.0

84.4

84.8

51 51.4 51.8 52.2 52.6 53

KRS

BLEU

Results (large-scale)


Transla>on Quality


base,DL8

+waw,DL18 +reoPrune

base,DL18

82.8

83.2

83.6

84.0

84.4

84.8

51 51.4 51.8 52.2 52.6 53

KRS

BLEU



Transla>on Quality

+1.2 BLEU +1.6 KRS


base,DL8

+waw,DL18 +reoPrune

base,DL18

82.8

83.2

83.6

84.0

84.4

84.8

51 51.4 51.8 52.2 52.6 53

KRS

BLEU



2579

5462

1588

0 1000 2000 3000 4000 5000 6000

base,DL8

base,DL18


ms/word


+1.2 BLEU +1.6 KRS


base,DL8

+waw,DL18 +reoPrune

base,DL18

82.8

83.2

83.6

84.0

84.4

84.8

51 51.4 51.8 52.2 52.6 53

KRS

BLEU



2579

5462

1588

0 1000 2000 3000 4000 5000 6000

base,DL8

base,DL18


ms/word


+1.2 BLEU +1.6 KRS

More metrics & language pairs in [Bisazza 2013]


Example


SRC

� � � � ��




SRC









6 Conclusions




337

Conclusions


•  Phrase-‐based remains strong baseline in many language pairs, but typically at the expense of long-‐reordering phenomena

•  We presented a method to capture long-‐range reordering in phrase-‐based SMT without sacrificing efficiency

•  Results: beNer reordering and transla>on quality in a large-‐scale Arabic-‐English transla>on system

•  Can be seen as mix of pre-‐ordering and decoding-‐>me reordering approaches

•  Same idea can be applied to other reordering models!

Conclusions


Thanks for your aNen>on!

•  Phrase-‐based remains strong baseline in many language pairs, but typically at the expense of long-‐reordering phenomena

•  We presented a method to capture long-‐range reordering in phrase-‐based SMT without sacrificing efficiency

•  Results: beNer reordering and transla>on quality in a large-‐scale Arabic-‐English transla>on system

•  Can be seen as mix of pre-‐ordering and decoding-‐>me reordering approaches

•  Same idea can be applied to other reordering models!

Dynamically shaping the reordering search space of phrase...

Documents

Transcript of Dynamically shaping the reordering search space of phrase...