Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip...

23
Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik Department of Linguistics and the Laboratory for Computational Linguistics and Information Processing (CLIP) at the Institute for Advanced Computer Studies (UMIACS) University of Maryland, College Park, MD 20742-7505, USA {ymarton, resnik} @t umiacs.umd.edu ACL’08, Columbus, Ohio, June 2008

description

3 Outline Background Hiero –Soft Syntactic Constraints Adding Syntax –Rule Granularity –Constraint Conditions Experiments Conclusions + Future Work

Transcript of Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip...

Page 1: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik…

Soft Syntactic Constraints for Hierarchical Phrase-Based

Translation

Yuval Marton and Philip ResnikDepartment of Linguistics

and the Laboratory for Computational Linguistics and Information Processing (CLIP)at the Institute for Advanced Computer Studies (UMIACS)

University of Maryland, College Park, MD 20742-7505, USA{ymarton, resnik} @t umiacs.umd.edu

ACL’08, Columbus, Ohio, June 2008

Page 2: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik…

2

Why Has Source-side Syntax Not Helped SMT as Much as

Target-side Syntax?• Most previous work:

Syntactic representations data-driven patterns• Chiang 05, ours: Data-driven patterns syntactic constraints• Why the failure in latter direction?

– Noisy / inaccurate parsing info?– Too coarse a usage of syntax info?

• We argue the latter: rule granularity and constraint conditions are key

• We show that adding (soft) syntactic constraints to data-driven patternsyields substantial improvements. Cowan et al. 2006

Zollmann and Venugopal 2006

DeNeefe et al. 2007

Marcu et al. 2006

Galley et al. 2006

Chiang 2005Marton and Resnik

2008

Riezler and Maxwell 2006

Using Source-side Parses -- Using Both -- Using Target-side Parses

Rel

axin

g Sy

ntax

-- A

ddin

g Sy

ntax

Cowan et al. 2006

Zollmann and Venugopal 2006

DeNeefe et al. 2007

Marcu et al. 2006

Galley et al. 2006

Chiang 2005

Riezler and Maxwell 2006

Using Source-side Parses -- Using Both -- Using Target-side Parses

Rel

axin

g Sy

ntax

-- A

ddin

g Sy

ntax

Page 3: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik…

3

Outline• Background• Hiero

– Soft Syntactic Constraints• Adding Syntax

– Rule Granularity– Constraint Conditions

• Experiments• Conclusions + Future Work

Page 4: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik…

4

Knowledge and Constraints• Rule-based vs. Data-driven• Formal vs. linguistic syntax (Chiang 2005)

– Formal Syntax (e.g., Synchronous CFG)– Linguistic Syntax (parses)– Doubly dissociated:

formal-only, linguistic-only, none, both (current trend)

• Hard vs. Soft Constraints– Hard constraint: limit possible space

(only allow rules compatible with constraint)– Soft constraint: skew space towards constraint

(but clear patterns in data ‘win’ even if incompatible with constraint)

– Soft syntactic constraint: boost weight of data-driven rules that are compatible with parsing info.

Formal +/-Linguistic +/-

F+ L+

F- L+

F+ L-

F- L-

Univ.Hard

Univ.Soft

Page 5: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik…

5

• Chiang 2005, 2007• Weighted synchronous CFG

– Unnamed non-terminals: X <e, f >e.g., X < 今年 X1, X1 this year>

• Translation model features:e.g., log p(e|f)

• Log-linear model:+ rule penalty feature, “glue” rules

Hiero

的竞选 Election 投票 在初选 voted in the primaries

Page 6: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik…

6

Soft Syntactic Constraints• Chiang’s 2005

constituency feature– Boost rule if source-side

matches a constituent span– Constituency-incompatible

emergent patterns can still ‘win’ (in spite of no boost)

– Good idea -- Neg-result • But what if…

Page 7: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik…

7

Rule granularity• Chiang: Single weight for all constituents (parse

tags)• … But what if we can assign a

separate feature and weight for each constituent?

• E.g., NP-only: (NP= )• Or VP-only: (VP= )

Page 8: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik…

8

Constraint Conditions• VP-only, revisited:

– We saw VP-match (VP= )– We can also incur a cost for

crossing constituent boundaries: VP-cross (VP+ )

Page 9: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik…

9

Feature Space• {NP, VP, IP, CP, …} x {=,+}• Basic translation models:

– For each feature, add (only it) to default feature set, assigning it a separate weight.

• Feature “combo” translation models:– NP2 (double feature): add both NP+ and NP= with

separate weights each– NP_ (conflated feature) ties weights of NP+, NP=

– XP+, XP=, XP2, XP_: conflate all “standard” X-bar Theory XP constituents (max projections) in each condition

– All-labels+ (Chiang’s), All-labels=, All-labels_, All-labels2

Page 10: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik…

10

Settings• Hiero Default feature set for baseline

– Chinese baseline also included a rule-based number translation feature (Chiang 2007)

• LM: SRI Language Modeling Toolkit (Stolcke, 2002) with modified Kneser-Ney smoothing (Chen & Goodman, 1998).

• Word-level alignments: GIZA++ (Och & Ney, 2000).• Source-side parses:

– Chinese: Huang et al. (2008) – Arabic: Stanford Parser v.2007-08-19 (Klein & Manning 2003)

• Optimized using MERT (Och 2003) – with BLEU (Papineni et al. 2002) – and the NIST-implemented “shortest” effective ref. length.– Dev set: Chinese NIST MT03; Arabic NIST MT02.

Page 11: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik…

11

Chinese-English• Replicated Chiang

2005 (neg.res.)• NP=, QP+, VP+ up

to .74 BLEU points better.

• NP_, VP2, IP2, all-labels_, XP+ up to 1.65 BLEU points better.

• Validated on the NIST MT08 test set

Page 12: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik…

12

Arabic-English• New result for

Chiang’s feature (MT06, MT08)

• PP+, AdvP= up to 1.40 BLEU better than Chiang’s and baseline.

• AP2, AdvP2 up to 1.94 better.

• Validated on the NIST MT08 test set

Page 13: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik…

13

PP+ Example: Arabic MT06Source ... (PP (IN ب) (NP (NP (NN تعىىن) (NP (NN مندوب) (NP (NNP

JJ) (ال NN) NP) (امم NN) NP) (ال DT) ((((لدى NNP) (سورىا … (((((((متحدة

Gloss …(PP (IN in) (NP (NP (NN appointment) (NP (NN representative) (NP (NNP syria) (NNP to)))) (DT the) (NP (NN nations) (NP (NN the) (JJ united))))))) …

Reference [the third decree ordered] the appointment of the syrian representative to the united nations …

Baseline … to appoint syria to the united nations representative …

PP+ … to appoint a representative of syria to the united nations …

Page 14: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik…

14

Discussion• Direct contribution

– Feature better translates related phrases (not shown here)

• Indirect contribution– Translation of other parts can be (and is) influenced

(to appoint a representative of syria to the united nations vs. to appoint syria to the united nations representative )

• Feature contributions are not additive– In fact, some combos do worse than each feature alone.

• Within-language consistency across test sets• Across-language variation, but IP & VP do well.

Page 15: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik…

15

Conclusion: Our Approach• Data-driven approach (Stat MT)• Using Formal syntax (SCFG)• While adding Soft constraints (weights)• of linguistic syntax (rule-based parses)• With fine-grained constituent features and

constraint conditions

Page 16: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik…

16

Main Contributions• First time to achieve improvement using (soft) syntax info in

Hiero• Previous (Chiang 2005) negative result – not (or not only) due

to noisy parses– Finer syntactic rule resolution helps (NP, VP,…)– Finer (soft) constraint conditions help (NP=, NP+, VP=, VP+,

…)– Selective application: parse labels that are not ”standard” XP

constituent labels seem to be more noisy than helpful– Feature combos contribution not additive (can cancel

each other out • Inter-language variation, but IP and VP generally do well

cross-linguistically. – Within-language consistency (across test sets)

Page 17: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik…

17

Future Work• Why do feature combos’ contributions sometimes

cancel each other out?– We found no simple correlation between finer-grained

feature scores (and/or boundary condition) and combination or conflation scores.

• Why did no NP variant yield much gain in Arabic? • Exploit other forms of soft constraints

Page 18: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik…

18

Thanks• This work was supported in part by DARPA prime

agreement HR0011-06-2-0001. • Thanks to David Chiang and Adam Lopez for making their

source code available; • Thanks to the Stanford Parser team and Mary Harper for

making their parsers available; • Thanks to David Chiang, Amy Weinberg, and CLIP

Laboratory colleagues, particularly Adam Lopez, Chris Dyer, and Smaranda Muresan, for discussion and invaluable assistance.

Page 19: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik…

19

Hiero Default Feature Set and the “Standard” XP Label Set

• Hiero default feature set: – LM, p(e|f), p(f|e), plex(e|f), plex(f|e),

rule (phrase) penalty and glue rule feature weights.– Chinese-only: number translation feature

• “Standard” linguistic labels: {CP, IP, NP, VP, PP, ADJP, ADVP, QP, LCP, DNP} – Excluding non-maximal projection labels such as VV,

NNP, etc.– excluding labels such as PRN (parentheses), FRAG

(fragment), etc.• XP= : disjunction of {CP=, IP=, …, DNP=}

Page 20: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik…

20

Training sets+ full results

Page 21: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik…

21

AdvP2 example in Arabic MT06Reference since the age of cassettes started , it is almost

impossible to find a house free of cassettes ,Baseline after entering the cassette now seems [missing

almost] impossible that homes [missing free] of these films [not so good]

AdvP2 after entering the cassette now almost impossible [correct] that free of [correct] the house tapes [good/better]

Page 22: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik…

22

PP+.AdvP= Example: Arabic MT06Reference … in statements made to the iranian students news

agency , " we cannot expect anything else from people who have a zionist past "

Baseline … " we cannot expect other [-wise from] people who are moving a zionist . "

PP+.AdvP= … " we cannot expect other [-wise from] people who for them [=who have], was [=past] a zionist . "

Page 23: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik…

23

PP+ Example: Arabic MT06Reference [the third decree ordered] the appointment of the syrian

representative to the united nations …Baseline … to appoint syria to the united nations representative

[bad word worder] …PP+ … to appoint a representative of syria to the

united nations …Reference … outside a bank in southeast baghdad .Baseline … in front of the bank , [misplaced comma] southeast

of baghdad .PP+ … in front of the bank of south east of baghdad .

Note that this example might be misleading in not being a good representative example of the feature’s contribution.