Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik...

23
Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik Department of Linguistics and the Laboratory for Computational Linguistics and Information Processing (CLIP) at the Institute for Advanced Computer Studies (UMIACS) University of Maryland, College Park, MD 20742-7505, USA {ymarton, resnik} @t umiacs.umd.edu ACL’08, Columbus, Ohio, June 2008

Transcript of Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik...

Page 1: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik Department of Linguistics and the Laboratory for Computational.

Soft Syntactic Constraints for Hierarchical Phrase-Based

Translation

Yuval Marton and Philip Resnik

Department of Linguisticsand the Laboratory for Computational Linguistics and Information Processing (CLIP)

at the Institute for Advanced Computer Studies (UMIACS)University of Maryland, College Park, MD 20742-7505, USA

{ymarton, resnik} @t umiacs.umd.edu

ACL’08, Columbus, Ohio, June 2008

Page 2: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik Department of Linguistics and the Laboratory for Computational.

2

Cowan et al. 2006

Zollmann and Venugopal 2006

DeNeefe et al. 2007

Marcu et al. 2006

Galley et al. 2006

Chiang 2005

Riezler and Maxwell 2006

Using Source-side Parses -- Using Both -- Using Target-side Parses

Rel

axin

g S

ynta

x --

Ad

din

g S

ynta

x

Cowan et al. 2006

Zollmann and Venugopal 2006

DeNeefe et al. 2007

Marcu et al. 2006

Galley et al. 2006

Chiang 2005

Riezler and Maxwell 2006

Mi, Huang, and Liu 2008

Zhang, Jiang, Aw, Li, Lim,Tan, Li 2008

Colin Cherry 2008

Using Source-side Parses -- Using Both -- Using Target-side Parses

Rel

axin

g S

ynta

x --

Ad

din

g S

ynta

x

Why Has Source-side Syntax Not Helped SMT as Much as

Target-side Syntax?• Most previous work:

Syntactic representations data-driven patterns

• Chiang 05, ours: Data-driven patterns syntactic constraints

• Why the failure in latter direction? – Noisy / inaccurate parsing info?

– Too coarse a usage of syntax info?

• We argue the latter: rule granularity and constraint conditions are key

• We show that adding (soft) syntactic constraints to data-driven patternsyields substantial improvements.

Cowan et al. 2006

Zollmann and Venugopal 2006

DeNeefe et al. 2007

Marcu et al. 2006

Galley et al. 2006

Chiang 2005Marton and Resnik

2008

Riezler and Maxwell 2006

Mi, Huang, and Liu 2008

Zhang, Jiang, Aw, Li, Lim,Tan, Li 2008

Colin Cherry 2008

Using Source-side Parses -- Using Both -- Using Target-side Parses

Rel

axin

g S

ynta

x --

Ad

din

g S

ynta

x

Page 3: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik Department of Linguistics and the Laboratory for Computational.

3

Outline

• Background

• Hiero– Soft Syntactic Constraints

• Adding Syntax– Rule Granularity– Constraint Conditions

• Experiments

• Conclusions + Future Work

Page 4: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik Department of Linguistics and the Laboratory for Computational.

4

Knowledge and Constraints

• Syntactic-tree-based vs. Data-driven• Formal vs. linguistic syntax (Chiang 2005)

– Formal Syntax (e.g., Synchronous CFG)

– Linguistic Syntax (parses)

• Hard vs. Soft Constraints– Hard constraint: limit possible space

(only allow rules compatible with constraint)

– Soft constraint: skew space towards constraint (but clear patterns in data ‘win’ even if incompatible with constraint)

– Soft syntactic constraint: reward weight of data-driven rules that are compatible with parsing info.

Univ.Hard

Univ.Soft

Page 5: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik Department of Linguistics and the Laboratory for Computational.

5

• Chiang 2005, 2007

• Weighted synchronous CFG – Unnamed non-terminals: X <e, f >

e.g., X < 今年 X1, X1 this year>

• Translation model features:e.g., log p(e|f)

• Log-linear model:+ rule penalty feature, “glue” rules

Hiero

的竞选 Election 投票 在初选 voted in the primaries

Page 6: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik Department of Linguistics and the Laboratory for Computational.

6

Soft Syntactic Constraints

• Chiang’s 2005 constituency feature– Reward rule’s score if rule’s

source-side matches a constituent span

– Constituency-incompatible emergent patterns can still ‘win’ (in spite of no reward)

– Good idea -- Neg-result • But what if…

Page 7: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik Department of Linguistics and the Laboratory for Computational.

7

Rule granularity

• Chiang: Single weight for all constituents (parse tags)

• … But what if we can assign a separate feature and weight for each constituent?

• E.g., NP-only: (NP= )

• Or VP-only: (VP= )

Page 8: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik Department of Linguistics and the Laboratory for Computational.

8

Constraint Conditions

• VP-only, revisited:– We saw VP-match (VP= ):

reward exact match of a VP sub-tree span

– We can also incur a penalty for crossing constituent boundaries: e.g., VP-cross (VP+ )

Page 9: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik Department of Linguistics and the Laboratory for Computational.

9

Feature Space

• {NP, VP, IP, CP, …} x {match=,cross-boundary+}

• Basic translation models:– For each feature, add (only it) to default feature set, assigning

it a separate weight.

• Feature “combo” translation models:– NP2 (double feature): add both NP+ and NP= with separate

weights each

– NP_ (conflated feature) ties weights of NP+, NP=

– XP=, XP+, XP2, XP_: conflate all labels that correspond to “standard” X-bar Theory XP constituents in each condition.

– All-labels= (Chiang’s), All-labels+, All-labels_, All-labels2

Page 10: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik Department of Linguistics and the Laboratory for Computational.

10

Settings• Hiero Default feature set for baseline

– Chinese baseline also included a specialized number translation feature (Chiang 2007)

• LM: SRI Language Modeling Toolkit (Stolcke, 2002) with modified Kneser-Ney smoothing (Chen & Goodman, 1998).

• Word-level alignments: GIZA++ (Och & Ney, 2000).

• Source-side parses: – Chinese: Huang et al. (2008)

– Arabic: Stanford Parser v.2007-08-19 (Klein & Manning 2003)

• Optimized using MERT (Och 2003) – with BLEU (Papineni et al. 2002)

– and the NIST-implemented “shortest” effective ref. length.

– Dev set: Chinese NIST MT03; Arabic NIST MT02.

Page 11: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik Department of Linguistics and the Laboratory for Computational.

11

Chinese-English

• Replicated Chiang 2005 constituency feature (negative result)

• NP=, QP+, VP+ up to .74 BLEU points better.

• XP+, IP2, all-labels_, VP2, NP_, up to 1.65 BLEU points better.

• Validated on the NIST MT08 test set

*,**: sig. better than baseline+,++: better than Chiang-05

(replicated)

Page 12: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik Department of Linguistics and the Laboratory for Computational.

12

Arabic-English• New result for Chiang’s

constituency feature (MT06, MT08)

• PP+, AdvP= up to 1.40 BLEU better than Chiang’s and baseline.

• AP2, AdvP2 up to 1.94 better.

• Validated on the NIST MT08 test set

*,**: sig. better than baseline+,++: better than Chiang-05

New!

Page 13: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik Department of Linguistics and the Laboratory for Computational.

13

PP+ Example: Arabic MT06

Source ... (PP (IN ب) (NP (NP (NN تعىىن) (NP (NN مندوب) (NP (NNP JJ) (ال NN) NP) (امم NN) NP) (ال DT) ((((لدى NNP) (سورىا … (((((((متحدة

Gloss …(PP (IN in) (NP (NP (NN appointment) (NP (NN representative) (NP (NNP syria) (NNP to)))) (DT the) (NP (NN nations) (NP (NN the) (JJ united))))))) …

Reference [the third decree ordered] the appointment of the syrian representative to the united nations …

Baseline … to appoint syria to the united nations representative …

PP+ … to appoint a representative of syria to the united nations …

Page 14: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik Department of Linguistics and the Laboratory for Computational.

14

Discussion• Direct contribution

– If specified phrase types translate better

• Indirect contribution– Translation of other parts can be (and is) influenced

(to appoint a representative of syria to the united nations vs. to appoint syria to the united nations representative )

• Feature combinations do not always help– In fact, some combos do worse than each feature alone.

• Within-language consistency across test sets– Chinese: improvement using NP, VP, IP, (XP, all-labels)

– Arabic: improvement using PP, AP, AdvP, (IP, VP, XP)

• Across-language variation, but IP & VP do well.

Page 15: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik Department of Linguistics and the Laboratory for Computational.

15

Conclusion: Our Approach

Data-driven approach (Stat MT)

using formal syntax (SCFG)

while adding soft constraints (weights)

of linguistic syntax (parses)

with fine-grained constituent features (NP, VP, …)

and constraint conditions (match=,cross+)

Page 16: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik Department of Linguistics and the Laboratory for Computational.

16

Main Contributions

• First time to achieve improvement using (soft) syntax info in Hiero

• Previous (Chiang 2005) negative result – not (or not only) due to noisy parses– Finer syntactic rule resolution helps (NP, VP,…)

– Finer (soft) constraint conditions help (NP=, NP+, VP=, VP+, …)

– Selective application: parse labels that are not ”standard” XP constituent labels seem to be more noisy than helpful

– Feature combos do not always help (might do worse)

• Inter-language variation, but IP and VP generally do well cross-linguistically. – Within-language consistency (across test sets)

Page 17: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik Department of Linguistics and the Laboratory for Computational.

17

Future Work

• Why do feature combo’s contributions sometimes do even worse than each of the single features?– no simple correlation between finer-grained features’

effectiveness and their effectiveness when used in combination with other features

– no simple correlation between boundary conditions (e.g., VP= vs. VP+) and their effectiveness for a given feature

• Why did no NP variant yield much gain in Arabic?

• Exploit other forms of soft constraints

Page 18: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik Department of Linguistics and the Laboratory for Computational.

18

Thanks

• This work was supported in part by DARPA prime agreement HR0011-06-2-0001.

• Thanks to David Chiang and Adam Lopez for making their source code available;

• Thanks to the Stanford Parser team and Mary Harper for making their parsers available;

• Thanks to David Chiang, Amy Weinberg, and CLIP Laboratory colleagues, particularly Adam Lopez, Chris Dyer, and Smaranda Muresan, for discussion and invaluable assistance.

Page 19: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik Department of Linguistics and the Laboratory for Computational.

19

Hiero Default Feature Set and the “Standard” XP Label Set

• Hiero default feature set: – LM, p(e|f), p(f|e), plex(e|f), plex(f|e),

rule (phrase) penalty and glue rule feature weights.

– Chinese-only: number translation feature

• “Standard” linguistic labels: {CP, IP, NP, VP, PP, ADJP, ADVP, QP, LCP, DNP} – Excluding non-maximal projection labels such as VV,

NNP, etc.

– excluding labels such as PRN (parentheses), FRAG (fragment), etc.

• XP= : disjunction of {CP=, IP=, …, DNP=}

Page 20: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik Department of Linguistics and the Laboratory for Computational.

20

Training sets+ full results

Page 21: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik Department of Linguistics and the Laboratory for Computational.

21

AdvP2 example in Arabic MT06

Reference since the age of cassettes started , it is almost impossible to find a house free of cassettes ,

Baseline after entering the cassette now seems [missing almost] impossible that homes [missing free] of these films [not so good]

AdvP2 after entering the cassette now almost impossible [correct] that free of [correct] the house tapes [good/better]

Page 22: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik Department of Linguistics and the Laboratory for Computational.

22

PP+.AdvP= Example: Arabic MT06

Reference … in statements made to the iranian students news agency , " we cannot expect anything else from people who have a zionist past "

Baseline … " we cannot expect other [-wise from] people who are moving a zionist . "

PP+.AdvP= … " we cannot expect other [-wise from] people who for them [=who have], was [=past] a zionist . "

Page 23: Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip Resnik Department of Linguistics and the Laboratory for Computational.

23

PP+ Example: Arabic MT06

Reference [the third decree ordered] the appointment of the syrian representative to the united nations …

Baseline … to appoint syria to the united nations representative [bad word worder] …

PP+ … to appoint a representative of syria to the united nations …

Reference … outside a bank in southeast baghdad .

Baseline … in front of the bank , [misplaced comma] southeast of baghdad .

PP+ … in front of the bank of south east of baghdad .

Note that this example might be misleading in not being a good representative example of the feature’s contribution.