Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip...
-
Upload
charity-lewis -
Category
Documents
-
view
229 -
download
0
description
Transcript of Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Yuval Marton and Philip...
Soft Syntactic Constraints for Hierarchical Phrase-Based
Translation
Yuval Marton and Philip ResnikDepartment of Linguistics
and the Laboratory for Computational Linguistics and Information Processing (CLIP)at the Institute for Advanced Computer Studies (UMIACS)
University of Maryland, College Park, MD 20742-7505, USA{ymarton, resnik} @t umiacs.umd.edu
ACL’08, Columbus, Ohio, June 2008
2
Why Has Source-side Syntax Not Helped SMT as Much as
Target-side Syntax?• Most previous work:
Syntactic representations data-driven patterns• Chiang 05, ours: Data-driven patterns syntactic constraints• Why the failure in latter direction?
– Noisy / inaccurate parsing info?– Too coarse a usage of syntax info?
• We argue the latter: rule granularity and constraint conditions are key
• We show that adding (soft) syntactic constraints to data-driven patternsyields substantial improvements. Cowan et al. 2006
Zollmann and Venugopal 2006
DeNeefe et al. 2007
Marcu et al. 2006
Galley et al. 2006
Chiang 2005Marton and Resnik
2008
Riezler and Maxwell 2006
Using Source-side Parses -- Using Both -- Using Target-side Parses
Rel
axin
g Sy
ntax
-- A
ddin
g Sy
ntax
Cowan et al. 2006
Zollmann and Venugopal 2006
DeNeefe et al. 2007
Marcu et al. 2006
Galley et al. 2006
Chiang 2005
Riezler and Maxwell 2006
Using Source-side Parses -- Using Both -- Using Target-side Parses
Rel
axin
g Sy
ntax
-- A
ddin
g Sy
ntax
3
Outline• Background• Hiero
– Soft Syntactic Constraints• Adding Syntax
– Rule Granularity– Constraint Conditions
• Experiments• Conclusions + Future Work
4
Knowledge and Constraints• Rule-based vs. Data-driven• Formal vs. linguistic syntax (Chiang 2005)
– Formal Syntax (e.g., Synchronous CFG)– Linguistic Syntax (parses)– Doubly dissociated:
formal-only, linguistic-only, none, both (current trend)
• Hard vs. Soft Constraints– Hard constraint: limit possible space
(only allow rules compatible with constraint)– Soft constraint: skew space towards constraint
(but clear patterns in data ‘win’ even if incompatible with constraint)
– Soft syntactic constraint: boost weight of data-driven rules that are compatible with parsing info.
Formal +/-Linguistic +/-
F+ L+
F- L+
F+ L-
F- L-
Univ.Hard
Univ.Soft
5
• Chiang 2005, 2007• Weighted synchronous CFG
– Unnamed non-terminals: X <e, f >e.g., X < 今年 X1, X1 this year>
• Translation model features:e.g., log p(e|f)
• Log-linear model:+ rule penalty feature, “glue” rules
Hiero
的竞选 Election 投票 在初选 voted in the primaries
6
Soft Syntactic Constraints• Chiang’s 2005
constituency feature– Boost rule if source-side
matches a constituent span– Constituency-incompatible
emergent patterns can still ‘win’ (in spite of no boost)
– Good idea -- Neg-result • But what if…
7
Rule granularity• Chiang: Single weight for all constituents (parse
tags)• … But what if we can assign a
separate feature and weight for each constituent?
• E.g., NP-only: (NP= )• Or VP-only: (VP= )
8
Constraint Conditions• VP-only, revisited:
– We saw VP-match (VP= )– We can also incur a cost for
crossing constituent boundaries: VP-cross (VP+ )
9
Feature Space• {NP, VP, IP, CP, …} x {=,+}• Basic translation models:
– For each feature, add (only it) to default feature set, assigning it a separate weight.
• Feature “combo” translation models:– NP2 (double feature): add both NP+ and NP= with
separate weights each– NP_ (conflated feature) ties weights of NP+, NP=
– XP+, XP=, XP2, XP_: conflate all “standard” X-bar Theory XP constituents (max projections) in each condition
– All-labels+ (Chiang’s), All-labels=, All-labels_, All-labels2
10
Settings• Hiero Default feature set for baseline
– Chinese baseline also included a rule-based number translation feature (Chiang 2007)
• LM: SRI Language Modeling Toolkit (Stolcke, 2002) with modified Kneser-Ney smoothing (Chen & Goodman, 1998).
• Word-level alignments: GIZA++ (Och & Ney, 2000).• Source-side parses:
– Chinese: Huang et al. (2008) – Arabic: Stanford Parser v.2007-08-19 (Klein & Manning 2003)
• Optimized using MERT (Och 2003) – with BLEU (Papineni et al. 2002) – and the NIST-implemented “shortest” effective ref. length.– Dev set: Chinese NIST MT03; Arabic NIST MT02.
11
Chinese-English• Replicated Chiang
2005 (neg.res.)• NP=, QP+, VP+ up
to .74 BLEU points better.
• NP_, VP2, IP2, all-labels_, XP+ up to 1.65 BLEU points better.
• Validated on the NIST MT08 test set
12
Arabic-English• New result for
Chiang’s feature (MT06, MT08)
• PP+, AdvP= up to 1.40 BLEU better than Chiang’s and baseline.
• AP2, AdvP2 up to 1.94 better.
• Validated on the NIST MT08 test set
13
PP+ Example: Arabic MT06Source ... (PP (IN ب) (NP (NP (NN تعىىن) (NP (NN مندوب) (NP (NNP
JJ) (ال NN) NP) (امم NN) NP) (ال DT) ((((لدى NNP) (سورىا … (((((((متحدة
Gloss …(PP (IN in) (NP (NP (NN appointment) (NP (NN representative) (NP (NNP syria) (NNP to)))) (DT the) (NP (NN nations) (NP (NN the) (JJ united))))))) …
Reference [the third decree ordered] the appointment of the syrian representative to the united nations …
Baseline … to appoint syria to the united nations representative …
PP+ … to appoint a representative of syria to the united nations …
14
Discussion• Direct contribution
– Feature better translates related phrases (not shown here)
• Indirect contribution– Translation of other parts can be (and is) influenced
(to appoint a representative of syria to the united nations vs. to appoint syria to the united nations representative )
• Feature contributions are not additive– In fact, some combos do worse than each feature alone.
• Within-language consistency across test sets• Across-language variation, but IP & VP do well.
15
Conclusion: Our Approach• Data-driven approach (Stat MT)• Using Formal syntax (SCFG)• While adding Soft constraints (weights)• of linguistic syntax (rule-based parses)• With fine-grained constituent features and
constraint conditions
16
Main Contributions• First time to achieve improvement using (soft) syntax info in
Hiero• Previous (Chiang 2005) negative result – not (or not only) due
to noisy parses– Finer syntactic rule resolution helps (NP, VP,…)– Finer (soft) constraint conditions help (NP=, NP+, VP=, VP+,
…)– Selective application: parse labels that are not ”standard” XP
constituent labels seem to be more noisy than helpful– Feature combos contribution not additive (can cancel
each other out • Inter-language variation, but IP and VP generally do well
cross-linguistically. – Within-language consistency (across test sets)
17
Future Work• Why do feature combos’ contributions sometimes
cancel each other out?– We found no simple correlation between finer-grained
feature scores (and/or boundary condition) and combination or conflation scores.
• Why did no NP variant yield much gain in Arabic? • Exploit other forms of soft constraints
18
Thanks• This work was supported in part by DARPA prime
agreement HR0011-06-2-0001. • Thanks to David Chiang and Adam Lopez for making their
source code available; • Thanks to the Stanford Parser team and Mary Harper for
making their parsers available; • Thanks to David Chiang, Amy Weinberg, and CLIP
Laboratory colleagues, particularly Adam Lopez, Chris Dyer, and Smaranda Muresan, for discussion and invaluable assistance.
19
Hiero Default Feature Set and the “Standard” XP Label Set
• Hiero default feature set: – LM, p(e|f), p(f|e), plex(e|f), plex(f|e),
rule (phrase) penalty and glue rule feature weights.– Chinese-only: number translation feature
• “Standard” linguistic labels: {CP, IP, NP, VP, PP, ADJP, ADVP, QP, LCP, DNP} – Excluding non-maximal projection labels such as VV,
NNP, etc.– excluding labels such as PRN (parentheses), FRAG
(fragment), etc.• XP= : disjunction of {CP=, IP=, …, DNP=}
20
Training sets+ full results
•
21
AdvP2 example in Arabic MT06Reference since the age of cassettes started , it is almost
impossible to find a house free of cassettes ,Baseline after entering the cassette now seems [missing
almost] impossible that homes [missing free] of these films [not so good]
AdvP2 after entering the cassette now almost impossible [correct] that free of [correct] the house tapes [good/better]
22
PP+.AdvP= Example: Arabic MT06Reference … in statements made to the iranian students news
agency , " we cannot expect anything else from people who have a zionist past "
Baseline … " we cannot expect other [-wise from] people who are moving a zionist . "
PP+.AdvP= … " we cannot expect other [-wise from] people who for them [=who have], was [=past] a zionist . "
23
PP+ Example: Arabic MT06Reference [the third decree ordered] the appointment of the syrian
representative to the united nations …Baseline … to appoint syria to the united nations representative
[bad word worder] …PP+ … to appoint a representative of syria to the
united nations …Reference … outside a bank in southeast baghdad .Baseline … in front of the bank , [misplaced comma] southeast
of baghdad .PP+ … in front of the bank of south east of baghdad .
Note that this example might be misleading in not being a good representative example of the feature’s contribution.