Post on 14-Dec-2015
Yang Liu
State Key Laboratory of Intelligent Technology and Systems
Tsinghua National Laboratory for Information Science and Technology
Department of Computer Science and TechnologyTsinghua University, Beijing 100084, China
ACL 2013
Introduction
目前的 statistical machine translation approach 大致上分為兩類 phrase-based syntax-based
提出 shift-reduce parsing algorithm 來整合兩類的優點
翻譯的對象是 string-to-dependency phrase pair
利用 maximum entropy model 來解決conflicts 的問題
Introduction
datasets: 使用 NIST Chinese-English translation datasets
evaluation : BLEU & TER , 並與phrase-based 和 syntax-based 結果相比較
Shift-Reduce Parsing for Phrase-based
String-to-Dependency Translation
Example:zongtong jiang yu siyue lai lundun
fangwen
The President will visit London in April
GIZA++
Context free grammar parser
Shift-Reduce Parsing for Phrase-based
String-to-Dependency Translation
Two broad categories: well-formed:
fixed floating – (left or right ,according to
position of head) ill-formedsource phrase target phrase dependen
cycategory
r1r2r3r4r5
fangwenyu siyue
zongtone jiangyu siyue lai
lundunzongtone jiang
visitin April
The President will
London in AprilPresident will
{}{1 2}{2 1}{2 3}
{}
fixedfixed
floating left
floating right
ill-formed
shift-reduce algorithm - exampletuple
<S,C>
從 empty state開始
terminate: 當所有 source words 都被翻譯且 stack 內有完整的 dependency tree 時
A Maximum Entropy Based Shift-Reduce Parsing Model
h : fixed l : left floating r : right
floating
A Maximum Entropy Based Shift-Reduce Parsing Model
maximum entropy model:
a ∈ {S , Rl , Rr} c : 為 boolean 值表示是否所有的 source
words 都 covered h(a, c, st, st-1) : vector of binary features Ѳ: vector of feature weights
A Maximum Entropy Based Shift-Reduce Parsing Model
A Maximum Entropy Based Shift-Reduce Parsing Model
為了 train model, 我們需要每個training example gold-standard action sequence
To alleviate this problem : derivation graph
Decoding
linear model with the following features: standard features
relative frequencies in two directions lexical weights in two directions phrase penalty distance-based reordering model lexicaized reordering model n-gram language model model word penalty
Decoding (continue)
dependency features: ill-formed structure penalty dependency language model maximum entropy parsing model
Decoding
Decoding
在 decoding 的過程中 ,stack 內的context information 會不斷變動(dependency language model and maximum entropy model probabilities)
使用 hypergraph reranking (Huang and Chiang, 2007; Huang, 2008) divided into two part
Decoding
為了提高 rule coverage, 使用 Shen et al. (2008) 的 ill-formed structures
如果 : ill-formed structure 有單一個 root : 當作
(pseudo) fixed structure 其他的 ill-formed structure 拆成一個
(pseudo) left floating structure 和一個(pseudo) right floating structure
Experiments
evaluated on Chinese-English translation
training data : 2.9M 個 sentence pairs,包含 76.0M Chinese words 和 82.2M English words
development set : 2002 NIST MT Chinese-English dataset
test sets: 2003-2005 NIST datasets
Experiments
用 Stanford parser 得到 English sentence 的 dependency trees
train a 4-gram language model on the Xinhua portion of the GIGAWORD corpus, which contains 238M English words
train a 3-gram dependency language model was trained on the English dependency trees
Experiments
compare with: The Moses phrase-based decoder
(Koehn et al., 2007) A re-implementation of bottom-up string-
to-dependency decoder (Shen et al., 2008)
b limit : 100 pharse table limit : 20
Experiments
Moses shares the same feature set with our system except for the dependency features.
For the bottom-up string-to-dependency system, we included both well-formed and ill-formed structures in chart parsing.
Experiments
Moses dependency
This work
Rule number 103M 587M 124M
avg. decoding time(per sentence)
3.67 s 13.89 s 4.56 s
Experiments
Experiments
Conclusion
提出 shift-reduce parsing algorithm for phrase-based string-to-dependency translation, 這個方法能整合 phrase-based和 string-to-dependency model 的優點 ,並在 Chinese-to-English translation 的實驗結果 ,outperform 兩個 baseline(phrase-based , syntax-based)
Future work
在 maximum entropy model 中增加更多的contextual information 來解決 conflicts的問題 , 另一方面 , 修改 Huang and Sagae (2010) 提出的 dynamic programming algorithm 來提高 string-to-dependency decoder 的效果