Stochastic Inversion Transduction Grammars Dekai Wu 11-734 Advanced Machine Translation Seminar...
-
Upload
alvin-gallagher -
Category
Documents
-
view
221 -
download
0
Transcript of Stochastic Inversion Transduction Grammars Dekai Wu 11-734 Advanced Machine Translation Seminar...
Stochastic Inversion Transduction Grammars Dekai Wu
11-734 Advanced Machine Translation Seminar
Presented by:
Sanjika Hewavitharana
04/13/2006
Overview
Simple Transduction Grammars
Inversion Transduction Grammars (ITGs)
Stochastic ITGs
Parsing with SITGs
Applications of SITGs
Main Reading: Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora (1997)
Introduction
Mathematical models of translation IBM Models (Brown et al.): String generates String Syntax based (Yamada & Kenji): Tree generates String ITG (Wu): two trees are generated simultaneously
ITGs A formalism for modeling bilingual sentence pairs Not intended to use as full translation models, but to use for
parallel corpus analysis Extract useful structures from input data
Generative view rather than translation view two output trees are generated simultaneously, one for each
language
Transduction Grammars
A simple transduction grammar is a CFG whose terminals are pairs of symbols (or singletons)
Can be used to model the generation of bilingual sentence pairs
E The Financial Secretary and I will be accountable.
C
yA
xA
yxA
/
/
/
21, LyLx
Transduction Grammar Rules E.g.
Simple Rules:
Inversion Rule:
Transduction Grammars
A simple transduction grammar is a CFG whose terminals are pairs of symbols (or singletons)
Can be used to model the generation of bilingual sentence pairs
E
C
yA
xA
yxA
/
/
/
21, LyLx
Transduction Grammars
In general, they are not very useful two languages should share exactly the same grammatical
structure
So some sentence pairs cannot be generated
ITG removes the rigid parallel ordering constraint Constituent order in one language may be the inverse of the
other language
Order is the same for both (square brackets):
Order is inverted for one (angle brackets):
CBA
CBA
ITGs
e.g.
With ITG we can parse the previous sentence pair Inversion rule: VP VV PP
ccc
eee
CBA
CBA
BCA
ccc
eee
BCA
CBA
BCA
ITG Parse Tree
Expressiveness of ITGs
Expressiveness of ITGs
Not all matching are possible with ITG e.g. ‘Inside-out’ matching are not allowed
This helps to reduce the combinatorial growth of matchings with the number of tokens
The number of matchings eliminated increases rapidly as the number of tokens increases
Author claims this is a benefit
Expressiveness of ITGs
Normal Form of ITG
For any ITG there exists an equivalent grammar in the normal form
Right hand side of all rules have either:
Terminal couples
Terminal singletons
Pairs of non-terminals with straight orientation
Pairs of non-terminals with inverted orientation
yxA /
/,/,/ AyAxA
BCA
BCA
Stochastic ITGs
A probability can be assigned to each rewrite rule
The probabilities of all the rules with a given left hand side must sum to 1.
An SITG will give the most probable matching (ML) parse for a sentence pair. Similar to Viterbi or CYK (Chart) parsing
001.0),(
4.0][
yxb
a
A
NANN
Parsing with SITGs
Every node (q) in the parse tree has 5 elements: Begin & end indices for language-1 string (s,t) Begin & end indices for language-2 string (u,v) Non-terminal category (i)
Each cell (in the chart) stores the probability of the most likely parse covering the appropriate substrings, rooted in the appropriate category
Parsing with SITGs - Algorithm
Initialize the cells corresponding to terminals using a translation lexicon
For the other cells, recursively find the most probable way of obtaining that nonterminal category.
Compute the probability by multiplying the probability of the rule by the probabilities of both the constituents
Store that probability plus the orientation of the rule
Complexity: O(n3m3)
Applications of SITGs
Segmentation
Bracketing
Alignment
Bilingual Constraint Transfer
Mining parallel sentences from comparable corpora
[Wu & Fung 2005]
Applications of SITGs - Segmentation
Word boundaries are not marked in Chinese text No word chunks available for matching
One option : do word segmentation as preprocessing Might produce chunks with that does not agree bilingually
Solution: extend the algorithm to accommodate segmentation Allow the initialization step to find strings of any length in the
translation lexicon The recursive step stores the most probable way of creating a
constituent, whether it came from the lexicon or from rules
Applications of SITGs – Bracketing
How to assign structure to a sentence with no grammar available? Especially problematic for minority language
A solution using ITGs: Get a parallel corpus pairing it with some other language Get a reasonable translation dictionary Parse it with a bracketing transduction grammar
Bracketing Transduction Grammar
A minimal ITG Only one nonterminal: A Production rules:
Lexical translation probabilities has prominence Small prob. values for the two singleton production rules Also, a very small value for
AAA
AAAa
a
ji
b
vuAij
/
j
b
i
b
vA
uAj
i
/
/
ijb
a
Bracketing with Singletons
Singletons cause bracketing errors Some refinements:
Depending on the language, bias the singletons attachment either to the left or the right of a constituent
Apply a series of transformations which would push the singletons as closely as possible towards couplese.g. [ x A B ] ⇌ x A B ⇌ x A B ⇌ [x A ] B
Before:
After:
Bracketing Experiments
Used 2000 Chinese-English sentence-pairs from HKUST corpus
Some filtering: Remove sentence pairs that were not adequately covered by
the lexicon (>1 unknown words) Remove sentence pairs with high unmatched words (>2)
Bracketing precision: 80% for English 78% for Chinese
Errors mainly due to lexical imperfections
A statistical lexicon (~6.5k English, ~5.5k Chinese words)
Can be improved with extra information
e.g. POS, grammar-based bracketer
Applications of SITGs - Alignment
Alignments (phrasal or word) are a natural byproduct of bilingual parsing
Unlike ‘parse-parse-match’ methods, this Doesn’t require a robust grammar for both languages Guarantees compatibility between parses Has a principled way of choosing between possible alignments
Provides a more reasonable ‘distortion penalty’
Recent empirical studies show ITGs produce better alignments in various applications [Wu & Fung 2005]
Bilingual Constraint Transfer
A high-quality parse for one language can be leveraged to get structure for the other
Alter the parsing algorithm: only allow constituents that match the parse that already
exists for the well-studied language
This works for any sort of constraint supplied for the well-studied language
References:
Dekai Wu (1997), Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora, Computational Linguistics, Vol. 23, no. 1, pp. 377-403.
Dekai Wu (1995), Grammarless Extraction of Phrasal Translation Examples from Parallel Texts, 6th Intl. Conf.on Theoretical and Methodological Issues in Machine Translation, Vol. 2, pp. 354-372. Leuven, Belgium.
Dekai Wu and Pascale FUNG (2005), Inversion Transduction Grammar Constraints for Mining Parallel Sentences from Quasi-Comparable Corpora, 2nd Intl. Joint Conf. on Natural Language Processing (IJCNLP-2005), Jeju, Korea, October.