Stochastic Inversion Transduction Grammars Dekai Wu 11-734 Advanced Machine Translation Seminar...

Stochastic Inversion Transduction Grammars Dekai Wu

11-734 Advanced Machine Translation Seminar

Presented by:

Sanjika Hewavitharana

04/13/2006

Overview

Simple Transduction Grammars

Inversion Transduction Grammars (ITGs)

Stochastic ITGs

Parsing with SITGs

Applications of SITGs

Main Reading: Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora (1997)

http://acl.ldc.upenn.edu/J/J97/J97-3002.pdf


Introduction

Mathematical models of translation IBM Models (Brown et al.): String generates String Syntax based (Yamada & Kenji): Tree generates String ITG (Wu): two trees are generated simultaneously

ITGs A formalism for modeling bilingual sentence pairs Not intended to use as full translation models, but to use for

parallel corpus analysis Extract useful structures from input data

Generative view rather than translation view two output trees are generated simultaneously, one for each

language

Transduction Grammars

A simple transduction grammar is a CFG whose terminals are pairs of symbols (or singletons)

Can be used to model the generation of bilingual sentence pairs

E The Financial Secretary and I will be accountable.

C

yA

xA

yxA

/

/

/

21, LyLx

Transduction Grammar Rules E.g.

Simple Rules:

Inversion Rule:


A simple transduction grammar is a CFG whose terminals are pairs of symbols (or singletons)

Can be used to model the generation of bilingual sentence pairs

E

C

yA

xA

yxA

/

/

/

21, LyLx


In general, they are not very useful two languages should share exactly the same grammatical

structure

So some sentence pairs cannot be generated

ITG removes the rigid parallel ordering constraint Constituent order in one language may be the inverse of the

other language

Order is the same for both (square brackets):

Order is inverted for one (angle brackets):

CBA

CBA

ITGs

e.g.

With ITG we can parse the previous sentence pair Inversion rule: VP VV PP

ccc

eee

CBA

CBA

BCA

ccc

eee

BCA

CBA

BCA

ITG Parse Tree

Expressiveness of ITGs


Not all matching are possible with ITG e.g. ‘Inside-out’ matching are not allowed

This helps to reduce the combinatorial growth of matchings with the number of tokens

The number of matchings eliminated increases rapidly as the number of tokens increases

Author claims this is a benefit

Normal Form of ITG

For any ITG there exists an equivalent grammar in the normal form

Right hand side of all rules have either:

Terminal couples

Terminal singletons

Pairs of non-terminals with straight orientation

Pairs of non-terminals with inverted orientation

yxA /

/,/,/ AyAxA

BCA

BCA

Stochastic ITGs

A probability can be assigned to each rewrite rule

The probabilities of all the rules with a given left hand side must sum to 1.

An SITG will give the most probable matching (ML) parse for a sentence pair. Similar to Viterbi or CYK (Chart) parsing

001.0),(

4.0][

yxb

a

A

NANN

Parsing with SITGs

Every node (q) in the parse tree has 5 elements: Begin & end indices for language-1 string (s,t) Begin & end indices for language-2 string (u,v) Non-terminal category (i)

Each cell (in the chart) stores the probability of the most likely parse covering the appropriate substrings, rooted in the appropriate category

Parsing with SITGs - Algorithm

Initialize the cells corresponding to terminals using a translation lexicon

For the other cells, recursively find the most probable way of obtaining that nonterminal category.

Compute the probability by multiplying the probability of the rule by the probabilities of both the constituents

Store that probability plus the orientation of the rule

Complexity: O(n3m3)

Applications of SITGs

Segmentation

Bracketing

Alignment

Bilingual Constraint Transfer

Mining parallel sentences from comparable corpora

[Wu & Fung 2005]

Applications of SITGs - Segmentation

Word boundaries are not marked in Chinese text No word chunks available for matching

One option : do word segmentation as preprocessing Might produce chunks with that does not agree bilingually

Solution: extend the algorithm to accommodate segmentation Allow the initialization step to find strings of any length in the

translation lexicon The recursive step stores the most probable way of creating a

constituent, whether it came from the lexicon or from rules

Applications of SITGs – Bracketing

How to assign structure to a sentence with no grammar available? Especially problematic for minority language

A solution using ITGs: Get a parallel corpus pairing it with some other language Get a reasonable translation dictionary Parse it with a bracketing transduction grammar

Bracketing Transduction Grammar

A minimal ITG Only one nonterminal: A Production rules:

Lexical translation probabilities has prominence Small prob. values for the two singleton production rules Also, a very small value for

AAA

AAAa

a

ji

b

vuAij

/

j

b

i

b

vA

uAj

i

/

/

ijb

a

Bracketing with Singletons

Singletons cause bracketing errors Some refinements:

Depending on the language, bias the singletons attachment either to the left or the right of a constituent

Apply a series of transformations which would push the singletons as closely as possible towards couplese.g. [ x A B ] ⇌ x A B ⇌ x A B ⇌ [x A ] B

Before:

After:

Bracketing Experiments

Used 2000 Chinese-English sentence-pairs from HKUST corpus

Some filtering: Remove sentence pairs that were not adequately covered by

the lexicon (>1 unknown words) Remove sentence pairs with high unmatched words (>2)

Bracketing precision: 80% for English 78% for Chinese

Errors mainly due to lexical imperfections

A statistical lexicon (~6.5k English, ~5.5k Chinese words)

Can be improved with extra information

e.g. POS, grammar-based bracketer

Applications of SITGs - Alignment

Alignments (phrasal or word) are a natural byproduct of bilingual parsing

Unlike ‘parse-parse-match’ methods, this Doesn’t require a robust grammar for both languages Guarantees compatibility between parses Has a principled way of choosing between possible alignments

Provides a more reasonable ‘distortion penalty’

Recent empirical studies show ITGs produce better alignments in various applications [Wu & Fung 2005]

Bilingual Constraint Transfer

A high-quality parse for one language can be leveraged to get structure for the other

Alter the parsing algorithm: only allow constituents that match the parse that already

exists for the well-studied language

This works for any sort of constraint supplied for the well-studied language

References:

Dekai Wu (1997), Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora, Computational Linguistics, Vol. 23, no. 1, pp. 377-403.

Dekai Wu (1995), Grammarless Extraction of Phrasal Translation Examples from Parallel Texts, 6th Intl. Conf.on Theoretical and Methodological Issues in Machine Translation, Vol. 2, pp. 354-372. Leuven, Belgium.

Dekai Wu and Pascale FUNG (2005), Inversion Transduction Grammar Constraints for Mining Parallel Sentences from Quasi-Comparable Corpora, 2nd Intl. Joint Conf. on Natural Language Processing (IJCNLP-2005), Jeju, Korea, October.



http://www.cs.ust.hk/%7Edekai/library/WU_Dekai/tmi95.Wu.ps

http://www.cs.ust.hk/%7Edekai/library/WU_Dekai/tmi95.Wu.ps

http://www.cs.ust.hk/%7Edekai/library/WU_Dekai/WuFung_IJCNLP2005.pdf

http://www.cs.ust.hk/%7Edekai/library/WU_Dekai/WuFung_IJCNLP2005.pdf

Stochastic Inversion Transduction Grammars Dekai Wu 11-734 Advanced Machine Translation Seminar...

Documents

Transcript of Stochastic Inversion Transduction Grammars Dekai Wu 11-734 Advanced Machine Translation Seminar...