Linguistics 187/287 Week 6

Linguistics 187/287 Week 6Linguistics 187/287 Week 6

Martin Forst, Ron Kaplan, Martin Forst, Ron Kaplan, and Tracy Kingand Tracy King

GenerationGenerationTerm-rewrite SystemTerm-rewrite SystemMachine TranslationMachine Translation

GenerationGeneration

Parsing: string to analysis Generation: analysis to string What type of input? How to generate

Why generate?Why generate?

Machine translationLang1 string -> Lang1 fstr -> Lang2 fstr -> Lang2 string

Sentence condensationLong string -> fstr -> smaller fstr -> new string

Question answering Production of NL reports

– State of machine or process– Explanation of logical deduction

Grammar debugging

F-structures as inputF-structures as input

Use f-structures as input to the generator May parse sentences that shouldn’t be

generated May want to constrain number of generated

options Input f-structure may be underspecified

XLE generatorXLE generator

Use the same grammar for parsing and generation

Advantages– maintainability– write rules and lexicons once

But– special generation tokenizer– different OT ranking

Generation tokenizer/morphologyGeneration tokenizer/morphology

White space– Parsing: multiple white space becomes a single

TB John appears. -> John TB appears TB . TB

– Generation: single TB becomes a single space (or nothing)

John TB appears TB . TB -> John appears. *John appears .

Suppress variant forms– Parse both favor and favour– Generate only one

Morphconfig for parsing & generationMorphconfig for parsing & generation

STANDARD ENGLISH MOPRHOLOGY (1.0)

TOKENIZE:

P!eng.tok.parse.fst G!eng.tok.gen.fst

ANALYZE:

eng.infl-morph.fst G!amerbritfilter.fst

G!amergen.fst

----

Reversing the parsing grammarReversing the parsing grammar

The parsing grammar can be used directly as a generator

Adapt the grammar with a special OT ranking GENOPTIMALITYORDER

Why do this?– parse ungrammatical input– have too many options

Ungrammatical inputUngrammatical input

Linguistically ungrammatical– They walks.– They ate banana.

Stylistically ungrammatical– No ending punctuation: They appear– Superfluous commas: John, and Mary appear.– Shallow markup: [NP John and Mary] appear.

Too many optionsToo many options

All the generated options can be linguistically valid, but too many for applications

Occurs when more than one string has the same, legitimate f-structure

PP placement: – In the morning I left. I left in the morning.

Using the Gen OT rankingUsing the Gen OT ranking

Generally much simpler than in the parsing direction– Usually only use standard marks and NOGOOD

no * marks, no STOPPOINT– Can have a few marks that are shared by several

constructions

one or two for dispreferred

one or two for preferred

Example: Prefer initial PPExample: Prefer initial PP

S --> (PP: @ADJUNCT @(OT-MARK GenGood))

NP: @SUBJ;

VP.

VP --> V

(NP: @OBJ)

(PP: @ADJUNCT).

GENOPTIMALITYORDER NOGOOD +GenGood.

parse: they appear in the morning.

generate: without OT: In the morning they appear.

They appear in the morning.

with OT: In the morning they appear.

Debugging the generatorDebugging the generator

When generating from an f-structure produced by the same grammar, XLE should always generate

Unless:– OT marks block the only possible string– something is wrong with the tokenizer/morphology regenerate-morphemes: if this gets a string the tokenizer/morphology is not the problem

Hard to debug: XLE has robustness features to help

Underspecified InputUnderspecified Input

F-structures provided by applications are not perfect– may be missing features– may have extra features– may simply not match the grammar coverage

Missing and extra features are often systematic– specify in XLE which features can be added and

deleted Not matching the grammar is a more serious

problem

Adding featuresAdding features English to French translation:

– English nouns have no gender– French nouns need gender– Soln: have XLE add gender

the French morphology will control the value

Specify additions in xlerc:– set-gen-adds add "GEND"– can add multiple features:

set-gen-adds add "GEND CASE PCASE"– XLE will optionally insert the feature

Note: Unconstrained additions make generation undecidable

ExampleExample

[ PRED 'dormir<SUBJ>'

SUBJ [ PRED 'chat'

NUM sg

SPEC def ]

TENSE present ]

[ PRED 'dormir<SUBJ>'

SUBJ [ PRED 'chat'

NUM sg

GEND masc

SPEC def ]

TENSE present ]

The cat sleeps. -> Le chat dort.

Deleting featuresDeleting features

French to English translation– delete the GEND feature

Specify deletions in xlerc– set-gen-adds remove "GEND"– can remove multiple features set-gen-adds remove "GEND CASE PCASE"– XLE obligatorily removes the features no GEND feature will remain in the f-structure– if a feature takes an f-structure value, that f-

structure is also removed

Changing valuesChanging values

If values of a feature do not match between the input f-structure and the grammar:– delete the feature and then add it

Example: case assignment in translation– set-gen-adds remove "CASE"

set-gen-adds add "CASE"– allows dative case in input to become accusative

e.g., exceptional case marking verb in input language but regular case in output language

Generation for DebuggingGeneration for Debugging

Checking for grammar and lexicon errors– create-generator english.lfg– reports ill-formed rules, templates, feature

declarations, lexical entries

Checking for ill-formed sentences that can be parsed– parse a sentence– see if all the results are legitimate strings– regenerate “they appear.”

Rewriting/TransferRewriting/Transfer System System

Why a Rewrite SystemWhy a Rewrite System

Grammars produce c-/f-structure output Applications may need to manipulate this

– Remove features– Rearrange features– Continue linguistic analysis (semantics, knowledge

representation – next week)

XLE has a general purpose rewrite system (aka "transfer" or "xfr" system)

Sample Uses of Rewrite SystemSample Uses of Rewrite System

Sentence condensation Machine translation Mapping to logic for knowledge

representation and reasoning Tutoring systems

What does the system do?What does the system do?

Input: set of "facts" Apply a set of ordered rules to the facts

– this gradually changes the set of input facts

Output: new set of facts

Rewrite system uses the same ambiguity management as XLE– can efficiently rewrite packed structures,

maintaining the packing

Example F-structure FactsExample F-structure Facts

PERS(var(1),3)PRED(var(1),girl)CASE(var(1),nom)NTYPE(var(1),common)NUM(var(1),pl)

SUBJ(var(0),var(1))

PRED(var(0),laugh)TNS-ASP(var(0),var(2))TENSE(var(2),pres)

arg(var(0),1,var(1))lex_id(var(0),1)lex_id(var(1),0)

F-structures get var(#) Special arg facts lex_id for each PRED

Facts have two arguments (except arg) Rewrite system allows for any number of arguments

Rule formatRule format Obligatory rule: LHS ==> RHS. Optional rule: LHS ?=> RHS. Unresourced fact: |- clause. LHS

clause : match and delete+clause : match and keep-LHS : negation (don't have fact)LHS, LHS : conjunction( LHS | LHS ) : disjunction{ ProcedureCall } : procedural attachment

RHSclause : replacement facts0 : empty set of replacement factsstop : abandon the analysis

Example rulesExample rules

"PRS (1.0)"

grammar = toy_rules.

"obligatorily add a determiner if there is a noun with no spec"

+NTYPE(%F,%%), -SPEC(%F,%%)==>SPEC(%F,def).

"optionally make plural nouns singular this will split the choice space"

NUM(%F, pl) ?=> NUM(%F, sg).


SUBJ(var(0),var(1))



Example Obligatory RuleExample Obligatory Rule

"obligatorily add a determiner if there is a noun with no spec"

+NTYPE(%F,%%), -SPEC(%F,%%)==>SPEC(%F,def).


SUBJ(var(0),var(1))



Output facts: all the input facts plus: SPEC(var(1),def)

Example Optional RuleExample Optional Rule

"optionally make plural nouns singular this will split the choice space"

NUM(%F, pl) ?=> NUM(%F, sg).

PERS(var(1),3)PRED(var(1),girl)CASE(var(1),nom)NTYPE(var(1),common)NUM(var(1),pl)SPEC(var(1),def)

SUBJ(var(0),var(1))



Output facts: all the input facts plus choice split: A1: NUM(var(1),pl) A2: NUM(var(1),sg)

Output of example rulesOutput of example rules

Output is a packed f-structure Generation gives two sets of strings

– The girls {laugh.|laugh!|laugh}– The girl {laughs.|laughs!|laughs}

Manipulating setsManipulating sets

Sets are represented with an in_set feature– He laughs in the park with the telescopeADJUNCT(var(0),var(2))

in_set(var(4),var(2))

in_set(var(5),var(2))

PRED(var(4),in)

PRED(var(5),with)

Might want to optionally remove adjuncts– but not negation

Example Adjunct Deletion RulesExample Adjunct Deletion Rules

"optionally remove member of adjunct set"

+ADJUNCT(%%, %AdjSet), in_set(%Adj, %AdjSet), -PRED(%Adj, not)?=> 0.

"obligatorily remove adjunct with nothing in it"

ADJUNCT(%%, %Adj), -in_set(%%,%Adj)==> 0.

He laughs with the telescope in the park.He laughs in the park with the telescopeHe laughs with the telescope.He laughs in the park.He laughs.

Manipulating PREDsManipulating PREDs

Changing the value of a PRED is easy– PRED(%F,girl) ==> PRED(%F,boy).

Changing the argument structure is trickier– Make any changes to the grammatical functions– Make the arg facts correlate with these

Example Passive RuleExample Passive Rule

"make actives passive

make the subject NULL; make the object the subject;

put in features"

SUBJ( %Verb, %Subj), arg( %Verb, %Num, %Subj),

OBJ( %Verb, %Obj), CASE( %Obj, acc)

==>

SUBJ( %Verb, %Obj), arg( %Verb, %Num, NULL), CASE( %Obj, nom),

PASSIVE( %Verb, +), VFORM( %Verb, pass).

the girls saw the monkeys ==>The monkeys were seen.

in the park the girls saw the monkeys ==>In the park the monkeys were seen.

Templates and MacrosTemplates and Macros

Rules can be encoded as templatesn2n(%Eng,%Frn) ::

PRED(%F,%Eng), +NTYPE(%F,%%)

==> PRED(%F,%Frn).

@n2n(man, homme).

@n2n(woman, femme).

Macros encode groups of clauses/factssg_noun(%F) :=

+NTYPE(%F,%%), +NUM(%F,sg).

@sg_noun(%F), -SPEC(%F)

==> SPEC(%F,def).

Unresourced FactsUnresourced Facts

Facts can be stipulated in the rules and refered to– Often used as a lexicon of information not

encoded in the f-structure

For example, list of days and months for manipulation of dates|- day(Monday). |- day(Tuesday). etc.

|- month(January). |- month(February). etc.

+PRED(%F,%Pred), ( day(%Pred) | month(%Pred) ) ==> …

Rule OrderingRule Ordering

Rewrite rules are ordered (unlike LFG syntax rules but like finite-state rules)– Output of rule1 is input to rule2 – Output of rule2 is input to rule3

This allows for feeding and bleeding– Feeding: insert facts used by later rules– Bleeding: remove facts needed by later rules

Can make debugging challenging

Example of Rule FeedingExample of Rule Feeding

Early Rule: Insert SPEC on nouns+NTYPE(%F,%%), -SPEC(%F,%%) ==>

SPEC(%F, def).

Later Rule: Allow plural nouns to become singular only if have a specifier (to avoid bad count nouns)NUM(%F,pl), +SPEC(%F,%%) ==> NUM(%F,sg).

Example of Rule BleedingExample of Rule Bleeding

Early Rule: Turn actives into passives (simplified)SUBJ(%F,%S), OBJ(%F,%O) ==>

SUBJ(%F,%O), PASSIVE(%F,+).

Later Rule: Impersonalize actives SUBJ(%F,%%), -PASSIVE(%F,+) ==>

SUBJ(%F,%S), PRED(%S,they), PERS(%S,3), NUM(%S,pl).

– will apply to intransitives and verbs with (X)COMPs but not transitives

DebuggingDebugging XLE command line: tdbg

– steps through rules stating how they apply============================================Rule 1: +(NTYPE(%F,A)), -(SPEC(%F,B)) ==>SPEC(%F,def) File /tilde/thking/courses/ling187/hws/thk.pl, lines 4-10

Rule 1 matches: [+(2)] NTYPE(var(1),common) 1 --> SPEC(var(1),def)============================================Rule 2: NUM(%F,pl) ?=>NUM(%F,sg) File /tilde/thking/courses/ling187/hws/thk.pl, lines 11-17

Rule 2 matches: [3] NUM(var(1),pl) 1 --> NUM(var(1),sg)============================================Rule 5: SUBJ(%Verb,%Subj), arg(%Verb,%Num,%Subj), OBJ(%Verb,%Obj), CASE(%Obj,acc) ==>SUBJ(%Verb,%Obj), arg(%Verb,%Num,NULL), CASE(%Obj,nom), PASSIVE(%Verb,+), VFORM(%Verb,pass) File /tilde/thking/courses/ling187/hws/thk.pl, lines 28-37

Rule does not apply

girls laughed

Running the Rewrite SystemRunning the Rewrite System

create-transfer : adds menu items load-transfer-rules FILE : loads rules from file f-str window under commands has:

– transfer : prints output of rules in XLE window– translate : runs output through generator

Need to do (where path is $XLEPATH/lib):setenv LD_LIBRARY_PATH

/afs/ir.stanford.edu/data/linguistics/XLE/SunOS/lib

Rewrite SummaryRewrite Summary

The XLE rewrite system lets you manipulate the output of parsing– Creates versions of output suitable for applications– Can involve significant reprocessing

Rules are ordered Ambiguity management is as with parsing

Grammatical Machine TranslationGrammatical Machine Translation

Stefan Riezler & John Maxwell

Source

Translation SystemTranslation System

XLEParsing

TargetF-structuresXLE

Generation F-structures.Transfer

GermanLFG

Translationrules

EnglishLFG

+ Lots of statistics

Transfer-Rule Induction from Transfer-Rule Induction from aligned bilingual corporaaligned bilingual corpora

1. Use standard techniques to find many-to-many candidate word-alignments in source-target sentence-pairs

2. Parse source and target sentences using LFG grammars for German and English

3. Select most similar f-structures in source and target4. Define many-to-many correspondences between

substructures of f-structures based on many-to-many word alignment

5. Extract primitive transfer rules directly from aligned f-structure units

6. Create powerset of possible combinations of basic rules and filter according to contiguity and type matching constraints

InductionInduction

Example sentences: Dafür bin ich zutiefst dankbar.

I have a deep appreciation for that.

Many-to-many word alignment:Dafür{6 7} bin{2} ich{1} zutiefst{3 4 5} dankbar{5}

F-structure alignment:

Extracting Primitive Transfer RulesExtracting Primitive Transfer Rules

Rule (1) maps lexical predicates Rule (2) maps lexical predicates and interprets subj-to-subj link as indication to

map subj of source with this predicate into subject of target and xcomp of source into object of target

%X1, %X2, %X3, … are variables for f-structures

(2) PRED(%X1, sein), SUBJ(%X1,%X2), XCOMP(%X1,%X3) ==> PRED(%X1, have), SUBJ(%X1,%X2) OBJ(%X1,%X3)

(1) PRED(%X1, ich) ==> PRED(%X1, I)

Extracting Complex Transfer RulesExtracting Complex Transfer Rules Complex rules are created by taking all

combinations of primitive rules, and filtering

(4) zutiefst dankbar sein ==> have a deep appreciation

(5) zutiefst dankbar dafür sein ==> have a deep appreciation for that

(6) ich bin zutiefst dankbar dafür ==> I have a deep appreciation for that

Transfer Contiguity constraintTransfer Contiguity constraint Transfer contiguity constraint:

1. Source and target f-structures each have to be connected2. F-structures in the transfer source can only be aligned with

f-structures in the transfer target, and vice versa Analogous to constraint on contiguous and

alignment-consistent phrases in phrase-based SMT Prevents extraction of rule that would translate

dankbar directly into appreciation since appreciation is aligned also to zutiefst

Transfer contiguity allows learning idioms like es gibt - there is from configurations that are local in f-structure but non-local in string, e.g., es scheint […] zu geben - there seems […] to be

Linguistic Filters on Transfer RulesLinguistic Filters on Transfer Rules Morphological stemming of PRED values (Optional) filtering of f-structure snippets based on

consistency of linguistic categories– Extraction of snippet that translates zutiefst dankbar into a

deep appreciation maps incompatible categories adjectival and nominal; valid in string-based world

– Translation of sein to have might be discarded because of adjectival vs. nominal types of their arguments

– Larger rule mapping zutiefst dankbar sein to have a deep appreciation is ok since verbal types match

TransferTransfer

Parallel application of transfer rules in non-deterministic fashion– Unlike XLE ordered-rule rewrite system

Each fact must be transferred by exactly one rule Default rule transfers any fact as itself Transfer works on chart using parser’s unification

mechanism for consistency checking Selection of most probable transfer output is done by

beam-decoding on transfer chart

GenerationGeneration

Bi-directionality allows us to use same grammar for parsing training data and for generation in translation application

Generator has to be fault-tolerant in cases where transfer-system operates on FRAGMENT parse or produces non-valid f-structures from valid input f-structures

Robust generation from unknown (e.g., untranslated) predicates and from unknown f-structures

Robust GenerationRobust Generation

Generation from unknown predicates: – Unknown German word “Hunde” is analyzed by German

grammar to extract stem (e.g., PRED = Hund, NUM = pl) and then inflected using English default morphology (“Hunds”)

Generation from unknown constructions:– Default grammar that allows any attribute to be generated in

any order is mixed as suboptimal option in standard English grammar, e.g. if SUBJ cannot be generated as sentence-initial NP, it will be generated in any position as any category

» extension/combination of set-gen-adds and OT ranking

Statistical ModelsStatistical Models

1. Log-probability of source-to-target transfer rules, where probability r(e|f) or rule that transfers source snippet f into target snippet e is estimated by relative frequency

2. Log-probability of target-to-source transfer rules, estimated by relative frequency

r(e | f ) count( f e)

count( f e' )e

Statistical Models, cont.Statistical Models, cont.

3. Log-probability of lexical translations l(e|f) from source to target snippets, estimated from Viterbi alignments a* between source word positions i=1, …n and target word positions j=1,…,m for stems fi and ej in snippets f and e with relative word translation frequencies t(ej|fi):

4. Log-probability of lexical translations from target to source snippets

l(e | f ) 1

| {i | (i, j) a*} |t(ej | fi )

(i, j)a*

j

Statistical Model, cont.Statistical Model, cont.

5. Number of transfer rules6. Number of transfer rules with frequency 17. Number of default transfer rules8. Log-probability of strings of predicates from

root to frontier of target f-structure, estimated from predicate trigrams in English f-structures

9. Number of predicates in target f-structure10. Number of constituent movements during

generations based on original order of head predicates of the constituents

Statistical Models, cont.Statistical Models, cont.

11. Number of generation repairs

12. Log-probability of target string as computed by trigram language model

13. Number of words in target string

Experimental EvaluationExperimental Evaluation Experimental setup

– German-to-English on Europarl parallel corpus (Koehn ‘02)– Training and evaluation on sentences of length 5-15, for quick

experimental turnaround– Resulting in training set of 163,141 sentences, development

set of 1,967 sentences, test of 1,755 sentences (used in Koehn et al. HLT’03)

– Improved bidirectional word alignment based on GIZA++ (Och et al. EMNLP’99)

– LFG grammars for German and English (Butt et al. COLING’02; Riezler et al. ACL’02)

– SRI trigram language model (Stocke’02)– Comparison with PHARAOH (Koehn et al. HLT’03) and IBM

Model 4 as produced by GIZA++ (Och et al. EMNLP’99)

Experimental Evaluation, cont.Experimental Evaluation, cont.

Around 700,000 transfer rules extracted from f-structures chosen by dependency similarity measure

System operates on n-best lists of parses (n=1), transferred f-structures (n=10), and generated strings (n=1,000)

Selection of most probable translations in two steps:– Most probable f-structure by beam search (n=20) on transfer

chart using features 1-10– Most probable string selected from strings generated from

selected n-best f-structures using features 11-13

Feature weights for modules trained by MER on 750 in-coverage sentences of development set

Automatic EvaluationAutomatic Evaluation

NIST scores (ignoring punctuation) & Approximate Randomization for significance testing (see above)

44% in-coverage of grammars; 51% FRAGMENT parses and/or generation repair; 5% timeouts– In-coverage: Difference between LFG and P not significant– Suboptimal robustness techniques decrease overall quality

M4 LFG P

in-coverage 5.13 *5.82 *5.99

full test set *5.57 *5.62 6.40

Manual EvaluationManual Evaluation

Closer look at in-coverage examples:– Random selection of 500 in-coverage examples – Two independent judges indicated preference for

LFG or PHARAOH, or equality, in blind test– Separate evaluation under criteria of

grammaticality/fluency and translational/semantic adequacy

– Significance assessed by Approximate Randomization via stratified shuffling of preference ratings between systems

Manual EvaluationManual Evaluation

Result differences on agreed-on ratings are statistically significant at p < 0.0001

Net improvement in translational adequacy on agreed-on examples is 11.4% on 500 sentences (57/500), amounting to 5% overall improvement in hybrid system (44% of 11.4%)

Net improvement in grammaticality on agreed-on examples is 15.4% on 500 sentences, amounting to 6.7% overall improvement in hybrid system

adequacy grammaticality

j1\j2 P LFG eq P LFG eq

P 48 8 7 36 2 9

LFG 10 105 18 6 113 17

equal 53 60 192 51 44 223

Examples: LFG > PHARAOHExamples: LFG > PHARAOH

src: in diesem fall werde ich meine verantwortung wahrnehmen

sef: then i will exercise my responsibility

LFG: in this case i accept my responsibility

P: in this case i shall my responsibilities

src: die politische stabilität hängt ab von der besserung der lebensbedingungen

ref: political stability depends upon the improvement of living conditions

LFG: the political stability hinges on the recovery the conditions

P: the political stability is rejects the recovery of the living conditions

Examples: PHARAOH > LFGExamples: PHARAOH > LFG

src: das ist schon eine seltsame vorstellung von gleichheit

ref: a strange notion of equality

LFG: equality that is even a strange idea

P: this is already a strange idea of equality

src: frau präsidentin ich beglückwünsche herrn nicholson zu seinem ausgezeichneten bericht

ref: madam president I congratulate mr nicholson on his excellent report

LFG: madam president I congratulate mister nicholson on his report excellented

P: madam president I congratulate mr nicholson for his excellent report

DiscussionDiscussion High percentage of out-of-coverage examples

– Accumulation of 2 x 20% error-rates in parsing training data – Errors in rule extraction – Together result in ill-formed transfer rules causing high

number of generation failures/repairs

Propagation of errors through the system also for in-coverage examples– Error analysis: 69% transfer errors, 10% due to parse errors

Discrepancy between NIST and manual evaluation– Suboptimal integration of generator, making training and

translation with large n-best lists infeasible– Language and distortion models applied after generation

ConclusionConclusion

Integration of grammar-based generator into dependency-based SMT system achieves state-of-the-art NIST and improved grammaticality and adequacy on in-coverage examples

Possibility of hybrid system since it is determinable when sentences are in coverage of system

Grammatical Machine Grammatical Machine Translation IITranslation II

Ji Fang, Martin Forst, John Maxwell, and Michael Tepper

Overview over different Overview over different approaches to MT approaches to MT

Level of transfer Transfer Disambiguation

“Traditional” MT(e.g. Systran)

String(with minimal analysis)

Mainly hand-developed rules

Heuristics

Statistical MT(e.g. Google)

String(morpholical analysis)(synt. rearrangements)

Phrase correspondences with statistics acquired on

bitexts

Machine-Learned (transfer

probabilities, LM)

Grammatical MT I (2006)

F-structure Term-rewriting rules with statistics induced from

parsed bitexts

Machine-Learned (ME models, LM)

Context-Based MT (Meaningful

Machines)

String Semi-automatically developed phrase pairs

Machine-Learned (LM)

Grammatical MT II (2008)

F-structure Term-rewriting rules without statistics induced from semi-automatically developed phrase pairs,

potentially bitexts

Machine-Learned (ME models, LM)

Limitations of string-based Limitations of string-based approaches approaches

Transfer rules/correspondences of little generality

Problems with long-distance dependencies Perform less well for morphologically rich

(target) languages N-gram LM-based disambiguation seems to

have leveled out

Limitations of string-based Limitations of string-based approaches - little generalityapproaches - little generality

From Europarl: Das tut mir leid. = I’m sorry [about that].

Google (SMT): I’m sorry. Perfect! But: As soon as input changes a bit, we get garbage.

Das tut ihr leid. ‘She is sorry about that.’ It does their suffering.

Der Tod deines Vaters tut mir leid. ‘I am sorry about the death

of your father.’ The death of your father I am sorry.Der Tod deines Vaters tut ihnen leid. ‘They are sorry about the

death of your father.’ The death of your father is doing them sorry.

Limitations of string-based Limitations of string-based approaches - problems with LDDsapproaches - problems with LDDs

From Europarl: Dies stellt eine der großen Herausforderungen für die französische Präsidentschaft dar . =

This is one of the major issues of the French Presidency .

Google (SMT): This is one of the major challenges for the French presidency represents.

Particle verb is identified and translated correctly But: two verbs ungrammatical; seem to be too far

apart to be filtered by LM

Limitations of string-based Limitations of string-based approaches - rich morphology approaches - rich morphology

Language pairs involving morphologically rich languages, e.g., Finnish, are hard

From Koehn (2005, MT Summit)

Limitations of string-based Limitations of string-based approaches - rich morphology approaches - rich morphology

Morphologically rich, free word order languages, e.g. German, are particularly hard as target languages.

Again from Koehn (2005, MT Summit)

Limitations of string-based Limitations of string-based approaches - n-gram LMs approaches - n-gram LMs Even for morphologically poor languages,

improving n-gram LMs becomes increasingly expensive.

Adding data helps improve translation quality (BLEU scores), but not enough.

Assuming best improvement rate observed in Brants et al. (2007), ~400 million times available data needed to attain human translation quality by LM improvement.

Limitations of string-based Limitations of string-based approaches - n-gram LMs approaches - n-gram LMs

From Brants et al. (2007)

Best improvement rate: +0.7 BP/x2

Would need 40 more doublings to obtain human translation quality. (42 + 0.7*40 ≈ 70)

Necessary training data in tokens: 1e22 (1e10*2^40 ≈ 1e22)

4e8 times current English Web (estimate) (2.5e13*4e8 = 1e22)

Limitations of bitext-based Limitations of bitext-based approaches approaches Generally available bitexts are limited in size and

specialized in genre– Parliament proceedings– UN texts– Judiciary texts (from multilingual countries)

Makes it hard to repurpose bitext-based systems to new genres

Induced transfer rules/correspondences often of mediocre quality– “Loose” translations– Bad alignments

Limitations of bitext-based Limitations of bitext-based approaches - availability and qualityapproaches - availability and quality

Readily available bitexts are limited in size and specialized in genre

Approaches to auto-extracting bitexts from the web exist.

Additional data help to some degree, but then effect levels out.– Still a genre bias in bitexts, despite automatic

acquisition?– Still more general problems with alignment quality

etc.?

Limitations of bitext-based Limitations of bitext-based approaches - availability and qualityapproaches - availability and quality

Much more data needed to attain human translation quality

Logarithmic gains (at best) by adding bitext data

From Munteanu & Marcu (2005)

Base Line: 100K - 95M English Words

Mid Line (+auto): + 90K - 2.1M

Top Line (+oracle): + 90K - 2.1M

Context-Based MT /Context-Based MT /Meaningful MachinesMeaningful Machines

Combines example-based MT (EBMT) and SMT

Very large (target) language model, large amount of monolingual text required

No transfer statistics, thus no parallel text required

Translation lexicon is developed semi-automatically (i.e. hand-validated)

Lexicon has slotted phrase pairs (like EBMT), i.e. “NP1 biss ins Gras.” = “NP1 bit the dust.”

Context-Based MT /Context-Based MT /Meaningful Machines - prosMeaningful Machines - pros

High-quality translation lexicon seems to allow for– Easier repurposing of system(s) to new genres– Better translation quality

From Carbonell (2006)

Context-Based MT /Context-Based MT /Meaningful Machines - consMeaningful Machines - cons

Works really well for English-Spanish. How about other language pairs?

Same problems with n-gram LMs as “traditional” SMT; probably affects pairs involving morphologically rich (target) language particularly badly.

How much manual labor involved in development of translation lexicon?

Computationally expensive

Grammatical Machine TranslationGrammatical Machine Translation

Syntactic transfer-based approach Parsing and generation identical/similar

between GMT I and GMT II

pyramid

String-level statistical methods

F-structure transfer rules

– transfer, score target FSs –

– pa

rse

sour

ce, s

core

f-st

ruct

ures

– – generate, pick best realization –

Grammatical Machine Translation Grammatical Machine Translation GMT I vs. GMT IIGMT I vs. GMT II

GMT I

Transfer rules induced from parsed bitexts

Target f-structures ranked using individual transfer rule statistics

GMT II

Transfer rules induced from manually/semi-automatically construc-ted phrase lexicon

Target f-structures ranked using monolingually trained bilexical dependency statistics and general transfer rule statistics

GMT IIGMT II Where do the transfer rules come from? Where do statistics/machine learning come

in?pyramid

String-level statistical methods

F-structure transfer rules

– transfer, score target FSs –

– pa

rse

sour

ce, s

core

f-st

ruct

ures

– – generate, pick best realization –

log-linear model trained on synt. annotated monolingual corpus

log-linear model trained on bitext data; includes score from parse ranking model and very general transfer features

log-linear model trained on bitext data; includes scores from other two models and features/score of monolingually trained model for realization ranking

induced from manually/semi-automatically compiled phrase pairs with ``slots’’; potentially, but not necessarily from bitexts

GMT II - The phrase dictionaryGMT II - The phrase dictionary

Contains phrase pairs with ``slot’’ categories (Ddeff, Ddef, NP1nom, NP1, etc.) that allow for well-formed phrases without being included in induced rules

Currently hand-written Will hopefully be compiled (semi-)automati-

cally from bilingual dictionaries Bitexts might also be used; how exactly

remains to be defined.

GMT II - Rule induction from the GMT II - Rule induction from the phrase dictionaryphrase dictionary Sub-FSs of “slot” variables are not included FS attributes can be defined as irrelevant for

translation, e.g. CASE (in both en and de), GEND (in de). Attributes so defined are never included in induced rules.

set-gen-adds remove CASE GEND FS attributes can be defined as

“remove_equal_features”. Attributes defined as such are not included in induced rules when they are equal.

set remove_equal_features NUM OBJOBL-AG PASSIVE SUBJ TENSE

more general rules

GMT II - Rule induction from the GMT II - Rule induction from the phrase dictionary (noun)phrase dictionary (noun) Ddeff Verfassung = Ddef constitution

PRED(%X1, Verfassung),NTYPE(%X1, %Z2),

NSEM(%Z2, %Z3),COMMON(%Z3, count),

NSYN(%Z2, common)==>PRED(%X1, constitution),NTYPE(%X1, %Z4),

NSYN(%Z4, common).

GMT II - Rule induction from the GMT II - Rule induction from the phrase dictionary (adjective)phrase dictionary (adjective) europäische = European

PRED(%X1, europäisch) ==>PRED(%X1, European).

To accommodate certain non-parallelism with respect to SUBJs of adjectives etc., special mechanism removes SUBJs of non-verbs and makes them addable in generation.

GMT II - Rule induction from the GMT II - Rule induction from the phrase dictionary (verb)phrase dictionary (verb) NP1nom koordiniert NP2acc. =

NP1 coordinates NP2.

PRED(%X1, koordinieren),arg(%X1, 1, %A2),arg(%X1, 2, %A3),VTYPE(%X1, main)==>PRED(%X1, coordinate),arg(%X1, 1, %A2),arg(%X1, 2, %A3),VTYPE(%X1, main).

GMT II - Rule inductionGMT II - Rule induction(argument switching)(argument switching) NP1nom tut NP2dat leid. = NP2

is sorry about NP1.

PRED(%X1, leid#tun),SUBJ(%X1, %A2),OBJ-TH(%X1, %A3),VTYPE(%X1, main)==>PRED(%X1,be),SUBJ(%X1,%A3),XCOMP-PRED(%X1,%Z1),

PRED(%Z1, sorry),OBL(%Z1,%Z2),

PRED(%Z2,about),OBJ(%Z2,%A2),

VTYPE(%X1,copular).

GMT II - Rule inductionGMT II - Rule induction(head switching)(head switching) Ich versuche nur, mich jeder Demagogie zu enthalten. =

It is just that I am trying not to indulge in demagoguery.

NP1nom Vfin nur. = It is ist just that NP1 Vs. +ADJUNCT(%X1,%Z2), in_set(%X3,%Z2), PRED(%X3,nur), ADV-

TYPE(%X3,unspec)

==>

PRED(%Z4,be), SUBJ(%Z4,%X3), NTYPE(%X3,%Z5), NSYN(%Z5,pronoun), GEND-SEM(%Z5,nonhuman), HUMAN(%Z5,-), NUM(%Z5,sg), PERS(%Z5,3), PRON-FORM(%Z5,it),

PRON-TYPE(%Z5,expl_), arg(%Z4,1,%Z6), PRED(%Z6, just), SUBJ(%Z6,%Z7), arg(%Z6,1,%A1), COMP-FORM(%A1,that), COMP(%Z6,%A1), nonarg(%Z6,1,%Z7), ATYPE(%Z6,predicative), DEGREE(%Z6, positive), nonarg(%Z4,1,%X3),

TNS-ASP(%Z4,%Z8), MOOD(%Z8,indicative), TENSE(%Z8, pres), XCOMP-PRED(%Z4,%Z6), CLAUSE-TYPE(%Z4,decl), PASSIVE(%Z4,-), VTYPE(%A2,copular).

GMT II - Rule inductionGMT II - Rule induction(more on head switching)(more on head switching) In addition to rewriting terms, system re-attaches

rewritten FS if necessary. Here, this might be the case of %X1.

+ADJUNCT(%X1,%Z2), in_set(%X3,%Z2), PRED(%X3,nur), ADV-TYPE(%X3,unspec)==>PRED(%Z4,be), SUBJ(%Z4,%X3), NTYPE(%X3,%Z5), NSYN(%Z5,pronoun), GEND-SEM(%Z5,nonhuman), HUMAN(%Z5,-), NUM(%Z5,sg), PERS(%Z5,3), PRON-FORM(%Z5,it),

PRON-TYPE(%Z5,expl_), arg(%Z4,1,%Z6), PRED(%Z6, just), SUBJ(%Z6,%Z7), arg(%Z6,1,%A1), COMP-FORM(%A1,that), COMP(%Z6,%A1), nonarg(%Z6,1,%Z7), ATYPE(%Z6,predicative), DEGREE(%Z6, positive), nonarg(%Z4,1,%X3),

TNS-ASP(%Z4,%Z8), MOOD(%Z8,indicative), TENSE(%Z8, pres), XCOMP-PRED(%Z4,%Z6), CLAUSE-TYPE(%Z4,decl), PASSIVE(%Z4,-), VTYPE(%A2,copular).

GMT II - Pros and cons of rule GMT II - Pros and cons of rule induction from a phrase dictionaryinduction from a phrase dictionary Development of phrase pairs can be carried out by someone

with little knowledge of grammar and transfer system; manual development of transfer rules would require experts (for boring, repetitive labor).

Phrase pairs can remain stable while grammars keep evolving. Since transfer rules are induced fully automatically, they can easily be kept in sync with grammars.

Induced rules are of much higher quality than rules induced from parsed bitexts (GMT I).

Although there is hope that phrase pairs can be constructed semi-automatically from bilingual dictionaries, it is not yet clear to what extent this can be automated.

If rule induction from parsed bitexts can be improved, the two approaches might well be complementary.

Lessons Learned for Parallel Lessons Learned for Parallel Grammar DevelopmentGrammar Development

Absence of a feature like PERF=+/- is not equivalent to PERF=-.

FS-internal features should not say anything about the function of the FS– Example: PRON-TYPE=poss instead of PRON-

TYPE=pers

Compounds should be analyzed similarly, whether spelt together (de) or apart (en)– Possible with SMOR– Very hard or even impossible with DMOR

Absence of PERF Absence of PERF PERF=- PERF=-

No function info in FS-internal No function info in FS-internal featuresfeatures

I think NP1 Vs. = In my opinion NP1 Vs.

Parallel analysis of compoundsParallel analysis of compounds

More Lessons Learned for Parallel More Lessons Learned for Parallel Grammar DevelopmentGrammar Development

ParGram needs to agree on a parallel PRED value for (personal) pronouns

We need an “interlingua” for numbers, clock times, dates etc.

Guessers should analyze (composite) names similarly

Parallel PRED values for Parallel PRED values for (personal) pronouns(personal) pronouns

Otherwise the number of rules we have to learn for them explodes.de-en: pro/er → he, pro/er → it, pro/sie → she, pro/sie → it,

pro/es → it, pro/es → he, pro/es → sheAlso: PRED-NUM-PERS combination may make no

sense!!! Result: A lot of generator effort for nothing…en-de: he → pro/er, she → pro/sie, it → pro/es, it → pro/er,

it → pro/sie, …

Interlingua for numbers, clock Interlingua for numbers, clock times, dates, etc.times, dates, etc.

We cannot possibly learn transfer rules for all dates.

Guessed (composite) namesGuessed (composite) names

We cannot possibly learn transfer rules for all proper names in this world.

And Yet More Lessons Learned for And Yet More Lessons Learned for Grammar DevelopmentGrammar Development

Reflexive pronouns - PERS and NUM agreement should be insured via inside-out function application, e.g. ((SUBJ ^) PERS)= (^PERS).

Semantically relevant features should not be hidden in CHECK

Reflexive pronounsReflexive pronouns

Introduce their own values for PERS and NUM– Overgeneration: *Ich wasche sich.– NUM ambiguity for (frequent) “sich”– Less generalization possible in transfer rules for

inherently reflexive verbs - 6 rules necessary instead of 1.

Reflexive pronounsReflexive pronouns

Semantically relevant features in Semantically relevant features in CHECKCHECK

sie = they Sie = you (formal)

Since CHECK features are not used for translations, the distinction between “sie” and “Sie” is lost.

Planned experiments - MotivationPlanned experiments - Motivation

We do not have the resources to develop a “general purpose” phrase dictionary in the short or medium term.

Nevertheless, we want to get an idea about how well our new approach may scale.

Planned Experiments 1Planned Experiments 1

Manually develop phrase dictionary for a few hundred Europarl sentences

Train target FS ranking model and realization ranking model on those sentences

Evaluate output in terms of BLEU, NIST and manually

Can we make this new idea work under ideal conditions? It seems we can.



Use bilingual dictionary to add possible phrase pairs that may distract the system



How well can our system deal with the “distractors”?



Use bilingual dictionary to add possible phrase pairs that may distract the system

Degrade the phrase dictionary at various levels of severity– Take out a certain percentage of phrase pairs– Shorter phrases may be penalized less than longer ones



How good or bad is the output of the system when the bilingual phrase dictionary lacks coverage?

Main Remaining ChallengesMain Remaining Challenges

Get comprehensive and high-quality dictionary of phrase pairs

Get more and better (i.e. more normalized and parallel) analyses from grammars

Improve ranking models, in particular on source side Improve generation behavior of grammars - So far,

grammar development has mostly been “parsing-oriented”.

Efficiency, in particular on the generation side, i.a. packed transfer and generation

Linguistics 187/287 Week 6

Documents

Transcript of Linguistics 187/287 Week 6