The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of...
Transcript of The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of...
![Page 1: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/1.jpg)
The AVENUE Project Data Elicitation System
Lori LevinLanguage Technologies Institute
School of Computer ScienceCarnegie Mellon University
![Page 2: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/2.jpg)
Joint work with
• Dr. Jeff Good
• Dr. Robert Frederking
• Alison Alvarez
![Page 3: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/3.jpg)
Outline
• The AVENUE MT project– Including a list of languages we have worked on
• The elicitation tool– Including which kinds of fonts it works for
• The elicitation corpus– Including which languages it has been translated into
• Tools for building and revising elicitation corpora
![Page 4: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/4.jpg)
MT Approaches
Interlingua: introduce-self
Syntactic ParsingPronoun-acc-1-sg chiamare-1sg N
Semantic Analysis
Sentence Planning Text
Generation[np poss-1sg “name”] BE-pres N
SourceMi chiamo Lori
TargetMy name is Lori
Transfer Rules
Direct: SMT, EBMT
AVENUE: Automate Rule Learning
![Page 5: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/5.jpg)
AVENUE Machine Translation System
Type informationSynchronous Context Free
RulesAlignments
x-side constraints
y-side constraints
xy-constraints, e.g. ((Y1 AGR) = (X1 AGR))
;SL: the old man, TL: ha-ish ha-zaqen
NP::NP [DET ADJ N] -> [DET N DET ADJ]((X1::Y1)(X1::Y3)(X2::Y4)(X3::Y2)
((X1 AGR) = *3-SING)((X1 DEF = *DEF)((X3 AGR) = *3-SING)((X3 COUNT) = +)
((Y1 DEF) = *DEF)((Y3 DEF) = *DEF)((Y2 AGR) = *3-SING)((Y2 GENDER) = (Y4 GENDER)))
Jaime Carbonell (PI), Alon Lavie (Co-PI), Lori Levin (Co-PI)
Rule learning: Katharina Probst
![Page 6: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/6.jpg)
AVENUE
• Rules can be written by hand or learned automatically.
• Hybrid– Rule-based transfer– Statistical decoder– Multi-engine combinations with SMT and EBMT
![Page 7: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/7.jpg)
AVENUE systems(Small and experimental, but tested on unseen data)
• Hebrew-to-English – Alon Lavie, Shuly Wintner, Katharina Probst– Hand-written and automatically learned– Automatic rules trained on 120 sentences perform
slightly better than about 20 hand-written rules.
• Hindi-to-English – Lavie, Peterson, Probst, Levin, Font, Cohen, Monson– Automatically learned– Performs better than SMT when training data is limited
to 50K words
![Page 8: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/8.jpg)
AVENUE systems(Small and experimental, but tested on unseen data)
• English-to-Spanish– Ariadna Font Llitjos– Hand-written, automatically corrected
• Mapudungun-to-Spanish – Roberto Aranovich and Christian Monson– Hand-written
• Dutch-to-English – Simon Zwarts– Hand-written
![Page 9: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/9.jpg)
Outline
• The AVENUE MT projectThe elicitation tool
• The questionnaire
• Tools for building questionnaires
![Page 10: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/10.jpg)
Elicitation
• Get data from someone who is– Bilingual – Literate
• With consistent spelling
– Not experienced with linguistics
![Page 11: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/11.jpg)
English-Hindi Example
Elicitation Tool: Erik Peterson
![Page 12: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/12.jpg)
English-Chinese Example
Note: Translator has to insert spaces between words in Chinese.
![Page 13: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/13.jpg)
English-Arabic Example
![Page 14: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/14.jpg)
Outline
• The AVENUE MT project
• The elicitation toolThe elicitation corpus
• Tools for building elicitation corpora
![Page 15: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/15.jpg)
Size of Questionnaire
• Around 3200 sentences
• 20K words
![Page 16: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/16.jpg)
EC Sample: clause level• Mary is writing a book for John.• Who let him eat the sandwich?• Who had the machine crush the
car?• They did not make the policeman
run.• Mary had not blinked.• The policewoman was willing to
chase the boy.• Our brothers did not destroy files.• He said that there is not a manual.• The teacher who wrote a textbook
left.• The policeman chased the man
who was a thief.• Mary began to work.
• Tense, aspect, transitivity, animacy
• Questions, causation and permission
• Interaction of lexical and grammatical aspect
• Volitionality
• Embedded clauses and sequence of tense
• Relative clauses
• Phase aspect
![Page 17: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/17.jpg)
EC Sample: noun phrase level
• The man quit in November.• The man works in the
afternoon.• The balloon floated over the
library.• The man walked over the
platform.• The man came out from
among the group of boys.• The long weekly meeting
ended.• The large bus to the post office
broke down.• The second man laughed.• All five boys laughed.
• Temporal and locative meanings• Quantifiers• Numbers• Combinations of different types of
modifers– My book
• Possession, definiteness– A book of mine
• Possession, indefiniteness
![Page 18: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/18.jpg)
Organization into Minimal Pairs
srcsent: Tú caíste.tgtsent: Eymi ütrünagimi.aligned: ((1,1),(2,2))context: tú = Juan [masculino, 2a persona del singular]comment: You (John) fell
srcsent: Tú estás cayendo.tgtsent: Eymi petu ütrünagimi.aligned: ((1,1),(2 3,2 3))context: tú = Juan [masculino, 2a persona del singular]comment: You (John) are falling
srcsent: Tú caíste .tgtsent: Eymi ütrunagimi.aligned: ((1,1),(2,2))context: tú = María [femenino, 2a persona del singular]comment: You (Mary) fell
![Page 19: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/19.jpg)
Feature Detection: Spanish
The girl saw a red book.((1,1)(2,2)(3,3)(4,4)(5,6)(6,5))La niña vió un libro rojo
A girl saw a red book((1,1)(2,2)(3,3)(4,4)(5,6)(6,5))Una niña vió un libro rojo
I saw the red book((1,1)(2,2)(3,3)(4,5)(5,4))Yo vi el libro rojo
I saw a red book.
((1,1)(2,2)(3,3)(4,5)(5,4)) Yo vi un libro rojo
Feature: definitenessValues: definite, indefiniteFunction-of-*: subj, objMarked-on-head-of-*: noMarked-on-dependent: yesMarked-on-governor: noMarked-on-other: noAdd/delete-word: noChange-in-alignment: no
![Page 20: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/20.jpg)
Feature Detection: Chinese
A girl saw a red book.
((1,2)(2,2)(3,3)(3,4)(4,5)(5,6)(5,7)(6,8))
有 一个 女人 看见 了 一本 红色 的 书 。
The girl saw a red book.
((1,1)(2,1)(3,3)(3,4)(4,5)(5,6)(6,7))
女人 看见 了 一本 红色的 书
Feature: definiteness
Values: definite, indefinite
Function-of-*: subject
Marked-on-head-of-*: no
Marked-on-dependent: no
Marked-on-governor: no
Add/delete-word: yes
Change-in-alignment: no
![Page 21: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/21.jpg)
Feature Detection: Chinese
I saw the red book((1, 3)(2, 4)(2, 5)(4, 1)(5, 2))
红色的 书, 我 看见 了
I saw a red book.((1,1)(2,2)(2,3)(2, 4)(4,5)(5,6))我 看见 了 一本 红色的 书 。
Feature: definitenesValues: definite, indefiniteFunction-of-*: objectMarked-on-head-of-*: noMarked-on-dependent: noMarked-on-governor: noAdd/delete-word: yesChange-in-alignment: yes
![Page 22: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/22.jpg)
Feature Detection: Hebrew
A girl saw a red book.((2,1) (3,2)(5,4)(6,3))
ראתה ספר אדוםילדה
The girl saw a red book((1,1)(2,1)(3,2)(5,4)(6,3))
ראתה ספר אדוםהילדה
I saw a red book.((2,1)(4,3)(5,2))
אדוםספרראיתי
I saw the red book.((2,1)(3,3)(3,4)(4,4)(5,3))
האדוםהספרראיתי את
Feature: definitenessValues: definite, indefiniteFunction-of-*: subj, objMarked-on-head-of-*: yesMarked-on-dependent: yesMarked-on-governor: noAdd-word: noChange-in-alignment: no
![Page 23: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/23.jpg)
Feature Detection Feeds into…
• Corpus Navigation: which minimal pairs to pursue next.– Don’t pursue gender in Mapudungun– Do pursue definiteness in Hebrew
• Morphology Learning:– Morphological learner identifies the forms of the morphemes– Feature detection identifies the functions
• Rule learning:– Rule learner will have to learn a constraint for each morpho-
syntactic marker that is discovered• E.g., Adjectives and nouns agree in gender, number, and definiteness
in Hebrew.
![Page 24: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/24.jpg)
Languages
• The set of feature structures with English sentences has been delivered to the Linguistic Data Consortium as part of the Reflex program.
• Translated (by LDC) into:– Thai– Bengali
• Plans to translate into:– Seven “strategic” languages per year for five years.
• As one small part of a language pack (BLARK) for each language.
![Page 25: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/25.jpg)
Languages
• Spanish version in progress at New Mexico State University (Helmreich and Cowie)– Plans to translate into Guarani
• Portuguese version in progress in Brazil (Marcello Modesto)– Plans to translate into Karitiana
• 200 speakers
• Plans to translate into Inupiaq (Kaplan and MacLean)
![Page 26: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/26.jpg)
Previous Elicitation Work
• Pilot corpus– Around 900 sentences– No feature structures
• Mapudungun– Two partial translations
• Quechua– Three translations
• Aymara– Seven translations
• Hebrew• Hindi
– Several translations• Dutch
![Page 27: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/27.jpg)
Feature Structures
• The EC is actually a corpus of feature structures that happen to have English or Spanish sentences attached to them.
![Page 28: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/28.jpg)
Bengali example with feature structure
srcsent: The large bus to the post office broke down. context: tgtsent:
((actor ((modifier ((mod-role mod-descriptor)(mod-role role-loc-general-to))) (np-identifiability identifiable)(np-specificity specific)(np-biological-gender bio-gender-n/a)(np-animacy anim-inanimate)(np-person person-third)(np-function fn-actor)(np-general-type common-noun-type)(np-number num-sg)(np-pronoun-exclusivity inclusivity-n/a)(np-pronoun-antecedent antecedent-n/a)(np-distance distance-neutral)))
(c-general-type declarative-clause)(c-my-causer-intentionality intentionality-n/a)(c-comparison-type comparison-n/a)(c-relative-tense relative-n/a)(c-our-boundary boundary-n/a)(c-comparator-function comparator-n/a)(c-causee-control control-n/a)(c-our-situations situations-n/a)(c-comparand-type comparand-n/a)(c-causation-directness directness-n/a)(c-source source-neutral)(c-causee-volitionality volition-n/a)(c-assertiveness assertiveness-neutral)(c-solidarity solidarity-neutral)(c-polarity polarity-positive)(c-v-grammatical-aspect gram-aspect-neutral)(c-adjunct-clause-type adjunct-clause-type-n/a)(c-v-phase-aspect phase-aspect-neutral)(c-v-lexical-aspect activity-accomplishment)(c-secondary-type secondary-neutral)(c-event-modality event-modality-none)(c-function fn-main-clause)(c-minor-type minor-n/a)(c-copula-type copula-n/a)(c-v-absolute-tense past)(c-power-relationship power-peer)(c-our-shared-subject shared-subject-n/a)(c-question-gap gap-n/a))
![Page 29: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/29.jpg)
Why feature structures?
• Decide what grammatical meaning to elicit.
• Represent it in a feature structure.
• Formulate an English or Spanish sentence that expresses that meaning.– We can use the same corpus of feature
structures for several elicitation languages
• Have the informant translate it.
![Page 30: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/30.jpg)
Grammatical meanings vs syntactic categories
• Features and values are based on a collection of grammatical meanings– Many of which are similar to the
grammatemes of the Prague Treebanks
![Page 31: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/31.jpg)
Grammatical Meanings
YES• Semantic Roles• Identifiability• Specificity• Time
– Before, after, or during time of speech
• Modality
NO• Case• Voice• Determiners• Auxiliary verbs
![Page 32: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/32.jpg)
Grammatical Meanings
YES• How is identifiability
expressed?– Determiner– Word order– Optional case marker– Optional verb agreement
• How is specificity expressed?
• How are generics expressed?
• How are predicate nominals marked?
NO• How are English
determiners translated?– The boy cried.– The lion is a fierce beast.– I ate a sandwich.– He is a soldier.
• Il est soldat.
![Page 33: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/33.jpg)
Argument Roles
• Actor
• Undergoer
• Predicate and predicatee– The woman is the manager.
• Recipient– I gave a book to the students.
• Beneficiary– I made a phone call for Sam.
![Page 34: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/34.jpg)
Why not subject and object?
• Languages use their voice systems for different purposes.
• Mapudungun obligatorily uses an inverse marked verb when third person acts on first or second person.– Verb agrees with undergoer– Undergoer exhibits other subjecthood properties– Actor may be object.
• Yes: How are actor and undergoer encoded in combination with other semantic features like adversity (Japanese) and person (Mapudungun)?
• No: How is English voice translated into another language?
![Page 35: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/35.jpg)
Argument Roles
• Accompaniment– With someone– With pleasure
• Material– (out) of wood
• About 20 more roles – From the Lingua checklist; Comrie & Smith (1977)– Many also found in tectogrammatical representations in the
Prague Treebanks
• Around 80 locative relations– From Lingua checklist
• Many temporal relations
![Page 36: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/36.jpg)
Noun Phrase Features
• Person• Number• Biological gender• Animacy• Distance (for deictics)• Identifiability• Specificity• Possession• Other semantic roles
– Accompaniment, material, location, time, etc.
• Type– Proper, common, pronoun
• Cardinals• Ordinals• Quantifiers• Given and new
information– Not used yet because of
limited context in the elicitation tool.
![Page 37: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/37.jpg)
Clause level features
• Tense• Aspect
– Lexical, grammatical, phase
• Type– Declarative, open-q,
yes-no-q
• Function– Main, argument,
adjunct, relative
• Source– Hearsay, first-hand,
sensory, assumed
• Assertedness– Asserted,
presupposed, wanted
• Modality– Permission, obligation– Internal, external
![Page 38: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/38.jpg)
Other clause types(Constructions)
• Causative– Make/let/have someone do something
• Predication– May be expressed with or without an overt copula.
• Existential– There is a problem.
• Impersonal– One doesn’t smoke in restaurants in the US.
• Lament– If only I had read the paper.
• Conditional• Comparative• Etc.
![Page 39: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/39.jpg)
Outline
• The AVENUE MT project
• The elicitation tool
• The elicitation corpusTools for elicitation corpora
![Page 40: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/40.jpg)
Mar 1, 2006
Tools for Creating Elicitation Corpora
List of semantic features and values
The Corpus
Feature Maps: which combinations of features and values are of interest
…Clause-Level
Noun-Phrase
Tense & Aspect Modality
Feature Structure Sets
Feature Specification
Reverse Annotated Feature Structure Sets: add English sentences
Smaller CorpusSampling
XML SchemaXSLT Script
![Page 41: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/41.jpg)
Mar 1, 2006
Tools for Creating Elicitation Corpora
List of semantic features and values
The Corpus
Feature Maps: which combinations of features and values are of interest
…Clause-Level
Noun-Phrase
Tense & Aspect Modality
Feature Structure Sets
Feature Specification
Reverse Annotated Feature Structure Sets: add English sentences
Smaller CorpusSampling
Combination Formalism
![Page 42: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/42.jpg)
Mar 1, 2006
Tools for Creating Elicitation Corpora
List of semantic features and values
The Corpus
Feature Maps: which combinations of features and values are of interest
…Clause-Level
Noun-Phrase
Tense & Aspect Modality
Feature Structure Sets
Feature Specification
Reverse Annotated Feature Structure Sets: add English sentences
Smaller CorpusSampling
Feature Structure Viewer
![Page 43: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/43.jpg)
Mar 1, 2006
Tools for Creating Elicitation Corpora
List of semantic features and values
The Corpus
Feature Maps: which combinations of features and values are of interest
…Clause-Level
Noun-Phrase
Tense & Aspect Modality
Feature Structure Sets
Feature Specification
Reverse Annotated Feature Structure Sets: add English sentences
Smaller CorpusSampling
![Page 44: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/44.jpg)
Feature Specification
• Defines Features and their values
• Sets default values for features
• Specifies feature requirements and restrictions
• Written in XML
![Page 45: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/45.jpg)
Feature SpecificationFeature: c-copula-type
(a copula is a verb like “be”; some languages do not have copulas)Values
copula-n/a Restrictions: 1. ~(c-secondary-type secondary-copula)Notes:
copula-role Restrictions: 1. (c-secondary-type secondary-copula)Notes: 1. A role is something like a job or a function. "He is a teacher" "This is a vegetable peeler"
copula-identity Restrictions: 1. (c-secondary-type secondary-copula)Notes: 1. "Clark Kent is Superman" "Sam is the teacher"
copula-location Restrictions: 1. (c-secondary-type secondary-copula)Notes: 1. "The book is on the table" There is a long list of locative relations later in the feature specification.
copula-description Restrictions: 1. (c-secondary-type secondary-copula)Notes: 1. A description is an attribute. "The children are happy." "The books are long."
![Page 46: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/46.jpg)
Feature Maps
• Some features interact in the grammar– English –s reflects person and number of the subject and tense of
the verb.– In expressing the English present progressive tense, the auxiliary
verb is in a different place in a question and a statement:• He is running.
• Is he running?
• We need to check many, but not all combinations of features and values.
• Using unlimited feature combinations leads to an unmanageable number of sentences
![Page 47: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/47.jpg)
Feature Combination Template((predicatee((np-general-type pronoun-type common-
noun-type)(np-person person-first person-second
person-third)(np-number num-sg num-pl)(np-biological-gender bio-gender-male bio-
gender-female)))
{[(predicate ((np-general-type common-noun-type)
(np-person person-third)))(c-copula-type role)][(predicate ((adj-general-type quality-type)(c-copula-type attributive)))][(predicate ((np-general-type common-
noun-type)(np-person person-third)(c-copula-type identity)))]}
(c-secondary-type secondary-copula) (c-polarity #all)
(c-general-type declarative)(c-speech-act sp-act-state)(c-v-grammatical-aspect gram-aspect-
neutral)(c-v-lexical-aspect state)(c-v-absolute-tense past present future)(c-v-phase-aspect durative))
Summarizes 288 feature structures, which are automatically generated.
![Page 48: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/48.jpg)
Adding Sentences to Feature Structures
srcsent: Mary was not a leader.context: Translate this as though it were spoken to a peer co-
worker;
((actor ((np-function fn-actor)(np-animacy anim-human)(np- biological-gender bio-gender-female) (np-general-type proper-noun-type)(np-identifiability identifiable)(np- specificity specific)…))
(pred ((np-function fn-predicate-nominal)(np-animacy anim- human)(np-biological-gender bio-gender-female) (np- general-type common-noun-type)(np-specificity specificity- neutral)…))
(c-v-lexical-aspect state)(c-copula-type copula-role)(c-secondary-type secondary-copula)(c-solidarity solidarity-neutral) (c-v-grammatical-aspect gram-aspect-neutral)(c-v-absolute-tense past) (c-v-phase-aspect phase-aspect-neutral) (c-general-type declarative-clause)(c-polarity polarity-negative)(c-my-causer-intentionality intentionality-n/a)(c-comparison-type comparison-n/a)(c-relative-tense relative-n/a)(c-our-boundary boundary-n/a)…)
![Page 49: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/49.jpg)
Difficult Issues in Adding Sentences
• Have to remember that the grammatical meanings don’t correspond exactly to English morphemes.– Identifiability and specificity vs the and a– Modality, tense, aspect vs auxiliary verbs
• The meaning has to be clear to a translator.– If English is going to be the source language for
translation, the clearest way to say something may not be the most common way it is said in real text or conversation.
![Page 50: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/50.jpg)
Hard Problems
• Expressing meanings that are not grammaticalized in English.– Evidentiality:
• He stole the bread.• Context: Translate this as if you do not
have first hand knowledge. In English, we might say, “They say that he stole the bread” or “I hear that he stole the bread.”
![Page 51: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/51.jpg)
Hard Problems
• Reverse annotating things that can be said in several ways in English.– Impersonals:
• One doesn’t smoke here.• You don’t smoke here.• They don’t smoke here.• There’s no smoking here.• Credit cards aren’t accepted.
– Problem in the Reflex corpus because space was limited.
![Page 52: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/52.jpg)
Evaluation
• Current funding has not covered evaluation of the questionnaire.– Except for informal observations as it was
translated into several languages.
• Does it elicit the meanings it was intended to elicit?– Informal observation: usually
• Is it useful for machine translation?
![Page 53: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/53.jpg)
Navigation
• Currently, feature combinations are specified by a human.
• Plan to work in active learning mode.– Build seed questionnaire– Translate some data– Do some learning– Identify most valuable pieces of information to get
next– Generate an RTB for those pieces of information– Translate more– Learn more– Generate more, etc.
![Page 54: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University.](https://reader035.fdocuments.net/reader035/viewer/2022062409/56649f265503460f94c3d4eb/html5/thumbnails/54.jpg)
Summary
• Feature Specification: – lists features and values – Grammatical meanings
• Feature Combinations
• Set of Feature Structures
• Add English or Spanish Sentences
• Get a translation and word alignment from a bilingual, literate informant