PDT: Tectogrammatical Representation

29
March 5, 2008 Companions Semantic Representation and Dia log Interfacing Workshop - Tectogrammatics 1 PDT: Tectogrammatical Representation Jan Hajič Institute of Formal and Applied Linguistics School of Computer Science Faculty of Mathematics and Physics Charles University, Prague Czech Republic

description

PDT: Tectogrammatical Representation. Jan Haji č Institute of Formal and Applied Linguistics School of Computer Science Faculty of Mathematics and Physics Charles University, Prague Czech Republic. Tectogrammatical Annotation (t-layer). Underlying (deep) syntax 4 sublayers ( integrated ): - PowerPoint PPT Presentation

Transcript of PDT: Tectogrammatical Representation

Page 1: PDT: Tectogrammatical Representation

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics

1

PDT:Tectogrammatical

Representation

Jan Hajič

Institute of Formal and Applied Linguistics

School of Computer Science

Faculty of Mathematics and Physics

Charles University, Prague

Czech Republic

Page 2: PDT: Tectogrammatical Representation

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics

2

Tectogrammatical Annotation (t-layer)

Underlying (deep) syntax 4 sublayers (integrated):

dependency structure, (detailed) functors valency annotation

topic/focus and deep word order coreference (mostly grammatical only) all the rest (“grammatemes”):

detailed functors, underlying gender, number, ...

Total 39 attributes (vs. 5 at m-layer, 2 at a-layer)

Page 3: PDT: Tectogrammatical Representation

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics

3

Analytical vs. Tectogrammatical representation

Underlying verb + tense

Deep function

Elided Actor in

Prepositions out

Another ellipsis...

(TR: sublayer 1 only shown)

Page 4: PDT: Tectogrammatical Representation

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics

4

Layer 3: Tectogrammatical

Underlying (deep) syntax 4 sublayers:

dependency structure, (detailed) functors topic/focus and deep word order coreference (mostly grammatical only) all the rest (grammatemes):

detailed functors underlying gender, number, ...

Page 5: PDT: Tectogrammatical Representation

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics

7

Tectogrammatical Functors

“Actants”: ACT, PAT, EFF, ADDR, ORIG

modify: verbs, nouns, adjectives cannot repeat in a clause, usually obligatory

Free modifications (~ 50), semantically defined can repeat; optional, sometimes obligatory Ex.: LOC, DIR1, ...; TWHEN, TTILL,...; RSTR; BEN, ATT, ACMP,

INTT, MANN; MAT, APP; ID, DPHR, ...

Special Coordination, Rhematizers, Foreign phrases,...

syntactic semantic

Page 6: PDT: Tectogrammatical Representation

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics

8

Tectogrammatical Example

Analytical verb form: he would be allowed to be enrolled

Additional attributes (grammatemes):conditional + “allow”

Collapsed

Page 7: PDT: Tectogrammatical Representation

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics

9

Tectogrammatical Example

Predicate with copula (state) you were fired

Page 8: PDT: Tectogrammatical Representation

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics

10

Tectogrammatical Example

Passive construction (action) (The) book has been translated [by Mr. X]

Disappeared Added

Page 9: PDT: Tectogrammatical Representation

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics

11

Tectogrammatical Example

Object he gave Mary a book

Obj goes into ACT, PAT, ADDR, EFF or ORIG based on governor’s valency frame

Page 10: PDT: Tectogrammatical Representation

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics

12

Relative clause (embedded) the woman, who had a French accent, was very pretty

Tectogrammatical Example

Page 11: PDT: Tectogrammatical Representation

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics

13

Tectogrammatical Example

Incomplete phrases Peter works well, but Paul badly

Added

Page 12: PDT: Tectogrammatical Representation

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics

14

Layer 3: Tectogrammatical

Underlying (deep) syntax 4 sublayers:

dependency structure, (detailed) functors topic/focus and deep word order coreference (mostly grammatical only) all the rest (grammatemes):

detailed functors underlying gender, number, ...

Page 13: PDT: Tectogrammatical Representation

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics

15

Deep Word Order, Topic/Focus

Example:

Baker bakes rolls. vs. BakerIC bakes rolls.

Analyticaldep. tree:

Page 14: PDT: Tectogrammatical Representation

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics

16

Deep Word OrderTopic/Focus

Deep word order: from “old” information to the “new” one (left-to-

right) at every level (head included) projectivity by definition (almost...)

i.e., partial level-based order -> total d.w.o.

Topic/focus/contrastive topic attribute of every node (t, f, c) restricted by d.w.o. and other constraints

Page 15: PDT: Tectogrammatical Representation

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics

17

Layer 3: Tectogrammatical

Underlying (deep) syntax 4 sublayers:

dependency structure, (detailed) functors topic/focus and deep word order coreference (mostly grammatical only) all the rest (grammatemes):

detailed functors underlying gender, number, ...

Page 16: PDT: Tectogrammatical Representation

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics

18

Coreference(intro only: see Silvie’s part)

Grammatical (easy) relative clauses

which, who Peter and Paul, who ...

control infinitival constructions

John promised to go ...

reflexive pronouns {him,her,thme}self(-ves)

Mary saw herself in ...

Johngo

he home

promisePRED

ACTPAT

ACT DIR3

Page 17: PDT: Tectogrammatical Representation

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics

19

Coreference

Textual Ex.: Peter moved to Iowa after he finished his PhD.

Peter Iowafinish

he PhD

movePRED

ACT DIR1TWHEN

ACT PAT

heAPP

Page 18: PDT: Tectogrammatical Representation

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics

20

Layer 3: Tectogrammatical

Underlying (deep) syntax 4 sublayers:

dependency structure, (detailed) functors topic/focus and deep word order coreference (mostly grammatical only) all the rest (grammatemes):

detailed functors underlying gender, number, ...

Page 19: PDT: Tectogrammatical Representation

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics

21

“Grammatemes”

Detailed functors (subfunctors) only for some functors:

TWHEN: before/after LOC: next-to, behind, in-front-of, ... also: ACMP, BEN, CPR, DIR1, DIR2, DIR3, EXT

Lexical (underlying) number (SG/PL), tense, modality, degree of

comparison, ... strictly only where necessary (agreement!)

Page 20: PDT: Tectogrammatical Representation

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics

22

Tectogrammatical attributes I

node typing complex, coap, qcomplex, root, atom, ...

functor, subfunctor TWHEN: TWHEN.basic, TWHEN.before

is_member, is_generated, is_parenthesis, is_dsp_root, is_state, quot_type, ...

grammatemes (16): aspect, degcmp, deontmod, sempos, tense, indeftype,

politeness, person, ...

Page 21: PDT: Tectogrammatical Representation

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics

23

Tectogrammatical attributes II

topic/focus: tfa, deepord

valency: t_lemma, val_frame.rf bookkeeping: id coref_gram.rf, coref_text.rf, compl.rf

reference to TR node, type of coreference sentmod Linking to analytical layer

a.lex.rf (“main” anal. node), a.aux.rf (others)

Page 22: PDT: Tectogrammatical Representation

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics

24

Fully Annotated Sentence

He spends his days sketching passers-by, or trying to.

Page 23: PDT: Tectogrammatical Representation

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics

25

Definition of Valency

Ability (“desire”) of words (verbs, nouns, adjectives) to combine themselves with other units of meaning

Properties of valency: Specific for every word meaning (in general)

leave: sb left sth for sb vs. sb left from somewhere same as in PropBank leave.02 vs. leave.01

Typically strongly correlates with surface form morphological case (~ ending), preposition+case, ...

Semantic constraintsare very dangerous

Page 24: PDT: Tectogrammatical Representation

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics

26

Structure of Valency

word (lemma) word sense group 1

valency frame: slot1 slot2 slot3

surface expression word sense group 2

...

PDT VALLEX (Cz), EngVallex (En)

vyměnit (to replace) vyměnit1

ACT PAT EFF

Nom. Acc.za+Acc.

vyměnit2

...

Page 25: PDT: Tectogrammatical Representation

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics

27

PDT-VALLEX Entry dosáhnout: “to reach”, “to get [sb to do sth]” browser/user-formatted example:

Page 26: PDT: Tectogrammatical Representation

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics

28

Corpus <-> Valency Lexicon Corpus:

ENTRY: uzavřít vf1: ACT(.1) CPHR({smlouva}.4)

ex: u. dohodu (close a contract)vf2: ACT(.1) PAT(.4)

ex.: u. pokoj (close a room, house)

Lexicon:

Sentence 2035: Sentence 15345: Sentence 51042:

Page 27: PDT: Tectogrammatical Representation

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics

29

Valency & Form: Constraints Tree structure:

(Sets of) Constraints: n1: lemma=uvažovat mode=active n2: case=Nom afun=Sb n3: lemma=o afun=AuxP n4: case=Loc afun=Obj

n1

n2 n3

n4

Page 28: PDT: Tectogrammatical Representation

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics

31

Example: Valency & Form

1:2 relative clause

to_say: ACT EFF

lemma=say mode=active

afun=AuxC lemma=that

afun=Obj POS=verb

afun=Sbcase=Nom

• linear representation: EFF(that[.v])

Page 29: PDT: Tectogrammatical Representation

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics

32

Valency and Text Generation

Using valency for... ...getting the correct (lemma, tag) of verb arguments

Example:

starat_se

PRED

Martin

ACT

tygr

PAT

Martin

....1..........

starat

V..............

o

...............

tygr

....4..........

VALLEX entry: starat (se) ACT(.1) PAT(o.[.4])

se

...............

Martin se stará o tygry.

“Martin takes care of tigers.”

“to take care of”

“tiger”